4 min read

Prometheus - Alertmanager - OpsGenie 연동 (+ Grafana)

테스트 환경

Prometheus(container)

  • Image: prom/prometheus
  • Version: 2.36.0

Alertmanager(container)

  • Image: prom/alertmanager
  • Version: 0.24.0

OpsGenie

  • Team Name: MSP_Alert_Test
  • Team Memver: sker
  • Integrations: Prometheus, Slack

1. OpsGenie에 Team을 생성합니다.

  • Team은 OpsGenie에 존재하는 계정을 기반으로 합니다.

2. 팀 메뉴 내 좌측 "Integrations"탭에서 Prometheus 및 Slack을 추가합니다.

  • Prometheus는 API Key를 Alertmanager Config 파일에 입력해야하니 복사해둡시다.
  • Slack은 "Add to Slack" 버튼을 누르면 웹 로그인 후 알림을 게시할 채널을 선택할 수 있습니다.

3. Prometheus API Key를 Alertmanager Config에 삽입합니다.

# alertmanager-config.yml
global:
  opsgenie_api_key: <여기에 삽입>
route:
  group_by: [alertname]
  group_wait: 15s
  group_interval: 1m
  receiver: opsgenie-alert

receivers:
- name: 'opsgenie-alert'
  opsgenie_configs:
    - send_resolved: true
      message: '{{ range .Alerts }}{{ .Annotations.title }}{{ end }}'
      description: '{{ range .Alerts }}{{ .Annotations.text }}{{ end }}'
      priority: '{{ range .Alerts }}{{ .Labels.priority }}{{ end }}'
      responders:
        - id: MSP_Alert_Test
          type: team

결과

  • Slack
  • OpsGenie

! 여기까지가 단순 Prometheus - OpsGenie간 알림 설정입니다.

  • 아래는 Prometheus, Alertmanager 구성 파일입니다.

  • prometheus.yml
global:
  scrape_interval:     15s  # 데이터 스크래핑 주기
  evaluation_interval: 15s  # 규칙 평가 주기

  external_labels:
    monitor: 'monitoring-test'

# Alert을 위한 prometheus rule
rule_files:
  - /etc/prometheus/alert-rules.yml

# Alertmanager Config
alerting:
  alertmanagers:
  - scheme: http
    static_configs:
      - targets: ['alertmanager:9093']		# docker-compose로 사용시 host = docker-compose.service

# exporter 엔드포인트 및 라벨 지정 
scrape_configs:
  - job_name: 'monitoring-ec2'
    scrape_interval: 5s
    static_configs:
      - targets: ['13.209.74.166:9100']


  • alert-rules.yml
groups:
- name: Alerts
  rules:
  - alert: CPU 80%
    # 15초 동안의 CPU 사용량을 백분율로 계산, 1분 동안 지속시 알림
    expr: 100 - ((irate(node_cpu_seconds_total{mode="idle"}[15s])) * 100) >= 80
    for: 1m
    labels:
      serverity: critical
      priority: P1
    annotations:
      title: "[WARNNING] {{ $labels.job }} CPU 80%"
      text: |
        Name: {{ $labels.job }}
        IP: {{ $labels.instance }}

  • alertmanager-config.yml
global:
  opsgenie_api_key: <secret>
route:
  group_by: [alertname]
  group_wait: 15s
  group_interval: 1m
  receiver: opsgenie-alert

receivers:
- name: 'opsgenie-alert'
  opsgenie_configs:
    - send_resolved: true                                                           # default: true
      message: '{{ range .Alerts }}{{ .Annotations.title }}{{ end }}'               # default: '{{ template "opsgenie.default.message" . }}'
      description: '{{ range .Alerts }}{{ .Annotations.text }}{{ end }}'            # default: '{{ template "opsgenie.default.description" . }}'
      priority: '{{ range .Alerts }}{{ .Labels.priority }}{{ end }}'                # P1, P2, P3, P4, P5
      responders:
        - id: MSP_Alert_Test
          type: team

!! 이 아래는 서비스를 구성하는 docker-compose 파일입니다.

grafana 및 mysql 이미지까지 섞여있습니다.

구조: prometheus, alertmanager, grafana, mysql(for grafana)


  • docker-compose.yaml
version: "3"
services:

# Prometheus (Port: 9090, Host: prometheus)
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - /Users/zlcus/tmp/prometheus/dir/conf_prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - /Users/zlcus/tmp/prometheus/dir/conf_prometheus/alert-rules.yml:/etc/prometheus/alert-rules.yml:ro
      - /Users/zlcus/tmp/prometheus/dir/conf_prometheus/data:/data:rw
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/data'           # Prometheus 메트릭 저장 Path
      - '--storage.tsdb.retention.time=37d'   # 데이터 보관주기
    restart: always
    ports:
      - "0.0.0.0:9090:9090"

# Alert Manager (Port: 9093, Host: alertmanager)
  alertmanager:
      image: prom/alertmanager
      container_name: alertmanager
      volumes:
        - /Users/zlcus/tmp/prometheus/dir/conf_alertmanager/config.yml:/etc/alertmanager/config.yml:ro
      command:
        - '--config.file=/etc/alertmanager/config.yml'
        - '--storage.path=/alertmanager'
      restart: always
      ports:
        - "9093:9093"

# MySQL (Port: 3306, Host: grafanadb)
  grafanadb:
    image: mysql:5.6
    platform: linux/x86_64
    container_name: mysql
    ports:
      - 3306:3306
    volumes:
      - /Users/zlcus/tmp/prometheus/dir/conf_mysql/data:/var/lib/mysql:rw
      - /Users/zlcus/tmp/prometheus/dir/conf_mysql/my.cnf:/etc/mysql/my.cnf:ro
      - /Users/zlcus/tmp/prometheus/dir/conf_mysql/mysqld.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf:ro
    environment:
      - MYSQL_DATABASE=grafana
      - MYSQL_USER=grafanaadmin
      - MYSQL_PASSWORD=grafanaAdmin1!
      - MYSQL_RANDOM_ROOT_PASSWORD=1
    restart: always

# Grafana (Port: 8000, Host: grafana)
  grafana:
    image: grafana/grafana
    container_name: grafana
    ports:
      - "0.0.0.0:8000:3000"
    volumes:
      # - /Users/zlcus/tmp/prometheus/dir/conf_grafana/ssl:/etc/ssl/certs
      - /Users/zlcus/tmp/prometheus/dir/conf_grafana/grafana.ini:/config_files/grafana.ini:ro
      - /Users/zlcus/tmp/prometheus/dir/conf_grafana/home.json:/usr/share/grafana/conf/provisioning/dashboards/home.json:ro
      - /Users/zlcus/tmp/prometheus/dir/conf_grafana/datasource.yml:/usr/share/grafana/conf/provisioning/datasources/datasource.yml:ro
    environment:
      - GF_PATHS_CONFIG=/config_files/grafana.ini
    restart: always
    depends_on:
      - grafanadb
      - prometheus
    # User Define (https://grafana.com/docs/grafana/latest/installation/docker/#migrate-to-v51-or-later)
    # user: '472'
    # ADD Permission
    privileged: true