Module 5 : Alerting et Notifications
Objectifs du Module
- Configurer Alertmanager
- Créer des règles d'alerte Prometheus
- Implémenter le routage et le silencing
- Intégrer les canaux de notification
- Appliquer les bonnes pratiques d'alerting
Durée : 2 heures
1. Architecture de l'Alerting
1.1 Vue d'Ensemble

1.2 États des Alertes

2. Règles d'Alerte Prometheus
2.1 Syntaxe de Base
# /etc/prometheus/rules/alerts.yml
groups:
- name: infrastructure
interval: 30s # Évaluation toutes les 30s (optionnel)
rules:
# Alerte simple
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
team: infrastructure
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is {{ $value | printf \"%.1f\" }}% on {{ $labels.instance }}"
# Alerte avec plusieurs seuils
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage above 85% on {{ $labels.instance }}"
description: "Memory usage: {{ $value | printf \"%.1f\" }}%"
- alert: CriticalMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 95
for: 2m
labels:
severity: critical
annotations:
summary: "Critical memory usage on {{ $labels.instance }}"
description: "Memory usage: {{ $value | printf \"%.1f\" }}%"
2.2 Alertes d'Infrastructure
groups:
- name: node-exporter
rules:
# Instance down
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.job }} instance {{ $labels.instance }} has been down for more than 1 minute"
# Disque
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 15
for: 5m
labels:
severity: warning
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk {{ $labels.mountpoint }} has {{ $value | printf \"%.1f\" }}% available"
- alert: DiskSpaceCritical
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 5
for: 1m
labels:
severity: critical
annotations:
summary: "Critical disk space on {{ $labels.instance }}"
description: "Disk {{ $labels.mountpoint }} has only {{ $value | printf \"%.1f\" }}% available"
# Load
- alert: HighSystemLoad
expr: node_load15 > count(node_cpu_seconds_total{mode="idle"}) by (instance)
for: 15m
labels:
severity: warning
annotations:
summary: "High system load on {{ $labels.instance }}"
description: "15-minute load average is {{ $value }}"
# Swap
- alert: HighSwapUsage
expr: (1 - node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) * 100 > 50
for: 10m
labels:
severity: warning
annotations:
summary: "High swap usage on {{ $labels.instance }}"
# Network errors
- alert: NetworkErrors
expr: rate(node_network_receive_errs_total[5m]) > 0 or rate(node_network_transmit_errs_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Network errors on {{ $labels.instance }}"
2.3 Alertes Applicatives
groups:
- name: application
rules:
# Taux d'erreur HTTP
- alert: HighHTTPErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)
* 100 > 5
for: 5m
labels:
severity: warning
annotations:
summary: "High HTTP error rate for {{ $labels.service }}"
description: "Error rate is {{ $value | printf \"%.1f\" }}%"
# Latence P95
- alert: HighLatencyP95
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High P95 latency for {{ $labels.service }}"
description: "P95 latency is {{ $value | printf \"%.2f\" }}s"
# Saturation (queue depth)
- alert: HighQueueDepth
expr: avg_over_time(queue_depth[5m]) > 100
for: 10m
labels:
severity: warning
annotations:
summary: "High queue depth"
2.4 Alertes Blackbox
groups:
- name: blackbox
rules:
- alert: EndpointDown
expr: probe_success == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Endpoint {{ $labels.instance }} is down"
description: "Probe failed for {{ $labels.instance }}"
- alert: SSLCertExpiringSoon
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
for: 1h
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon"
description: "Certificate for {{ $labels.instance }} expires in {{ $value | printf \"%.0f\" }} days"
- alert: SSLCertExpiryCritical
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 7
for: 1h
labels:
severity: critical
annotations:
summary: "SSL certificate expiring very soon!"
description: "Certificate for {{ $labels.instance }} expires in {{ $value | printf \"%.0f\" }} days"
- alert: SlowResponse
expr: probe_duration_seconds > 3
for: 5m
labels:
severity: warning
annotations:
summary: "Slow response from {{ $labels.instance }}"
description: "Response time is {{ $value | printf \"%.2f\" }}s"
3. Configuration Alertmanager
3.1 Structure de Base
# alertmanager.yml
global:
# Configuration globale
resolve_timeout: 5m
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alertmanager@example.com'
smtp_auth_username: 'alertmanager'
smtp_auth_password: 'password'
# Slack global
slack_api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
# Templates personnalisés
templates:
- '/etc/alertmanager/templates/*.tmpl'
# Arbre de routage
route:
# Route par défaut
receiver: 'default-receiver'
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
# Routes enfants
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
continue: true
- match:
team: infrastructure
receiver: 'slack-infra'
- match_re:
service: (api|web)
receiver: 'slack-backend'
# Inhibitions
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
# Receivers
receivers:
- name: 'default-receiver'
email_configs:
- to: 'alerts@example.com'
- name: 'slack-infra'
slack_configs:
- channel: '#alerts-infra'
send_resolved: true
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: 'your-pagerduty-key'
3.2 Routage Avancé
route:
receiver: 'default'
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
routes:
# Alertes critiques -> PagerDuty + Slack
- match:
severity: critical
receiver: 'pagerduty-critical'
routes:
- match:
env: production
receiver: 'pagerduty-prod'
continue: true
- receiver: 'slack-critical'
# Par équipe
- match:
team: platform
receiver: 'slack-platform'
routes:
- match:
severity: critical
receiver: 'pagerduty-platform'
- match:
team: backend
receiver: 'slack-backend'
# Par environnement
- match:
env: staging
receiver: 'slack-staging'
group_wait: 1m
repeat_interval: 1h
# Heures de bureau uniquement
- match:
severity: warning
receiver: 'email-team'
active_time_intervals:
- business-hours
# Time intervals
time_intervals:
- name: business-hours
time_intervals:
- weekdays: ['monday:friday']
times:
- start_time: '09:00'
end_time: '18:00'
3.3 Receivers Détaillés
receivers:
# Email
- name: 'email-team'
email_configs:
- to: 'team@example.com'
send_resolved: true
html: '{{ template "email.html" . }}'
headers:
Subject: '[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}'
# Slack
- name: 'slack-alerts'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
channel: '#alerts'
send_resolved: true
username: 'Alertmanager'
icon_emoji: ':warning:'
title: '{{ if eq .Status "firing" }}:fire:{{ else }}:white_check_mark:{{ end }} {{ .GroupLabels.alertname }}'
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }}
*Severity:* {{ .Labels.severity }}
*Instance:* {{ .Labels.instance }}
*Description:* {{ .Annotations.description }}
{{ end }}
actions:
- type: button
text: 'Runbook'
url: '{{ (index .Alerts 0).Annotations.runbook_url }}'
- type: button
text: 'Dashboard'
url: 'https://grafana.example.com/d/xxx'
# PagerDuty
- name: 'pagerduty-critical'
pagerduty_configs:
- routing_key: 'your-routing-key'
severity: '{{ .CommonLabels.severity }}'
description: '{{ .CommonAnnotations.summary }}'
details:
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
# Microsoft Teams
- name: 'teams-alerts'
webhook_configs:
- url: 'https://outlook.office.com/webhook/xxx'
send_resolved: true
# Opsgenie
- name: 'opsgenie-critical'
opsgenie_configs:
- api_key: 'your-api-key'
message: '{{ .CommonAnnotations.summary }}'
priority: '{{ if eq .CommonLabels.severity "critical" }}P1{{ else }}P3{{ end }}'
# Webhook générique
- name: 'webhook-custom'
webhook_configs:
- url: 'https://api.example.com/alerts'
send_resolved: true
http_config:
bearer_token: 'your-token'
4. Silencing et Inhibition
4.1 Silences via API
# Créer un silence via amtool
amtool silence add alertname="HighCPUUsage" instance="server1:9100" \
--comment="Maintenance planned" \
--author="admin" \
--duration="2h"
# Créer un silence via API
curl -X POST http://alertmanager:9093/api/v2/silences \
-H "Content-Type: application/json" \
-d '{
"matchers": [
{"name": "alertname", "value": "HighCPUUsage", "isRegex": false},
{"name": "instance", "value": "server1.*", "isRegex": true}
],
"startsAt": "2024-01-15T10:00:00Z",
"endsAt": "2024-01-15T12:00:00Z",
"createdBy": "admin",
"comment": "Planned maintenance"
}'
# Lister les silences
amtool silence query
curl http://alertmanager:9093/api/v2/silences
# Supprimer un silence
amtool silence expire <silence-id>
curl -X DELETE http://alertmanager:9093/api/v2/silence/<silence-id>
4.2 Inhibition Rules
# alertmanager.yml
inhibit_rules:
# Critical inhibe Warning pour la même alerte
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
# InstanceDown inhibe toutes les autres alertes de cette instance
- source_match:
alertname: 'InstanceDown'
target_match_re:
alertname: '.+'
equal: ['instance']
# Cluster down inhibe les alertes de nodes
- source_match:
alertname: 'ClusterDown'
target_match_re:
alertname: 'Node.*'
equal: ['cluster']
# Maintenance mode
- source_match:
alertname: 'MaintenanceMode'
target_match_re:
severity: '(warning|critical)'
equal: ['instance']
5. Bonnes Pratiques
5.1 Design des Alertes
BONNES PRATIQUES ALERTING
═════════════════════════
1. ACTIONABLE
─────────────────────────────────────────────
✓ Chaque alerte doit avoir une action claire
✓ Inclure un runbook_url dans les annotations
✗ Éviter les alertes "pour information"
2. SYMPTÔMES vs CAUSES
─────────────────────────────────────────────
✓ Alerter sur les symptômes (latence, erreurs)
✗ Éviter d'alerter sur les causes (CPU, RAM)
sauf si vraiment critique
3. SEUILS APPROPRIÉS
─────────────────────────────────────────────
✓ Basés sur les SLOs/SLAs
✓ Utiliser le 'for' pour éviter le flapping
✗ Éviter les seuils arbitraires
4. FATIGUE D'ALERTE
─────────────────────────────────────────────
✓ Réduire le bruit au maximum
✓ Grouper les alertes similaires
✓ Utiliser les inhibitions
✗ Ne pas ignorer les alertes récurrentes
5. DOCUMENTATION
─────────────────────────────────────────────
✓ Annotations claires (summary, description)
✓ Runbooks à jour
✓ Labels cohérents (severity, team, service)
5.2 Template d'Alerte
- alert: ServiceHighErrorRate
# Expression claire et commentée
expr: |
# Taux d'erreur 5xx sur 5 minutes
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)
* 100 > 1 # Seuil: 1% d'erreurs
# Durée avant firing (évite flapping)
for: 5m
# Labels pour routage et filtrage
labels:
severity: warning
team: backend
slo: availability
# Annotations pour contexte humain
annotations:
summary: "High error rate for {{ $labels.service }}"
description: |
Error rate is {{ $value | printf "%.2f" }}% for service {{ $labels.service }}.
This exceeds the 1% threshold for 5 minutes.
runbook_url: "https://wiki.example.com/runbooks/high-error-rate"
dashboard_url: "https://grafana.example.com/d/service-overview?var-service={{ $labels.service }}"
5.3 SLO-Based Alerting
groups:
- name: slo-alerts
rules:
# Error budget burn rate
- alert: ErrorBudgetBurn
expr: |
# Burn rate sur 1h
(
sum(rate(http_requests_total{status=~"5.."}[1h])) by (service)
/
sum(rate(http_requests_total[1h])) by (service)
) > (1 - 0.999) * 14.4 # 14.4x = consume budget en 5 jours
for: 5m
labels:
severity: warning
type: error-budget
annotations:
summary: "Error budget burning fast for {{ $labels.service }}"
# Multi-window, multi-burn-rate
- alert: SLOViolation
expr: |
# Fast burn (1h window, 14.4x burn rate)
(
sum(rate(http_requests_total{status=~"5.."}[1h])) by (service)
/
sum(rate(http_requests_total[1h])) by (service)
> 14.4 * (1 - 0.999)
)
and
# Slow burn (6h window, 6x burn rate)
(
sum(rate(http_requests_total{status=~"5.."}[6h])) by (service)
/
sum(rate(http_requests_total[6h])) by (service)
> 6 * (1 - 0.999)
)
for: 5m
labels:
severity: critical
6. Exercice : À Vous de Jouer
Mise en Pratique
Objectif : Mettre en place un système d'alerting complet avec routage intelligent et gestion des silences
Contexte : Vous gérez une plateforme e-commerce critique. Vous devez configurer des alertes pour détecter les problèmes d'infrastructure et d'application, avec une escalade appropriée selon la sévérité. Les alertes critiques doivent réveiller l'astreinte, les warnings peuvent attendre les heures de bureau.
Tâches à réaliser :
- Créer 8 règles d'alerte couvrant infrastructure et application
- Configurer Alertmanager avec routage par sévérité
- Simuler des alertes en générant de la charge
- Créer un silence pour maintenance planifiée
- Tester l'inhibition (une alerte critique doit masquer les warnings)
- Documenter chaque alerte avec summary et runbook_url
Critères de validation :
- [ ] Règles d'alerte valides (vérifiées avec promtool)
- [ ] Alertes visibles dans Prometheus (/alerts)
- [ ] Alertmanager reçoit et route les alertes
- [ ] Le groupement fonctionne (alertes similaires groupées)
- [ ] Les silences fonctionnent
- [ ] L'inhibition masque correctement les alertes
Solution
1. Règles d'alerte complètes
# prometheus/rules/infrastructure.yml
groups:
- name: infrastructure
interval: 30s
rules:
# Alerte instance down
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
team: platform
annotations:
summary: "Instance {{ $labels.instance }} est down"
description: "L'instance {{ $labels.instance }} (job {{ $labels.job }}) ne répond plus depuis 1 minute"
runbook_url: "https://wiki.example.com/runbooks/instance-down"
# CPU élevé
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
team: platform
annotations:
summary: "CPU élevé sur {{ $labels.instance }}"
description: "Utilisation CPU à {{ $value | printf \"%.1f\" }}% sur {{ $labels.instance }}"
runbook_url: "https://wiki.example.com/runbooks/high-cpu"
- alert: CriticalCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95
for: 2m
labels:
severity: critical
team: platform
annotations:
summary: "CPU critique sur {{ $labels.instance }}"
description: "Utilisation CPU à {{ $value | printf \"%.1f\" }}% - intervention immédiate requise"
# Mémoire élevée
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
for: 5m
labels:
severity: warning
team: platform
annotations:
summary: "Mémoire élevée sur {{ $labels.instance }}"
description: "Utilisation mémoire à {{ $value | printf \"%.1f\" }}%"
# Disque plein
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes) * 100 < 15
for: 5m
labels:
severity: warning
team: platform
annotations:
summary: "Espace disque faible sur {{ $labels.instance }}"
description: "Partition {{ $labels.mountpoint }} : {{ $value | printf \"%.1f\" }}% disponible"
- alert: DiskSpaceCritical
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes) * 100 < 5
for: 1m
labels:
severity: critical
team: platform
annotations:
summary: "Espace disque critique sur {{ $labels.instance }}"
description: "Partition {{ $labels.mountpoint }} : seulement {{ $value | printf \"%.1f\" }}% disponible"
# prometheus/rules/application.yml
groups:
- name: application
interval: 30s
rules:
# Taux d'erreur élevé
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
* 100 > 5
for: 5m
labels:
severity: warning
team: backend
annotations:
summary: "Taux d'erreur HTTP élevé"
description: "Taux d'erreur à {{ $value | printf \"%.2f\" }}% (seuil: 5%)"
runbook_url: "https://wiki.example.com/runbooks/high-error-rate"
# Latence élevée
- alert: HighLatency
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
) > 0.5
for: 5m
labels:
severity: warning
team: backend
annotations:
summary: "Latence P95 élevée"
description: "Latence P95 à {{ $value | printf \"%.2f\" }}s (seuil: 500ms)"
2. Configuration Alertmanager
# alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
# Configuration webhook Slack
# Remplacez par votre webhook URL réel
route:
receiver: 'default'
group_by: ['alertname', 'severity', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
# Alertes critiques -> escalade immédiate
- match:
severity: critical
receiver: 'critical-alerts'
group_wait: 10s
repeat_interval: 1h
# Alertes warning -> canal standard
- match:
severity: warning
receiver: 'warning-alerts'
# Par équipe
- match:
team: backend
receiver: 'team-backend'
- match:
team: platform
receiver: 'team-platform'
# Règles d'inhibition
inhibit_rules:
# Critical inhibe Warning pour la même alerte
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
# InstanceDown inhibe toutes les autres alertes de cette instance
- source_match:
alertname: 'InstanceDown'
target_match_re:
alertname: '.+'
equal: ['instance']
receivers:
- name: 'default'
webhook_configs:
- url: 'http://localhost:5001/webhook'
send_resolved: true
- name: 'critical-alerts'
# En production: PagerDuty, OpsGenie, etc.
webhook_configs:
- url: 'http://localhost:5001/webhook/critical'
send_resolved: true
- name: 'warning-alerts'
webhook_configs:
- url: 'http://localhost:5001/webhook/warning'
- name: 'team-backend'
webhook_configs:
- url: 'http://localhost:5001/webhook/backend'
- name: 'team-platform'
webhook_configs:
- url: 'http://localhost:5001/webhook/platform'
3. Validation des règles
# Vérifier la syntaxe des règles
docker run --rm -v $(pwd)/prometheus/rules:/rules \
prom/prometheus:latest \
promtool check rules /rules/*.yml
# Tester une règle spécifique
docker run --rm -v $(pwd)/prometheus/rules:/rules \
prom/prometheus:latest \
promtool test rules /rules/test.yml
4. Générer des alertes de test
# Stress CPU pour déclencher HighCPUUsage
docker exec -it node-exporter sh -c "yes > /dev/null &"
# Remplir le disque (ATTENTION: test uniquement)
# docker exec -it node-exporter dd if=/dev/zero of=/tmp/bigfile bs=1M count=1000
# Vérifier les alertes dans Prometheus
curl http://localhost:9090/api/v1/alerts | jq '.data.alerts[] | {name: .labels.alertname, state: .state}'
# Vérifier dans Alertmanager
curl http://localhost:9093/api/v2/alerts | jq '.[] | {labels: .labels.alertname, status: .status.state}'
5. Créer un silence
# Créer un silence pour maintenance (2 heures)
curl -X POST http://localhost:9093/api/v2/silences \
-H "Content-Type: application/json" \
-d '{
"matchers": [
{"name": "instance", "value": "node-exporter:9100", "isRegex": false}
],
"startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
"endsAt": "'$(date -u -d '+2 hours' +%Y-%m-%dT%H:%M:%SZ)'",
"createdBy": "admin",
"comment": "Maintenance planifiée - mise à jour système"
}'
# Lister les silences actifs
curl http://localhost:9093/api/v2/silences | jq '.[] | {id: .id, comment: .comment, status: .status.state}'
# Supprimer un silence (remplacer SILENCE_ID)
# curl -X DELETE http://localhost:9093/api/v2/silence/SILENCE_ID
6. Tester l'inhibition
Créez deux alertes pour la même instance : une warning et une critical. La critical devrait masquer la warning grâce à la règle d'inhibition.
# Vérifier les inhibitions actives
curl http://localhost:9093/api/v2/alerts | jq '.[] | select(.status.inhibitedBy | length > 0)'
7. Docker Compose complet
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus/rules:/etc/prometheus/rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--web.enable-lifecycle'
alertmanager:
image: prom/alertmanager:latest
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
Checklist finale :
- [ ]
promtool check rulespasse sans erreur - [ ] Les alertes apparaissent dans Prometheus UI (Status > Rules)
- [ ] Alertmanager reçoit les alertes (visible dans l'UI :9093)
- [ ] Le groupement fonctionne (alertes similaires regroupées)
- [ ] Les silences masquent les alertes
- [ ] L'inhibition fonctionne (critical masque warning)
- [ ] Chaque alerte a un summary et un runbook_url
Quiz
- Que signifie l'état "pending" d'une alerte ?
- [ ] A. L'alerte est résolue
- [ ] B. La condition est vraie mais la durée 'for' n'est pas atteinte
-
[ ] C. L'alerte attend une approbation
-
Que fait une inhibition ?
- [ ] A. Supprime définitivement une alerte
- [ ] B. Empêche certaines alertes si une autre est active
-
[ ] C. Augmente la priorité d'une alerte
-
Quel paramètre pour éviter les alertes répétitives ?
- [ ] A. group_wait
- [ ] B. repeat_interval
- [ ] C. resolve_timeout
Réponses : 1-B, 2-B, 3-B
Précédent : Module 4 - Grafana Dashboards
Suivant : TP Final - Stack Complète
Navigation
| ← Module 4 : Grafana - Dashboards Avancés | TP Final : Stack d'Observabilité Comp... → |