Module 4 : Grafana - Dashboards Avancés

Objectifs du Module

Installer et configurer Grafana
Créer des dashboards professionnels
Utiliser les variables et templates
Maîtriser les panels avancés
Implémenter le provisioning as code

Durée : 3 heures

1. Installation et Configuration

1.1 Docker

# docker-compose.yml
grafana:
  image: grafana/grafana:10.2.0
  container_name: grafana
  ports:
    - "3000:3000"
  volumes:
    - grafana_data:/var/lib/grafana
    - ./grafana/provisioning:/etc/grafana/provisioning
  environment:
    - GF_SECURITY_ADMIN_USER=admin
    - GF_SECURITY_ADMIN_PASSWORD=admin123
    - GF_USERS_ALLOW_SIGN_UP=false
    - GF_SERVER_ROOT_URL=http://localhost:3000
  restart: unless-stopped

volumes:
  grafana_data:

1.2 Configuration Avancée

# grafana.ini ou variables d'environnement

#################################### Server ####################################
[server]
protocol = http
http_port = 3000
domain = grafana.example.com
root_url = %(protocol)s://%(domain)s/

#################################### Security ##################################
[security]
admin_user = admin
admin_password = secure_password
secret_key = SW2YcwTIb9zpOOhoPsMm

# Désactiver la création de compte
[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Viewer

#################################### Auth ######################################
[auth]
disable_login_form = false

# LDAP
[auth.ldap]
enabled = true
config_file = /etc/grafana/ldap.toml

# OAuth (exemple avec Keycloak)
[auth.generic_oauth]
enabled = true
name = Keycloak
client_id = grafana
client_secret = your_secret
scopes = openid profile email
auth_url = https://keycloak.example.com/auth/realms/master/protocol/openid-connect/auth
token_url = https://keycloak.example.com/auth/realms/master/protocol/openid-connect/token
api_url = https://keycloak.example.com/auth/realms/master/protocol/openid-connect/userinfo

#################################### Database ##################################
[database]
type = postgres
host = postgres:5432
name = grafana
user = grafana
password = grafana_password

1.3 Datasources

DATASOURCES SUPPORTÉES
══════════════════════

TIME SERIES                 LOGS                    TRACES
───────────                 ────                    ──────
• Prometheus                • Loki                  • Tempo
• InfluxDB                  • Elasticsearch         • Jaeger
• Graphite                  • CloudWatch Logs       • Zipkin
• TimescaleDB
• VictoriaMetrics

SQL                         CLOUD                   AUTRES
───                         ─────                   ──────
• MySQL                     • CloudWatch            • JSON API
• PostgreSQL                • Azure Monitor         • CSV
• MSSQL                     • Google Cloud          • TestData
• ClickHouse               • Datadog

2. Création de Dashboards

2.1 Structure d'un Dashboard

Anatomie d'un Dashboard Grafana

2.2 Types de Panels

PANELS GRAFANA
══════════════

VISUALISATIONS
─────────────────────────────────────────────
Time series    │ Graphiques temporels classiques
Stat           │ Valeur unique avec seuils
Gauge          │ Jauge avec min/max
Bar gauge      │ Barres horizontales/verticales
Table          │ Données tabulaires
Heatmap        │ Distribution 2D
Histogram      │ Distribution des valeurs
Pie chart      │ Camembert
State timeline │ États au fil du temps
Status history │ Historique des statuts
Geomap         │ Carte géographique
Canvas         │ Visualisation personnalisée
Text           │ Markdown/HTML

WIDGETS
─────────────────────────────────────────────
Alert list     │ Liste des alertes actives
Annotation list│ Liste des annotations
Dashboard list │ Navigation entre dashboards
News           │ Flux RSS
Logs           │ Visualisation de logs (Loki)
Traces         │ Visualisation de traces (Tempo)

2.3 Panel Time Series

{
  "type": "timeseries",
  "title": "CPU Usage",
  "datasource": {
    "type": "prometheus",
    "uid": "prometheus"
  },
  "targets": [
    {
      "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
      "legendFormat": "{{instance}}"
    }
  ],
  "fieldConfig": {
    "defaults": {
      "unit": "percent",
      "min": 0,
      "max": 100,
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "value": null, "color": "green" },
          { "value": 70, "color": "yellow" },
          { "value": 90, "color": "red" }
        ]
      },
      "custom": {
        "lineWidth": 2,
        "fillOpacity": 10,
        "gradientMode": "scheme",
        "showPoints": "never"
      }
    }
  },
  "options": {
    "legend": {
      "displayMode": "table",
      "placement": "bottom",
      "calcs": ["mean", "max", "last"]
    },
    "tooltip": {
      "mode": "multi",
      "sort": "desc"
    }
  }
}

2.4 Panel Stat

{
  "type": "stat",
  "title": "Uptime",
  "targets": [
    {
      "expr": "(time() - node_boot_time_seconds) / 86400",
      "legendFormat": "Days"
    }
  ],
  "fieldConfig": {
    "defaults": {
      "unit": "d",
      "decimals": 1,
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "value": null, "color": "red" },
          { "value": 1, "color": "yellow" },
          { "value": 7, "color": "green" }
        ]
      }
    }
  },
  "options": {
    "reduceOptions": {
      "calcs": ["lastNotNull"]
    },
    "colorMode": "background",
    "graphMode": "none",
    "textMode": "value_and_name"
  }
}

2.5 Panel Table

{
  "type": "table",
  "title": "Server Status",
  "targets": [
    {
      "expr": "up{job=\"node\"}",
      "format": "table",
      "instant": true
    },
    {
      "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
      "format": "table",
      "instant": true
    },
    {
      "expr": "100 * (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)",
      "format": "table",
      "instant": true
    }
  ],
  "transformations": [
    {
      "id": "merge"
    },
    {
      "id": "organize",
      "options": {
        "excludeByName": {
          "Time": true,
          "job": true
        },
        "renameByName": {
          "instance": "Server",
          "Value #A": "Status",
          "Value #B": "CPU %",
          "Value #C": "Memory %"
        }
      }
    }
  ],
  "fieldConfig": {
    "overrides": [
      {
        "matcher": { "id": "byName", "options": "Status" },
        "properties": [
          {
            "id": "mappings",
            "value": [
              { "type": "value", "options": { "1": { "text": "UP", "color": "green" } } },
              { "type": "value", "options": { "0": { "text": "DOWN", "color": "red" } } }
            ]
          }
        ]
      }
    ]
  }
}

3. Variables et Templates

3.1 Types de Variables

TYPES DE VARIABLES
══════════════════

Query          │ Valeurs depuis une datasource
Custom         │ Liste statique de valeurs
Text box       │ Saisie libre
Constant       │ Valeur fixe (masquée)
Datasource     │ Sélection de datasource
Interval       │ Intervalles de temps
Ad hoc filters │ Filtres dynamiques

3.2 Variable Query (Prometheus)

# Variable: instance
# Type: Query
# Query: label_values(node_cpu_seconds_total, instance)
# Regex: /(.+):.+/  # Extraire hostname sans port
# Multi-value: true
# Include All: true

# Variable: job
# Query: label_values(up, job)

# Variable: mountpoint
# Query: label_values(node_filesystem_size_bytes{instance=~"$instance"}, mountpoint)
# Dépend de la variable instance

# Variable: cpu_mode
# Type: Custom
# Values: user,system,iowait,idle

3.3 Utilisation dans les Queries

# Utiliser une variable simple
up{instance="$instance"}

# Utiliser une variable multi-valeur
up{instance=~"$instance"}

# Combiner plusieurs variables
node_cpu_seconds_total{instance=~"$instance", mode="$cpu_mode"}

# Variable avec regex
up{job=~"$job"}

# Variable intervalle
rate(http_requests_total[$__interval])

# Variables auto
rate(http_requests_total[$__rate_interval])

3.4 Exemple Complet

{
  "templating": {
    "list": [
      {
        "name": "datasource",
        "type": "datasource",
        "query": "prometheus"
      },
      {
        "name": "env",
        "type": "query",
        "datasource": "$datasource",
        "query": "label_values(up, environment)",
        "current": { "selected": true, "text": "production", "value": "production" },
        "refresh": 1
      },
      {
        "name": "instance",
        "type": "query",
        "datasource": "$datasource",
        "query": "label_values(up{environment=\"$env\"}, instance)",
        "multi": true,
        "includeAll": true,
        "allValue": ".*",
        "refresh": 2
      },
      {
        "name": "interval",
        "type": "interval",
        "query": "1m,5m,10m,30m,1h",
        "current": { "text": "5m", "value": "5m" }
      }
    ]
  }
}

4. Transformations

4.1 Transformations Courantes

TRANSFORMATIONS GRAFANA
═══════════════════════

DONNÉES
─────────────────────────────────────────────
Merge            │ Fusionner plusieurs queries
Join by field    │ Joindre par un champ commun
Concatenate      │ Concaténer les frames
Group by         │ Grouper et agréger

COLONNES
─────────────────────────────────────────────
Organize fields  │ Renommer, réordonner, masquer
Filter by name   │ Filtrer les colonnes
Filter by value  │ Filtrer les lignes
Add field        │ Ajouter un champ calculé

CALCULS
─────────────────────────────────────────────
Reduce           │ Calculer une seule valeur (avg, sum, etc.)
Calculate field  │ Opérations entre champs
Binary operation │ Opérations binaires

4.2 Exemples de Transformations

{
  "transformations": [
    // Fusionner les résultats de plusieurs queries
    {
      "id": "merge"
    },
    // Réorganiser les colonnes
    {
      "id": "organize",
      "options": {
        "excludeByName": {
          "Time": true,
          "__name__": true
        },
        "indexByName": {
          "instance": 0,
          "cpu": 1,
          "memory": 2
        },
        "renameByName": {
          "instance": "Server",
          "Value #A": "CPU Usage",
          "Value #B": "Memory Usage"
        }
      }
    },
    // Filtrer les valeurs
    {
      "id": "filterByValue",
      "options": {
        "filters": [
          {
            "fieldName": "CPU Usage",
            "config": {
              "id": "greaterOrEqual",
              "options": { "value": 50 }
            }
          }
        ],
        "type": "include",
        "match": "any"
      }
    },
    // Ajouter un champ calculé
    {
      "id": "calculateField",
      "options": {
        "mode": "binary",
        "binary": {
          "left": "CPU Usage",
          "operator": "+",
          "right": "Memory Usage"
        },
        "alias": "Total Usage"
      }
    }
  ]
}

5. Provisioning as Code

5.1 Structure

grafana/provisioning/
├── dashboards/
│   ├── default.yml           # Configuration provider
│   └── dashboards/
│       ├── node-exporter.json
│       └── application.json
├── datasources/
│   └── datasources.yml
├── alerting/
│   └── alerting.yml
└── notifiers/
    └── notifiers.yml

5.2 Datasources

# provisioning/datasources/datasources.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false
    jsonData:
      httpMethod: POST
      manageAlerts: true
      prometheusType: Prometheus
      prometheusVersion: "2.47.0"

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    editable: false
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: "traceID=(\\w+)"
          name: TraceID
          url: "$${__value.raw}"

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    uid: tempo
    editable: false

5.3 Dashboard Provider

# provisioning/dashboards/default.yml
apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: 'Provisioned'
    folderUid: 'provisioned'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    allowUiUpdates: true
    options:
      path: /etc/grafana/provisioning/dashboards/dashboards

5.4 Dashboard JSON

{
  "uid": "node-exporter",
  "title": "Node Exporter",
  "tags": ["prometheus", "node-exporter"],
  "timezone": "browser",
  "schemaVersion": 38,
  "version": 1,
  "refresh": "30s",
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "templating": {
    "list": [
      {
        "name": "instance",
        "type": "query",
        "datasource": { "type": "prometheus", "uid": "prometheus" },
        "query": "label_values(node_uname_info, instance)",
        "refresh": 2,
        "multi": true,
        "includeAll": true
      }
    ]
  },
  "panels": [
    {
      "id": 1,
      "type": "stat",
      "title": "CPU Usage",
      "gridPos": { "x": 0, "y": 0, "w": 6, "h": 4 },
      "targets": [
        {
          "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=~\"$instance\"}[5m])) * 100)",
          "legendFormat": "{{instance}}"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "steps": [
              { "value": null, "color": "green" },
              { "value": 70, "color": "yellow" },
              { "value": 90, "color": "red" }
            ]
          }
        }
      }
    }
  ]
}

5.5 Alerting Rules

# provisioning/alerting/alerting.yml
apiVersion: 1

groups:
  - orgId: 1
    name: Infrastructure
    folder: Alerts
    interval: 1m
    rules:
      - uid: high-cpu
        title: High CPU Usage
        condition: C
        data:
          - refId: A
            relativeTimeRange:
              from: 300
              to: 0
            datasourceUid: prometheus
            model:
              expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
          - refId: B
            relativeTimeRange:
              from: 0
              to: 0
            datasourceUid: __expr__
            model:
              type: reduce
              expression: A
              reducer: mean
          - refId: C
            datasourceUid: __expr__
            model:
              type: threshold
              expression: B
              conditions:
                - evaluator:
                    type: gt
                    params: [80]
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: High CPU usage on {{ $labels.instance }}
          description: CPU usage is above 80% for 5 minutes

6. Dashboards Recommandés

6.1 Dashboards Communautaires

# Importer depuis grafana.com par ID

# Node Exporter Full - ID: 1860
# → Dashboard complet pour Node Exporter

# Docker and System Monitoring - ID: 893
# → Monitoring Docker + système

# Kubernetes Cluster - ID: 6417
# → Vue cluster Kubernetes

# Nginx - ID: 9614
# → Monitoring Nginx

# PostgreSQL - ID: 9628
# → Monitoring PostgreSQL

# Redis - ID: 11835
# → Monitoring Redis

6.2 Best Practices

BONNES PRATIQUES DASHBOARDS
═══════════════════════════

ORGANISATION
─────────────────────────────────────────────
✓ Un dashboard = un focus (service/infra)
✓ Utiliser les folders pour organiser
✓ Nommer clairement avec tags
✓ Documenter avec des Text panels

DESIGN
─────────────────────────────────────────────
✓ Panels importants en haut
✓ Utiliser les Row pour regrouper
✓ Couleurs cohérentes (vert=OK, rouge=KO)
✓ Éviter la surcharge d'informations

PERFORMANCE
─────────────────────────────────────────────
✓ Limiter le nombre de panels (<20)
✓ Utiliser des variables pour filtrer
✓ Préférer instant queries pour les stats
✓ Éviter les regex complexes

7. Exercice : À Vous de Jouer

Mise en Pratique

Objectif : Créer un dashboard Grafana professionnel avec variables, transformations et provisioning

Contexte : Votre équipe a besoin d'un dashboard unifié pour monitorer tous les serveurs de l'infrastructure. Le dashboard doit être réutilisable pour n'importe quel serveur grâce aux variables, et doit être versionné via le provisioning as code.

Tâches à réaliser :

Créer un dashboard "Infrastructure Overview" avec 6 panels différents
Ajouter une variable instance pour filtrer par serveur
Ajouter une variable interval pour ajuster la fenêtre temporelle
Utiliser des transformations sur au moins un panel
Configurer des seuils colorés (vert/jaune/rouge) sur les panels
Exporter le dashboard en JSON
Configurer le provisioning pour charger automatiquement le dashboard au démarrage

Critères de validation :

[ ] Panel Stat affiche l'uptime du serveur
[ ] Panel Gauge affiche le CPU avec seuils colorés
[ ] Panel Time Series affiche l'évolution de la mémoire
[ ] Panel Table affiche l'utilisation disque par partition
[ ] Panel Bar Gauge affiche le trafic réseau
[ ] Panel Graph affiche la charge système (load average)
[ ] Les variables fonctionnent et filtrent correctement
[ ] Le dashboard est provisionné automatiquement

Solution

1. Structure du provisioning

grafana/
├── provisioning/
│   ├── datasources/
│   │   └── datasources.yml
│   └── dashboards/
│       ├── default.yml
│       └── dashboards/
│           └── infrastructure.json

2. Configuration datasource

# grafana/provisioning/datasources/datasources.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

3. Configuration dashboard provider

# grafana/provisioning/dashboards/default.yml
apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: 'Infrastructure'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    allowUiUpdates: true
    options:
      path: /etc/grafana/provisioning/dashboards/dashboards

4. Panels à créer dans l'interface Grafana

Panel 1 - Stat : Uptime

{
  "type": "stat",
  "title": "Uptime",
  "targets": [{
    "expr": "(time() - node_boot_time_seconds{instance=~\"$instance\"}) / 86400"
  }],
  "fieldConfig": {
    "defaults": {
      "unit": "d",
      "decimals": 1,
      "thresholds": {
        "steps": [
          {"value": null, "color": "red"},
          {"value": 1, "color": "yellow"},
          {"value": 7, "color": "green"}
        ]
      }
    }
  }
}

Panel 2 - Gauge : CPU Usage

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle",instance=~"$instance"}[$interval])) * 100)

Configuration : - Unit: percent (0-100) - Thresholds: 70 (yellow), 90 (red) - Display mode: Gradient

Panel 3 - Time Series : Memory

100 * (1 - node_memory_MemAvailable_bytes{instance=~"$instance"} / node_memory_MemTotal_bytes{instance=~"$instance"})

Configuration : - Unit: percent - Legend: {{instance}} - Fill opacity: 10

Panel 4 - Table : Disk Usage

Queries multiples :

# Query A - Filesystem
node_filesystem_size_bytes{instance=~"$instance",fstype!~"tmpfs|overlay"}

# Query B - Used
node_filesystem_size_bytes{instance=~"$instance",fstype!~"tmpfs|overlay"} - node_filesystem_avail_bytes{instance=~"$instance",fstype!~"tmpfs|overlay"}

# Query C - Available
node_filesystem_avail_bytes{instance=~"$instance",fstype!~"tmpfs|overlay"}

Transformations : - Merge - Organize fields (renommer les colonnes) - Add field from calculation (calcul du pourcentage)

Panel 5 - Bar Gauge : Network Traffic

rate(node_network_receive_bytes_total{instance=~"$instance",device!~"lo|veth.*"}[$interval]) / 1024 / 1024

Configuration : - Unit: MBps - Orientation: Horizontal - Display mode: Gradient

Panel 6 - Graph : Load Average

node_load1{instance=~"$instance"}
node_load5{instance=~"$instance"}
node_load15{instance=~"$instance"}

Configuration : - Legend: Load 1m, 5m, 15m - Threshold line au nombre de CPUs

5. Variables à configurer

Variable : datasource - Type: Datasource - Query: prometheus

Variable : instance - Type: Query - Query: label_values(node_cpu_seconds_total, instance) - Multi-value: true - Include All: true - Refresh: On Dashboard Load

Variable : interval - Type: Interval - Values: 1m,5m,10m,30m,1h - Auto: false

6. Export et sauvegarde

Une fois le dashboard créé dans l'UI :

Cliquer sur l'icône Share (⬆)
Onglet "Export"
"Save to file"
Sauvegarder dans grafana/provisioning/dashboards/dashboards/infrastructure.json

7. Docker Compose avec provisioning

grafana:
  image: grafana/grafana:latest
  ports:
    - "3000:3000"
  volumes:
    - ./grafana/provisioning:/etc/grafana/provisioning
    - grafana_data:/var/lib/grafana
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin123
    - GF_USERS_ALLOW_SIGN_UP=false

Validation complète :

# Démarrer la stack
docker-compose up -d

# Attendre que Grafana démarre
sleep 10

# Vérifier que le datasource est provisionné
curl -u admin:admin123 http://localhost:3000/api/datasources | jq .

# Vérifier que le dashboard est provisionné
curl -u admin:admin123 http://localhost:3000/api/search | jq .

# Accéder à Grafana
open http://localhost:3000

Points importants :

Utilisez toujours instance=~"$instance" pour filtrer par la variable
Utilisez $interval dans les fonctions rate() pour la flexibilité
Configurez "editable": true dans le JSON pour permettre les modifications
Testez le dashboard avec différentes sélections de variables

Quiz

Quel panel pour afficher une valeur unique ?
[ ] A. Time series
[ ] B. Stat
[ ] C. Table
Comment utiliser une variable multi-valeur dans une query ?
[ ] A. {instance="$instance"}
[ ] B. {instance=~"$instance"}
[ ] C. {instance IN $instance}
Où placer les dashboards pour le provisioning ?
[ ] A. /var/lib/grafana/dashboards
[ ] B. /etc/grafana/provisioning/dashboards
[ ] C. /usr/share/grafana/dashboards

Réponses : 1-B, 2-B, 3-B

Précédent : Module 3 - Exporters

Suivant : Module 5 - Alerting


← Module 3 : Exporters et Instrumentation	Module 5 : Alerting et Notifications →

Retour au Programme