Grafana High Availability (HA)¶
This guide explains how to enable Grafana HA in Privacera monitoring. When enabled, Grafana uses a shared database, multiple replicas, HA alerting, a Pod Disruption Budget (PDB), and anti-affinity for resilience.
What Grafana HA gives you¶
| Feature | What it does | Default when HA enabled |
|---|---|---|
| Shared database | Dashboards, alerts, users, and sessions are stored in a database so all Grafana pods see the same data. Without HA, Grafana uses SQLite (local file) which doesn't work with multiple replicas. | Database enabled; SQLite disabled |
| Multiple replicas | Runs multiple Grafana pods for high availability. If one pod fails, others continue serving requests. Provides load distribution and zero-downtime updates. | 1 replica (default, can be overridden) |
| No PVC for Grafana | Grafana does not use persistent volume storage. All state (dashboards, alerts, users, sessions) is stored in the shared database. | Persistence: disabled |
| HA alerting | Only one pod (the leader) sends alert notifications via gossip protocol on port 9094. Prevents duplicate notifications when multiple Grafana pods are running. | Enabled with headless service |
| Horizontal Pod Autoscaler (HPA) | Automatically scales Grafana pods up or down based on CPU and memory usage. Ensures optimal resource utilization and performance during varying load. | Enabled: min 1, max 3 pods, CPU 70%, memory 75% |
| Pod Disruption Budget (PDB) | Ensures a minimum number of Grafana pods remain available during voluntary disruptions (node maintenance, cluster upgrades). Prevents all pods from being terminated simultaneously. | 1 pod minimum available |
| Pod anti-affinity | Kubernetes scheduler tries to spread Grafana pods across different nodes. If one node fails, Grafana remains available on other nodes. | Preferred (soft rule) on hostname topology |
Prerequisites¶
- Database: A shared database (PostgreSQL or MySQL) is required for Grafana HA. You can use the same RDBMS service used by Privacera or any other database instance.
- Vars file: You will edit the monitoring vars file. Follow the steps below to set it up.
Setup vars file¶
-
SSH into the instance where Privacera Manager is installed.
-
Navigate to the
configdirectory:Bash -
Copy
vars.monitoring.ymlfile fromsample-varsfolder tocustom-varsfolder:
!!! note If this file already exists in custom-vars folder, you can skip this step.
| Bash | |
|---|---|
- Open the file for editing:
Bash
In the rest of this guide, this file is referred to as vars.monitoring.yml.
Step 1: Create the Grafana database and user¶
Before enabling Grafana HA, you must create a dedicated database and user for Grafana in your database server.
Database choice
You can use the same RDBMS service used by Privacera or any other database instance. The commands below work for both internal (deployed by Privacera) and external databases.
Login to your database¶
Connect to your database server using an admin user with privileges to create databases and users.
For MySQL:
| Bash | |
|---|---|
For PostgreSQL:
| Bash | |
|---|---|
Replace:
<username>- Admin username<hostname>- Database hostname
Create database and user¶
Once logged in, execute the following SQL commands:
| SQL | |
|---|---|
Important notes
- Replace
your_secure_passwordwith a strong password - Remember the database name, username, and password - you'll need them in Step 2
- These commands work identically for internal and external databases
Step 2: Configure Grafana HA variables¶
Edit vars.monitoring.yml and set the following variables:
Variable descriptions¶
| Variable | Purpose | Example |
|---|---|---|
GRAFANA_HA_ENABLE | Master flag to enable all HA features | "true" |
GRAFANA_DB_TYPE | Database driver to use | "postgres" or "mysql" |
GRAFANA_DB_HOST | Database hostname or IP address | "db.example.com" or "10.0.1.50" |
GRAFANA_DB_PORT | Database port | "5432" (PostgreSQL) or "3306" (MySQL/MariaDB) |
GRAFANA_DB_NAME | Database name created for Grafana | "grafana" |
GRAFANA_DB_USER | Database user created for Grafana | "grafana_user" |
GRAFANA_DB_PASSWORD | Password for the database user | Your secure password |
GRAFANA_DB_SSL_MODE | SSL mode for PostgreSQL (optional) | "require", "verify-full", or "disable" |
SSL Mode
GRAFANA_DB_SSL_MODE only applies to PostgreSQL and is ignored for MySQL.
What happens when you enable HA¶
When GRAFANA_HA_ENABLE is set to "true", the following are automatically configured:
- Database usage enabled (SQLite disabled)
- Initial replicas set to the value of
GRAFANA_DEPLOYMENT_REPLICAS(default: 1). - HPA enabled with min 1, max 3 replicas
- Deployment strategy changed to RollingUpdate
- Headless service created for HA alerting
- Pod Disruption Budget enabled (min 1 pod available)
- Pod anti-affinity enabled (spread across nodes)
- Persistence disabled (no PVC for Grafana)
Step 3: Optional - Override HPA settings¶
What is HPA?¶
HPA (Horizontal Pod Autoscaler) automatically scales the number of Grafana pods based on resource usage (CPU and memory). When load increases, HPA adds more pods; when load decreases, it removes pods, ensuring optimal resource utilization and performance.
Default HPA values when HA is enabled¶
- Initial replicas: 1 (starting number of Grafana pods)
- HPA min replicas: Set to value of
GRAFANA_DEPLOYMENT_REPLICAS - HPA max replicas: 3 (maximum pods allowed)
- CPU threshold: 70% (scale up when CPU exceeds this)
- Memory threshold: 75% (scale up when memory exceeds this)
Replica variable behavior
When HPA is enabled, GRAFANA_AUTOSCALING_HPA_MIN_REPLICA is automatically set to the value of GRAFANA_DEPLOYMENT_REPLICAS to ensure consistent baseline capacity. HPA then manages the replica count dynamically between min and max based on resource usage.
HPA variables explained¶
You can override these in vars.monitoring.yml:
| Variable | What it means | Example |
|---|---|---|
GRAFANA_DEPLOYMENT_REPLICAS | Initial/starting number of Grafana pods | 1 = Start with 1 pod |
GRAFANA_AUTOSCALING_HPA_MIN_REPLICA | Minimum pods to keep running at all times (set to value of GRAFANA_DEPLOYMENT_REPLICAS) | GRAFANA_DEPLOYMENT_REPLICAS = Use same as initial replicas |
GRAFANA_AUTOSCALING_HPA_MAX_REPLICA | Maximum pods allowed during high load | 3 = Never scale above 3 pods |
GRAFANA_AUTOSCALING_HPA_TARGET_CPU | CPU percentage that triggers scale-up | "70" = Scale up when CPU > 70% |
GRAFANA_AUTOSCALING_HPA_TARGET_MEM | Memory percentage that triggers scale-up | "75" = Scale up when memory > 75% |
How HPA works
If your Grafana pods are using 80% CPU (above the 70% target), HPA will add more pods (up to max 3). If CPU drops to 50% (below target), HPA will remove pods (down to the minimum, which is set to the value of GRAFANA_DEPLOYMENT_REPLICAS).
Step 4: Apply and verify¶
Regenerate and deploy¶
- Run setup so the Grafana Helm values are regenerated from your vars:
| Bash | |
|---|---|
- Install or upgrade (as you normally do):
| Bash | |
|---|---|
Or run your usual playbook that deploys monitoring/Grafana.
Sanity checks¶
-
Pods: Check that Grafana pods are running:
You should see Grafana pods (e.g. 1 to 3 depending on load and HPA).Bash -
HPA: Verify HPA is active:
You should see the Grafana HPA with current/min/max replicas.Bash -
DB configuration in generated values: After running setup, verify that database variables are correctly rendered in the Grafana Helm values file. Check the
grafana.inisection (database config) in:Confirm that the database type, host, port, name, and user are set as per yourBash vars.monitoring.yml. This file is generated from your vars; if DB settings are wrong here, fix them invars.monitoring.ymland run setup again.
Quick reference: variables summary¶
Required variables for HA¶
| Variable | Example | Description |
|---|---|---|
GRAFANA_HA_ENABLE | "true" | Master flag to enable HA |
GRAFANA_DB_TYPE | "postgres" or "mysql" | Database type |
GRAFANA_DB_HOST | "db.example.com" | Database hostname |
GRAFANA_DB_PORT | "5432" or "3306" | Database port |
GRAFANA_DB_NAME | "grafana" | Database name |
GRAFANA_DB_USER | "grafana_user" | Database user |
GRAFANA_DB_PASSWORD | "your_password" | Database password |
Optional variables (override defaults)¶
| Variable | Default when HA | Purpose |
|---|---|---|
GRAFANA_DEPLOYMENT_REPLICAS | 1 | Initial number of Grafana pods |
GRAFANA_AUTOSCALING_HPA_MIN_REPLICA | Value of GRAFANA_DEPLOYMENT_REPLICAS | HPA minimum replicas |
GRAFANA_AUTOSCALING_HPA_MAX_REPLICA | 3 | HPA maximum replicas |
GRAFANA_AUTOSCALING_HPA_TARGET_CPU | "70" | CPU threshold for scaling |
GRAFANA_AUTOSCALING_HPA_TARGET_MEM | "75" | Memory threshold for scaling |
GRAFANA_DB_SSL_MODE | (varies) | PostgreSQL SSL mode (optional) |
Automatically configured when HA is enabled¶
GRAFANA_DB_ENABLED→"true"GRAFANA_AUTOSCALING_HPA_ENABLED→"true"GRAFANA_POD_DISRUPTION_BUDGET_ENABLED→"true"GRAFANA_POD_ANTI_AFFINITY_ENABLED→"true"- Persistence for Grafana → disabled (no PVC)
- Headless service and HA alerting → enabled
Troubleshooting¶
| Issue | What to check |
|---|---|
| Grafana pods not starting / DB connection errors | Verify GRAFANA_DB_* variables (host, port, name, user, password). Ensure the database and user exist and are reachable from the Kubernetes cluster. Test connectivity from a pod in the cluster. |
| YAML error on Helm upgrade | Ensure numeric variables use proper format (e.g., 3 not "3" for replicas). Quote values that contain special characters. |
| Duplicate alert notifications | Confirm HA is enabled (headless service, multiple replicas, shared DB). |
| HPA not scaling | Check HPA status with kubectl describe hpa <grafana-hpa> -n <namespace>. Ensure metrics-server is running in your cluster. Verify CPU/memory requests are set on Grafana pods. |