Grafana High Availability (HA)¶

This guide explains how to enable Grafana HA in Privacera monitoring. When enabled, Grafana uses a shared database, multiple replicas, HA alerting, a Pod Disruption Budget (PDB), and anti-affinity for resilience.

What Grafana HA gives you¶

Feature	What it does	Default when HA enabled
Shared database	Dashboards, alerts, users, and sessions are stored in a database so all Grafana pods see the same data. Without HA, Grafana uses SQLite (local file) which doesn't work with multiple replicas.	Database enabled; SQLite disabled
Multiple replicas	Runs multiple Grafana pods for high availability. If one pod fails, others continue serving requests. Provides load distribution and zero-downtime updates.	1 replica (default, can be overridden)
No PVC for Grafana	Grafana does not use persistent volume storage. All state (dashboards, alerts, users, sessions) is stored in the shared database.	Persistence: disabled
HA alerting	Only one pod (the leader) sends alert notifications via gossip protocol on port 9094. Prevents duplicate notifications when multiple Grafana pods are running.	Enabled with headless service
Horizontal Pod Autoscaler (HPA)	Automatically scales Grafana pods up or down based on CPU and memory usage. Ensures optimal resource utilization and performance during varying load.	Enabled: min 1, max 3 pods, CPU 70%, memory 75%
Pod Disruption Budget (PDB)	Ensures a minimum number of Grafana pods remain available during voluntary disruptions (node maintenance, cluster upgrades). Prevents all pods from being terminated simultaneously.	1 pod minimum available
Pod anti-affinity	Kubernetes scheduler tries to spread Grafana pods across different nodes. If one node fails, Grafana remains available on other nodes.	Preferred (soft rule) on hostname topology

Prerequisites¶

Database: A shared database (PostgreSQL or MySQL) is required for Grafana HA. You can use the same RDBMS service used by Privacera or any other database instance.
Vars file: You will edit the monitoring vars file. Follow the steps below to set it up.

Setup vars file¶

SSH into the instance where Privacera Manager is installed.

Navigate to the config directory:

Bash
1	`cd ~/privacera/privacera-manager/config/`

Copy vars.monitoring.yml file from sample-vars folder to custom-vars folder:

!!! note If this file already exists in custom-vars folder, you can skip this step.

Bash
1	`cp sample-vars/vars.monitoring.yml custom-vars/`

Open the file for editing:
Bash
1
vi custom-vars/vars.monitoring.yml

In the rest of this guide, this file is referred to as vars.monitoring.yml.

Step 1: Create the Grafana database and user¶

Before enabling Grafana HA, you must create a dedicated database and user for Grafana in your database server.

Database choice

You can use the same RDBMS service used by Privacera or any other database instance. The commands below work for both internal (deployed by Privacera) and external databases.

Connect to your database server using an admin user with privileges to create databases and users.

For MySQL:

Bash
mysql -u <username> -p -h <hostname> -P 3306

For PostgreSQL:

Bash
psql -U <username> -d <database_name> -h <hostname> -p 5432

Replace:

<username> - Admin username
<hostname> - Database hostname

Create database and user¶

Once logged in, execute the following SQL commands:

SQL
-- Create database
CREATE DATABASE grafana;

-- Create user (syntax varies by database type)
CREATE USER grafana_user IDENTIFIED BY 'your_secure_password';

-- Grant privileges
GRANT ALL PRIVILEGES ON grafana TO grafana_user;

-- Apply changes
FLUSH PRIVILEGES;

Important notes

Replace your_secure_password with a strong password
Remember the database name, username, and password - you'll need them in Step 2
These commands work identically for internal and external databases

Step 2: Configure Grafana HA variables¶

Edit vars.monitoring.yml and set the following variables:

YAML
###############################################
######## Grafana High Availability (HA) #########
###############################################

# Enable Grafana HA (required for all HA features)
GRAFANA_HA_ENABLE: "true"

# --- Database Configuration ---
GRAFANA_DB_TYPE: "mysql"                    # Options: "postgres" or "mysql"
GRAFANA_DB_HOST: "your-db-host.example.com" # Database hostname or IP
GRAFANA_DB_PORT: "3306"                     # Port: 5432 for PostgreSQL, 3306 for MySQL/MariaDB
GRAFANA_DB_NAME: "grafana"                  # Database name you created
GRAFANA_DB_USER: "grafana_user"             # Database user you created
GRAFANA_DB_PASSWORD: "your_secure_password" # Password for the database user

# --- Optional: PostgreSQL only ---
# GRAFANA_DB_SSL_MODE: "require"            # Options: require, verify-full, disable

## Grafana HA Replicas Configuration
## Uncomment to override default replicas (default: 1 when HA enabled)
GRAFANA_DEPLOYMENT_REPLICAS: 1

Variable descriptions¶

Variable	Purpose	Example
`GRAFANA_HA_ENABLE`	Master flag to enable all HA features	`"true"`
`GRAFANA_DB_TYPE`	Database driver to use	`"postgres"` or `"mysql"`
`GRAFANA_DB_HOST`	Database hostname or IP address	`"db.example.com"` or `"10.0.1.50"`
`GRAFANA_DB_PORT`	Database port	`"5432"` (PostgreSQL) or `"3306"` (MySQL/MariaDB)
`GRAFANA_DB_NAME`	Database name created for Grafana	`"grafana"`
`GRAFANA_DB_USER`	Database user created for Grafana	`"grafana_user"`
`GRAFANA_DB_PASSWORD`	Password for the database user	Your secure password
`GRAFANA_DB_SSL_MODE`	SSL mode for PostgreSQL (optional)	`"require"`, `"verify-full"`, or `"disable"`

SSL Mode

GRAFANA_DB_SSL_MODE only applies to PostgreSQL and is ignored for MySQL.

What happens when you enable HA¶

When GRAFANA_HA_ENABLE is set to "true", the following are automatically configured:

Database usage enabled (SQLite disabled)
Initial replicas set to the value of GRAFANA_DEPLOYMENT_REPLICAS (default: 1).
HPA enabled with min 1, max 3 replicas
Deployment strategy changed to RollingUpdate
Headless service created for HA alerting
Pod Disruption Budget enabled (min 1 pod available)
Pod anti-affinity enabled (spread across nodes)
Persistence disabled (no PVC for Grafana)

Step 3: Optional - Override HPA settings¶

What is HPA?¶

HPA (Horizontal Pod Autoscaler) automatically scales the number of Grafana pods based on resource usage (CPU and memory). When load increases, HPA adds more pods; when load decreases, it removes pods, ensuring optimal resource utilization and performance.

Default HPA values when HA is enabled¶

Initial replicas: 1 (starting number of Grafana pods)
HPA min replicas: Set to value of GRAFANA_DEPLOYMENT_REPLICAS
HPA max replicas: 3 (maximum pods allowed)
CPU threshold: 70% (scale up when CPU exceeds this)
Memory threshold: 75% (scale up when memory exceeds this)

Replica variable behavior

When HPA is enabled, GRAFANA_AUTOSCALING_HPA_MIN_REPLICA is automatically set to the value of GRAFANA_DEPLOYMENT_REPLICAS to ensure consistent baseline capacity. HPA then manages the replica count dynamically between min and max based on resource usage.

HPA variables explained¶

You can override these in vars.monitoring.yml:

YAML
# Initial number of Grafana pods (starting point before HPA takes over)
GRAFANA_DEPLOYMENT_REPLICAS: 1

# Minimum number of pods HPA will maintain (set to value of GRAFANA_DEPLOYMENT_REPLICAS)
GRAFANA_AUTOSCALING_HPA_MIN_REPLICA: "{{GRAFANA_DEPLOYMENT_REPLICAS}}"

# Maximum number of pods HPA can scale up to (under high load)
GRAFANA_AUTOSCALING_HPA_MAX_REPLICA: 3

# CPU threshold: HPA scales up when average CPU usage exceeds this percentage
GRAFANA_AUTOSCALING_HPA_TARGET_CPU: "70"

# Memory threshold: HPA scales up when average memory usage exceeds this percentage
GRAFANA_AUTOSCALING_HPA_TARGET_MEM: "75"

Variable	What it means	Example
`GRAFANA_DEPLOYMENT_REPLICAS`	Initial/starting number of Grafana pods	`1` = Start with 1 pod
`GRAFANA_AUTOSCALING_HPA_MIN_REPLICA`	Minimum pods to keep running at all times (set to value of `GRAFANA_DEPLOYMENT_REPLICAS`)	`GRAFANA_DEPLOYMENT_REPLICAS` = Use same as initial replicas
`GRAFANA_AUTOSCALING_HPA_MAX_REPLICA`	Maximum pods allowed during high load	`3` = Never scale above 3 pods
`GRAFANA_AUTOSCALING_HPA_TARGET_CPU`	CPU percentage that triggers scale-up	`"70"` = Scale up when CPU > 70%
`GRAFANA_AUTOSCALING_HPA_TARGET_MEM`	Memory percentage that triggers scale-up	`"75"` = Scale up when memory > 75%

How HPA works

If your Grafana pods are using 80% CPU (above the 70% target), HPA will add more pods (up to max 3). If CPU drops to 50% (below target), HPA will remove pods (down to the minimum, which is set to the value of GRAFANA_DEPLOYMENT_REPLICAS).

Step 4: Apply and verify¶

Regenerate and deploy¶

Run setup so the Grafana Helm values are regenerated from your vars:

Bash
1	`./privacera-manager.sh setup`

Install or upgrade (as you normally do):

Bash
1	`./pm_with_helm.sh install`

Or run your usual playbook that deploys monitoring/Grafana.

Sanity checks¶

Pods: Check that Grafana pods are running:
Bash
1
kubectl get pods -n <namespace> -l app.kubernetes.io/name=grafana
You should see Grafana pods (e.g. 1 to 3 depending on load and HPA).
HPA: Verify HPA is active:
Bash
1
kubectl get hpa -n <namespace>
You should see the Grafana HPA with current/min/max replicas.
DB configuration in generated values: After running setup, verify that database variables are correctly rendered in the Grafana Helm values file. Check the grafana.ini section (database config) in:
Bash
1
privacera/privacera-manager/output/kubernetes/helm/monitoring/grafana/grafana-values.yml
Confirm that the database type, host, port, name, and user are set as per your vars.monitoring.yml. This file is generated from your vars; if DB settings are wrong here, fix them in vars.monitoring.yml and run setup again.

Quick reference: variables summary¶

Required variables for HA¶

Variable	Example	Description
`GRAFANA_HA_ENABLE`	`"true"`	Master flag to enable HA
`GRAFANA_DB_TYPE`	`"postgres"` or `"mysql"`	Database type
`GRAFANA_DB_HOST`	`"db.example.com"`	Database hostname
`GRAFANA_DB_PORT`	`"5432"` or `"3306"`	Database port
`GRAFANA_DB_NAME`	`"grafana"`	Database name
`GRAFANA_DB_USER`	`"grafana_user"`	Database user
`GRAFANA_DB_PASSWORD`	`"your_password"`	Database password

Optional variables (override defaults)¶

Variable	Default when HA	Purpose
`GRAFANA_DEPLOYMENT_REPLICAS`	1	Initial number of Grafana pods
`GRAFANA_AUTOSCALING_HPA_MIN_REPLICA`	Value of `GRAFANA_DEPLOYMENT_REPLICAS`	HPA minimum replicas
`GRAFANA_AUTOSCALING_HPA_MAX_REPLICA`	3	HPA maximum replicas
`GRAFANA_AUTOSCALING_HPA_TARGET_CPU`	`"70"`	CPU threshold for scaling
`GRAFANA_AUTOSCALING_HPA_TARGET_MEM`	`"75"`	Memory threshold for scaling
`GRAFANA_DB_SSL_MODE`	(varies)	PostgreSQL SSL mode (optional)

Automatically configured when HA is enabled¶

GRAFANA_DB_ENABLED → "true"
GRAFANA_AUTOSCALING_HPA_ENABLED → "true"
GRAFANA_POD_DISRUPTION_BUDGET_ENABLED → "true"
GRAFANA_POD_ANTI_AFFINITY_ENABLED → "true"
Persistence for Grafana → disabled (no PVC)
Headless service and HA alerting → enabled

Troubleshooting¶

Issue	What to check
Grafana pods not starting / DB connection errors	Verify `GRAFANA_DB_*` variables (host, port, name, user, password). Ensure the database and user exist and are reachable from the Kubernetes cluster. Test connectivity from a pod in the cluster.
YAML error on Helm upgrade	Ensure numeric variables use proper format (e.g., `3` not `"3"` for replicas). Quote values that contain special characters.
Duplicate alert notifications	Confirm HA is enabled (headless service, multiple replicas, shared DB).
HPA not scaling	Check HPA status with `kubectl describe hpa <grafana-hpa> -n <namespace>`. Ensure metrics-server is running in your cluster. Verify CPU/memory requests are set on Grafana pods.