Skip to content

Grafana High Availability (HA)

This guide explains how to enable Grafana HA in Privacera monitoring. When enabled, Grafana uses a shared database, multiple replicas, HA alerting, a Pod Disruption Budget (PDB), and anti-affinity for resilience.


What Grafana HA gives you

Feature What it does Default when HA enabled
Shared database Dashboards, alerts, users, and sessions are stored in a database so all Grafana pods see the same data. Without HA, Grafana uses SQLite (local file) which doesn't work with multiple replicas. Database enabled; SQLite disabled
Multiple replicas Runs multiple Grafana pods for high availability. If one pod fails, others continue serving requests. Provides load distribution and zero-downtime updates. 1 replica (default, can be overridden)
No PVC for Grafana Grafana does not use persistent volume storage. All state (dashboards, alerts, users, sessions) is stored in the shared database. Persistence: disabled
HA alerting Only one pod (the leader) sends alert notifications via gossip protocol on port 9094. Prevents duplicate notifications when multiple Grafana pods are running. Enabled with headless service
Horizontal Pod Autoscaler (HPA) Automatically scales Grafana pods up or down based on CPU and memory usage. Ensures optimal resource utilization and performance during varying load. Enabled: min 1, max 3 pods, CPU 70%, memory 75%
Pod Disruption Budget (PDB) Ensures a minimum number of Grafana pods remain available during voluntary disruptions (node maintenance, cluster upgrades). Prevents all pods from being terminated simultaneously. 1 pod minimum available
Pod anti-affinity Kubernetes scheduler tries to spread Grafana pods across different nodes. If one node fails, Grafana remains available on other nodes. Preferred (soft rule) on hostname topology

Prerequisites

  • Database: A shared database (PostgreSQL or MySQL) is required for Grafana HA. You can use the same RDBMS service used by Privacera or any other database instance.
  • Vars file: You will edit the monitoring vars file. Follow the steps below to set it up.

Setup vars file

  1. SSH into the instance where Privacera Manager is installed.

  2. Navigate to the config directory:

    Bash
    cd ~/privacera/privacera-manager/config/
    

  3. Copy vars.monitoring.yml file from sample-vars folder to custom-vars folder:

!!! note If this file already exists in custom-vars folder, you can skip this step.

Bash
cp sample-vars/vars.monitoring.yml custom-vars/
  1. Open the file for editing:
    Bash
    vi custom-vars/vars.monitoring.yml
    

In the rest of this guide, this file is referred to as vars.monitoring.yml.


Step 1: Create the Grafana database and user

Before enabling Grafana HA, you must create a dedicated database and user for Grafana in your database server.

Database choice

You can use the same RDBMS service used by Privacera or any other database instance. The commands below work for both internal (deployed by Privacera) and external databases.

Login to your database

Connect to your database server using an admin user with privileges to create databases and users.

For MySQL:

Bash
mysql -u <username> -p -h <hostname> -P 3306

For PostgreSQL:

Bash
psql -U <username> -d <database_name> -h <hostname> -p 5432

Replace:

  • <username> - Admin username
  • <hostname> - Database hostname

Create database and user

Once logged in, execute the following SQL commands:

SQL
-- Create database
CREATE DATABASE grafana;

-- Create user (syntax varies by database type)
CREATE USER grafana_user IDENTIFIED BY 'your_secure_password';

-- Grant privileges
GRANT ALL PRIVILEGES ON grafana TO grafana_user;

-- Apply changes
FLUSH PRIVILEGES;

Important notes

  • Replace your_secure_password with a strong password
  • Remember the database name, username, and password - you'll need them in Step 2
  • These commands work identically for internal and external databases

Step 2: Configure Grafana HA variables

Edit vars.monitoring.yml and set the following variables:

YAML
###############################################
######## Grafana High Availability (HA) #########
###############################################

# Enable Grafana HA (required for all HA features)
GRAFANA_HA_ENABLE: "true"

# --- Database Configuration ---
GRAFANA_DB_TYPE: "mysql"                    # Options: "postgres" or "mysql"
GRAFANA_DB_HOST: "your-db-host.example.com" # Database hostname or IP
GRAFANA_DB_PORT: "3306"                     # Port: 5432 for PostgreSQL, 3306 for MySQL/MariaDB
GRAFANA_DB_NAME: "grafana"                  # Database name you created
GRAFANA_DB_USER: "grafana_user"             # Database user you created
GRAFANA_DB_PASSWORD: "your_secure_password" # Password for the database user

# --- Optional: PostgreSQL only ---
# GRAFANA_DB_SSL_MODE: "require"            # Options: require, verify-full, disable

## Grafana HA Replicas Configuration
## Uncomment to override default replicas (default: 1 when HA enabled)
GRAFANA_DEPLOYMENT_REPLICAS: 1

Variable descriptions

Variable Purpose Example
GRAFANA_HA_ENABLE Master flag to enable all HA features "true"
GRAFANA_DB_TYPE Database driver to use "postgres" or "mysql"
GRAFANA_DB_HOST Database hostname or IP address "db.example.com" or "10.0.1.50"
GRAFANA_DB_PORT Database port "5432" (PostgreSQL) or "3306" (MySQL/MariaDB)
GRAFANA_DB_NAME Database name created for Grafana "grafana"
GRAFANA_DB_USER Database user created for Grafana "grafana_user"
GRAFANA_DB_PASSWORD Password for the database user Your secure password
GRAFANA_DB_SSL_MODE SSL mode for PostgreSQL (optional) "require", "verify-full", or "disable"

SSL Mode

GRAFANA_DB_SSL_MODE only applies to PostgreSQL and is ignored for MySQL.

What happens when you enable HA

When GRAFANA_HA_ENABLE is set to "true", the following are automatically configured:

  • Database usage enabled (SQLite disabled)
  • Initial replicas set to the value of GRAFANA_DEPLOYMENT_REPLICAS (default: 1).
  • HPA enabled with min 1, max 3 replicas
  • Deployment strategy changed to RollingUpdate
  • Headless service created for HA alerting
  • Pod Disruption Budget enabled (min 1 pod available)
  • Pod anti-affinity enabled (spread across nodes)
  • Persistence disabled (no PVC for Grafana)

Step 3: Optional - Override HPA settings

What is HPA?

HPA (Horizontal Pod Autoscaler) automatically scales the number of Grafana pods based on resource usage (CPU and memory). When load increases, HPA adds more pods; when load decreases, it removes pods, ensuring optimal resource utilization and performance.

Default HPA values when HA is enabled

  • Initial replicas: 1 (starting number of Grafana pods)
  • HPA min replicas: Set to value of GRAFANA_DEPLOYMENT_REPLICAS
  • HPA max replicas: 3 (maximum pods allowed)
  • CPU threshold: 70% (scale up when CPU exceeds this)
  • Memory threshold: 75% (scale up when memory exceeds this)

Replica variable behavior

When HPA is enabled, GRAFANA_AUTOSCALING_HPA_MIN_REPLICA is automatically set to the value of GRAFANA_DEPLOYMENT_REPLICAS to ensure consistent baseline capacity. HPA then manages the replica count dynamically between min and max based on resource usage.

HPA variables explained

You can override these in vars.monitoring.yml:

YAML
# Initial number of Grafana pods (starting point before HPA takes over)
GRAFANA_DEPLOYMENT_REPLICAS: 1

# Minimum number of pods HPA will maintain (set to value of GRAFANA_DEPLOYMENT_REPLICAS)
GRAFANA_AUTOSCALING_HPA_MIN_REPLICA: "{{GRAFANA_DEPLOYMENT_REPLICAS}}"

# Maximum number of pods HPA can scale up to (under high load)
GRAFANA_AUTOSCALING_HPA_MAX_REPLICA: 3

# CPU threshold: HPA scales up when average CPU usage exceeds this percentage
GRAFANA_AUTOSCALING_HPA_TARGET_CPU: "70"

# Memory threshold: HPA scales up when average memory usage exceeds this percentage
GRAFANA_AUTOSCALING_HPA_TARGET_MEM: "75"
Variable What it means Example
GRAFANA_DEPLOYMENT_REPLICAS Initial/starting number of Grafana pods 1 = Start with 1 pod
GRAFANA_AUTOSCALING_HPA_MIN_REPLICA Minimum pods to keep running at all times (set to value of GRAFANA_DEPLOYMENT_REPLICAS) GRAFANA_DEPLOYMENT_REPLICAS = Use same as initial replicas
GRAFANA_AUTOSCALING_HPA_MAX_REPLICA Maximum pods allowed during high load 3 = Never scale above 3 pods
GRAFANA_AUTOSCALING_HPA_TARGET_CPU CPU percentage that triggers scale-up "70" = Scale up when CPU > 70%
GRAFANA_AUTOSCALING_HPA_TARGET_MEM Memory percentage that triggers scale-up "75" = Scale up when memory > 75%

How HPA works

If your Grafana pods are using 80% CPU (above the 70% target), HPA will add more pods (up to max 3). If CPU drops to 50% (below target), HPA will remove pods (down to the minimum, which is set to the value of GRAFANA_DEPLOYMENT_REPLICAS).


Step 4: Apply and verify

Regenerate and deploy

  1. Run setup so the Grafana Helm values are regenerated from your vars:
Bash
./privacera-manager.sh setup
  1. Install or upgrade (as you normally do):
Bash
./pm_with_helm.sh install

Or run your usual playbook that deploys monitoring/Grafana.

Sanity checks

  • Pods: Check that Grafana pods are running:

    Bash
    kubectl get pods -n <namespace> -l app.kubernetes.io/name=grafana
    
    You should see Grafana pods (e.g. 1 to 3 depending on load and HPA).

  • HPA: Verify HPA is active:

    Bash
    kubectl get hpa -n <namespace>
    
    You should see the Grafana HPA with current/min/max replicas.

  • DB configuration in generated values: After running setup, verify that database variables are correctly rendered in the Grafana Helm values file. Check the grafana.ini section (database config) in:

    Bash
    privacera/privacera-manager/output/kubernetes/helm/monitoring/grafana/grafana-values.yml
    
    Confirm that the database type, host, port, name, and user are set as per your vars.monitoring.yml. This file is generated from your vars; if DB settings are wrong here, fix them in vars.monitoring.yml and run setup again.


Quick reference: variables summary

Required variables for HA

Variable Example Description
GRAFANA_HA_ENABLE "true" Master flag to enable HA
GRAFANA_DB_TYPE "postgres" or "mysql" Database type
GRAFANA_DB_HOST "db.example.com" Database hostname
GRAFANA_DB_PORT "5432" or "3306" Database port
GRAFANA_DB_NAME "grafana" Database name
GRAFANA_DB_USER "grafana_user" Database user
GRAFANA_DB_PASSWORD "your_password" Database password

Optional variables (override defaults)

Variable Default when HA Purpose
GRAFANA_DEPLOYMENT_REPLICAS 1 Initial number of Grafana pods
GRAFANA_AUTOSCALING_HPA_MIN_REPLICA Value of GRAFANA_DEPLOYMENT_REPLICAS HPA minimum replicas
GRAFANA_AUTOSCALING_HPA_MAX_REPLICA 3 HPA maximum replicas
GRAFANA_AUTOSCALING_HPA_TARGET_CPU "70" CPU threshold for scaling
GRAFANA_AUTOSCALING_HPA_TARGET_MEM "75" Memory threshold for scaling
GRAFANA_DB_SSL_MODE (varies) PostgreSQL SSL mode (optional)

Automatically configured when HA is enabled

  • GRAFANA_DB_ENABLED"true"
  • GRAFANA_AUTOSCALING_HPA_ENABLED"true"
  • GRAFANA_POD_DISRUPTION_BUDGET_ENABLED"true"
  • GRAFANA_POD_ANTI_AFFINITY_ENABLED"true"
  • Persistence for Grafana → disabled (no PVC)
  • Headless service and HA alerting → enabled

Troubleshooting

Issue What to check
Grafana pods not starting / DB connection errors Verify GRAFANA_DB_* variables (host, port, name, user, password). Ensure the database and user exist and are reachable from the Kubernetes cluster. Test connectivity from a pod in the cluster.
YAML error on Helm upgrade Ensure numeric variables use proper format (e.g., 3 not "3" for replicas). Quote values that contain special characters.
Duplicate alert notifications Confirm HA is enabled (headless service, multiple replicas, shared DB).
HPA not scaling Check HPA status with kubectl describe hpa <grafana-hpa> -n <namespace>. Ensure metrics-server is running in your cluster. Verify CPU/memory requests are set on Grafana pods.