Skip to content

Tuning Apache Solr installation

Overview

Apache Solr is a powerful search platform which is used to power the Privacera Portal search screens for Access Manager Access Audits, Privacera Discovery Classifications and Privacera PolicySync activity logs. Apache Zookeeper is used to manage the Solr cluster and ensure high availability.

Apache Solr and Zookeeper are installed as part of the Privacera Manager installation for Self Managed and Data Plane with Discovery. These are critical components that need to be configured properly for optimal performance, as they help in monitoring and triaging issues with the Access Manager policies and Policysync activity. They power the Discovery classifications user interface.

The out of box base installation starts with one Solr and one Zookeeper instances. However for production load, it is recommended to have a minimum of three replicas for both Solr and Zookeeper. This is to ensure high availability and fault tolerance. In addition, the Solr heap size and memory requests and limits should be set to match the expected size of the collections on disk.

Prerequisites

This setup will increase the number of Solr and Zookeeper pods to 3 and also increase the heap size and memory requests and limits. Please ensure that the Kubernetes have enough capacity to handle the increased number of pods and memory.

Setup

Below are the steps to configure Solr and Zookeeper for production use.

Warning

This step will delete the existing Solr and Zookeeper deployments and PVCs. This will result in loss of all the existing Solr collections and data. This is not a reversible step.

  1. For Access Audits, please ensure that you have already configured the Audit Server to send the access audits to Cloud Object Store like AWS S3, ADLS or GCS.
  2. For PolicySync Activity Audits, please note that you will lose all the existing PolicySync activity logs.
  3. For Discovery, please consult with your Privacera representative to ensure that you have a backup of the Discovery classifications and for the instructions to re-create the Discovery classifications.

Step 1: Update the configuration as follows

Create a file config/custom-vars/vars.solr.yml and add the following variables:

Copy the following file and edit it,

Bash
1
2
3
cd ~/privacera/privacera-manager
cp -n config/sample-vars/vars.solr.yml config/custom-vars/
vi config/custom-vars/vars.solr.yml
YAML
# The Java heap size for Solr can be max 32GB. You can start with a lower value such 
# as 8 GB and increase it as needed.
# Refer - https://solr.apache.org/guide/solr/latest/deployment-guide/jvm-settings.html
SOLR_HEAP_MIN_MEMORY: "16384m"
SOLR_HEAP_MAX_MEMORY: "16384m"

# The memory requests and limits of Solr pod should match the expected size of the collections on 
#  disk as Solr uses off-heap memory.
#
# Refer https://solr.apache.org/guide/solr/latest/deployment-guide/jvm-settings.html 
#
# As an example, we are setting the memory requests and limits to 40 GB. This means the 
# Kubernetes nodes should have at least 40 GB of memory available for Solr pods including 
# JVM heap memory and off-heap memory.
SOLR_K8S_MEM_REQUESTS: "40960M"
SOLR_K8S_MEM_LIMITS: "40960M"

# Set the maximum retention period for Access Manager (Ranger) access audits to 15 days. 
# You can tune this as required based on your access audit volume. Note that the 
# audit search in Portal is for triaging current issues. For analysis of usage patterns
# that requires historical access audits, you should persist the access audits to object-store 
# like AWS S3, ADLS or GCS and run queries on them.
MAX_AUDIT_RETENTION_DAYS: "15"

# The number of shards for the Ranger collection should be set to 3. 
RANGER_SOLR_NUMBER_OF_SHARDS: "3"

# The Solr pods should be configured with 3 replicas minimum.
SOLR_K8S_CLUSTER_SIZE: 3

# The Zookeeper pods should be configured with 3 replicas. There is no need to go beyond this.
ZOOKEEPER_CLUSTER_SIZE: 3

Step 2: Backup the config/ssl folder and delete the generated files

Warning

You will need to re-distribute Plugins packages for EMR, Databricks, Apache Spark and any other Ranger plugins that you are using after the upgrade as the mutual TLS keys are re-generated.

Bash
1
2
3
4
5
cd ~/privacera/privacera-manager
cd config
tar -czvf backup-ssl-$(date '+%Y-%m-%d-%H%M%S').tar.gz /path/to/directory
cd ssl
rm *p12 *cer *pem *jceks *jks

Step 2: Generate the helm charts

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh setup

Step 3: Uninstall the Solr and Zookeeper helm charts

Warning

All the search indexes will be cleared. This step should be done prior to rolling out to production.

Bash
1
2
3
cd ~/privacera/privacera-manager
helm uninstall solr -n <namespace>
helm uninstall zookeeper -n <namespace>

Confirm that the deployments and PVCs are deleted.

Step 4: Scale down Portal and Ranger deployments

Bash
1
2
3
cd ~/privacera/privacera-manager
kubectl scale deployment portal --replicas=0 -n <namespace>
kubectl scale deployment ranger-admin --replicas=0 -n <namespace>

Confirm that the pods are deleted.

Step 5: Install the new helm charts

Bash
1
2
3
cd ~/privacera/privacera-manager
helm upgrade --install zookeeper --namespace <namespace> output/kubernetes/helm/zookeeper
helm upgrade --install solr --namespace <namespace> output/kubernetes/helm/solr

Wait for the solr and zookeeper pods to be up and running.

Step 6: Scale up the Portal and Ranger deployments

Bash
1
2
3
cd ~/privacera/privacera-manager
kubectl scale deployment portal --replicas=1 -n <namespace>
kubectl scale deployment ranger-admin --replicas=1 -n <namespace>

If you have multiple replicas, then scale them up to the number of replicas you want. This step will re-create all the solr collections. You can log into the Solr admin UI to check the status of the collections.

Step 7: Run the post-install task

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh post-install

Reference to any Discovery Solr configuration

Reference to Portal screens that use Solr

Comments