Tuning Apache Solr installation¶
Overview¶
Apache Solr is a powerful search platform which is used to power the Privacera Portal search screens for Access Manager Access Audits, Privacera Discovery Classifications and Privacera PolicySync activity logs. Apache Zookeeper is used to manage the Solr cluster and ensure high availability.
Apache Solr and Zookeeper are installed as part of the Privacera Manager installation for Self Managed and Data Plane with Discovery. These are critical components that need to be configured properly for optimal performance, as they help in monitoring and triaging issues with the Access Manager policies and Policysync activity. They power the Discovery classifications user interface.
The out of box base installation starts with one Solr and one Zookeeper instances. However for production load, it is recommended to have a minimum of three replicas for both Solr and Zookeeper. This is to ensure high availability and fault tolerance. In addition, the Solr heap size and memory requests and limits should be set to match the expected size of the collections on disk.
Prerequisites¶
This setup will increase the number of Solr and Zookeeper pods to 3 and also increase the heap size and memory requests and limits. Please ensure that the Kubernetes have enough capacity to handle the increased number of pods and memory.
Setup¶
Below are the steps to configure Solr and Zookeeper for production use.
Warning
This step will delete the existing Solr and Zookeeper deployments and PVCs. This will result in loss of all the existing Solr collections and data. This is not a reversible step.
- For Access Audits, please ensure that you have already configured the Audit Server to send the access audits to Cloud Object Store like AWS S3, ADLS or GCS.
- For PolicySync Activity Audits, please note that you will lose all the existing PolicySync activity logs.
- For Discovery, please consult with your Privacera representative to ensure that you have a backup of the Discovery classifications and for the instructions to re-create the Discovery classifications.
Step 1: Update the configuration as follows¶
Create a file config/custom-vars/vars.solr.yml
and add the following variables:
Copy the following file and edit it,
Bash | |
---|---|
Step 2: Backup the config/ssl folder and delete the generated files¶
Warning
You will need to re-distribute Plugins packages for EMR, Databricks, Apache Spark and any other Ranger plugins that you are using after the upgrade as the mutual TLS keys are re-generated.
Bash | |
---|---|
Step 2: Generate the helm charts¶
Step 3: Uninstall the Solr and Zookeeper helm charts¶
Warning
All the search indexes will be cleared. This step should be done prior to rolling out to production.
Bash | |
---|---|
Confirm that the deployments and PVCs are deleted.
Step 4: Scale down Portal and Ranger deployments¶
Bash | |
---|---|
Confirm that the pods are deleted.
Step 5: Install the new helm charts¶
Bash | |
---|---|
Wait for the solr and zookeeper pods to be up and running.
Step 6: Scale up the Portal and Ranger deployments¶
Bash | |
---|---|
If you have multiple replicas, then scale them up to the number of replicas you want. This step will re-create all the solr collections. You can log into the Solr admin UI to check the status of the collections.