Skip to content

Backing up audits in Apache Solr

In self-managed deployments, audit logs are stored in Apache Solr, which is used by Privacera Portal to display audit logs. If you want to back up the audit logs in Apache Solr, you can use the script provided in this section.

It is highly recommended to configure the Audit Server to send audit logs to external storage, such as GCS, ADLS, or S3.

Prerequisites

Prerequisite Description
Apache Solr In Self Managed deployments, Apache Solr is installed by default

Backup Script

To manually back up audits from the ranger_audits collection, follow these steps:

Create a script file named backup_solr_docs.sh with the following content:

Update the value to SOLR_URL and ensure it is accessible from the machine where the script is executed.

Script to Backup Audits in Apache Solr
backup_solr_docs.sh
#!/bin/bash
# Solr configuration
# Update with your Solr URL and make sure it is accessible from the machine where the script is executed
SOLR_URL="https://localhost:8983/solr"  # Update with your Solr URL
COLLECTION_NAME="ranger_audits"      # Replace with your collection name
DATE_FIELD="evtTime"                 # Replace with your date field name

# Input arguments: start and end date
START_DATE=$1
END_DATE=$2
OUTPUT_DIR=${3:-"./solr_backup"}

# Create output directory if it doesn't exist
mkdir -p "${OUTPUT_DIR}"

# Function to get the total number of documents
get_total_documents() {
QUERY="q=*:*&fq=:evtTime[${START_DATE}T00:00:00Z TO ${END_DATE}T23:59:59Z]&rows=0"
RESPONSE=$(curl -s -G "${SOLR_URL}/${COLLECTION_NAME}/select" --data-urlencode "q=*:*" --data-urlencode "fq=evtTime:[2024-01-01T00:00:00Z TO 2024-12-31T23:59:59Z]" --data "rows=0")
TOTAL_DOCS=$(echo "${RESPONSE}" | jq '.response.numFound')
if [[ -z "${TOTAL_DOCS}" || "${TOTAL_DOCS}" == "null" ]]; then
echo "Error: Unable to determine total documents. Check query or Solr response."
exit 1
fi

echo "${TOTAL_DOCS}"
}

# Backup documents with pagination
backup_documents() {
TOTAL_DOCS=$(get_total_documents)
echo "Total documents to back up: ${TOTAL_DOCS}"

# Solr pagination parameters
BATCH_SIZE=1000
START=0

while [ "${START}" -lt "${TOTAL_DOCS}" ]; do
    echo "Backing up documents ${START} to $((${START} + ${BATCH_SIZE}))..."

    # Encode query parameters to handle special characters properly    
    QUERY="fq=evtTime%3A%5B${START_DATE}T00%3A00%3A00Z%20TO%20${END_DATE}T23%3A59%3A59Z%5D&indent=true&q.op=OR&q=*%3A*&rows=1000&sort=evtTime%20desc&start=${START}&rows=${BATCH_SIZE}"

    # Define output file
    OUTPUT_FILE="${OUTPUT_DIR}/backup_${START}_to_$((${START} + ${BATCH_SIZE})).json"

    # Execute curl with the encoded query
    curl -s "${SOLR_URL}/${COLLECTION_NAME}/select?${QUERY}" -o "${OUTPUT_FILE}"

    # Log output file
    if [ $? -eq 0 ]; then
    echo "Backup saved to ${OUTPUT_FILE}"
    else
    echo "Error: Backup failed for documents ${START} to $((${START} + ${BATCH_SIZE}))"
fi
# Increment start for the next batch
START=$((START + BATCH_SIZE))
done

echo "Backup completed. Files saved in ${OUTPUT_DIR}"
}

# Main execution
if [ -z "${START_DATE}" ] || [ -z "${END_DATE}" ]; then
echo "Usage: $0 <start_date> <end_date> [output_dir]"
exit 1
fi

backup_documents
  1. Make the Script Executable
    Bash
        chmod +x backup_solr_docs.sh
    
  2. Run the Script
    Bash
    #./delete_ranger_audits.sh ${delete_doc_from_date} ${delete_doc_to_date}
    ./backup_solr_docs.sh 2024-01-01 2024-12-31 /home/privacera
    

Comments