Setup for Access Management for Apache Spark¶
Configure¶
This section outlines the steps to set up the Apache Spark OLAC or OLAC_FGAC with the Privacera Plugin. Ensure that all prerequisites are completed before beginning the setup process.
Perform following steps to configure Apache Spark OLAC connector:
-  SSH into the instance where Privacera Manager is installed. 
-  Run the following command to navigate to the /configdirectory and copy yml files:
-  Modify the following properties: -  In the vars.spark-standalone.ymlfile, update the following properties with the appropriate values:#Spark Env name. SPARK_ENV_TYPE: "<PLEASE_CHANGE>" #Add the spark home path #Eg: "/opt/spark" SPARK_HOME: "<PLEASE_CHANGE>" #Spark user home directory #Eg: "/opt/privacera" SPARK_USER_HOME: "<PLEASE_CHANGE>"Variable Definition Example SPARK_ENV_TYPE Set the environment type. privacera_spark_olacSPARK_HOME Home path of your Spark installation. /opt/sparkSPARK_USER_HOME User home directory of your Spark installation. /opt/privacera
 
-  
-  Once the properties are configured, update your Privacera Manager platform instance by following the commands 
-  Once the post-install process is complete, you will see spark-standalonefolder in the~/privacera/privacera-manager/outputdirectory, with the following folder structure:output/ ├── spark-standalone/ │ ├── spark_custom_conf/ │ │ ├── FGAC/ │ │ │ ├── README.md │ │ │ ├── jwttoken.pub │ │ │ ├── privacera_spark.properties │ │ ├── OLAC/ │ │ │ ├── privacera_spark.properties │ │ │ ├── README.md │ │ ├── OLAC_FGAC/ │ │ │ ├── README.md │ │ │ ├── jwttoken.pub │ │ │ ├── privacera_spark.properties │ │ ├── auditserver-secrets-keystore.jks │ │ ├── global-truststore.p12 │ │ ├── ranger-plugin-keystore.p12 │ │ ├── ranger.jceks │ ├── privacera_setup.sh │ ├── spark_custom_conf.zip │ ├── standalone_spark_FGAC.sh │ ├── standalone_spark_OLAC.sh │ ├── standalone_spark_OLAC_FGAC.sh
Setup Scripts and Configuration Files¶
Creating Script Files¶
-  Create privacera-oss-pluginfolder and navigate to it
-  Create a penv.shfile and copy the following content to itpenv.sh
-  Run the following command to create and edit build_privacera_plugin.shfile. Expand the following section and copy it's content to thebuild_privacera_plugin.shfilebuild_privacera_plugin.sh
-  Create scriptsfolder and navigate to it
-  Create privacera_setup.shscript file and copy the following content to itprivacera_setup.sh
-  Create standalone_spark_setup.shscript file and copy the following content to itstandalone_spark_setup.shstandalone_spark_setup.sh 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 #!/bin/bash set -x #no need to change SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" PRIVACERA_WORK_DIR=${PRIVACERA_WORK_DIR:-${SCRIPT_DIR}/work} PRIVACERA_OUT_FILE=${PRIVACERA_OUT_FILE:-${PRIVACERA_WORK_DIR}/privacera.out} PKG_NAME=${PKG_NAME:-privacera-spark-plugin.tar.gz} PRIVACERA_CONF_FOLDER=${PRIVACERA_CONF_FOLDER:-${SPARK_HOME}/conf} PRIVACERA_CUSTOM_CONF_ZIP=${PRIVACERA_CUSTOM_CONF_ZIP:-spark_custom_conf.zip} SPARK_DEFAULT_CONF=${SPARK_HOME}/conf/spark-defaults.conf SPARK_PLUGIN_TYPE=${SPARK_PLUGIN_TYPE:-"OLAC"} OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE:-false} ENV_TYPE=${ENV_TYPE:-"PLATFORM"} SPARK_VERSION=${SPARK_VERSION} if [[ $ENV_TYPE != "PCLOUD" ]]; then # for platform we have to append as we are using same path for both export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}/spark-plugin fi rm -rf ${PRIVACERA_OUT_FILE} function log(){ msg=$1 currentTime=`date` echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE} } function createFolders() { #delete existing jars and work folder so update would not create issue as jar name is different in every release rm -rf ${SPARK_HOME}/jars/ranger-* rm -rf ${SPARK_HOME}/jars/privacera* rm -rf ${PRIVACERA_WORK_DIR} mkdir -p ${PRIVACERA_WORK_DIR} } function validation(){ if [[ $SPARK_HOME == "" ]];then log " SPARK_HOME can't be empty " exit 1; fi log "SPARK_HOME = ${SPARK_HOME}" } download_pkg(){ log "Downloading pkg from ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}" cd ${PRIVACERA_WORK_DIR} if [[ ${PRIVACERA_BASE_DOWNLOAD_URL} == https* ]]; then log "Downloading pkg using wget ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}" wget -nv ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME} -O ${PKG_NAME} else log "download pkg path is not yet supported ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}" fi } function install() { log "Extracting ${PRIVACERA_WORK_DIR}/${PKG_NAME} into ${PRIVACERA_WORK_DIR}" tar xzf ${PRIVACERA_WORK_DIR}/${PKG_NAME} --directory ${PRIVACERA_WORK_DIR} #putting privacera jar into spark home cp -r ${PRIVACERA_WORK_DIR}/spark-plugin/* $SPARK_HOME/jars/ if [ ! -f $SPARK_DEFAULT_CONF ]; then cp $SPARK_DEFAULT_CONF.template $SPARK_DEFAULT_CONF fi } function update_jars() { # find java major version JAVA_MAJOR_VERSION=$(java -version 2>&1 | sed -E -n 's/.* version "([^.-]*).*"/\1/p' | cut -d' ' -f1) echo "Java major version: ${JAVA_MAJOR_VERSION}" if [ ${JAVA_MAJOR_VERSION} -ge 15 ]; then cp -r ${PRIVACERA_WORK_DIR}/jdk15-jars/* $SPARK_HOME/jars/ranger-spark-plugin-impl/ fi } function configure() { log "Configure started" # NOTE: Equivalent of this will be done in the entrypoint script PRIVACERA_CONF_WORK_DIR="${PRIVACERA_WORK_DIR}/spark-conf" cp ${PRIVACERA_CONF_WORK_DIR}/resource_type_plugin_map.json ${PRIVACERA_CONF_FOLDER}/ #setting spark agent if [ `grep -rin "spark.driver.extraJavaOptions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|spark.driver.extraJavaOptions .*|spark.driver.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF else echo "spark.driver.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties" >> $SPARK_DEFAULT_CONF fi if [[ "${SPARK_PLUGIN_TYPE}" == *"FGAC"* ]]; then log "Spark python/sql plugin " if [ `grep -rin "spark.sql.extensions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension|g' $SPARK_DEFAULT_CONF else echo "#Setting Privacera spark-plugin properties" >> $SPARK_DEFAULT_CONF echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension" >> $SPARK_DEFAULT_CONF fi elif [[ "${SPARK_PLUGIN_TYPE}" == *"OLAC"* ]]; then echo "Setting agent for executor" #setting spark agent if [ `grep -rin "# spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|# spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF elif [ `grep -rin "spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar|g -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties' $SPARK_DEFAULT_CONF else echo "spark.executor.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties " >> $SPARK_DEFAULT_CONF fi fi ## add or append 'spark.sql.hive.metastore.sharedPrefixes' if [ `grep -rin "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sharedPrefixes=`grep -ri "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF` sharedPrefixes="${sharedPrefixes#spark.sql.hive.metastore.sharedPrefixes}" # trim leading whitespaces ### sharedPrefixes="${sharedPrefixes##*( )}" # trim trailing whitespaces ## sharedPrefixes="${sharedPrefixes%%*( )}" sed -i -e "s|^spark.sql.hive.metastore.sharedPrefixes .*|#spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes}|g" $SPARK_DEFAULT_CONF echo "spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes},com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF else echo "spark.sql.hive.metastore.sharedPrefixes com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF fi log "Configure completed" } function setup_deltalake() { # configure deltalake support log "setup_deltalake started" if [ "${OSS_DELTA_LAKE_ENABLE}" == true ]; then if [ "${SPARK_VERSION}" = "3.4.0" ] || [ "${SPARK_VERSION}" = "3.4.1" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/2.4.0/delta-storage-2.4.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.0" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.0.0/delta-spark_2.12-3.0.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.0.0/delta-storage-3.0.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.1" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.1.0/delta-spark_2.12-3.1.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.1.0/delta-storage-3.1.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.3" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.2.1/delta-spark_2.12-3.2.1.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.2.1/delta-storage-3.2.1.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.4" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.0/delta-spark_2.12-3.3.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.0/delta-storage-3.3.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.5" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.1/delta-spark_2.12-3.3.1.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.1/delta-storage-3.3.1.jar -P "${SPARK_HOME}/jars/" else cp -r "${PRIVACERA_WORK_DIR}/oss-delta-jars/"* "${SPARK_HOME}/jars/" fi # update spark-defaults.conf if [ $(grep -rin "spark.sql.catalog.spark_catalog" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then sed -i '' -e 's|spark.sql.catalog.spark_catalog.*|spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog|g' "$SPARK_DEFAULT_CONF" else echo "spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog" >> "$SPARK_DEFAULT_CONF" fi if [[ "${SPARK_PLUGIN_TYPE}" == *"FGAC"* ]]; then if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then sed -i "s|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF" else echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF" fi else if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then sed -i "s|spark.sql.extensions.*|spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF" else echo "spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF" fi fi fi } function verify() { log "Privacera Spark setup completed" } createFolders validation download_pkg install update_jars configure setup_deltalake verify
-  Make the privacera_setup.shandstandalone_spark_setup.shscript files executable
Create Configuration Files¶
- Create configfolder and navigate to it
-  Copy the privacera_spark.propertiesandglobal-truststore.p12files to theconfigfolder
-  Navigate to configfolder and update theprivacera_spark.propertiesfile as follow:
-  Copy the privacera_spark.propertiesand required configuration files to theconfigfoldercp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/OLAC_FGAC/privacera_spark.properties ~/privacera-oss-plugin/config/ cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/auditserver-secrets-keystore.jks ~/privacera-oss-plugin/config/ cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/ranger-plugin-keystore.p12 ~/privacera-oss-plugin/config/ cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/ranger.jceks ~/privacera-oss-plugin/config/ cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/global-truststore.p12 ~/privacera-oss-plugin/config/
-  Navigate to configfolder and update theprivacera_spark.propertiesfile as follow:cd ~/privacera-oss-plugin/config/ vi privacera_spark.properties privacera.signer.truststore=/opt/privacera/global-truststore.p12 xasecure.audit.keystore.path=/opt/privacera/auditserver-secrets-keystore.jks xasecure.policymgr.clientssl.keystore=/opt/privacera/ranger-plugin-keystore.p12 xasecure.policymgr.clientssl.keystore.credential.file=jceks://file//opt/privacera/ranger.jceks xasecure.policymgr.clientssl.truststore=/opt/privacera/global-truststore.p12 xasecure.policymgr.clientssl.truststore.credential.file=jceks://file//opt/privacera/ranger.jceks
-  Remove the following property from the privacera_spark.propertiesfile:
-  Create log4j2.propertiesfile and copy the following content to itlog4j2.properties
Generate Privacera Deployment File¶
Note
- To enable deltalake support, refer Enable delta lake.
- To enable miniosupport, refer MinIO configuration.
- To enable static jwt public keysupport, refer Configure static JWT
- Execute the build_privacera_plugin.shscript to generate the privacera deployment tarball file:
Building the Docker Image¶
Create Dockerfile¶
-  Create Dockerfilescript file and copy the following content to itNote - The Docker file Dockerfile is provided here is as an example file that uses the open-source Apache Spark images. Instead, you can use your custom spark docker file and integrate the below steps to deploy Privacera Plugin into it.
- To achieve that, add the below commands in your existing Dockerfile ARG PRIVACERA_SPARK_PLUGIN_TAR_GZ=./privacera-spark-plugin.tar.gz # Add the privacera-spark-plugin tar.gz to the image at the root ADD ${PRIVACERA_SPARK_PLUGIN_TAR_GZ} RUN mkdir -p ${PRIVACERA_HOME} RUN chown -R ${USER_NAME}:${GROUP_NAME} ${PRIVACERA_HOME} # Symlinks for common files RUN ln -sf /privacera-secret/privacera_spark.properties /opt/spark/conf/privacera_spark.properties && \ ln -sf /privacera-secret/global-truststore.p12 /opt/privacera/global-truststore.p12 && \ ln -sf /privacera-conf/log4j2.properties /opt/spark/conf/log4j2.properties # Symlinks for OLAC_FGAC RUN if [ "$PRIVACERA_SPARK_PLUGIN_TYPE" = "OLAC_FGAC" ]; then \ ln -sf /privacera-secret/auditserver-secrets-keystore.jks /opt/privacera/auditserver-secrets-keystore.jks && \ ln -sf /privacera-secret/ranger-plugin-keystore.p12 /opt/privacera/ranger-plugin-keystore.p12 && \ ln -sf /privacera-secret/ranger.jceks /opt/privacera/ranger.jceks ; \ fi
 Dockerfile
Build the Docker Image¶
-  Run the following command to create and edit build_spark_image.shfile. Expand the following section and copy it's content to thebuild_spark_image.shfilebuild_spark_image.sh
-  Execute the build_spark_image.shscript to generate the privacera deployment files:
Push the Docker Image to the Remote HUB¶
- Use your internal HUB to publish the image.
Kubernetes Deployment¶
Create Kubernetes YAML Template Files¶
-  Create k8s/templatesfolder and navigate to it
-  Create namespace.ymlscript file and copy the following content to it
-  Create role-binding.ymlscript file and copy the following content to itrole-binding.yml
-  Create role.ymlscript file and copy the following content to itrole.yml
-  Create service-account.ymlscript file and copy the following content to it
-  Create privacera-spark-examples.ymlscript file and copy the following content to itprivacera-spark-examples.yml
Create Scripts to Generate Kubernetes Deployment Files from Templates¶
-  Create replace.shscript file and copy the following content to itreplace.sh
-  Create apply.shscript file and copy the following content to itapply.sh
Apply the Kubernetes Deployment Files¶
-  Make the apply.shandreplace.shscript files executable
-  Execute the replace.shscript to replace the variables in thek8s/templatesfolder
-  Execute the apply.shscript to create necessary Kubernetes secrets and apply Spark deployment configs
 
Validate the Deployment¶
-  Check the status of the pods 
-  Access the pod 
- Prev topic: Prerequisites
- Next topic: Advanced Configuration