Setup for Access Management for Apache Spark¶
Configure¶
This section outlines the steps to set up the Apache Spark OLAC with the Privacera Plugin. Please ensure that all prerequisites are completed before beginning the setup process.
Perform following steps to configure Apache Spark OLAC connector:
-
SSH into the instance where Privacera Manager is installed.
-
Run the following command to navigate to the
/config
directory and copy yml files: -
Modify the following properties:
-
In the
vars.spark-standalone.yml
file, update the following properties with the appropriate values:Bash Variable Definition Example SPARK_ENV_TYPE Set the environment type. privacera_spark_olac
SPARK_HOME Home path of your Spark installation. /opt/spark
SPARK_USER_HOME User home directory of your Spark installation. /opt/privacera
-
-
Once the properties are configured, update your Privacera Manager platform instance by following the commands
-
Once the post-install process is complete, you will see
spark-standalone
folder in the~/privacera/privacera-manager/output
directory, with the following folder structure:
Setup scripts and configuration files¶
Creating script files¶
-
Create
privacera-oss-plugin
folder and navigate to it -
Create a
penv.sh
file and copy the following content to itNote
To get the
PRIVACERA_BASE_DOWNLOAD_URL
run the following command where Privacera is installed. Update this download URL inpenv.sh
script fileBash penv.sh
-
Run the following command to create and edit
build_privacera_plugin.sh
file. Expand the following section and copy it's content to thebuild_privacera_plugin.sh
fileBash build_privacera_plugin.sh
-
Create
scripts
folder and navigate to it -
Create
privacera_setup.sh
script file and copy the following content to itBash privacera_setup.sh
-
Create
standalone_spark_setup.sh
script file and copy the following content to itBash standalone_spark_setup.sh
Bash 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195
#!/bin/bash set -x #no need to change SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" PRIVACERA_WORK_DIR=${PRIVACERA_WORK_DIR:-${SCRIPT_DIR}/work} PRIVACERA_OUT_FILE=${PRIVACERA_OUT_FILE:-${PRIVACERA_WORK_DIR}/privacera.out} PKG_NAME=${PKG_NAME:-privacera-spark-plugin.tar.gz} PRIVACERA_CONF_FOLDER=${PRIVACERA_CONF_FOLDER:-${SPARK_HOME}/conf} PRIVACERA_CUSTOM_CONF_ZIP=${PRIVACERA_CUSTOM_CONF_ZIP:-spark_custom_conf.zip} SPARK_DEFAULT_CONF=${SPARK_HOME}/conf/spark-defaults.conf SPARK_PLUGIN_TYPE=${SPARK_PLUGIN_TYPE:-"OLAC"} OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE:-false} ENV_TYPE=${ENV_TYPE:-"PLATFORM"} SPARK_VERSION=${SPARK_VERSION} if [[ $ENV_TYPE != "PCLOUD" ]]; then # for platform we have to append as we are using same path for both export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}/spark-plugin fi rm -rf ${PRIVACERA_OUT_FILE} function log(){ msg=$1 currentTime=`date` echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE} } function createFolders() { #delete existing jars and work folder so update would not create issue as jar name is different in every release rm -rf ${SPARK_HOME}/jars/ranger-* rm -rf ${SPARK_HOME}/jars/privacera* rm -rf ${PRIVACERA_WORK_DIR} mkdir -p ${PRIVACERA_WORK_DIR} } function validation(){ if [[ $SPARK_HOME == "" ]];then log " SPARK_HOME can't be empty " exit 1; fi log "SPARK_HOME = ${SPARK_HOME}" } download_pkg(){ log "Downloading pkg from ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}" cd ${PRIVACERA_WORK_DIR} if [[ ${PRIVACERA_BASE_DOWNLOAD_URL} == https* ]]; then log "Downloading pkg using wget ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}" wget -nv ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME} -O ${PKG_NAME} else log "download pkg path is not yet supported ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}" fi } function install() { log "Extracting ${PRIVACERA_WORK_DIR}/${PKG_NAME} into ${PRIVACERA_WORK_DIR}" tar xzf ${PRIVACERA_WORK_DIR}/${PKG_NAME} --directory ${PRIVACERA_WORK_DIR} #putting privacera jar into spark home cp -r ${PRIVACERA_WORK_DIR}/spark-plugin/* $SPARK_HOME/jars/ if [ ! -f $SPARK_DEFAULT_CONF ]; then cp $SPARK_DEFAULT_CONF.template $SPARK_DEFAULT_CONF fi } function update_jars() { # find java major version JAVA_MAJOR_VERSION=$(java -version 2>&1 | sed -E -n 's/.* version "([^.-]*).*"/\1/p' | cut -d' ' -f1) echo "Java major version: ${JAVA_MAJOR_VERSION}" if [ ${JAVA_MAJOR_VERSION} -ge 15 ]; then cp -r ${PRIVACERA_WORK_DIR}/jdk15-jars/* $SPARK_HOME/jars/ranger-spark-plugin-impl/ fi } function configure() { log "Configure started" # NOTE: Equivalent of this will be done in the entrypoint script PRIVACERA_CONF_WORK_DIR="${PRIVACERA_WORK_DIR}/spark-conf" cp ${PRIVACERA_CONF_WORK_DIR}/resource_type_plugin_map.json ${PRIVACERA_CONF_FOLDER}/ #setting spark agent if [ `grep -rin "spark.driver.extraJavaOptions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|spark.driver.extraJavaOptions .*|spark.driver.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF else echo "spark.driver.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties" >> $SPARK_DEFAULT_CONF fi if [ ${SPARK_PLUGIN_TYPE} == "FGAC" ];then log "Spark python/sql plugin " if [ `grep -rin "spark.sql.extensions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension|g' $SPARK_DEFAULT_CONF else echo "#Setting Privacera spark-plugin properties" >> $SPARK_DEFAULT_CONF echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension" >> $SPARK_DEFAULT_CONF fi elif [ ${SPARK_PLUGIN_TYPE} == "OLAC" ];then echo "Setting agent for executor" #setting spark agent if [ `grep -rin "# spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|# spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF elif [ `grep -rin "spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sed -i '' -e 's|spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar|g -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties' $SPARK_DEFAULT_CONF else echo "spark.executor.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties " >> $SPARK_DEFAULT_CONF fi fi ## add or append 'spark.sql.hive.metastore.sharedPrefixes' if [ `grep -rin "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF | wc -l` -gt 0 ];then sharedPrefixes=`grep -ri "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF` sharedPrefixes="${sharedPrefixes#spark.sql.hive.metastore.sharedPrefixes}" # trim leading whitespaces ### sharedPrefixes="${sharedPrefixes##*( )}" # trim trailing whitespaces ## sharedPrefixes="${sharedPrefixes%%*( )}" sed -i -e "s|^spark.sql.hive.metastore.sharedPrefixes .*|#spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes}|g" $SPARK_DEFAULT_CONF echo "spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes},com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF else echo "spark.sql.hive.metastore.sharedPrefixes com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF fi log "Configure completed" } function setup_deltalake() { # configure deltalake support log "setup_deltalake started" if [ "${OSS_DELTA_LAKE_ENABLE}" == true ]; then if [ "${SPARK_VERSION}" = "3.4.0" ] || [ "${SPARK_VERSION}" = "3.4.1" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/2.4.0/delta-storage-2.4.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.0" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.0.0/delta-spark_2.12-3.0.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.0.0/delta-storage-3.0.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.1" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.1.0/delta-spark_2.12-3.1.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.1.0/delta-storage-3.1.0.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.3" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.2.1/delta-spark_2.12-3.2.1.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.2.1/delta-storage-3.2.1.jar -P "${SPARK_HOME}/jars/" elif [ "${SPARK_VERSION}" = "3.5.4" ]; then wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.0/delta-spark_2.12-3.3.0.jar -P "${SPARK_HOME}/jars/" wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.0/delta-storage-3.3.0.jar -P "${SPARK_HOME}/jars/" else cp -r "${PRIVACERA_WORK_DIR}/oss-delta-jars/"* "${SPARK_HOME}/jars/" fi # update spark-defaults.conf if [ $(grep -rin "spark.sql.catalog.spark_catalog" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then sed -i '' -e 's|spark.sql.catalog.spark_catalog.*|spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog|g' "$SPARK_DEFAULT_CONF" else echo "spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog" >> "$SPARK_DEFAULT_CONF" fi if [ "${SPARK_PLUGIN_TYPE}" == "FGAC" ]; then if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then sed -i "s|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF" else echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF" fi else if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then sed -i "s|spark.sql.extensions.*|spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF" else echo "spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF" fi fi fi } function verify() { log "Privacera Spark setup completed" } createFolders validation download_pkg install update_jars configure setup_deltalake verify
-
Make the
privacera_setup.sh
andstandalone_spark_setup.sh
script files executableBash
Create configuration files¶
-
Create
config
folder and navigate to it -
Copy the
privacera_spark.properties
andglobal-truststore.p12
files to theconfig
folder -
Navigate to
config
folder and update theprivacera_spark.properties
file as follow:Bash - Remove the following property from the
privacera_spark.properties
file:Bash
- Remove the following property from the
-
Create
log4j2.properties
file and copy the following content to itBash log4j2.properties
Generate Privacera deployment file¶
Note
- To enable
delta
lake support, please refer Enable delta lake. - To enable
minio
support, please refer MinIO configuration.
- Execute the
build_privacera_plugin.sh
script to generate the privacera deployment tarball file:
Building the Docker image¶
Create Dockerfile¶
-
Create
Dockerfile
script file and copy the following content to itNote
- The Docker file Dockerfile is provided here is as an example file that uses the open-source Apache Spark images. Instead, you can use your custom spark docker file and integrate the below steps to deploy Privacera Plugin into it.
- To achieve that, add the below commands in your existing Dockerfile
Dockerfile
Build the Docker image¶
-
Run the following command to create and edit
build_spark_image.sh
file. Expand the following section and copy it's content to thebuild_spark_image.sh
filebuild_spark_image.sh
-
Execute the
build_spark_image.sh
script to generate the privacera deployment files:
Push the Docker image to the remote HUB¶
- Please use your internal HUB to publish the image.
Kubernetes Deployment¶
Create Kubernetes YAML template files¶
-
Create
k8s/templates
folder and navigate to it -
Create
namespace.yml
script file and copy the following content to itBash -
Create
role-binding.yml
script file and copy the following content to itBash role-binding.yml
-
Create
role.yml
script file and copy the following content to itBash role.yml
-
Create
service-account.yml
script file and copy the following content to itBash -
Create
privacera-spark-examples.yml
script file and copy the following content to itBash privacera-spark-examples.yml
Create Scripts to generate Kubernetes deployment files from templates¶
-
Create
replace.sh
script file and copy the following content to itreplace.sh
-
Create
apply.sh
script file and copy the following content to itBash apply.sh
Apply the Kubernetes deployment files¶
-
Make the
apply.sh
andreplace.sh
script files executableBash -
Execute the
replace.sh
script to replace the variables in thek8s/templates
folderBash -
Execute the
apply.sh
script to create necessary Kubernetes secrets and apply Spark deployment configs
Bash
Validate the Deployment¶
-
Check the status of the pods
-
Access the pod
Bash
- Prev topic: Prerequisites
- Next topic: Advanced Configuration