Skip to content

Setup for Access Management for Apache Spark

Configure

This section outlines the steps to set up the Apache Spark OLAC with the Privacera Plugin. Please ensure that all prerequisites are completed before beginning the setup process.

Perform following steps to configure Apache Spark OLAC connector:

  1. SSH into the instance where Privacera Manager is installed.

  2. Run the following command to navigate to the /config directory and copy yml files:

    Bash
    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.spark-standalone.yml custom-vars/
    

  3. Modify the following properties:

    • In the vars.spark-standalone.yml file, update the following properties with the appropriate values:

      Bash
      #Spark Env name.
      SPARK_ENV_TYPE: "<PLEASE_CHANGE>"
      
      #Add the spark home path
      #Eg: "/opt/spark"
      SPARK_HOME: "<PLEASE_CHANGE>"
      
      #Spark user home directory
      #Eg: "/opt/privacera"
      SPARK_USER_HOME: "<PLEASE_CHANGE>"
      

      Variable Definition Example
      SPARK_ENV_TYPE Set the environment type. privacera_spark_olac
      SPARK_HOME Home path of your Spark installation. /opt/spark
      SPARK_USER_HOME User home directory of your Spark installation. /opt/privacera
  4. Once the properties are configured, update your Privacera Manager platform instance by following the commands

    Bash
    cd ~/privacera/privacera-manager
    ./privacera-manager.sh post-install
    
  5. Once the post-install process is complete, you will see spark-standalone folder in the ~/privacera/privacera-manager/output directory, with the following folder structure:

    Bash
    output/
    ├── spark-standalone/
       ├── spark_custom_conf/ 
          ├── FGAC/ 
             ├── README.md
             ├── jwttoken.pub
             ├── privacera_spark.properties
          ├── OLAC/
             ├── privacera_spark.properties
             ├── README.md
          ├── OLAC_FGAC/
             ├── README.md
             ├── jwttoken.pub
             ├── privacera_spark.properties
          ├── auditserver-secrets-keystore.jks
          ├── global-truststore.p12
          ├── ranger-plugin-keystore.p12
          ├── ranger.jceks
       ├── privacera_setup.sh
       ├── spark_custom_conf.zip
       ├── standalone_spark_FGAC.sh
       ├── standalone_spark_OLAC.sh
       ├── standalone_spark_OLAC_FGAC.sh
    

Setup scripts and configuration files

Creating script files

  1. Create privacera-oss-plugin folder and navigate to it

    Bash
    mkdir ~/privacera-oss-plugin
    cd ~/privacera-oss-plugin
    

  2. Create a penv.sh file and copy the following content to it

    Bash
    cd ~/privacera-oss-plugin
    vi penv.sh
    

    Note

    To get the PRIVACERA_BASE_DOWNLOAD_URL run the following command where Privacera is installed. Update this download URL in penv.sh script file

    Bash
    grep -i 'PRIVACERA_BASE_DOWNLOAD_URL=' ~/privacera/privacera-manager/output/spark-standalone/standalone_spark_OLAC.sh
    

    penv.sh
    Bash
    #!/bin/bash
    
    ## Set up environment variables for Privacera Spark plugin deployment
    
    
    # Privacera configuration
    export PRIVACERA_BASE_DOWNLOAD_URL="<PRIVACERA_BASE_DOWNLOAD_URL>"
    
    # Uncomment this to enable delta lake support
    #export OSS_DELTA_LAKE_ENABLE="true"
    
    
    # Spark configuration
    # Spark Docker image (e.g. spark:3.5.3-python3)
    export SPARK_BASE_IMAGE="<SPARK_BASE_IMAGE>"
    # Spark version (e.g., 3.5.3)
    export SPARK_VERSION="<SPARK_VERSION>"
    
    
    # Docker image configuration
    export SPARK_PLUGIN_IMAGE="<SPARK_PLUGIN_IMAGE>"
    export SPARK_DOCKER_PULL_SECRET="<SPARK_DOCKER_PULL_SECRET>"
    export TAG="<DOCKER_IMAGE_TAG>"
    
    
    ## K8S configuration
    export SPARK_NAME_SPACE="<SPARK_NAME_SPACE>"
    export PRIVACERA_SECRET_NAME="privacera-spark-secret"
    export PRIVACERA_CONFIGMAP_NAME="privacera-spark-configmap"
    
    export SPARK_PLUGIN_ROLE_BINDING="privacera-sa-spark-plugin-role-binding"
    export SPARK_PLUGIN_SERVICE_ACCOUNT="privacera-sa-spark-plugin"
    export SPARK_PLUGIN_ROLE="privacera-sa-spark-plugin-role"
    export SPARK_PLUGIN_APP_NAME="privacera-spark-examples"
    
    
    # Input and output directories for k8s templates
    export INPUT_DIR="./templates"
    export OUTPUT_DIR="./output"
    
  3. Run the following command to create and edit build_privacera_plugin.sh file. Expand the following section and copy it's content to the build_privacera_plugin.sh file

    Bash
    vi build_privacera_plugin.sh
    

    build_privacera_plugin.sh
    Bash
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    ENV_FILE="${SCRIPT_DIR}/penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    SPARK_HOME=${SCRIPT_DIR}/spark_home/opt/spark
    PRIVACERA_HOME=${SCRIPT_DIR}/privacera_home
    PRIVACERA_WORK_DIR=${PRIVACERA_HOME}/work
    
    PRIVACERA_SPARK_PLUGIN_TYPE=OLAC
    ENV_TYPE=PLATFORM
    SPARK_DEFAULT_CONF=${SPARK_HOME}/conf/spark-defaults.conf
    USER_NAME=spark
    GROUP_NAME=spark
    
    export PRIVACERA_HOME=${PRIVACERA_HOME}
    export PRIVACERA_WORK_DIR=${PRIVACERA_WORK_DIR}
    export PRIVACERA_DOWNLOAD=$PRIVACERA_HOME/downloads
    export SPARK_DEFAULT_CONF=${SPARK_DEFAULT_CONF}
    export PRIVACERA_SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}
    export ENV_TYPE=${ENV_TYPE}
    export OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE}
    export SPARK_VERSION=${SPARK_VERSION}
    export PRIVACERA_SETUP_SCRIPT="privacera_setup.sh"
    export SPARK_USER=${USER_NAME}
    export SPARK_GROUP=${GROUP_NAME}
    export SPARK_HOME=${SPARK_HOME}
    
    rm -rf ${PRIVACERA_HOME}
    rm -rf ${SPARK_HOME}
    
    mkdir -p ${PRIVACERA_HOME}
    mkdir -p ${SPARK_HOME}
    mkdir -p ${SPARK_HOME}/jars
    mkdir -p ${SPARK_HOME}/conf
    
    touch ${SPARK_HOME}/conf/spark-defaults.conf
    
    # run the actual script that downloads and installs the plugin
    bash scripts/privacera_setup.sh
    
    cp ${PRIVACERA_HOME}/work/privacera_version.txt ${SPARK_HOME}/conf
    sed -i "s|${SPARK_HOME}|/opt/spark|g" ${SPARK_HOME}/conf/spark-defaults.conf
    
    tar cvfz privacera-spark-plugin.tar.gz -C ${SPARK_HOME}/../.. .
    
  4. Create scripts folder and navigate to it

    Bash
    mkdir ~/privacera-oss-plugin/scripts
    cd ~/privacera-oss-plugin/scripts
    

  5. Create privacera_setup.sh script file and copy the following content to it

    Bash
    vi privacera_setup.sh
    

    privacera_setup.sh
    Bash
    #!/bin/bash
    
    set -x
    
    echo "SPARK_DEFAULT_CONF ${SPARK_DEFAULT_CONF}"
    echo "SPARK_HOME ${SPARK_HOME}"
    echo "PRIVACERA_SPARK_PLUGIN_TYPE ${PRIVACERA_SPARK_PLUGIN_TYPE}"
    echo "PRIVACERA_BASE_DOWNLOAD_URL ${PRIVACERA_BASE_DOWNLOAD_URL}"
    echo "SPARK_VERSION ${SPARK_VERSION}"
    
    export ENV_TYPE=${ENV_TYPE:-"PLATFORM"}
    export OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE:-false}
    export SPARK_VERSION=${SPARK_VERSION}
    
    # spark folders
    export SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    SCRIPT_NAME=standalone_spark_setup.sh
    
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    
    chmod +x ${SCRIPT_DIR}/${SCRIPT_NAME}
    . ${SCRIPT_DIR}/${SCRIPT_NAME}
    
  6. Create standalone_spark_setup.sh script file and copy the following content to it

    Bash
    vi standalone_spark_setup.sh
    

    standalone_spark_setup.sh
    Bash
    #!/bin/bash
    
    set -x
    
    #no need to change
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    PRIVACERA_WORK_DIR=${PRIVACERA_WORK_DIR:-${SCRIPT_DIR}/work}
    PRIVACERA_OUT_FILE=${PRIVACERA_OUT_FILE:-${PRIVACERA_WORK_DIR}/privacera.out}
    PKG_NAME=${PKG_NAME:-privacera-spark-plugin.tar.gz}
    PRIVACERA_CONF_FOLDER=${PRIVACERA_CONF_FOLDER:-${SPARK_HOME}/conf}
    PRIVACERA_CUSTOM_CONF_ZIP=${PRIVACERA_CUSTOM_CONF_ZIP:-spark_custom_conf.zip}
    SPARK_DEFAULT_CONF=${SPARK_HOME}/conf/spark-defaults.conf
    SPARK_PLUGIN_TYPE=${SPARK_PLUGIN_TYPE:-"OLAC"}
    OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE:-false}
    ENV_TYPE=${ENV_TYPE:-"PLATFORM"}
    SPARK_VERSION=${SPARK_VERSION}
    
    if [[ $ENV_TYPE != "PCLOUD" ]]; then
      # for platform we have to append as we are using same path for both
      export PRIVACERA_BASE_DOWNLOAD_URL=${PRIVACERA_BASE_DOWNLOAD_URL}/spark-plugin
    fi
    
    rm -rf ${PRIVACERA_OUT_FILE}
    function log(){
      msg=$1
      currentTime=`date`
      echo "${currentTime} : ${msg} " >> ${PRIVACERA_OUT_FILE}
    }
    
    function createFolders() {
      #delete existing jars and work folder so update would not create issue as jar name is different in every release
      rm -rf ${SPARK_HOME}/jars/ranger-*
      rm -rf ${SPARK_HOME}/jars/privacera*
      rm -rf ${PRIVACERA_WORK_DIR}
      mkdir -p ${PRIVACERA_WORK_DIR}
    }
    
    function validation(){
      if [[ $SPARK_HOME == "" ]];then
        log " SPARK_HOME can't be empty "
        exit 1;
      fi
      log "SPARK_HOME = ${SPARK_HOME}"
    }
    
    download_pkg(){
      log "Downloading pkg from ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}"
      cd ${PRIVACERA_WORK_DIR}
      if [[ ${PRIVACERA_BASE_DOWNLOAD_URL} == https* ]]; then
        log "Downloading pkg using wget ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}"
        wget -nv ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME} -O ${PKG_NAME}
      else
        log "download pkg path is not yet supported ${PRIVACERA_BASE_DOWNLOAD_URL}/${PKG_NAME}"
      fi
    }
    
    function install() {
      log "Extracting ${PRIVACERA_WORK_DIR}/${PKG_NAME} into ${PRIVACERA_WORK_DIR}"
      tar xzf ${PRIVACERA_WORK_DIR}/${PKG_NAME} --directory ${PRIVACERA_WORK_DIR}
    
      #putting privacera jar into spark home
      cp -r ${PRIVACERA_WORK_DIR}/spark-plugin/* $SPARK_HOME/jars/
    
      if [ ! -f $SPARK_DEFAULT_CONF ]; then
       cp $SPARK_DEFAULT_CONF.template $SPARK_DEFAULT_CONF
      fi
    }
    
    function update_jars() {
      # find java major version
      JAVA_MAJOR_VERSION=$(java -version 2>&1 | sed -E -n 's/.* version "([^.-]*).*"/\1/p' | cut -d' ' -f1)
      echo "Java major version: ${JAVA_MAJOR_VERSION}"
      if [ ${JAVA_MAJOR_VERSION} -ge 15 ]; then
        cp -r ${PRIVACERA_WORK_DIR}/jdk15-jars/* $SPARK_HOME/jars/ranger-spark-plugin-impl/
      fi
    }
    
    function configure() {
        log "Configure started"
    
        #  NOTE: Equivalent of this will be done in the entrypoint script
        PRIVACERA_CONF_WORK_DIR="${PRIVACERA_WORK_DIR}/spark-conf"
    
        cp ${PRIVACERA_CONF_WORK_DIR}/resource_type_plugin_map.json ${PRIVACERA_CONF_FOLDER}/
    
        #setting spark agent
        if [ `grep -rin "spark.driver.extraJavaOptions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
          sed -i '' -e 's|spark.driver.extraJavaOptions .*|spark.driver.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF
        else
          echo "spark.driver.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties" >> $SPARK_DEFAULT_CONF
        fi
    
        if [ ${SPARK_PLUGIN_TYPE} == "FGAC" ];then
          log "Spark python/sql plugin "
          if [ `grep -rin "spark.sql.extensions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
            sed -i '' -e 's|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension|g' $SPARK_DEFAULT_CONF
          else
            echo "#Setting Privacera spark-plugin properties" >> $SPARK_DEFAULT_CONF
            echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension" >> $SPARK_DEFAULT_CONF
          fi
        elif [ ${SPARK_PLUGIN_TYPE} == "OLAC" ];then
          echo "Setting agent for executor"
            #setting spark agent
            if [ `grep -rin "# spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
              sed -i '' -e 's|# spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties|g' $SPARK_DEFAULT_CONF
            elif [ `grep -rin "spark.executor.extraJavaOptions" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
              sed -i '' -e 's|spark.executor.extraJavaOptions .*|spark.executor.extraJavaOptions -javaagent:'${SPARK_HOME}'/jars/privacera-agent.jar|g -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties' $SPARK_DEFAULT_CONF
            else
              echo "spark.executor.extraJavaOptions -javaagent:${SPARK_HOME}/jars/privacera-agent.jar -Dlog4j.configurationFile=file:///privacera-conf/log4j2.properties " >> $SPARK_DEFAULT_CONF
            fi
        fi
    
        ## add or append 'spark.sql.hive.metastore.sharedPrefixes'
        if [ `grep -rin "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF |  wc -l` -gt 0 ];then
          sharedPrefixes=`grep -ri "^spark.sql.hive.metastore.sharedPrefixes" $SPARK_DEFAULT_CONF`
          sharedPrefixes="${sharedPrefixes#spark.sql.hive.metastore.sharedPrefixes}"
          # trim leading whitespaces ###
          sharedPrefixes="${sharedPrefixes##*( )}"
          # trim trailing whitespaces  ##
          sharedPrefixes="${sharedPrefixes%%*( )}"
          sed -i -e "s|^spark.sql.hive.metastore.sharedPrefixes .*|#spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes}|g" $SPARK_DEFAULT_CONF
          echo "spark.sql.hive.metastore.sharedPrefixes ${sharedPrefixes},com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF
        else
          echo "spark.sql.hive.metastore.sharedPrefixes com.privacera,com.amazonaws" >> $SPARK_DEFAULT_CONF
        fi
    
        log "Configure completed"
    }
    
    function setup_deltalake() {
        # configure deltalake support
        log "setup_deltalake started"
    
        if [ "${OSS_DELTA_LAKE_ENABLE}" == true ]; then
    
            if [ "${SPARK_VERSION}" = "3.4.0" ] || [ "${SPARK_VERSION}" = "3.4.1" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/2.4.0/delta-storage-2.4.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.0" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.0.0/delta-spark_2.12-3.0.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.0.0/delta-storage-3.0.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.1" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.1.0/delta-spark_2.12-3.1.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.1.0/delta-storage-3.1.0.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.3" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.2.1/delta-spark_2.12-3.2.1.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.2.1/delta-storage-3.2.1.jar -P "${SPARK_HOME}/jars/"
    
            elif [ "${SPARK_VERSION}" = "3.5.4" ]; then
                wget https://repo1.maven.org/maven2/io/delta/delta-spark_2.12/3.3.0/delta-spark_2.12-3.3.0.jar -P "${SPARK_HOME}/jars/"
                wget https://repo1.maven.org/maven2/io/delta/delta-storage/3.3.0/delta-storage-3.3.0.jar -P "${SPARK_HOME}/jars/"
    
            else
                cp -r "${PRIVACERA_WORK_DIR}/oss-delta-jars/"* "${SPARK_HOME}/jars/"
            fi
    
            # update spark-defaults.conf
            if [ $(grep -rin "spark.sql.catalog.spark_catalog" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then
                sed -i '' -e 's|spark.sql.catalog.spark_catalog.*|spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog|g' "$SPARK_DEFAULT_CONF"
            else
                echo "spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog" >> "$SPARK_DEFAULT_CONF"
            fi
    
            if [ "${SPARK_PLUGIN_TYPE}" == "FGAC" ]; then
                if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then
                    sed -i "s|spark.sql.extensions.*|spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF"
                else
                    echo "spark.sql.extensions com.privacera.spark.agent.SparkSQLExtension,io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF"
                fi
            else
                if [ $(grep -rin "spark.sql.extensions" "$SPARK_DEFAULT_CONF" | wc -l) -gt 0 ]; then
                    sed -i "s|spark.sql.extensions.*|spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension|g" "$SPARK_DEFAULT_CONF"
                else
                    echo "spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension" >> "$SPARK_DEFAULT_CONF"
                fi
            fi
    
        fi  
    }
    
    function  verify() {
        log "Privacera Spark setup completed"
    }
    
    createFolders
    validation
    download_pkg
    install
    update_jars
    configure
    setup_deltalake
    verify
    
  7. Make the privacera_setup.sh and standalone_spark_setup.sh script files executable

    Bash
    chmod +x privacera_setup.sh standalone_spark_setup.sh
    

Create configuration files

  1. Create config folder and navigate to it

    Bash
    mkdir ~/privacera-oss-plugin/config
    cd ~/privacera-oss-plugin/config
    

  2. Copy the privacera_spark.properties and global-truststore.p12 files to the config folder

    Bash
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/OLAC/privacera_spark.properties ~/privacera-oss-plugin/config/
    cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf/global-truststore.p12 ~/privacera-oss-plugin/config/
    

  3. Navigate to config folder and update the privacera_spark.properties file as follow:

    Bash
    1
    2
    3
    4
    cd ~/privacera-oss-plugin/config/
    vi privacera_spark.properties
    
    privacera.signer.truststore=/opt/privacera/global-truststore.p12
    

    • Remove the following property from the privacera_spark.properties file:
      Bash
      privacera.clusterName=<CLUSTER_NAME>
      
  4. Create log4j2.properties file and copy the following content to it

    Bash
    vi log4j2.properties
    

    log4j2.properties
    Bash
    # Root logger configuration
    rootLogger.level = info
    rootLogger.appenderRefs = stdout, file
    rootLogger.appenderRef.stdout.ref = console
    rootLogger.appenderRef.file.ref = RollingFileAppender
    
    # console appender
    appender.console.type = Console
    appender.console.name = console
    appender.console.target = SYSTEM_ERR
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss} [%t] %p %c :%L %m%n%ex
    
    ## rolling file config
    appender.rolling.type = RollingFile
    appender.rolling.name = RollingFileAppender
    appender.rolling.fileName = /tmp/${sys:user.name}/privacera.log
    appender.rolling.filePattern = /tmp/${sys:user.name}/privacera-%d{yyyy-MM-dd}-%i.log
    appender.rolling.layout.type = PatternLayout
    appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} [%t] %p %c :%L %m%n%ex
    appender.rolling.policies.type = Policies
    appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
    appender.rolling.policies.time.interval = 1
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size = 100MB
    
    # privacera
    logger.privacera.name = com.privacera
    logger.privacera.level = info
    logger.privacera.additivity = false
    logger.privacera.appenderRefs = rolling
    logger.privacera.appenderRef.rolling.ref = RollingFileAppender
    
    # ranger
    logger.ranger.name = org.apache.ranger
    logger.ranger.level = info
    logger.ranger.additivity = false
    logger.ranger.appenderRefs = rolling
    logger.ranger.appenderRef.rolling.ref = RollingFileAppender
    
    # aws sdk v1
    logger.amazon.name = com.amazon
    logger.amazon.level = info
    logger.amazon.additivity = false
    logger.amazon.appenderRefs = rolling
    logger.amazon.appenderRef.rolling.ref = RollingFileAppender
    
    logger.amazonaws.name = com.amazonaws
    logger.amazonaws.level = info
    logger.amazonaws.additivity = false
    logger.amazonaws.appenderRefs = rolling
    logger.amazonaws.appenderRef.rolling.ref = RollingFileAppender
    
    # aws sdk v2
    logger.software-amazon.name = software.amazon.awssdk
    logger.software-amazon.level = info
    logger.software-amazon.additivity = false
    logger.software-amazon.appenderRefs = rolling
    logger.software-amazon.appenderRef.rolling.ref = RollingFileAppender
    
    # apache http
    logger.apache.name = org.apache.http.wire
    logger.apache.level = info
    logger.apache.additivity = false
    logger.apache.appenderRefs = rolling
    logger.apache.appenderRef.rolling.ref = RollingFileAppender
    
    # apache http client
    logger.httpclient.name = org.apache.http.client
    logger.httpclient.level = ERROR
    logger.httpclient.additivity = false
    logger.httpclient.appenderRefs = console
    logger.httpclient.appenderRef.console.ref = ConsoleAppender
    

Generate Privacera deployment file

Note

  • Execute the build_privacera_plugin.sh script to generate the privacera deployment tarball file:
    Bash
    1
    2
    3
    cd ~/privacera-oss-plugin
    chmod +x build_privacera_plugin.sh
    ./build_privacera_plugin.sh
    

Building the Docker image

Create Dockerfile

  • Create Dockerfile script file and copy the following content to it

    Note

    • The Docker file Dockerfile is provided here is as an example file that uses the open-source Apache Spark images. Instead, you can use your custom spark docker file and integrate the below steps to deploy Privacera Plugin into it.
    • To achieve that, add the below commands in your existing Dockerfile
      Bash
      ARG PRIVACERA_SPARK_PLUGIN_TAR_GZ=./privacera-spark-plugin.tar.gz
      
      # Add the privacera-spark-plugin tar.gz to the image at the root
      ADD ${PRIVACERA_SPARK_PLUGIN_TAR_GZ} 
      
      RUN mkdir -p ${PRIVACERA_HOME}
      RUN chown -R ${USER_NAME}:${GROUP_NAME} ${PRIVACERA_HOME}
      
      RUN ln -sf /privacera-secret/privacera_spark.properties /opt/spark/conf/privacera_spark.properties
      RUN ln -sf /privacera-secret/global-truststore.p12 /opt/privacera/global-truststore.p12
      RUN ln -sf /privacera-conf/log4j2.properties /opt/spark/conf/log4j2.properties
      
    Bash
    cd ~/privacera-oss-plugin
    vi Dockerfile
    
    Dockerfile
    Bash
    ARG SPARK_BASE_IMAGE
    FROM ${SPARK_BASE_IMAGE}
    
    ARG SPARK_VERSION
    ARG PRIVACERA_SPARK_PLUGIN_TAR_GZ=./privacera-spark-plugin.tar.gz
    ARG PRIVACERA_SPARK_PLUGIN_TYPE=OLAC
    ARG SPARK_HOME=/opt/spark
    ARG USER_NAME=spark
    ARG GROUP_NAME=spark
    ARG PRIVACERA_HOME=/opt/privacera
    
    ENV PRIVACERA_HOME=${PRIVACERA_HOME}
    ENV PRIVACERA_SPARK_PLUGIN_TYPE=${PRIVACERA_SPARK_PLUGIN_TYPE}
    ENV OSS_DELTA_LAKE_ENABLE=${OSS_DELTA_LAKE_ENABLE}
    ENV SPARK_VERSION=${SPARK_VERSION}
    ENV SPARK_USER=${USER_NAME}
    ENV SPARK_GROUP=${GROUP_NAME}
    ENV SPARK_HOME=${SPARK_HOME}
    
    USER root
    
    # Download AWS JARs for supported Spark versions
    RUN if [ "${SPARK_VERSION}" = "3.5.3" ] || \
           [ "${SPARK_VERSION}" = "3.5.4" ]; then \
            wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar \
                -P ${SPARK_HOME}/jars/; \
            wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.781/aws-java-sdk-bundle-1.12.781.jar \
                -P ${SPARK_HOME}/jars/; \
        else \
            echo "No hadoop-aws and aws-java-sdk-bundle jars downloaded"; \
        fi
    
    # Add the privacera-spark-plugin tar.gz to the image at the root
    ADD ${PRIVACERA_SPARK_PLUGIN_TAR_GZ} /
    
    # Set up Privacera directories and permissions
    RUN mkdir -p ${PRIVACERA_HOME}
    RUN chown -R ${USER_NAME}:${GROUP_NAME} ${PRIVACERA_HOME}
    
    RUN ln -sf /privacera-secret/privacera_spark.properties /opt/spark/conf/privacera_spark.properties
    RUN ln -sf /privacera-secret/global-truststore.p12 /opt/privacera/global-truststore.p12
    RUN ln -sf /privacera-conf/log4j2.properties /opt/spark/conf/log4j2.properties
    
    USER ${USER_NAME}
    

Build the Docker image

  1. Run the following command to create and edit build_spark_image.sh file. Expand the following section and copy it's content to the build_spark_image.sh file

    Bash
    cd ~/privacera-oss-plugin
    vi build_spark_image.sh
    

    build_spark_image.sh
    Bash
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
    ENV_FILE="${SCRIPT_DIR}/penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    # Fallback defaults if not set in penv.sh
    SPARK_BASE_IMAGE="${SPARK_BASE_IMAGE:-spark:3.5.3-python3}"
    SPARK_VERSION="${SPARK_VERSION:-3.5.3}"
    
    # create plugin image
    DOCKER_BUILDKIT=1 docker build \
        -t ${TAG} \
        --build-arg SPARK_BASE_IMAGE=${SPARK_BASE_IMAGE} \
        --build-arg SPARK_VERSION=${SPARK_VERSION} \
        --progress plain \
        -f Dockerfile .
    
  2. Execute the build_spark_image.sh script to generate the privacera deployment files:

    Bash
    chmod +x build_spark_image.sh
    ./build_spark_image.sh
    

Push the Docker image to the remote HUB

  • Please use your internal HUB to publish the image.

Kubernetes Deployment

Create Kubernetes YAML template files

  1. Create k8s/templates folder and navigate to it

    Bash
    mkdir -p ~/privacera-oss-plugin/k8s/templates
    cd ~/privacera-oss-plugin/k8s/templates
    

  2. Create namespace.yml script file and copy the following content to it

    Bash
    vi namespace.yml
    

    namespace.yml
    namespace.yaml
    1
    2
    3
    4
    apiVersion: v1
    kind: Namespace
    metadata:
      name: SPARK_NAME_SPACE
    
  3. Create role-binding.yml script file and copy the following content to it

    Bash
    vi role-binding.yml
    

    role-binding.yml
    role-binding.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: SPARK_PLUGIN_ROLE_BINDING
      namespace: SPARK_NAME_SPACE
    subjects:
    - kind: ServiceAccount
      name: SPARK_PLUGIN_SERVICE_ACCOUNT
      namespace: SPARK_NAME_SPACE
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: SPARK_PLUGIN_ROLE
    
  4. Create role.yml script file and copy the following content to it

    Bash
    vi role.yml
    

    role.yml
    role.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: SPARK_PLUGIN_ROLE
      namespace: SPARK_NAME_SPACE
    rules:
    - apiGroups: ["", "extensions", "apps"]
      resources: ["*"]
      verbs: ["*"]
    - apiGroups: ["batch"]
      resources:
      - jobs
      - cronjobs
      verbs: ["*"]
    
  5. Create service-account.yml script file and copy the following content to it

    Bash
    vi service-account.yml
    

    service-account.yml
    service-account.yaml
    1
    2
    3
    4
    5
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: SPARK_PLUGIN_SERVICE_ACCOUNT
      namespace:  SPARK_NAME_SPACE
    
  6. Create privacera-spark-examples.yml script file and copy the following content to it

    Bash
    vi privacera-spark-examples.yml
    

    privacera-spark-examples.yml
    privacera-spark-examples.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: SPARK_PLUGIN_APP_NAME
      name: SPARK_PLUGIN_APP_NAME
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: SPARK_PLUGIN_APP_NAME
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            app: SPARK_PLUGIN_APP_NAME
        spec:
          serviceAccountName: SPARK_PLUGIN_SERVICE_ACCOUNT
          securityContext:
            fsGroup: 200
          initContainers:
          imagePullSecrets:
            - name: SPARK_DOCKER_PULL_SECRET
          containers:
            - image: SPARK_PLUGIN_IMAGE
              name: SPARK_PLUGIN_APP_NAME-exec
              imagePullPolicy: Always
              command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
              env:
                - name: SPARK_PLUGIN_POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: SPARK_PLUGIN_POD_IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
              ports:
                - name: spark-ui
                  containerPort: 4040
                - name: sparkdriver
                  containerPort: 7077
                - name: blockmanager
                  containerPort: 7078
              resources:
                limits:
                  memory: "2Gi"
                  cpu: "0.5"
                requests:
                  memory: "1Gi"
                  cpu: "0.2"
              volumeMounts:
                - name: privacera-spark-secret-volume
                  mountPath: /privacera-secret
                - name: privacera-spark-conf-volume
                  mountPath: /privacera-conf
          restartPolicy: Always
          volumes:
            - name: privacera-spark-secret-volume
              secret:
                secretName: PRIVACERA_SECRET_NAME
            - name: privacera-spark-conf-volume
              configMap:
                name: PRIVACERA_CONFIGMAP_NAME
    status: {}
    

Create Scripts to generate Kubernetes deployment files from templates

  1. Create replace.sh script file and copy the following content to it

    Bash
    cd ~/privacera-oss-plugin/k8s  
    vi replace.sh
    

    replace.sh
    Bash
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    
    ENV_FILE="${SCRIPT_DIR}/../penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    SPARK_NAME_SPACE=${SPARK_NAME_SPACE:-privacera-spark-plugin-test}
    SPARK_PLUGIN_ROLE_BINDING=${SPARK_PLUGIN_ROLE_BINDING:-privacera-sa-spark-plugin-role-binding}
    SPARK_PLUGIN_SERVICE_ACCOUNT=${SPARK_PLUGIN_SERVICE_ACCOUNT:-privacera-sa-spark-plugin}
    SPARK_PLUGIN_ROLE=${SPARK_PLUGIN_ROLE:-privacera-sa-spark-plugin-role}
    SPARK_PLUGIN_APP_NAME=${SPARK_PLUGIN_APP_NAME:-privacera-spark-examples}
    SPARK_PLUGIN_IMAGE=${SPARK_PLUGIN_IMAGE:-hub.docker.us/spark-plugin:latest}
    SPARK_DOCKER_PULL_SECRET=${SPARK_DOCKER_PULL_SECRET:-docker-hub}
    INPUT_DIR=${INPUT_DIR:-./templates}
    OUTPUT_DIR=${OUTPUT_DIR:-./output}
    
    mkdir -p ${OUTPUT_DIR}
    
    files=(namespace.yml role-binding.yml service-account.yml privacera-spark-examples.yml role.yml)
    for i in ${!files[@]}
    do
        in_file="${INPUT_DIR}/${files[i]}"
        out_file="${OUTPUT_DIR}/${files[i]}"
        if [ -f "${in_file}" ]; then
          echo "replace variable in file ${in_file} > out_file=${out_file}"
          cp "${in_file}" "${out_file}"
          sed -i "s|SPARK_NAME_SPACE|${SPARK_NAME_SPACE}|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_ROLE_BINDING|${SPARK_PLUGIN_ROLE_BINDING}|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_SERVICE_ACCOUNT|${SPARK_PLUGIN_SERVICE_ACCOUNT}|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_ROLE|$SPARK_PLUGIN_ROLE|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_APP_NAME|$SPARK_PLUGIN_APP_NAME|g" "${out_file}"
          sed -i "s|SPARK_PLUGIN_IMAGE|$SPARK_PLUGIN_IMAGE|g" "${out_file}"
          sed -i "s|SPARK_DOCKER_PULL_SECRET|$SPARK_DOCKER_PULL_SECRET|g" "${out_file}"
          sed -i "s|PRIVACERA_SECRET_NAME|${PRIVACERA_SECRET_NAME}|g" "${out_file}"
          sed -i "s|PRIVACERA_CONFIGMAP_NAME|${PRIVACERA_CONFIGMAP_NAME}|g" "${out_file}"
        fi
    done
    
  2. Create apply.sh script file and copy the following content to it

    Bash
    vi apply.sh
    

    apply.sh
    Bash
    #!/bin/bash
    
    set -x
    
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    
    ENV_FILE="${SCRIPT_DIR}/../penv.sh"
    
    # if penv.sh exists, load it
    if [ -f ${ENV_FILE} ]; then
      echo "Sourcing env file ${ENV_FILE}"
      source ${ENV_FILE}
    else
      echo "${ENV_FILE} file not found"
      exit 1
    fi
    
    kubectl apply -f ${OUTPUT_DIR}/namespace.yml
    kubectl apply -f ${OUTPUT_DIR}/service-account.yml
    kubectl apply -f ${OUTPUT_DIR}/role.yml
    kubectl apply -f ${OUTPUT_DIR}/role-binding.yml
    
    # Delete and recreate Kubernetes secret
    kubectl delete secret ${PRIVACERA_SECRET_NAME} -n ${SPARK_NAME_SPACE}
    kubectl create secret generic ${PRIVACERA_SECRET_NAME} \
      --from-file=${SCRIPT_DIR}/../config/privacera_spark.properties \
      --from-file=${SCRIPT_DIR}/../config/global-truststore.p12 \
      -n ${SPARK_NAME_SPACE}
    
    # Delete and recreate Kubernetes configmap
    kubectl delete configmap ${PRIVACERA_CONFIGMAP_NAME} -n ${SPARK_NAME_SPACE}
    kubectl create configmap ${PRIVACERA_CONFIGMAP_NAME} \
      --from-file=${SCRIPT_DIR}/../config/log4j2.properties \
      -n ${SPARK_NAME_SPACE}
    
    kubectl apply -f ${OUTPUT_DIR}/privacera-spark-examples.yml -n ${SPARK_NAME_SPACE}
    

Apply the Kubernetes deployment files

  1. Make the apply.sh and replace.sh script files executable

    Bash
    chmod +x apply.sh replace.sh
    

  2. Execute the replace.sh script to replace the variables in the k8s/templates folder

    Bash
    ./replace.sh
    

  3. Execute the apply.sh script to create necessary Kubernetes secrets and apply Spark deployment configs

    Bash
    ./apply.sh
    

Validate the Deployment

  1. Check the status of the pods

    Bash
    export SPARK_NAME_SPACE=<SPARK_NAME_SPACE>
    kubectl get pods -n ${SPARK_NAME_SPACE}
    

  2. Access the pod

    Bash
    kubectl exec -it <SPARK_PLUGIN_POD_NAME> -n ${SPARK_NAME_SPACE} -- /bin/bash
    

Comments