Connector Guide - Access - Apache Spark OLAC¶

This is the connector guide for using Apache Spark OLAC with Privacera. Please make sure that the connector has been installed and configured correctly before proceeding with the instructions in this guide.

Run spark session¶

Navigate to ${SPARK_HOME}/bin folder and export the JWT token
Bash
1 2
cd <SPARK_HOME>/bin export JWT_TOKEN="<JWT_TOKEN>"
Start spark-session (choose one of spark-shell, pyspark, or spark-sql)
To pass the JWT token directly as a command-line argument, use the following configuration when connecting to the cluster
Bash
1 2
./<spark-shell | pyspark | spark-sql> \ --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"

To use the file path containing the JWT token, use the following configuration:

Bash
./<spark-shell | pyspark | spark-sql> \
--conf "spark.hadoop.privacera.jwt.token=<path-to-jwt-token-file>"

Run spark session with executor¶

SSH into driver pod of the Namespace

Export the following variables

Bash
SPARK_NAME_SPACE=<SPARK_NAME_SPACE>
SPARK_IMAGE=<SPARK_IMAGE>

SVC_ACCOUNT=privacera-sa-spark-plugin
K8S_MASTER=k8s://https://kubernetes.default.svc

export JWT_TOKEN="<JWT_TOKEN>"

Run the following command to start spark session with executors

Bash
/opt/spark/bin/spark-shell \
  --master ${K8S_MASTER} \
  --deploy-mode client \
  --conf spark.executor.instances=2 \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=${SVC_ACCOUNT} \
  --conf spark.kubernetes.namespace=${SPARK_NAME_SPACE} \
  --conf spark.kubernetes.driver.request.cores=0.001 \
  --conf spark.driver.memory=1g \
  --conf spark.kubernetes.executor.request.cores=0.001 \
  --conf spark.executor.memory=1g \
  --conf spark.kubernetes.container.image=${SPARK_IMAGE} \
  --conf spark.kubernetes.container.image.pullPolicy=Always \
  --conf spark.driver.host=${SPARK_PLUGIN_POD_IP} \
  --conf spark.driver.port=7077 \
  --conf spark.blockManager.port=7078 \
  --conf spark.kubernetes.executor.secrets.privacera-spark-secret=/privacera-secret \
  --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"

Run spark session with MinIO¶

To enable custom S3 endpoints for accessing specific MinIO buckets along with S3 buckets, you must update these properties for each bucket individually include following properties when starting spark session.

Bash
./<spark-shell | pyspark | spark-sql> \
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" \
--conf "spark.hadoop.fs.s3a.bucket.<bucket-1>.path.style.access=true" \
--conf "spark.hadoop.fs.s3a.bucket.<bucket-1>.connection.ssl.enabled=true" \
--conf "spark.hadoop.fs.s3a.bucket.<bucket-1>.endpoint=https://<MINIO_HOST>:<MINIO_PORT>" \
--conf "spark.hadoop.fs.s3a.bucket.<bucket-2>.path.style.access=true" \
--conf "spark.hadoop.fs.s3a.bucket.<bucket-2>.connection.ssl.enabled=true" \
--conf "spark.hadoop.fs.s3a.bucket.<bucket-2>.endpoint=https://<MINIO_HOST>:<MINIO_PORT>"

To enable global endpoint to access only minio buckets, include following properties when starting spark session.

Bash
./<spark-shell | pyspark | spark-sql> \
--conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" \
--conf "spark.hadoop.fs.s3a.path.style.access=true" \
--conf "spark.hadoop.fs.s3a.connection.ssl.enabled=true" \
--conf "spark.hadoop.fs.s3a.endpoint=https://<MINIO_HOST>:<MINIO_PORT>"

Prev topic: Access Management
Next topic: Prerequisites

Connector Guide - Access - Apache Spark OLAC¶

Run spark session¶

Run spark session with executor¶

Run spark session with MinIO¶

Comments