Privacera Plugin in Spark Standalone
This section covers how you can use Privacera Manager to generate the setup script and Spark custom configuration for SSL/TSL to install Privacera Plugin in an open-source Spark environment.
Note
The steps explained below are applicable to Spark 3.x version only.
Prerequisites
Ensure the following prerequisites are met:
- A working Spark environment.
- Privacera services must be up and running.
Configuration
-
SSH to the instance as USER.
-
Run the following commands.
cd ~/privacera/privacera-manager cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/ vi config/custom-vars/vars.spark-standalone.yml
-
Edit the following properties. For property details and description, refer to the Configuration Properties below.
SPARK_STANDALONE_ENABLE: "true" SPARK_ENV_TYPE: "<PLEASE_CHANGE>" SPARK_HOME: "<PLEASE_CHANGE>" SPARK_USER_HOME: "<PLEASE_CHANGE>"
-
Run the following commands.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
After the update is complete, the setup script (
privacera_setup.sh
,standalone_spark_FGAC.sh
,standalone_spark_OLAC.sh
) and Spark custom configurations (spark_custom_conf.zip
) for SSL will be generated at the path,cd ~/privacera/privacera-manager/output/spark-standalone
. -
In your Spark environment, either you can enable FGAC or OLAC. Use the following tabs:
To enable Fine-grained access control (FGAC), do the following:
-
Copy
standalone_spark_FGAC.sh
andspark_custom_conf.zip
. Both the files should be placed under the same folder. -
Add permissions to execute the script.
chmod +x standalone_spark_FGAC.sh
-
Run the script to install the Privacera plugin in your Spark environment.
./standalone_spark_FGAC.sh
To enable Object level access control (OLAC), do the following:
-
Copy
standalone_spark_OLAC.sh
andspark_custom_conf.zip
. Both the files should be placed under the same folder. -
Add permissions to execute the script.
chmod +x standalone_spark_OLAC.sh
-
Run the script to install the Privacera plugin in your Spark environment.
./standalone_spark_OLAC.sh
-
Configuration Properties
Property | Description | Example |
---|---|---|
SPARK_STANDALONE_ENABLE | Property to enable generating setup script and configs for Spark standalone plugin installation. | true |
SPARK_ENV_TYPE |
Set the environment type. It can be any user-defined type. For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod. |
local |
SPARK_HOME | Home path of your Spark installation. | ~/privacera/spark/spark-3.1.1-bin-hadoop3.2 |
SPARK_USER_HOME | User home directory of your Spark installation. | /home/ec2-user |
SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED |
Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user. To enable the fallback, set to true; to disable, set to false. |
true |
Validations
To verify the successful installation of Privacera plugin, do the following:
-
Create an S3 bucket ${S3_BUCKET} for sample testing.
-
Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).
wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
-
(Optional) Add AWS JARS in Spark. Download the JARS according to the version of Spark Hadoop in your environment.
cd <SPARK_HOME>/jars
For Spark-3.1.1 - Hadoop 3.2 version,
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
-
Run the following command.
cd <SPARK_HOME>/bin
-
Run the spark-shell to execute scala commands.
./spark-shell
Validations with JWT Token
-
Run the following command.
cd <SPARK_HOME>/bin
-
Set the JWT_TOKEN.
JWT_TOKEN="<JWT_TOKEN>"
-
Run the following command to start spark-shell with parameters.
./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" --conf "spark.hadoop.privacera.jwt.oauth.enable=true"
Validations with JWT Token and Public Key
-
Create a local file with the public key, if the JWT token is generated by private/public key combination.
-
Set the following according to the payload of JWT Token.
JWT_TOKEN="<JWT_TOKEN>" #The following variables are optional, set it only if token has it else set it empty JWT_TOKEN_ISSUER="<JWT_TOKEN_ISSUER>" JWT_TOKEN_PUBLIC_KEY_FILE="<JWT_TOKEN_PUBLIC_KEY_FILE_PATH>" JWT_TOKEN_USER_KEY="<JWT_TOKEN_USER_KEY>" JWT_TOKEN_GROUP_KEY="<JWT_TOKEN_GROUP_KEY>" JWT_TOKEN_PARSER_TYPE="<JWT_TOKEN_PARSER_TYPE>"
-
Run the following command to start spark-shell with parameters.
./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" --conf "spark.hadoop.privacera.jwt.oauth.enable=true" --conf "spark.hadoop.privacera.jwt.token.publickey=${JWT_TOKEN_PUBLIC_KEY_FILE}" --conf "spark.hadoop.privacera.jwt.token.issuer=${JWT_TOKEN_ISSUER}" --conf "spark.hadoop.privacera.jwt.token.parser.type=${JWT_TOKEN_PARSER_TYPE}" --conf "spark.hadoop.privacera.jwt.token.userKey=${JWT_TOKEN_USER_KEY}" --conf "spark.hadoop.privacera.jwt.token.groupKey=${JWT_TOKEN_GROUP_KEY}"
Use Cases
-
Add a policy in Access Manager with read permission to ${S3_BUCKET}.
val file_path = "s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv" val df=spark.read.csv(file_path) df.show(5)
-
Add a policy in Access Manager with delete and write permission to ${S3_BUCKET}.
df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/csv/customer_data.csv")