Skip to main content

Privacera Documentation

Table of Contents

Encryption UDFs for Apache Spark on PrivaceraCloud

This section describes how to install and configure the Privacera Crypto jar in Apache Spark to use Encryption UDFs to encrypt and decrypt data in Open Source Saprk.

Syntax of Privacera Encryption UDFs for Apache Spark

The Privacera Crypto jar includes the following encryption-related UDFs.

Encrypt: With the quoted '<encryption_scheme_name>', the protect UDF encrypts all values of <column_name> and writes the encrypted data to <new_column_name> in <table_name>:

select protect(<column_name>, <encryption_scheme_name>) as <new_column_name> from <database_name>.<table_name>;

Decrypt: With the quoted '<encryption_scheme_name>', the unprotect UDF decrypts all values of <column_name> and writes the decrypted data to <new_column_name> in <table_name>:

select unprotect(<column_name>, '<encryption_scheme_name>') as <new_column_name> from <database_name>.<table_name>;

Decrypt with obfuscation: With the quoted '<encryption_scheme_name>', the unprotect UDF decrypts all values of <column_name>, further obfuscates the decrypted data via <presentation_scheme_name>, and writes the decrypted, obfuscated data to <new_column_name> in <table_name>:

select unprotect(<column_name>, '<encryption_scheme_name>', 'presentation_scheme_name') as <new_column_name> from <table_name>;

For example usage, see Example queries to verify UDFs

Download and install Privacera Crypto jar

To install the Privacera Crypto jar file in Apache Spark, get the URL of the Privacera Crypto jar file and download it to your Apache Spark, instance:

  1. In your PrivaceraCloud account, go to Settings > API Key.

  2. Under the PEG heading, for PEG Crypto Starburst Trino Jar, click COPY URL, and use that URL with wget on the command line of your Apache Spark instance. This URL is shown below as <Privacera_Crypto_Jar_URL>.

    cd <path_to_apache_spark_home_directory>
    wget <Privacera_Crypto_Jar_URL> -O privacera-crypto-jar-with-dependencies.jar
  3. Copy the jar file to your Apache Spark instance's plugins/privcera directory, which you should create if it does not already exist.

Set up in Apache Spark

After you have downloaded the Privacera Crypto jar, you need to set some properties, update your Apache Spark start-up script, and define the UDFs.

Set variables in Apache Spark conf/crypto.properties

Create a file in Apache Spark called <path_to_apache_spark_home_directory>/conf/crypto.properties:

Add the following properties to the file, where:

  • The value of your endpoint for Privacera Encryption on PrivaceraCloud, <PrivaceraCloud_Encryption_URL> is obtained by clicking the Copy Url link in Settings > Api Key

privacera.crypto.native.threadpool.size=100
privacera.crypto.shared.secret=secret
privacera.crypto.session.cache.size=1000
privacera.deployment.mode.saas=true
privacera.peg.base.url=<PrivaceraCloud_Encryption_URL>
privacera.peg.username=<PrivaceraCloud_Encryption_Username>
privacera.peg.password=<PrivaceraCloud_Encryption_Password>

Add envar to spark-env.sh

Follow these commands to define the path to the Privacera Crypto jar:

vi  <absolute_path_to_apache_spark_home_directory>/conf/spark-env.sh
export CRYPTO_CONFIG_DIR=<absolute_path_to_apache_spark_home_directory>/conf

Restart Apache Spark

# Go to Apache Spark bin directory
cd <absolute_path_to_trino_home_directory>/bin
# Restart Apache Spark
./spark-sql

Create Privacera protect and unprotect UDFs

To create both Privacera protect and unprotect user-defined function (UDF), run the following SQL commands inApache Spark :

create database if not EXISTS privacera;
drop function if exists privacera.protect;
drop function if exists privacera.unprotect;
drop function if exists privacera.mask;

CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF';
CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF';
CREATE FUNCTION privacera.mask AS 'com.privacera.crypto.PrivaceraMaskUDF';
 

Example queries to verify UDFs

See the syntax detailed in Syntax of Privacera Encryption UDFs for Apache Spark.

Encrypt: The following example query with the protect UDF encrypts the cleartext CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted'EMAIL' encryption scheme:

select protect(CUSTOMER_EMAIL, `EMAIL`) from CUSTOMERS;

Decrypt: The following example query with the unprotect UDF decrypts the encrypted CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted 'EMAIL' encryption scheme:

select unprotect(CUSTOMER_EMAIL, 'EMAIL') from CUSTOMERS;

Decrypt with obfuscation: The following example query with the unprotect UDF decrypts the encrypted CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted 'EMAIL' encryption scheme and obfuscates the decrypted data with the presentation scheme PRESENTATION_EMAIL:

select unprotect(CUSTOMER_EMAIL, 'EMAIL', 'PRESENTATION_EMAIL') as OPTIONAL_OUTPUT_COLUMN_FOR_OBFUSCATED_DATA from CUSTOMERS;