Skip to main content

Privacera Platform

Table of Contents

Encryption with Databricks, Hive, Streamsets, Trino

:

Databricks UDFs for encryption and masking on PrivaceraPlatform

Set up Databricks encryption and masking

This topic describes how to install and configure the Privacera Encryption JAR file UDF in Privacera Manager Databricks to create UDFs for encryption and masking and to create policies for users and groups.

The overall approach is as follows:

  1. Install the Privacera Manager Encryption JAR in Databricks with the Databricks CLI or UI

  2. Upload Privacera Manager configuration files to Databricks

  3. Define UDFs in Databricks to call the Privacera Manager encryption protect and unprotect methods.

Prerequisites for setting up Databricks encryption and masking
  • In Databricks, make sure that the users who will use the UDFs have sufficient access to write the pertinent tables.

  • In Privacera Manager, make sure to configure the Databricks datasource: Databricks Spark Plugin (Python/SQL) on AWS, Azure, or GCP.

  • In Privacera Manager, make sure that Privacera Encryption has been enabled.

  • In Privacera Manager, make sure that the users who will use the UDFs in Databricks have been given permission to access the encryption scheme policies that are part of the UDF syntax.

  • In Privacera Manager, make sure that these same users have been given permission to access the encryption keys in the Ranger KMS.

Methods for installing the Privacera encryption JAR

You can install the Privacera encryption JAR file in the following ways:

After you install the JAR file, you need to define some configuration properties and User-Defined Functions (UDFs) to call the Privacera encryption /protect and /unprotect API endpoints.

Install the Privacera encryption JAR using Databricks CLI

To install the Privacera encryption JAR using the Databricks CLI, follow these steps:

  1. Download the JAR to a local machine.

    The variable PRIVACERA_BASE_DOWNLOAD_URL depends on the version of the Privacera software you want. See Configure and Install Core Services.

    export PRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies.jar -O privacera-crypto-jar-with-dependencies.jar
    
  2. Upload the JAR file to DBFS or an S3 location from where the Databricks cluster can access it.

  3. Upload the jar into DBFS using the Databricks CLI:

    databricks fs ls
    databricks fs mkdirs dbfs:/privacera/crypto/jars
    databricks fs cp privacera-crypto-jar-with-dependencies.jar dbfs:/privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar
Install the Privacera encryption JAR using Databricks UI

To install the Privacera encryption JAR using the Databricks UI, follow these steps:

  1. Navigate to the Databricks cluster details page by selecting Clusters > cluster name > Libraries.

  2. Click Install > New.

  3. Drop or upload the JAR file.

    dbfs:/privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar

  4. Wait until the JAR file is installed.

Create and upload encryption configuration files

The steps here rely on the default location of the Privacera crypto properties file. However, you can change this location to a directory of your choice. Follow the steps here and then see Custom Path to Crypto Properties File in Databricks.

To create and upload the encryption configuration files, do the following:

  1. Create the configuration file on your local machine. In the next step, upload the file to the Databricks cluster.

    mkdir -p privacera/crypto/configs
    cd privacera/crypto/configs
     # Edit the crypto_default.properties file to set the following variables. 
    vi crypto_default.properties
    privacera.portal.base.url=http://<APP_HOSTNAME.>:6868 
    privacera.portal.username=<SOME_USERNAME>
    privacera.portal.password=<SOME_PASSWORD>
     # Mode of encryption/decryption: rpc or native
    privacera.crypto.mode=native
    
  2. Upload the configuration file to DBFS.

    databricks fs ls
    databricks fs mkdirs dbfs:/privacera/crypto/configs
    databricks fs cp crypto_default.properties dbfs:/privacera/crypto/configs/crypto_default.properties
    
Create and run Databricks UDFs for encryption

You can create Privacera Encryption user-defined functions (UDFs) by running SQL queries in your Databricks cluster.

Create Privacera protect UDF

To create a Privacera protect user-defined function (UDF), run the following SQL query in your Databricks cluster:

create database if not exists privacera;
drop function if exists privacera.protect;
CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF';
Create Privacera unprotect UDF

To create a Privacera unprotect user-defined function (UDF), run the following SQL query in your Databricks cluster:

create database if not exists privacera;
drop function if exists privacera.unprotect;
CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF';
Run sample queries in Databricks to verify

You can run queries to verify that the definitions of the UDFs in Databricks actually do work. These queries do the following:

  1. Run the protect UDF to encrypt a database column shown as a variable <colname> . Substitute your own column name for this variable.

  2. Run the unprotect UDF to decrypt that same database column.

Sample query to run encryption:

select privacera.protect($<colname>,'$<SCHEME_NAME>') from $<db_name>.$<table_name> limit10;

Sample query to run encryption and decryption in a single query to verify the setup:

select privacera.unprotect(privacera.protect($<colname>,'$<SCHEME_NAME>'),'$<SCHEME_NAME>') from $<db_name>.$<table_name> limit10;
Create and run Databricks UDF for masking

You can create Privacera user-defined functions (UDFs) for masking by running the following SQL query in your Databricks cluster:

drop function if exists db.mask;
CREATE FUNCTION db.mask AS 'com.privacera.crypto.PrivaceraMaskUDF'
Run sample queries to verify masking and encryption

Redact the column email from the customer_data database with the masking scheme EMAIL_REDACT_SCHEME and save the output to a column called RedactedEmail.

select mask(email,'EMAIL_REDACT_SCHEME')
as RedactedEmail
db.customer_data;

Single query to encrypt and mask: Encrypt (protect) the column PERSON_NAM from the customer_data database with the PERSON_NAME_ENCRYPTION_SCHEME and mask the EMAIL from the customer_data database with the masking scheme EMAIL_MASKING_SCHEME. The data are transformed in place with no intermediate location.

select protect(PERSON_NAME,'PERSON_NAME_ENCRYPTION_SCHEME'),
mask(EMAIL,'EMAIL_MASKING_SCHEME')
from db.customer_data;
Create a custom path to the crypto properties file in Databricks

For your Databricks UDFs, you might want to change the location of Privacera's crypto properties file in your Databricks cluster to enhance security. The crypto.properties file contains configuration settings for Privacera Encryption.

To change the location of the crypto.properties file, follow these steps:

  1. Move the properties file to a new directory on your Databricks cluster.

  2. Define an environment variable in Databricks to point to that new directory.

  3. Define the same path in a Privacera custom variable on your Privacera host.

The following sections describe how to complete each of the above steps.

Note

This is to change the location of the crypto properties file on your Databricks cluster, not a DBFS location.

Move the crypto properties file to new location in Databricks

In your Databricks cluster, the default location of the Privacera crypto properties file is /databricks/crypto/config/crypto.properties. This an absolute path starting with /.

  1. On your Privacera host, move the properties file from its default location to the new path. This must be an absolute path starting with /.

  2. Make a note of this new path.

In the steps here, this new location is called <absolute_path_on_databricks_cluster_to_directory_with_crypto.properties_file>.

Define an environment variable in Databricks

You must set an environment variable to point to the new location of the Privacera crypto properties file.

As the Databricks administrator, in your Databricks cluster, do the following:

  1. Navigate to the system Configuration tab.

  2. In the Environment Variables section, add the following line: CRYPTO_CONFIG_DIR=<absolute_path_on_databricks_cluster_to_directory_with_crypto.properties_file>.

  3. Save the change.

Define custom variable in Privacera

You need to define the same new path of the crypto properties file in your Privacera installation and update the configuration.

As the Privacera administrator, on the Privacera host, run the following commands:

cd ~/privacera/privacera-manager
cp config/sample-vars/vars.databricks.plugin.yml  config/custom-vars/vars.databricks.plugin.yml 
vi config/custom-vars/vars.databricks.plugin.yml
DATABRICKS_CRYPTO_CONFIG_DIR: "<absolute_path_on_databricks_cluster_to_directory_with_crypto.properties_file>"
# Save the file
# Update the configuration
cd ~/privacera/privacera-manager
./privacera-manager.sh update

Hive UDFs for encryption on Privacera Platform

This topic provides instruction on how to enable encryption using a Hive user-defined function (UDF).

Add Privacera UDF in Hive

To add a Privacera user-defined function (UDF) in Hive, follow these steps:

  1. Log in to the Privacera Portal.

  2. From the navigation menu, select Encryption & Masking > Encryption.

  3. Under the Diagnostics tab, click Encryption.

  4. Click Create for UDF - Protect, UnProtect.

  5. Click Yes.

    The UDF is created.

Confirm Privacera UDF in Hive
  1. SSH to the instance.

  2. Do kinit for <user>.

  3. Connect to beeline.

    # Example
    beeline -u "jdbc:hive2://<hostname>:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n $<user>
    
  4. Enter the following two commands to describe the functions created by Privacera:

    • DESCRIBE FUNCTION EXTENDED privacera.protect;

    • DESCRIBE FUNCTION EXTENDED privacera.unprotect;

    Note

    If the UDF has not been created successfully, you will see the error message “Function ‘privacera.protect’ does not exist”.

Test Privacera UDF in Hive
  1. Log in to Privacera Portal.

  2. From the navigation menu, select Encryption & Masking > Schemes .

  3. Click Add.

  4. Create a scheme by entering the following details:

    • Scheme Name: SWIFT_ID

    • Format type: FPE_ALPHA_NUMERIC

    • Scope: All

    • Algorithm: FPE

  5. Click Save.

  6. Check if the KMS key is generated for the scheme:

    1. Log in to Ranger at http://<hostname>:6080 with the username keyadmin.

    2. Click Encryption > Key Manager.

  7. SSH to the instance.

  8. Do kinit for <user>.

  9. Connect to beeline.

    #Example
    beeline -u "jdbc:hive2://<hostname>:2181/;
    serviceDiscoveryMode=zooKeeper;
    zooKeeperNamespace=hiveserver2" -n $<user>
    
  10. After connecting to beeline enter the following commands.

    select privacera.protect("TEXT" ,"SWIFT_ID");
    select privacera.unprotect(privacera.protect("TEXT" ,"SWIFT_ID"), "SWIFT_ID");
    
  11. Check the KMS audits in Ranger.

    • Log in to Ranger http://<hostname>:6080 with the username keyadmin.

    • Click Audit > Access.

    • If the UDF fails, the message "denied access" is recorded in Ranger KMS audits.

      If access is denied, you need to give permission to `$<user>`.

    • If the UDF result is successful, the audits are shown as Allowed.

    • To check crypto UDF logs, run this command:

      sudo su
      tail -f /var/log/hive/hiveserver2.log
      
Give users access to UDFs

Users need access to the UDFs.

Set up Privacera Encryption
Check if the shaded JAR is needed

You might need to use a shaded JAR depending on certain conditions described below.

To check if the shaded JAR is needed, go to the Hive libraries directory:

cd /opt/cloudera/parcels/CDH/lib/hive/lib/
ls | grep http

You need to install the shaded JAR if:

  • The version of the http client is less than 4.5.6.

  • The UDF throws a NoClassDefFound exception for class org/apache/http/config/Lookup.

To install the shaded JAR, see Install the shaded JAR.

Create Privacera UDF in CDH/CDP

To install the Privacera Encryption JAR in CDH/CDP cluster, do the following:

  1. Download the JAR to the cluster node.

    sudo su - privacera
    exportPRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies.jar -O privacera-crypto-jar-with-dependencies.jar
    
  2. Upload the JAR into an HDFS location where Hive can access it. The following are the commands to upload the JAR into HDFS using Hadoop CLI:

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar
    
  3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    
    vi crypto.properties
    
    privacera.portal.base.url=http://$<PRIVACERA_SERVER>:6868
    privacera.portal.username=$<USER_NAME>
    privacera.portal.password=$<PASSWORD>
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    
    cp crypto.properties  crypto_default.properties
    
  4. Upload the configuration files to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  5. Install the Crypto JAR into the Cluster.

    1. SSH to the cluster node.

    2. Do kinit for $<user> (if Kerberos is enabled).

    3. Connect to beeline.

    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for for the variables <HIVE_SERVER2_HOSTNAME> and <REALM> below.

      HIVE_SERVER2_HOSTNAME=<<10.1.1.2>>REALM=<<EXAMPLE.PRIVACERA.US>>beeline -u "jdbc:hive2://$<HIVE_SERVER2_HOSTNAME>:10000/default;principal=hive/$<HIVE_SERVER2_HOSTNAME>@$<REALM>"              
  6. Add Privacera crypto UDF JAR. You need to run multiple SQL queries in Databricks cluster to create privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar;
      
  7. Create the Privacera encryption UDF. You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect UDF:

      create database if not exists privacera ;
      drop functionif exists privacera.unprotect;
      create function privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      
    • SQL query to create Privacera protect UDF:

      drop functionif exists privacera.protect;
      create function privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies.jar';
      
Install the shaded JAR

To install the shaded Privacera Encryption JAR in a CDH/CDP cluster, follow these steps:

  1. Download JAR in cluster node.

    sudo su - privacera
    exportPRIVACERA_BASE_DOWNLOAD_URL=$<PRIVACERA_BASE_DOWNLOAD_URL>
    wget $<PRIVACERA_BASE_DOWNLOAD_URL>/privacera-crypto-jar-with-dependencies-shaded.jar -O privacera-crypto-jar-with-dependencies-shaded.jar
    
  2. Upload JAR into HDFS location from where Hive can access it. Following are the commands to upload JAR into HDFS using Hadoop CLI.

    kinit -kt /opt/privacera/keytab/privacera.headless.keytab privacera
    hadoop fs -ls /
    hadoop fs -mkdir -p /privacera/crypto/jars
    hadoop fs -put privacera-crypto-jar-with-dependencies-shaded.jar /privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar
    

    3. Create configuration files.

    hadoop fs -mkdir -p /privacera/crypto/configs
    vi crypto.properties
    privacera.portal.base.url=http://$<PRIVACERA_SERVER>:6868
    privacera.portal.username=$<USER_NAME>
    privacera.portal.password=$<PASSWORD>
    
    #Mode of encryption/decryption rpc/native
    privacera.crypto.mode=native
    cp crypto.properties  crypto_default.properties
    
  3. Upload the configuration files to DBFS.

    hadoop fs -put crypto.properties /privacera/crypto/configs/crypto.properties
    hadoop fs -put crypto_default.properties /privacera/crypto/configs/crypto_default.properties
    
  4. Install Crypto JAR into cluster.

    1. SSH to the cluster node.

    2. Do kinit for <user> (if Kerberos is enabled).

    3. Connect to beeline.

    4. If you have Kerberos-enabled cluster, use the following command to login to beeline:

      Update the values for <HIVE_SERVER2_HOSTNAME> and <REALM> as per your environment.

      HIVE_SERVER2_HOSTNAME=<<10.1.1.2>>REALM=<<EXAMPLE.PRIVACERA.US>>beeline -u "jdbc:hive2://$<HIVE_SERVER2_HOSTNAME>:10000/default;principal=hive/$<HIVE_SERVER2_HOSTNAME>@$<REALM>"
      
  5. Add Privacera Crypto UDF JAR. You need to run multiple SQL queries in Databricks cluster to create Privacera encryption functions.

    • SQL query to create the Privacera unprotect UDF.

      add jar hdfs:///privacera/crypto//privacera-crypto-jar-with-dependencies-shaded.jar;
      
  6. Create the Privacera encryption UDF. You need to run multiple SQL queries in beeline to create Privacera encryption UDF.

    • SQL query to create Privacera unprotect UDF .

      create database if not exists privacera ;
      drop functionif exists privacera.unprotect;
      CREATE FUNCTION privacera.unprotect AS 'com.privacera.crypto.PrivaceraDecryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
    • SQL query to create Privacera protect UDF.

      drop functionif exists privacera.protect;
      CREATE FUNCTION privacera.protect AS 'com.privacera.crypto.PrivaceraEncryptUDF' using jar 'hdfs:///privacera/crypto/jars/privacera-crypto-jar-with-dependencies-shaded.jar';
Hive TEZ: Add properties file in beeline

If you are using Hive TEZ, before executing the UDFs, you must add the cryptop.properties file in beeline with the following command:

add file hdfs:///privacera/crypto/configs/crypto.properties;
Sample queries to verify setup
  • Sample query to run encryption:

    select privacera.protect("test_data","$<SCHEME_NAME>") limit 10;
  • Sample query to run encryption and decryption in one query to verify setup:

    select privacera.unprotect(privacera.protect("test_data","$<SCHEME_NAME>"),"$<SCHEME_NAME>") limit 10;
  • For authorization and leveraging Hive Masking policies, you need to install the Hive plug-in in hiveserver2.

If you do not want to install the Hive plug-in, you can authorize use of the keys based on KMS.

  • Create a view on top of your raw table:

    create view secure_view as select col1, privacera.protect(col2, ‘SSN’) as col2 from db.table;
    select * from secure_view;
    
  • If the user is not present in Privacera, they can still access the protect function. It is recommended to use the Hive plug-in as they can control the access to the resource using Ranger Policies and it makes it easier to manage them with the simple UI.

StreamSets Data Collector (SDC) and Privacera Encryption on Privacera Platform

This topic provides instruction on how to install and configure the Privacera StreamSets plugin for Ranger and Privacera Encryption.

Enable Encryption for SDC

To enable Privacera Encryption for the StreamSets Data Collector (SDC), do the following:

  1. Run the following command:

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.crypto.streamset.yml custom-vars/vars.crypto.streamset.yml
    
  2. Update Privacera Manager:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
    
Configure Encryption for SDC
  1. Copy the StreamSets Privacera package.

    1. If you have StreamSets and Privacera Manager running on different systems, copy the following two files from ~/privacera/privacera-manager/output/streamset/ on the Privacera Manager host machine:

      • privacera-streamset.tar.gz

      • crypto-config

      If you have JCEKS enabled, copy the following file from the location, ~/privacera/privacera-manager/config/keystores/ of the Privacera Manager host machine:

      • cryptoprop.jceks

    2. If you have StreamSets and Privacera Manager running on same system, do the following:

      cp ~/privacera/privacera-manager/output/streamset/privacera-streamset.tar.gz ~/privacera/downloads
      cp -r ~/privacera/privacera-manager/output/streamset/crypto-config ~/privacera/downloads/crypto-config
      

      If you have JCEKS enabled, do the following:

      cp ~/privacera/privacera-manager/config/keystores/cryptoprop.jceks ~/privacera/downloads/crypto-config/
      
  2. Extract the StreamSets Privacera package.

    cd ~/privacera/downloads
    mkdir streamsets
    tar xfz ~/privacera/downloads/privacera-streamset.tar.gz -C streamsets
    
  3. Access the StreamSets installation directory as root user.

    sudo su
    
  4. Set the StreamSets installation directory.

    export STREAMSET_HOME=/opt/streamset/streamsets-datacollector-3.13.0
    
  5. Copy the Privacera library into the StreamSets data collector user-libs directory:

    cp -r streamsets/privacera-streamset/ $<STREAMSET_HOME>/user-libs/
    
  6. Copy the configuration files.

    cp -r crypto-config $<STREAMSET_HOME>/../crypto-config
    
  7. Define a security policy.

    cat << EOF >> $<STREAMSET_HOME>/etc/sdc-security.policy 
    grant <
    permission java.io.FilePermission "/opt/privacera/-", "read";
    permission java.io.FilePermission "/opt/streamset/-", "read,write";
    permission java.net.SocketPermission "*", "connect,accept,listen,resolve";
    >;
    EOF                              
  8. Stop StreamSets.

    kill -9 $(ps aux | grep 'sdc'| awk '<print $2>')
  9. Restart StreamSets.

    ulimit -n 32768
    nohup $<STREAMSET_HOME>/bin/streamsets dc &
    
  10. Verify the logs to make sure that StreamSets is running.

    tail -f $<STREAMSET_HOME>/log/sdc.log
    
Verify StreamSets setup

To verify that Privacera Encryption is now working with the StreamSets Data Collector (SDC), follow these steps:

  1. Configure a sample pipeline to encrypt a local file. You can use the following sample. Import this sample pipeline into StreamSets. For more information, see Sample pipeline.

  2. Access the StreamSets installation directory as root user.

    sudo su
    
  3. Create data directories.

    DATA_DIR=/opt/streamset/
    cd $<DATA_DIR>
    mkdir -p customer_data/input 
    mkdir -p customer_data/output
    mkdir -p customer_data/input_error
    mkdir -p customer_data/output/encrypted_error
    
  4. Create a sample data file:

    cat << EOF > customer_data/input/customer_data_with_header.csv 
    id,name,ssn,email_address,amount
    1,Tamara,898453744,aphillips@vang.info,162454.67
    2,Richard,65511350,vreynolds@gmail.com,602.89
    3,Tanya,634090950,harringtonwilliam@diaz-king.com,48712.67
    4,Richard,829439881,martinvalerie@yahoo.com,5122.02
    5,Raymond,227804351,sarachavez@yahoo.com,97963.857
    6,Melissa,553465892,kevinwillis@gmail.com,36654.806
    7,Deborah,782539839,brittney24@yahoo.com,19.231
    8,Rodney,515337130,jenniferkelly@davis-bond.biz,65083.651
    9,Katherine,137057143,jperkins@gmail.com,4822.343
    10,David,432941241,wmccann@hotmail.com,4069.34
    EOF
  5. Create a metadata file to map the input dataset columns to Privacera Encryption schema columns:

    cat << EOF > customer_data/customer_data.meta
    COLUMN_NAME|SCHEME_NAME
    id|
    name|SYSTEM_PERSON_NAME
    ssn|SYSTEM_SSN
    email_address|SYSTEM_EMAIL
    amount|
    EOF

    To run the sample pipeline, make sure you have the Privacera user created in your Ranger and it has permissions on the KMS keys starting with pmsk*.

Add permission for keys in Ranger
  1. Log in to the Ranger UI as an administrator and create the Privacera user. You can grant permissions to the Privacera user on keys.

  2. Log in to Ranger with keyadmin credentials and click on privacera_kms.

  3. Create or update policy for Privacera user.

  4. Run the StreamSets pipeline preview and verify the encrypted value on the right side of the table.

Trino UDFs for encryption and masking on Privacera Platform

This topic provides instruction on how to install and configure the Privacera crypto plugin for Trino. Doing so will allow you to use Privacera-supplied encryption user-defined functions (UDFs) in Trino to encrypt or decrypt data.

The protect and unprotect UDFs work with privacera_starburstenterprise but not with privacera_hive. Starburst has three possible configurations (Hive, System, and Hive + System), of which only the system-level has been verified.

Privacera Encryption UDFs for Trino

The Privacera crypto plugin includes the following UDFs:

  • Encrypt: With the quoted <encryption_scheme_name>, the protect UDF encrypts all values of <column_name> in a table:

    select protect(<column_name>, '<encryption_scheme_name>') from <table_name>;                  
  • Decrypt: With the <encryption_scheme_name>, the unprotect UDF decrypts all values of <column_name> in a table:

    select unprotect(<column_name>, '<encryption_scheme_name>') from <table_name>;                       
  • Decrypt with obfuscation: With the quoted <encryption_scheme_name>, the unprotect UDF decrypts all values of <column_name> in a table, further obfuscates the decrypted data via <presentation_scheme_name>, and writes the decrypted, obfuscated data to <optional_column_name_for_obfuscated_data>:

    select unprotect(<column_name>, '<encryption_scheme_name>', <presentation_scheme_name>) <optional_column_name_for_obfuscated_data> from <table_name>;
    
  • Decrypt with obfuscation: With the quoted <encryption_scheme_name>, the unprotect UDF decrypts all values of <column_name> in a table, further obfuscates the decrypted data via <presentation_scheme_name>, and writes the decrypted, obfuscated data to <optional_column_name_for_obfuscated_data>:

    select unprotect(<column_name>, '<encryption_scheme_name>', <presentation_scheme_name>) <optional_column_name_for_obfuscated_data> from <table_name>;
    

    For example usage, see Example Queries to Verify Privacera-supplied UDFs.

Prerequisites for installing Privacera crypto plugin for Trino

Before installing the Privacera crypto plugin for Trino, do the following:

  • Install Trino. In this topic, the location of the installed Trino software is shown as:

    <absolute_path_to_trino_home_directory>
    
  • Identify the users who will use the UDFs and ensure they have access to the pertinent tables.

  • Determine the required paths to the crypto JAR and crypto.properties file. The Encryption plugin for Trino relies on these files. The paths for each file depend on whether you have deployed Trino in a container (such as Docker). These different paths are detailed in the following sections.

Install the Privacera crypto plugin for Trino using Privacera Manager

To install the Privacera crypto plugin, follow these steps:

See the following sections for details about how to complete each step.

Upgrade Privacera Manager

To install the Privacera crypto plugin, you first need to update Privacera Manager to get a shell script. This shell script downloads the Privacera Encryption crypto plugin for Trino.

To do so, run the following commands:

# Change to Privacera Manager directory
cd ~/privacera/privacera-manager

# Upgrade Privacera Manager itself
 ./privacera-manager.sh upgrade-manager  
Configure the Privacera crypto plugin for Trino
 # Copy the Trino properties file to Privacera Manager config/custom-vars directory
 cp config/sample-vars/vars.starburst.enterprise.trino.yml config/custom-vars/

 # Set property STARBURST_TRINO_ENABLE to true
 vi config/custom-vars/vars.starburst.enterprise.trino.yml
 ...
 STARBURST_TRINO_ENABLE: "true"
 ...
 # Save the file
 # Edit starburst-trino-crypto.yml to specify Trino home directory
 vi ansible/privacera-docker/roles/defaults/main/starburst-trino-crypto.yml
 ...
 STARBURST_TRINO_INSTALL_DIR: <absolute_path_to_trino_home_directory>
 ...
 # Save the file
Run shell script to install Privacera crypto plugin
 # Change to Privacera Manager directory
 cd ~/privacera/privacera-manager

 # Update Privacera Manager to get shell script
 ./privacera-manager.sh update

 # Change to new directory created by privacera-manager update
 cd output/starburst-trino-crypto/

 # Make the script executable
 chmod +x privacera_crypto_trino_setup.sh
 #
 ######################################
 # NOTE: You must copy the script to your Trino or Starburst instance
 ######################################
 #
 #  Run the script on your instance from where you copied it
 ./privacera_crypto_trino_setup.sh
Verify that the shell script ran correctly

Verify the following:

  • The location of the Privacera crypto JAR:

    # For non-container deployment
    ls -l <absolute_path_to_trino_home_directory>/plugin/privacera/privacera-crypto-jar-with-dependencies.jar
    
    # For container deployment
    ls -l /data/starburst/plugin/privacera/privacera-crypto-jar-with-dependencies.jar
    
  • The location of the crypto.properties file in Trino's etc directory:

    # Verify existence of crypto.properties file
    # For non-container deployment
    ls -l <absolute_path_to_trino_home_directory>/etc/crypto.properties
    
    # For non-container deployment
    ls -l /data/starburst/etc/crypto.properties
                               
Restart Trino to register the Privacera crypto UDFs for Trino
# Go to Trino bin directory
cd /<trino_installation_directory>/bin

# Restart Trino
./launcher restart                   
privacera.unprotect with optional presentation scheme

The unprotect UDF supports an optional specification of a presentation scheme that further obfuscates the decrypted data.

Syntax:

select <id>, privacera.unprotect(<COLUMN_NAME>, <ENCRYPTION_SCHEME_NAME>, <PRESENTATION_SCHEME_NAME>) <OPTIONAL_NAME_FOR_COLUMN_TO_WRITE_OBFUSCATED_OUPUT> from <DB_NAME>.<TABLE_NAME>;

where:

  • <PRESENTATION_SCHEME_NAME> is the name of the chosen Privacera presentation scheme with which to further obfuscate the decrypted data.

  • <OPTIONAL_NAME_FOR_COLUMN_TO_WRITE_OBFUSCATED_OUTPUT> is a "pretty" name for the column that the obfuscated data is written to.

  • Other arguments are the same as in the preceding unprotect example.

Example queries to verify Privacera-supplied UDFs

See the syntax detailed in Syntax of Privacera Encryption UDFs for Trino.

  • Encrypt: The following example query with the protect UDF encrypts the cleartext CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted'EMAIL' encryption scheme:

    select protect(CUSTOMER_EMAIL, `EMAIL`) from CUSTOMERS;
    
  • Decrypt: The following example query with the unprotect UDF decrypts the encrypted CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted 'EMAIL' encryption scheme:

    select unprotect(CUSTOMER_EMAIL, 'EMAIL') from CUSTOMERS;
    
  • Decrypt with obfuscation: The following example query with the unprotect UDF decrypts the encrypted CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted 'EMAIL' encryption scheme, obfuscates the decrypted data with the presentation scheme PRESENTATION_EMAIL, and writes the decrypted, obfuscated data to OPTIONAL_OUTPUT_COLUMN_FOR_OBFUSCATED_DATA:

    select unprotect(CUSTOMER_EMAIL, 'EMAIL', PRESENTATION_EMAIL) OPTIONAL_OUTPUT_COLUMN_FOR_OBFUSCATED_DATA from CUSTOMERS;