Databricks SQL Encryption

The following steps enable use of Privacera encryption services in a Databricks SQL notebook:

Create a secret shared by Privacera Encryption Gateway (PEG) and Databricks.
Create Resource Policies in Privacera for data access to Databricks SQL resources.
Create Privacera encryption and decryption User-Defined Functions (UDFs) in Databricks.

For more information about Privacera encryption schemes, see the Privacera Encryption Guide.

Prerequisites

A working Databricks SQL installation connected to PrivaceraCloud. See Databricks to learn more.Databricks
Databricks CLI installed to your client system and configured to attach to your Databricks host. See Databricks Documentation: Databricks CLI and Databricks Documentation: Authenticating using Databricks personal access tokens.
Privacera Encryption Gateway (PEG) enabled and configured in your account settings. See About Account.

Grant permission in encryption scheme policy

To use Databricks SQL encryption, you must create a scheme policy for a user that will use the Databricks UDF. This scheme policy must grant the getSchemes permission. See Create Scheme Policies on PrivaceraCloud to learn more.

Configure Databricks

Create Databricks secrets

With the Databricks CLI:

Create a secret scope called privaceracloud:

databricks secrets create-scope --scope privaceracloud

Add secrets to this scope:
- peg_username, peg_password, and peg_secret are literals and should be entered exactly as shown.
- The <username>, <password>, and <sharedsecret> values below are the same as what you entered in PrivaceraCloud when adding the PEG service. See API Key to learn more.
```
databricks secrets put --scope privaceracloud --key peg_username --string-value <username>
databricks secrets put --scope privaceracloud --key peg_password --string-value <password> 
databricks secrets put --scope privaceracloud --key peg_secret --string-value <sharedsecret>
```

Add Privacera environment variables to Databricks cluster

Add the following environment variables in your Databricks cluster:

PEG_SECRET={{secrets/privaceracloud/peg_secret}}
PEG_PASSWORD={{secrets/privaceracloud/peg_password}}
PEG_USERNAME={{secrets/privaceracloud/peg_username}}

Caution

Note that there can be existing environment variables. Do not remove these.

Create Privacera protect and unprotect User-Defined Functions (UDFs)

First log into Databricks, create a notebook, and set the language to SQL.

Run the following SQL commands in Databricks to create UDFs for Privacera encryption services, named protect and unprotect.

Note

com.privacera.crypto functions enable use of encryption schemes, but do not accept presentation schemes.

Create Privacera protect UDF:

create database if not exists privacera;
use privacera;
drop function if exists privacera.protect;
CREATE FUNCTION privacera.protect AS com.privacera.crypto.PrivaceraEncryptUDF'

Create Privacera unprotect UDF:

use privacera;
drop function if exists privacera.unprotect;
CREATE FUNCTION privacera.unprotect AS com.privacera.crypto.PrivaceraDecryptUDF'

Configure Privacera resource policies

Databricks SQL resources are managed under Access Manager > Resource Policies > privacera_hive.

To add resource policies to allow access to selected resources:

Create a policy to give data access users, groups, or roles the select privilege to target database resources. On the Add Policy page, under Allow Conditions use Select Role, Select Group and/or Select User then under Permissions choose select.
For example:
Create a policy to grant data access users, groups, or roles the select privilege to the protect and unprotect UDFs. On the Add Policy page, under Allow Conditions use Select Role, Select Group and/or Select User then under Permissions choose select.
For example:

How to use UDFs in SQL to encrypt and decrypt

The following are SQL command examples for privacera.protect (encrypt) and privacera.unprotect (decrypt) UDFs:

privacera.protect

select privacera.protect(<COLNAME>,'<ENCRYPTION_SCHEME_NAME>') from <DB_NAME>.<TABLE_NAME>;

<COLNAME> is the identifier of the column to encrypt.
<ENCRYPTION_SCHEME_NAME> is the name of the chosen Privacera encryption scheme.
<DB_NAME>.<TABLE_NAME> are the names of the database and table in that database.

Example

In this example, the email column of the bigdatabase.customer_data table is encrypted with the SYSTEM_EMAIL encryption scheme.

select privacera.protect(email, \'SYSTEM\_EMAIL\') from bigdatabase.customer\_data;

privacera.unprotect

select privacera.unprotect(<COLNAME>,'<ENCRYPTION_SCHEME_NAME>') from <DB_NAME>.<TABLE_NAME>;

<COLNAME> is the identifier of the column to decrypt.
<ENCRYPTION_SCHEME_NAME> is the name of the chosen Privacera encryption scheme, which must be the same encryption scheme used to originally encrypt.
<DB_NAME>.<TABLE_NAME> are the names of the database and table in that database.

Example

In this example, the email column of the bigdatabase.customer_data table is decrypted with the SYSTEM_EMAIL encryption scheme.

select privacera.unprotect(email, 'SYSTEM_EMAIL') from bigdatabase.customer_data;

privacera.unprotect with optional presentation scheme

The unprotect UDF supports an optional specification of a presentation scheme that further obfuscates the decrypted data.

For an example of data transformation with the optional presentation scheme, see Example of Data Transformation with /unprotect and Presentation Scheme..PEG REST API on PrivaceraCloud

Example query:

select id, privacera.unprotect(<COLUMN_NAME>, <ENCRYPTION_SCHEME_NAME>, <PRESENTATION_SCHEME_NAME>) <OPTIONAL_NAME_FOR_COLUMN_TO_WRITE_OBFUSCATED_OUPUT> from <DB_NAME>.<TABLE_NAME>;

<PRESENTATION_SCHEME_NAME> is the name of the chosen Privacera presentation scheme with which to further obfuscate the decrypted data.
<OPTIONAL_NAME_FOR_COLUMN_TO_WRITE_OBFUSCATED_OUTPUT> is a "pretty" name for the column that the obfuscated data is written to.
Other arguments are the same as in the preceding unprotect example.

PrivaceraCloud Documentation

Table of ContentsTable of Contents