PrivaceraCloud Documentation

Privacera Encryption UDF for masking in Databricks

Privacera Encryption includes a UDF for Databricks that can one-way mask your data. For background, see Masking schemes.

Syntax of Databricks UDF for masking

The masking UDF for Databricks has the following syntax:

Mask: With the quoted '<mask_scheme_name>', the mask UDF one-way transforms all values of <column_name> in <table_name>:

select mask(<column_name>, <mask_scheme_name>) from <table_name>;
Prerequisites for Databricks masking UDF

The following should already be ready:

  • The Privacera init script for Databricks must be installed in your Databricks instance. See Databricks.

  • A fully functional installation of Databricks.

  • The users who will use the UDFs have sufficient access to the pertinent tables in Databricks.

Define the mask UDF in Databricks

In your Databricks instance, run the following command to define the mask UDF:

drop function if exists db.mask;
CREATE FUNCTION db.mask AS 'com.privacera.crypto.PrivaceraMaskUDF'           
Example query to verify Privacera-supplied mask UDF

See the syntax detailed in Syntax of Databricks UDF for masking.

Mask: The following example query with the mask UDF one-way transforms the cleartext CUSTOMER_EMAIL column of the CUSTOMERS table using the quoted'MASK_SCHEME_EMAIL' masking scheme:


Redact the column email from the customer_data database with the masking scheme EMAIL_REDACT_SCHEME and save the output to a column called RedactedEmail.

select mask(email,'EMAIL_REDACT_SCHEME')
as RedactedEmail

Single query to encrypt and mask: Encrypt (protect) the column PERSON_NAM from the customer_data database with the PERSON_NAME_ENCRYPTION_SCHEME and mask the EMAIL from the customer_data database with the masking scheme EMAIL_MASKING_SCHEME. The data are transformed in place with no intermediate location.

from db.customer_data;