Skip to content

Privacera Encryption - Encrypt and Decrypt files with Databricks Cluster

This guide explains how to integrate Privacera Encryption with your Databricks clusters. It covers how to encrypt and decrypt files using Scala Input and Output Streams.

Prerequisites

Before you begin, ensure the following prerequisites are met:

  • A Databricks cluster is up and running.
  • Ensure your Databricks environment supports Scala.
  • Privacera Encryption Gateway (PEG) must be enabled and properly configured. For details, see the PEG setup documentation
  • Generate the necessary system schemes within Privacera.
  • The target table exists and contains valid data for protect, unprotect, or mask operations.

Encrypting Files with Scala on Databricks

To encrypt a file, run the following commands in a Databricks notebook:

Scala
import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI

val hadoopConf = spark.sessionState.newHadoopConf()
val fs = FileSystem.get(new URI("dbfs:/"), hadoopConf)

val inputStream = fs.open(new Path("<INPUT_FILE_PATH>"))
val outputStream = fs.create(new Path("<OUTPUT_FILE_PATH>"), true)

val binaryCryptoUtil = new com.privacera.encryption.hive.util.BinaryCryptoUtil()
binaryCryptoUtil.encryptStream("SYSTEM_ADDRESS", inputStream, outputStream)

inputStream.close()
outputStream.close()

println("✅ Input file is encrypted")

Decrypting Files with Scala on Databricks.

To decrypt a file, run the following commands in a Databricks notebook:

Scala
import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI

val hadoopConf = spark.sessionState.newHadoopConf()
val fs = FileSystem.get(new URI("dbfs:/"), hadoopConf)

val inputStream = fs.open(new Path("<INPUT_FILE_PATH>"))
val outputStream = fs.create(new Path("<OUTPUT_FILE_PATH>"), true)

val binaryCryptoUtil = new com.privacera.encryption.hive.util.BinaryCryptoUtil()
binaryCryptoUtil.decryptStream("SYSTEM_ADDRESS", inputStream, outputStream)

inputStream.close()
outputStream.close()

println("✅ Output file is decrypted")

Note

  • Cluster name should not contain special characters except underscore (_).
  • No extra libraries are needed on the cluster; this is deprecated as manual additions may cause conflicts.
  • The init script sets up required configurations for PEG in Databricks Cluster.

Warning

  • Always ensure secure handling of Input/Output streams in Scala to prevent data leakage or unauthorized access.

Comments