Access Management for Databricks all-purpose compute clusters with Object-Level Access Control (OLAC)¶

Introduction¶

Privacera seamlessly integrates with Databricks all-purpose compute clusters that support Object-Level Access Control (OLAC), enabling the enforcement of data access policies, monitoring of data usage, and ensuring regulatory compliance. This document outlines the key features, benefits, and configuration steps for integrating Databricks all-purpose compute clusters with Privacera.

Connector Details¶

Topics	Details
Integration methodology	Dataserver Signature generation
Access Tools	Databricks Console, JDBC
Supported User Identities for Policies	LDAP/AD/SCIM Users LDAP/AD/SCIM Groups Privacera Roles
Data Source User Identities	SAML/SSO Databricks Login using Email Address Databricks Token Databricks Service Principal JWT token file in cluster

Supported Access Management Features¶

Feature	Supported	Native	Using SecureView
S3 Files Access control (s3a/s3n/s3)	Yes	No	N/A
Azure Data Lake Access control(abfs/abfss)	Yes	No	N/A
Centralized Access Audit	Yes	N/A	N/A
Granular Access Audit Record	Yes	N/A	N/A
Database Access Control	No	No	N/A
Table Access Control	No	No	N/A
View Access Control	No	No	N/A
Column Access Control	No	No	N/A
Row Access Control	No	No	N/A
Dynamic Column Data Masking	No	No	N/A
Dynamic Column Data Encryption	No	No	N/A
DBFS Files Access control	No	No	N/A

Supported Databricks matrix¶

Here is the supported Databricks matrix for Privacera integration with Databricks all-purpose compute clusters:

Supported runtime versions¶

Databricks offers multiple runtime versions, Privacera supports the following runtime versions:

Language	Supported	Scala Version
10.4 LTS	Yes	2.12
11.3 LTS	Yes	2.12
12.2 LTS	Yes	2.12
13.3 LTS	Yes	2.12
14.3 LTS	Yes	2.12
15.4 LTS	Yes	2.12
16.4 LTS	Yes	2.12

Notebook languages¶

Databricks supports multiple languages in the notebook, Privacera supports the following languages:

Language	Supported
python (%python)	Yes
SQL (%sql)	Yes
Scala (%scala)	Yes
R (%r)	Yes
hadoop fs (%fs)	Yes

Note

R language is not supported on clusters with Shared Access Mode.

Supported Databricks cluster deployment matrix¶

Here are the supported cluster types for Privacera integration with Databricks all-purpose compute clusters:

Interactive cluster¶

For interactive clusters, Privacera supports the following cluster types:

Cluster type	Supported
Standard (Scala/Python/R/SQL)	Yes
High Concurrency (Python/R/SQL)	Yes
Single Node (Scala/Python/R/SQL)	Yes

Job on new cluster¶

For jobs on new clusters, Privacera supports the following job types:

Job type	Supported
Notebook	Yes
JAR (scala/java)	Yes
spark-submit	Yes
Python	No
Python wheel	No
Delta Live Tables pipeline	No

Job on existing cluster¶

For jobs on existing clusters, Privacera supports the following job types:

Job type	Supported
Notebook	No
JAR (scala/java)	No
spark-submit	No
Delta Live Tables pipeline	No
Python wheel	No
Python	No

How it Works¶

Privacera integrates with Databricks all-purpose compute clusters using the Privacera Spark plugin, which is deployed via init scripts during cluster creation. The plugin calls the Privacera Dataserver to obtain a signature, which is subsequently authorized based on the Apache Ranger plugin.

User Identity Mapping¶

The policies in Privacera configured for the users and groups from AD/LDAP or SCIM and roles created in Privacera. These identities are mapped to the Databricks user identities as follows:

Privacera Identity	Databricks Identity	Notes
LDAP/AD/SCIM User	Email Address/ Databricks Service Principals
LDAP/AD/SCIM Group	N/A
Privacera Role	N/A

The Apache Ranger plugin, which operates as part of the Dataserver, maps the user's email address to the AD/SCIM user. The groups and roles corresponding to the user are dynamically fetched from Privacera and utilized to enforce group and role-based policies in the Databricks clusters.

We also support JWT token user-identity

Privacera Identity	JWT token user-identity	Notes
LDAP/AD/SCIM User	JWT payload user
LDAP/AD/SCIM Group	JWT payload group/scope	The user group mapping will be extracted from the JWT token payload, eliminating the need for explicit mapping for access control.
Privacera Role	N/A

Any attribute based access control (ABAC) and tag based policies configured in Privacera are enforced by the Apache Ranger plugin at runtime.

Prev topic: About Databricks Clusters - OLAC
Next topic: Prerequisites