Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)¶

Introduction¶

For Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC), Privacera provides seamless integration to enforce data access policies, monitor data usage, and ensure compliance with regulatory requirements. This document provides an overview of the key features, benefits, and configuration steps for integrating Databricks all-purpose compute clusters with Privacera.

Connector Details¶

Topics	Details
Integration methodology	Apache Ranger Plugin
Access Tools	Databricks Console, JDBC
Supported User Identities for Policies	LDAP/AD/SCIM Users LDAP/AD/SCIM Groups Privacera Roles
Data Source User Identities	SAML/SSO Databricks Login using Email Address Databricks Token Databricks Service Principal JWT token file in cluster

Supported Access Management Features¶

Feature	Supported	Native	Using SecureView
Database Access Control	Yes	Yes	N/A
Table Access Control	Yes	Yes	N/A
View Access Control	Yes	Yes	N/A
Column Access Control	Yes	Yes	N/A
Row Access Control	Yes	Yes	N/A
Dynamic Column Data Masking	Yes	Yes	N/A
Dynamic Column Data Encryption	Yes	Yes	N/A
Centralized Access Audit	Yes	N/A	N/A
Granular Access Audit Record	Yes	N/A	N/A
S3 Files Access control (s3a/s3n/s3)	Yes	Yes	N/A
Azure Data Lake Access control(abfs/abfss)	Yes	Yes	N/A
DBFS Files Access control	Yes	Yes	N/A

Supported Databricks Matrix¶

Here are the supported Databricks matrices for Privacera integration with Databricks all-purpose compute clusters:

Supported Runtime Versions¶

Databricks offers multiple runtime versions. Privacera supports the following runtime versions:

Language	Supported	Scala Version
10.4 LTS	Yes	2.12
11.3 LTS	Yes	2.12
12.2 LTS	Yes	2.12
13.3 LTS	Yes	2.12
14.3 LTS	Yes	2.12
15.4 LTS	Yes	2.12
16.4 LTS	Yes	2.12

Notebook Languages¶

Databricks supports multiple languages in the notebook, Privacera supports the following languages:

Language	Supported
python (%python)	Yes
SQL (%sql)	Yes
hadoop fs (%fs)	Yes
R (%r)	Yes
Scala (%scala)	No

Note

Databricks has deprecated support for R language. Refer to Databricks document Shared Access Mode for additional information.

Supported Databricks Cluster Deployment Matrix:¶

Here are the supported cluster types for Privacera integration with Databricks all-purpose compute clusters:

Interactive Cluster¶

For interactive clusters, Privacera supports the following cluster types:

Cluster type	Supported
High Concurrency (Python/R/SQL)	Yes
Standard (Scala/Python/R/SQL)	No
Single Node (Scala/Python/R/SQL)	No

Job on New Cluster¶

For jobs on new clusters, Privacera supports the following job types:

Job type	Supported
Notebook	Yes
Python	Yes
Python wheel	Yes
JAR (scala/java)	No
spark-submit	No
Delta Live Tables pipeline	No

Job on Existing Cluster¶

For jobs on existing clusters, Privacera supports the following job types:

Job type	Supported
Notebook	Yes
Python wheel	Yes
JAR (scala/java)	No
spark-submit	No
Delta Live Tables pipeline	No
Python	No

Limitations for Access Management Features¶

Scala shouldn't be enabled on the cluster. Only Object Level Access Control (OLAC) is supported for scala. Refer to the OLAC documentation for more information.
Databricks has limitation on supporting R in certain clusters. Refer to the Databricks documentation for more information.

How it Works¶

The Privacera integration with Databricks all-purpose compute clusters with FGAC is achieved using the Apache Ranger plugin. The Apache Ranger plugin is deployed as part of the Databricks Spark process using init scripts while creating the clusters. The plugin fetches the policies from the Privacera Policy Server and when the SQL queries are executed by the Databricks users, the plugin intercepts the queries and enforces the policies.

User Identity Mapping¶

The policies in Privacera configured for the users and groups from AD/LDAP or SCIM and roles created in Privacera. These identities are mapped to the Databricks user identities as follows:

Privacera Identity	Databricks Identity	Notes
AD/SCIM User	Email Address/ Databricks Service Principals
AD/SCIM Group	N/A
Privacera Role	N/A

The Apache Ranger plugin which runs as part of the Databricks Spark process maps the email address of the user to the AD/SCIM user. The groups and roles corresponding to the user are dynamically fetched from Privacera and used to enforce group and roles based policies in the Databricks clusters.

Privacera connector for Databricks Cluster also supports JWT token user-identity

Privacera Identity	JWT token user-identity	Notes
AD/SCIM User	JWT payload user
AD/SCIM Group	JWT payload group/scope	The user group mapping will be extracted from the JWT token payload, eliminating the need for explicit mapping for access control.
Privacera Role	N/A

Any attribute based access control (ABAC) and tag based policies configured in Privacera are enforced by the Apache Ranger plugin at runtime.

Prev topic: About Databricks Clusters - FGAC
Next topic: Prerequisites