Skip to content

Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)ΒΆ

IntroductionΒΆ

For Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC), Privacera provides seamless integration to enforce data access policies, monitor data usage, and ensure compliance with regulatory requirements. This document provides an overview of the key features, benefits, and configuration steps for integrating Databricks all-purpose compute clusters with Privacera.

Connector DetailsΒΆ

Topics Details
Integration methodology Apache Ranger Plugin
Access Tools Databricks Console, JDBC
Supported User Identities for Policies
  • LDAP/AD/SCIM Users
  • LDAP/AD/SCIM Groups
  • Privacera Roles
Data Source User Identities
  • SAML/SSO
  • Databricks Login using Email Address
  • Databricks Token
  • Databricks Service Principal
  • JWT token file in cluster

Supported Access Management FeaturesΒΆ

Feature Supported Native Using SecureView
🟒 Database Access Control Yes Yes N/A
🟒 Table Access Control Yes Yes N/A
🟒 View Access Control Yes Yes N/A
🟒 Column Access Control Yes Yes N/A
🟒 Row Access Control Yes Yes N/A
🟒 Dynamic Column Data Masking Yes Yes N/A
🟒 Dynamic Column Data Encryption Yes Yes N/A
🟒 Centralized Access Audit Yes N/A N/A
🟒 Granular Access Audit Record Yes N/A N/A
🟒 S3 Files Access control (s3a/s3n/s3) Yes Yes N/A
🟒 Azure Data Lake Access control(abfs/abfss) Yes Yes N/A
🟒 DBFS Files Access control Yes Yes N/A

Supported Databricks matrixΒΆ

Here are the supported Databricks matrices for Privacera integration with Databricks all-purpose compute clusters:

Supported runtime versionsΒΆ

Databricks have multiple runtime versions, Privacera supports the following runtime versions:

Language Supported End-of-support date
πŸ”΄ 7.3 LTS No (Limited Support)
🟒 9.1 LTS Yes
🟒 10.4 LTS Yes
🟒 11.3 LTS Yes
🟒 12.2 LTS Yes
🟒 13.3 LTS Yes
🟒 14.3 LTS Yes

Notebook languagesΒΆ

Databricks supports multiple languages in the notebook, Privacera supports the following languages:

Language Supported
🟒 python (%python) Yes
🟒 SQL (%sql) Yes
🟒 hadoop fs (%fs) Yes
🟒 R (%r) Yes
πŸ”΄ Scala (%scala) No

Supported Databricks cluster deployment matrix:ΒΆ

Here are the supported cluster types for Privacera integration with Databricks all-purpose compute clusters:

Interactive clusterΒΆ

For interactive clusters, Privacera supports the following cluster types:

Cluster type Supported
🟒 High Concurrency (Python/R/SQL) Yes
πŸ”΄ Standard (Scala/Python/R/SQL) No
πŸ”΄ Single Node (Scala/Python/R/SQL) No

Job on new clusterΒΆ

For jobs on new clusters, Privacera supports the following job types:

Job type Supported
🟒 Notebook Yes
🟒 Python Yes
🟒 Python wheel Yes
πŸ”΄ JAR (scala/java) No
πŸ”΄ spark-submit No
πŸ”΄ Delta Live Tables pipeline No

Job on existing clusterΒΆ

For jobs on existing clusters, Privacera supports the following job types:

Job type Supported
🟒 Notebook Yes
🟒 Python wheel Yes
πŸ”΄ JAR (scala/java) No
πŸ”΄ spark-submit No
πŸ”΄ Delta Live Tables pipeline No
πŸ”΄ Python No

Limitations for Access Management FeaturesΒΆ

Scala shouldn't be enabled on the cluster. Only Object Level Access Control (OLAC) is supported for scala. Refer to the OLAC documentation for more information.

How it WorksΒΆ

The Privacera integration with Databricks all-purpose compute clusters with FGAC is achieved using the Apache Ranger plugin. The Apache Ranger plugin is deployed as part of the Databricks Spark process using init scripts while creating the clusters. The plugin fetches the policies from the Privacera Policy Server and when the SQL queries are executed by the Databricks users, the plugin intercepts the queries and enforces the policies.

User Identity MappingΒΆ

The policies in Privacera configured for the users and groups from AD/LDAP or SCIM and roles created in Privacera. These identities are mapped to the Databricks user identities as follows:

Privacera Identity Databricks Identity Notes
AD/SCIM User Email Address/ Databricks Service Principals
AD/SCIM Group N/A
Privacera Role N/A

The Apache Ranger plugin which runs as part of the Databricks Spark process maps the email address of the user to the AD/SCIM user. The groups and roles corresponding to the user are dynamically fetched from Privacera and used to enforce group and roles based policies in the Databricks clusters.

We also support JWT token user-identity

Privacera Identity JWT token user-identity Notes
AD/SCIM User JWT payload user
AD/SCIM Group JWT payload group/scope The user group mapping will be extracted from the JWT token payload, eliminating the need for explicit mapping for access control.
Privacera Role N/A

Any attribute based access control (ABAC) and tag based policies configured in Privacera are enforced by the Apache Ranger plugin at runtime.

Comments