Access Management for Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC)ΒΆ
IntroductionΒΆ
For Databricks all-purpose compute clusters with Fine-Grained Access Control (FGAC), Privacera provides seamless integration to enforce data access policies, monitor data usage, and ensure compliance with regulatory requirements. This document provides an overview of the key features, benefits, and configuration steps for integrating Databricks all-purpose compute clusters with Privacera.
Connector DetailsΒΆ
Topics | Details |
---|---|
Integration methodology | Apache Ranger Plugin |
Access Tools | Databricks Console, JDBC |
Supported User Identities for Policies |
|
Data Source User Identities |
|
Supported Access Management FeaturesΒΆ
Feature | Supported | Native | Using SecureView |
---|---|---|---|
Database Access Control | Yes | Yes | N/A |
Table Access Control | Yes | Yes | N/A |
View Access Control | Yes | Yes | N/A |
Column Access Control | Yes | Yes | N/A |
Row Access Control | Yes | Yes | N/A |
Dynamic Column Data Masking | Yes | Yes | N/A |
Dynamic Column Data Encryption | Yes | Yes | N/A |
Centralized Access Audit | Yes | N/A | N/A |
Granular Access Audit Record | Yes | N/A | N/A |
S3 Files Access control (s3a/s3n/s3) | Yes | Yes | N/A |
Azure Data Lake Access control(abfs/abfss) | Yes | Yes | N/A |
DBFS Files Access control | Yes | Yes | N/A |
Supported Databricks matrixΒΆ
Here are the supported Databricks matrices for Privacera integration with Databricks all-purpose compute clusters:
Supported runtime versionsΒΆ
Databricks have multiple runtime versions, Privacera supports the following runtime versions:
Language | Supported | End-of-support date |
---|---|---|
7.3 LTS | No (Limited Support) | |
9.1 LTS | Yes | |
10.4 LTS | Yes | |
11.3 LTS | Yes | |
12.2 LTS | Yes | |
13.3 LTS | Yes | |
14.3 LTS | Yes | |
15.4 LTS | Yes |
Notebook languagesΒΆ
Databricks supports multiple languages in the notebook, Privacera supports the following languages:
Language | Supported |
---|---|
python (%python) | Yes |
SQL (%sql) | Yes |
hadoop fs (%fs) | Yes |
R (%r) | Yes |
Scala (%scala) | No |
Supported Databricks cluster deployment matrix:ΒΆ
Here are the supported cluster types for Privacera integration with Databricks all-purpose compute clusters:
Interactive clusterΒΆ
For interactive clusters, Privacera supports the following cluster types:
Cluster type | Supported |
---|---|
High Concurrency (Python/R/SQL) | Yes |
Standard (Scala/Python/R/SQL) | No |
Single Node (Scala/Python/R/SQL) | No |
Job on new clusterΒΆ
For jobs on new clusters, Privacera supports the following job types:
Job type | Supported |
---|---|
Notebook | Yes |
Python | Yes |
Python wheel | Yes |
JAR (scala/java) | No |
spark-submit | No |
Delta Live Tables pipeline | No |
Job on existing clusterΒΆ
For jobs on existing clusters, Privacera supports the following job types:
Job type | Supported |
---|---|
Notebook | Yes |
Python wheel | Yes |
JAR (scala/java) | No |
spark-submit | No |
Delta Live Tables pipeline | No |
Python | No |
Limitations for Access Management FeaturesΒΆ
Scala shouldn't be enabled on the cluster. Only Object Level Access Control (OLAC) is supported for scala. Refer to the OLAC documentation for more information.
How it WorksΒΆ
The Privacera integration with Databricks all-purpose compute clusters with FGAC is achieved using the Apache Ranger plugin. The Apache Ranger plugin is deployed as part of the Databricks Spark process using init scripts while creating the clusters. The plugin fetches the policies from the Privacera Policy Server and when the SQL queries are executed by the Databricks users, the plugin intercepts the queries and enforces the policies.
User Identity MappingΒΆ
The policies in Privacera configured for the users and groups from AD/LDAP or SCIM and roles created in Privacera. These identities are mapped to the Databricks user identities as follows:
Privacera Identity | Databricks Identity | Notes |
---|---|---|
AD/SCIM User | Email Address/ Databricks Service Principals | |
AD/SCIM Group | N/A | |
Privacera Role | N/A |
The Apache Ranger plugin which runs as part of the Databricks Spark process maps the email address of the user to the AD/SCIM user. The groups and roles corresponding to the user are dynamically fetched from Privacera and used to enforce group and roles based policies in the Databricks clusters.
We also support JWT token user-identity
Privacera Identity | JWT token user-identity | Notes |
---|---|---|
AD/SCIM User | JWT payload user | |
AD/SCIM Group | JWT payload group/scope | The user group mapping will be extracted from the JWT token payload, eliminating the need for explicit mapping for access control. |
Privacera Role | N/A |
Any attribute based access control (ABAC) and tag based policies configured in Privacera are enforced by the Apache Ranger plugin at runtime.
- Prev topic: About Databricks Clusters - FGAC
- Next topic: Prerequisites