Skip to content

Databricks User Guide

Spark Fine-grained Access Control (FGAC)

Enable View-level Access Control

  1. Edit the SparkConfig of your existing Privacera-enabled Databricks Cluster.

  2. Add the below property.

    spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable true
    
  3. Save and restart the Databricks cluster.

Apply View-level Access Control

To CREATE VIEW in Spark Plug-In, you need the permission for DATA_ADMIN.

The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.

Use Case

  • Let’s take a use case where we have 'employee_db' database and two tables inside it with below data:

    #Requires create privilege on the database enabled by default;
    create database if not exists employee_db;
    

  • Create two tables.

    #Requires create privilege on the table level;
    
    create table if not exists employee_db.employee_data(id int,userid string,country string);
    create table if not exists employee_db.country_region(country string,region string);
    

  • Insert test data.

    #Requires update privilege on the table level;
    
    insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU'); 
    insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
    

    select * from employee_db.country_region; 
    #Requires select privilege on the column level;
    
    select * from employee_db.employee_data; 
    #Requires select privilege on the column level;
    
  • Now try to create a View on top of above two tables created, we will get ERROR as below:

    create view employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
    
    Error: Error while compiling statement: 
    FAILED: HiveAccessControlException 
    Permission denied: user [emily] does not have [DATA_ADMIN] privilege on [employee_db/employee_data] (state=42000,code=40000)
    

  • Create a view policy for table on employee_db.employee_region as shown in the above image.

    Now create a policy as shown above in the image and try to execute the same query the query, it will pass through.

    Note

    Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource.

Alter View

#Requires Atler permission on the view;
ALTER VIEW employee_db.employee_region AS  select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;

Rename View

#Requires Atler permission on the view;
ALTER VIEW  employee_db.employee_region RENAME to employee_db.employee_region_renamed;

Drop View

#Requires Drop permission on the view;
DROP VIEW employee_db.employee_region_renamed;

Row Level Filter

create view if not exists employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;

select * from employee_db.employee_region;

Column Masking

select * from employee_db.employee_region;

Whitelisting for Py4J Security Manager

Certain Python methods are blacklisted on Databricks clusters to enhance security on the clusters. While trying to access such methods, you might receive the following error:

Error

py4j.security.Py4JSecurityException: … is not whitelisted”.

If you still want to access the Python classes or methods, you can add them to a whitelisting file. To whitelist classes or methods, do the following:

  1. Create a file containing a list of all the packages, class constructors or methods that should be whitelisted.

    1. For whitelisting a complete java package (including all it’s classes), add the package name ending with .* in the end.

      org.apache.spark.api.python.*
      
    2. For whitelisting constructors of the given class, add the fully qualified class name.

      org.apache.spark.api.python.PythonRDD
      
    3. For whitelisting specific methods of a given class, add the fully qualified class name followed by the method name.

      org.apache.spark.api.python.PythonRDD.runJobToPythonFile
      org.apache.spark.api.python.SerDeUtil.pythonToJava
      
  2. Once you have added all the required packages, classes and methods, the file will contain a list of commands as shown below.

    org.apache.spark.sql.SparkSession.createRDDFromTrustedPath
    org.apache.spark.api.java.JavaRDD.rdd
    org.apache.spark.rdd.RDD.isBarrier
    org.apache.spark.api.python.*
    
  3. Upload the file to a DBFS location that could be referenced from the Spark Application Configuration section.

    Suppose the whitelist.txt file contains the classes/methods to be whitelisted. Run following command to upload to Databricks.

    dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
    
  4. Add the following command to the Spark Config with reference to the DBFS file location.

    spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
    
  5. Restart your cluster.