Skip to content

Whitelist py4j Security Manager via S3 or DBFS in Databricks FGAC

To uphold security measures, certain Python methods are blacklisted by Databricks. This ensures that unauthorized python libraries can't bypass the cluster security and access the underlying IAM role of the compute nodes. However, if you wish to access these classes or methods, you may add them to a whitelisting file.

Whitelisting alters Databricks' default security. Ensure this is aligned with your security policies.

  1. Create the whitelisting.txt File:

    • This file should contain a list of packages, class constructors, or methods that you intend to whitelist.

    • Example:

      Python
      1
      2
      3
      4
      5
      6
      7
      8
      9
      # Whitelist an entire package (including all its classes) 
      org.apache.spark.api.python.*
      
      # Whitelist specific constructors
      org.apache.spark.api.python.PythonRDD
      
      # Whitelist specific methods
      org.apache.spark.api.python.PythonRDD.runJobToPythonFile
      org.apache.spark.api.python.SerDeUtil.pythonToJava
      

  2. Upload the whitelisting.txt File:

    • To DBFS, run the following command:

      Text Only
      dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
      

    • To S3, use the S3 console to upload the file to the desired location.

  3. Update Databricks Spark Configuration:

    • In Databricks, navigate to the Spark Configuration and specify the location of the whitelisting file:

    • For DBFS:

      Text Only
      spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
      

    • For S3:

      Text Only
      spark.hadoop.privacera.whitelist s3://your-bucket/whitelist.txt
      

  4. Restart Your Databricks Cluster:

    • After making these changes, please restart your Databricks cluster for the new whitelist to take effect.

Comments