Skip to main content

Privacera Documentation

Whitelist py4j security manager via S3 or DBFS

To enforce security, certain Python methods are blacklisted by Databricks. However, Privacera makes use of these methods.

The following error shows default blacklisting security:

py4j.security.Py4JSecurityException: … is not whitelisted”

If you still want to access the Python classes or methods, you can add them to a whitelisting file.

Note

Whitelisting changes Databricks default security. This whitelisting is not absolutely required and depends entirely on your own security policies.

The whitelisting.txt file can be stored on either S3 or DBFS. In either case, it's location is configured in the Databricks console.

  1. Create a file called whitelisting.txt containing a list of all the packages, class constructors, or methods that should be whitelisted.

    • For whitelisting a complete java package (including all its classes), add the package name ending with .*. For example:

      org.apache.spark.api.python.*
    • For whitelisting constructors of a given class, add the fully qualified class name. For example:

      org.apache.spark.api.python.PythonRDD
    • For whitelisting specific methods of a given class, add the fully qualified class name followed by the method name. For example:

      org.apache.spark.api.python.PythonRDD.runJobToPythonFile
      org.apache.spark.api.python.SerDeUtil.pythonToJava
    • Full example of the above constructs:

      org.apache.spark.sql.SparkSession.createRDDFromTrustedPath
      org.apache.spark.api.java.JavaRDD.rdd
      org.apache.spark.rdd.RDD.isBarrier
      org.apache.spark.api.python.*
  2. Upload the file to an S3 or DBFS location that is accessible from Databricks's Spark Application Configuration page. Suppose the whitelist.txt file contains the classes and methods to be whitelisted.

    • For Databricks, to upload the whitelisting file, run following command:

      dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
    • To upload the whitelisting file to S3, use the S3 console to upload it to the desired location.

    • For either S3 or Databricks, in Databricks' Spark Application Configuration, add the full path to the uploaded whitelisting.txt file location.

    • This example is for a whitelisting.txt file stored in DBFS. You could instead specify an S3 path.

      spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
  3. Restart your Databricks cluster.