- Platform Release 6.5
- Privacera Platform Installation
- Privacera Platform User Guide
- Privacera Discovery User Guide
- Privacera Encryption Guide
- Privacera Access Management User Guide
- AWS User Guide
- Overview of Privacera on AWS
- Configure policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera Platform documentation changelog
Databricks user guide
Spark Fine-grained Access Control (FGAC)
Enable View-level access control
Edit the SparkConfig of your existing Privacera-enabled Databricks Cluster.
Add the following property:
spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable true
Save and restart the Databricks cluster.
Apply View-level access control
To CREATE VIEW in Spark Plug-In, you need the permission for DATA_ADMIN.
The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.
Use Case
Let’s take a use case where we have 'employee_db' database and two tables inside it with below data:
#Requires create privilege on the database enabled by default; create database if not exists employee_db;
Create two tables.
#Requires privilege for table creation; create table if not exists employee_db.employee_data(id int,userid string,country string); create table if not exists employee_db.country_region(country string,region string);
Insert test data.
#Requires update privilege for tables; insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU'); insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
#Requires select privilege for columns; select * from employee_db.country_region; select * from employee_db.employee_data;
Now try to create a View on top of above two tables created, we will get ERROR as below:
create view employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [emily] does not have [DATA_ADMIN] privilege on [employee_db/employee_data] (state=42000,code=40000)
Create a view policy for table on employee_db.employee_region as shown in the above image.
Now create a policy as shown above in the image and try to execute the same query the query, it will pass through.
Note
Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource.
Alter View
#Requires alter permission on the view; ALTER VIEW employee_db.employee_region AS select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
Rename View
#Requires alter permission on the view; ALTER VIEW employee_db.employee_region RENAME to employee_db.employee_region_renamed;
Drop View
#Requires Drop permission on the view; DROP VIEW employee_db.employee_region_renamed;
Row-Level Filter
create view if not exists employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country; select * from employee_db.employee_region;

Column Masking
select * from employee_db.employee_region;

Access AWS S3 using Boto3 from Databricks
This section describes how to use the AWS SDK (Boto3) for Privacera Platform to access AWS S3 file data through a Privacera DataServer proxy.
Prerequisites
Ensure that the following prerequisites are met:
Put the iptables in the Databricks init-script.
To enable boto3 access control in your Databricks environment, add the following command to open port 8282 for outgoing connections:
sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT
Restart the Databricks cluster.
We pass the iptables command as shown below through the Privacera Manager properties in the vars.databricks.plugin.yml
file and run the update privacera manager command.
DATABRICKS_POST_PLUGIN_COMMAND_LIST: - echo "Completed Installation" - iptable command goes here
Accessing AWS S3 files
The following commands must be run in a notebook for Databricks:
Install the AWS Boto3 libraries
pip install boto3
Import the required libraries
import boto3
Fetch the DataServer certificate
If SSL is enabled on the dataserver, the port is 8282.
%sh sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT dirname="/tmp/lib3" mkdir -p -- "$dirname" DS_URL="https://{DATASERVER_EC2_OR_K8S_LB_URL}:{DAS_SSL_PORT}" #Sample url as shown below #DS_URL="https://10.999.99.999:8282" DS_CERT_FILE="$dirname/ds.pem" curl -k -H "connection:close" -o "${DS_CERT_FILE}" "${DS_URL}/services/certificate"
Access the AWS S3 files
def check_s3_file_exists(bucket, key, access_key, secret_key, endpoint_url, dataserver_cert, region_name): exec_status = False access_key = access_key secret_key = secret_key endpoint_url = endpoint_url try: s3 = boto3.resource(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name, verify=dataserver_cert) print(s3.Object(bucket_name=bucket, key=key).get()['Body'].read().decode('utf-8')) exec_status = True except Exception as e: print("Got error: {}".format(e)) finally: return exec_status def read_s3_file(bucket, key, access_key, secret_key, endpoint_url, dataserver_cert, region_name): exec_status = False access_key = access_key secret_key = secret_key endpoint_url = endpoint_url try: s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name, verify=dataserver_cert) obj = s3.get_object(Bucket=bucket, Key=key) print(obj['Body'].read().decode('utf-8')) exec_status = True except Exception as e: print("Got error: {}".format(e)) finally: return exec_status readFilePath = "file data/data/format=txt/sample/sample_small.txt" bucket = "infraqa-test" #platform access_key = "${privacera_access_key}" secret_key = "${privacera_secret_key}" endpoint_url = "https://${DATASERVER_EC2_OR_K8S_LB_URL}:${DAS_SSL_PORT}" #sample value as shown below endpoint_url = "https://10.999.99.999:8282" priv_dataserver_cert = "/tmp/lib3/ds.pem" region_name = "us-east-1" print(f"got file===== {readFilePath} ============= bucket= {bucket}") status = check_s3_file_exists(bucket, readFilePath, access_key, secret_key, endpoint_url, priv_dataserver_cert, region_name)
Access Azure file using Azure SDK from Databricks
This section describes how to use the Azure SDK for Privacera Platform to access Azure DataStorage/Datalake file data through a Privacera DataServer proxy.
prerequisites
Ensure that the following prerequisites are met:
Put the iptables in the Databricks init-script.
To enable boto3 access control in your Databricks environment, add the following command to open port 8282 for outgoing connections:
sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT
Restart the Databricks cluster.
We pass the iptables command as shown below through the Privacera Manager properties in the vars.databricks.plugin.yml
file and run the update privacera manager command.
DATABRICKS_POST_PLUGIN_COMMAND_LIST: - echo "Completed Installation" - iptable command goes here
Accessing Azure files
The following commands must be run in a notebook for Databricks:
Install the Azure SDK libraries
pip install azure-storage-file-datalake
Import the required libraries
import os, uuid, sys from azure.storage.filedatalake import DataLakeServiceClient from azure.core._match_conditions import MatchConditions from azure.storage.filedatalake._models import ContentSettings
Fetch the DataServer certificate
If SSL is enabled on the dataserver, the port is 8282.
sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT dirname="/tmp/lib3" mkdir -p -- "$dirname" DS_URL="https://{DATASERVER_EC2_OR_K8S_LB_URL}:{DAS_SSL_PORT}" #Sample url as shown below #DS_URL="https://10.999.99.999:8282" DS_CERT_FILE="$dirname/ds.pem" curl -k -H "connection:close" -o "${DS_CERT_FILE}" "${DS_URL}/services/certificate"
Initialize the account storage through connection string method
def initialize_storage_account_connect_str(my_connection_string): try: global service_client print(my_connection_string) os.environ['REQUESTS_CA_BUNDLE'] = '/tmp/lib3/ds.pem' service_client = DataLakeServiceClient.from_connection_string(conn_str=my_connection_string, headers={'x-ms-version': '2020-02-10'}) except Exception as e: print(e)
Prepare the connection string
def prepare_connect_str(): try: connect_str = "DefaultEndpointsProtocol=https;AccountName=${privacera_access_key}-{storage_account_name};AccountKey=${base64_encoded_value_of(privacera_access_key|privacera_secret_key)};BlobEndpoint=https://${DATASERVER_EC2_OR_K8S_LB_URL}:${DAS_SSL_PORT};" # sample value is shown below #connect_str = "DefaultEndpointsProtocol=https;AccountName=MMTTU5Njg4Njk0MDAwA6amFpLnBhdGVsOjE6MTY1MTU5Njg4Njk0MDAw==-pqadatastorage;AccountKey=TVRVNUTU5Njg4Njk0MDAwTURBd01UQTZhbUZwTG5CaGRHVnNPakU2TVRZMU1URTJOVGcyTnpVMTU5Njg4Njk0MDAwVZwLzNFbXBCVEZOQWpkRUNxNmpYcjTU5Njg4Njk0MDAwR3Q4N29UNFFmZWpMOTlBN1M4RkIrSjdzSE5IMFZic0phUUcyVHTU5Njg4Njk0MDAwUxnPT0=;BlobEndpoint=https://10.999.99.999:8282;" return connect_str except Exception as e: print(e)
Define a sample access method to get Azure file and directories
def list_directory_contents(connect_str): try: initialize_storage_account_connect_str(connect_str) file_system_client = service_client.get_file_system_client(file_system="{storage_container_name}") #sample values as shown below #file_system_client = service_client.get_file_system_client(file_system="infraqa-test") paths = file_system_client.get_paths(path="{directory_path}") #sample values as shown below #paths = file_system_client.get_paths(path="file data/data/format=csv/sample/") for path in paths: print(path.name + '\n') except Exception as e: print(e)
To verify that the proxy is functioning, call the access methods
connect_str = prepare_connect_str() list_directory_contents(connect_str)
Whitelist py4j security manager via S3 or DBFS
To enforce security, certain Python methods are blacklisted by Databricks. However, Privacera makes use of these methods.
The following error shows default blacklisting security:
py4j.security.Py4JSecurityException: … is not whitelisted”
If you still want to access the Python classes or methods, you can add them to a whitelisting file.
Note
Whitelisting changes Databricks default security. This whitelisting is not absolutely required and depends entirely on your own security policies.
Steps for whitelisting via S3 or DBFS
The whitelisting.txt
file can be stored on either S3 or DBFS. In either case, it's location is configured in the Databricks console.
Create a file called
whitelisting.txt
containing a list of all the packages, class constructors or methods that should be whitelisted.For whitelisting a complete java package (including all its classes), add the package name ending with
.*
. For example:org.apache.spark.api.python.*
For whitelisting constructors of a given class, add the fully qualified class name. For example:
org.apache.spark.api.python.PythonRDD
For whitelisting specific methods of a given class, add the fully qualified class name followed by the method name. For example
org.apache.spark.api.python.PythonRDD.runJobToPythonFile org.apache.spark.api.python.SerDeUtil.pythonToJava
Full example of the above constructs:
org.apache.spark.sql.SparkSession.createRDDFromTrustedPath org.apache.spark.api.java.JavaRDD.rdd org.apache.spark.rdd.RDD.isBarrier org.apache.spark.api.python.*
Upload the file to an S3 or DBFS location that is accessible from Databricks's Spark Application Configuration page.
Suppose the
whitelist.txt
file contains the classes and methods to be whitelisted.- <item>
Example: for Databricks, to upload the whitelisting file, run following command.
dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
</item><item>To upload the whitelisting file to S3, use the S3 console to upload it to the desired location.
</item>For either S3 or Databricks, in Databricks' Spark Application Configuration, add the full path to the uploaded
whitelisting.txt
file location.This example is for a
whitelisting.txt
file stored in DBFS. You could instead specify an S3 path.spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
Restart your Databricks cluster.