- Platform Release 6.5
- Privacera Platform Installation
- About Privacera Manager (PM)
- Install overview
- Prerequisites
- Installation
- Default services configuration
- Component services configurations
- Access Management
- Data Server
- PolicySync
- Snowflake
- Redshift
- Redshift Spectrum
- PostgreSQL
- Microsoft SQL Server
- Databricks SQL
- RocksDB
- Google BigQuery
- Power BI
- UserSync
- Privacera Plugin
- Databricks
- Spark standalone
- Spark on EKS
- Trino Open Source
- Dremio
- AWS EMR
- AWS EMR with Native Apache Ranger
- GCP Dataproc
- Starburst Enterprise
- Privacera services (Data Assets)
- Audit Fluentd
- Grafana
- Access Request Manager (ARM)
- Ranger Tagsync
- Discovery
- Encryption & Masking
- Privacera Encryption Gateway (PEG) and Cryptography with Ranger KMS
- AWS S3 bucket encryption
- Ranger KMS
- AuthZ / AuthN
- Security
- Access Management
- Reference - Custom Properties
- Validation
- Additional Privacera Manager configurations
- CLI actions
- Debugging and logging
- Advanced service configuration
- Increase Privacera portal timeout for large requests
- Order of precedence in PolicySync filter
- Configure system properties
- PolicySync
- Databricks
- Table properties
- Upgrade Privacera Manager
- Troubleshooting
- Possible Errors and Solutions in Privacera Manager
-
- Unable to Connect to Docker
- Terminate Installation
- 6.5 Platform Installation fails with invalid apiVersion
- Ansible Kubernetes Module does not load
- Unable to connect to Kubernetes Cluster
- Common Errors/Warnings in YAML Config Files
- Delete old unused Privacera Docker images
- Unable to debug error for an Ansible task
- Unable to upgrade from 4.x to 5.x or 6.x due to Zookeeper snapshot issue
- Storage issue in Privacera UserSync & PolicySync
- Permission Denied Errors in PM Docker Installation
- Unable to initialize the Discovery Kubernetes pod
- Portal service
- Grafana service
- Audit server
- Audit Fluentd
- Privacera Plugin
-
- Possible Errors and Solutions in Privacera Manager
- How-to
- Appendix
- AWS topics
- AWS CLI
- AWS IAM
- Configure S3 for real-time scanning
- Install Docker and Docker compose (AWS-Linux-RHEL)
- AWS S3 MinIO quick setup
- Cross account IAM role for Databricks
- Integrate Privacera services in separate VPC
- Securely access S3 buckets ssing IAM roles
- Multiple AWS account support in Dataserver using Databricks
- Multiple AWS S3 IAM role support in Dataserver
- Azure topics
- GCP topics
- Kubernetes
- Microsoft SQL topics
- Snowflake configuration for PolicySync
- Create Azure resources
- Databricks
- Spark Plug-in
- Azure key vault
- Add custom properties
- Migrate Ranger KMS master key
- IAM policy for AWS controller
- Customize topic and table names
- Configure SSL for Privacera
- Configure Real-time scan across projects in GCP
- Upload custom SSL certificates
- Deployment size
- Service-level system properties
- PrestoSQL standalone installation
- AWS topics
- Privacera Platform User Guide
- Introduction to Privacera Platform
- Settings
- Data inventory
- Token generator
- System configuration
- Diagnostics
- Notifications
- How-to
- Privacera Discovery User Guide
- What is Discovery?
- Discovery Dashboard
- Scan Techniques
- Processing order of scan techniques
- Add and scan resources in a data source
- Start or cancel a scan
- Tags
- Dictionaries
- Patterns
- Scan status
- Data zone movement
- Models
- Disallowed Tags Policy
- Rules
- Types of rules
- Example rules and classifications
- Create a structured rule
- Create an unstructured rule
- Create a rule mapping
- Export rules and mappings
- Import rules and mappings
- Post-processing in real-time and offline scans
- Enable post-processing
- Example of post-processing rules on tags
- List of structured rules
- Supported scan file formats
- Data Source Scanning
- Data Inventory
- TagSync using Apache Ranger
- Compliance Workflow
- Data zones and workflow policies
- Workflow Policies
- Alerts Dashboard
- Data Zone Dashboard
- Data zone movement
- Example Workflow Usage
- Discovery health check
- Reports
- Built-in Reports
- Saved reports
- Offline reports
- Reports with the query builder
- How-to
- Privacera Encryption Guide
- Essential Privacera Encryption terminology
- Install Privacera Encryption
- Encryption Key Management
- Schemes
- Scheme Policies
- Encryption Schemes
- Presentation Schemes
- Masking schemes
- Encryption formats, algorithms, and scopes
- Deprecated encryption formats, algorithms, and scopes
- Encryption with PEG REST API
- PEG REST API on Privacera Platform
- PEG API Endpoint
- Encryption Endpoint Summary for Privacera Platform
- Authentication Methods on Privacera Platform
- Anatomy of the /protect API Endpoint on Privacera Platform
- About Constructing the datalist for protect
- About Deconstructing the datalist for unprotect
- Example of Data Transformation with /unprotect and Presentation Scheme
- Example PEG API endpoints
- /unprotect with masking scheme
- REST API Response Partial Success on Bulk Operations
- Audit Details for PEG REST API Accesses
- REST API Reference
- Make calls on behalf of another user
- Troubleshoot REST API Issues on Privacera Platform
- PEG REST API on Privacera Platform
- Encryption with Databricks, Hive, Streamsets, Trino
- Databricks UDFs for encryption and masking
- Hive UDFs
- Streamsets
- Trino UDFs
- Privacera Access Management User Guide
- Privacera Access Management
- How Polices are evaluated
- Resource policies
- Policies overview
- Creating Resource Based Policies
- Configure Policy with Attribute-Based Access Control
- Configuring Policy with Conditional Masking
- Tag Policies
- Entitlement
- Request Access
- Approve access requests
- Service Explorer
- User/Groups/Roles
- Permissions
- Reports
- Audit
- Security Zone
- Access Control using APIs
- AWS User Guide
- Overview of Privacera on AWS
- Set policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- S3 browser
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera documentation changelog
Databricks user guide
Spark Fine-grained Access Control (FGAC)
Enable View-level access control
Edit the SparkConfig of your existing Privacera-enabled Databricks Cluster.
Add the following property:
spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable true
Save and restart the Databricks cluster.
Apply View-level access control
To CREATE VIEW in Spark Plug-In, you need the permission for DATA_ADMIN.
The source table on which you are going to create a view requires DATA_ADMIN access in Ranger policy.
Use Case
Let’s take a use case where we have 'employee_db' database and two tables inside it with below data:
#Requires create privilege on the database enabled by default; create database if not exists employee_db;
Create two tables.
#Requires privilege for table creation; create table if not exists employee_db.employee_data(id int,userid string,country string); create table if not exists employee_db.country_region(country string,region string);
Insert test data.
#Requires update privilege for tables; insert into employee_db.country_region values ('US','NA'), ('CA','NA'), ('UK','UK'), ('DE','EU'), ('FR','EU'); insert into employee_db.employee_data values (1,'james','US'),(2,'john','US'), (3,'mark','UK'), (4,'sally-sales','UK'),(5,'sally','DE'), (6,'emily','DE');
#Requires select privilege for columns; select * from employee_db.country_region; select * from employee_db.employee_data;
Now try to create a View on top of above two tables created, we will get ERROR as below:
create view employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [emily] does not have [DATA_ADMIN] privilege on [employee_db/employee_data] (state=42000,code=40000)
Create a view policy for table on employee_db.employee_region as shown in the above image.
Now create a policy as shown above in the image and try to execute the same query the query, it will pass through.
Note
Granting Data_admin privileges on the resource implicitly grants Select privilege on the same resource.
Alter View
#Requires alter permission on the view; ALTER VIEW employee_db.employee_region AS select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country;
Rename View
#Requires alter permission on the view; ALTER VIEW employee_db.employee_region RENAME to employee_db.employee_region_renamed;
Drop View
#Requires Drop permission on the view; DROP VIEW employee_db.employee_region_renamed;
Row-Level Filter
create view if not exists employee_db.employee_region(userid, region) as select e.userid, cr.region from employee_db.employee_data e, employee_db.country_region cr where e.country = cr.country; select * from employee_db.employee_region;

Column Masking
select * from employee_db.employee_region;

Whitelisting for Py4J security manager
Certain Python methods are blacklisted on Databricks clusters to enhance security on the clusters. While trying to access such methods, you might receive the following error:
py4j.security.Py4JSecurityException: … is not whitelisted”
If you still want to access the Python classes or methods, you can add them to a whitelisting file. To whitelist classes or methods, do the following:
Create a file containing a list of all the packages, class constructors or methods that should be whitelisted.
For whitelisting a complete java package (including all it’s classes), add the package name ending with .* in the end.
org.apache.spark.api.python.*
For whitelisting constructors of the given class, add the fully qualified class name.
org.apache.spark.api.python.PythonRDD
For whitelisting specific methods of a given class, add the fully qualified class name followed by the method name.
org.apache.spark.api.python.PythonRDD.runJobToPythonFile org.apache.spark.api.python.SerDeUtil.pythonToJava
Once you have added all the required packages, classes and methods, the file will contain a list of commands as shown below.
org.apache.spark.sql.SparkSession.createRDDFromTrustedPath org.apache.spark.api.java.JavaRDD.rdd org.apache.spark.rdd.RDD.isBarrier org.apache.spark.api.python.*
Upload the file to a DBFS location that could be referenced from the Spark Application Configuration section.
Suppose the
whitelist.txt
file contains the classes/methods to be whitelisted. Run following command to upload to Databricks.dbfs cp whitelist.txt dbfs:/privacera/whitelist.txt
Add the following command to the Spark Config with reference to the DBFS file location.
spark.hadoop.privacera.whitelist dbfs:/privacera/whitelist.txt
Restart your cluster.
Access AWS S3 using Boto3 from Databricks
This section describes how to use the AWS SDK (Boto3) for Privacera Platform to access AWS S3 file data through a Privacera DataServer proxy.
Prerequisites
Ensure that the following prerequisites are met:
Put the iptables in the Databricks init-script.
To enable boto3 access control in your Databricks environment, add the following command to open port 8282 for outgoing connections:
sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT
Restart the Databricks cluster.
We pass the iptables command as shown below through the Privacera Manager properties in the vars.databricks.plugin.yml
file and run the update privacera manager command.
DATABRICKS_POST_PLUGIN_COMMAND_LIST: - echo "Completed Installation" - iptable command goes here
Accessing AWS S3 files
The following commands must be run in a notebook for Databricks:
Install the AWS Boto3 libraries
pip install boto3
Import the required libraries
import boto3
Fetch the DataServer certificate
Note
If SSL is enabled on the dataserver, the port is 8282.
%sh sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT dirname="/tmp/lib3" mkdir -p -- "$dirname" DS_URL="https://{DATASERVER_EC2_OR_K8S_LB_URL}:{DAS_SSL_PORT}" #Sample url as shown below #DS_URL="https://10.999.99.999:8282" DS_CERT_FILE="$dirname/ds.pem" curl -k -H "connection:close" -o "${DS_CERT_FILE}" "${DS_URL}/services/certificate"
Access the AWS S3 files
def check_s3_file_exists(bucket, key, access_key, secret_key, endpoint_url, dataserver_cert, region_name): exec_status = False access_key = access_key secret_key = secret_key endpoint_url = endpoint_url try: s3 = boto3.resource(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name, verify=dataserver_cert) print(s3.Object(bucket_name=bucket, key=key).get()['Body'].read().decode('utf-8')) exec_status = True except Exception as e: print("Got error: {}".format(e)) finally: return exec_status def read_s3_file(bucket, key, access_key, secret_key, endpoint_url, dataserver_cert, region_name): exec_status = False access_key = access_key secret_key = secret_key endpoint_url = endpoint_url try: s3 = boto3.client(service_name='s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url=endpoint_url, region_name=region_name, verify=dataserver_cert) obj = s3.get_object(Bucket=bucket, Key=key) print(obj['Body'].read().decode('utf-8')) exec_status = True except Exception as e: print("Got error: {}".format(e)) finally: return exec_status readFilePath = "file data/data/format=txt/sample/sample_small.txt" bucket = "infraqa-test" #platform access_key = "${privacera_access_key}" secret_key = "${privacera_secret_key}" endpoint_url = "https://${DATASERVER_EC2_OR_K8S_LB_URL}:${DAS_SSL_PORT}" #sample value as shown below endpoint_url = "https://10.999.99.999:8282" priv_dataserver_cert = "/tmp/lib3/ds.pem" region_name = "us-east-1" print(f"got file===== {readFilePath} ============= bucket= {bucket}") status = check_s3_file_exists(bucket, readFilePath, access_key, secret_key, endpoint_url, priv_dataserver_cert, region_name)
Access Azure file using Azure SDK from Databricks
This section describes how to use the Azure SDK for Privacera Platform to access Azure DataStorage/Datalake file data through a Privacera DataServer proxy.
prerequisites
Ensure that the following prerequisites are met:
Put the iptables in the Databricks init-script.
To enable boto3 access control in your Databricks environment, add the following command to open port 8282 for outgoing connections:
sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT
Restart the Databricks cluster.
We pass the iptables command as shown below through the Privacera Manager properties in the vars.databricks.plugin.yml
file and run the update privacera manager command.
DATABRICKS_POST_PLUGIN_COMMAND_LIST: - echo "Completed Installation" - iptable command goes here
Accessing Azure files
The following commands must be run in a notebook for Databricks:
Install the Azure SDK libraries
pip install azure-storage-file-datalake
Import the required libraries
import os, uuid, sys from azure.storage.filedatalake import DataLakeServiceClient from azure.core._match_conditions import MatchConditions from azure.storage.filedatalake._models import ContentSettings
Fetch the DataServer certificate
Note
If SSL is enabled on the dataserver, the port is 8282.
sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8282 -j ACCEPT dirname="/tmp/lib3" mkdir -p -- "$dirname" DS_URL="https://{DATASERVER_EC2_OR_K8S_LB_URL}:{DAS_SSL_PORT}" #Sample url as shown below #DS_URL="https://10.999.99.999:8282" DS_CERT_FILE="$dirname/ds.pem" curl -k -H "connection:close" -o "${DS_CERT_FILE}" "${DS_URL}/services/certificate"
Initialize the account storage through connection string method
def initialize_storage_account_connect_str(my_connection_string): try: global service_client print(my_connection_string) os.environ['REQUESTS_CA_BUNDLE'] = '/tmp/lib3/ds.pem' service_client = DataLakeServiceClient.from_connection_string(conn_str=my_connection_string, headers={'x-ms-version': '2020-02-10'}) except Exception as e: print(e)
Prepare the connection string
def prepare_connect_str(): try: connect_str = "DefaultEndpointsProtocol=https;AccountName=${privacera_access_key}-{storage_account_name};AccountKey=${base64_encoded_value_of(privacera_access_key|privacera_secret_key)};BlobEndpoint=https://${DATASERVER_EC2_OR_K8S_LB_URL}:${DAS_SSL_PORT};" # sample value is shown below #connect_str = "DefaultEndpointsProtocol=https;AccountName=MMTTU5Njg4Njk0MDAwA6amFpLnBhdGVsOjE6MTY1MTU5Njg4Njk0MDAw==-pqadatastorage;AccountKey=TVRVNUTU5Njg4Njk0MDAwTURBd01UQTZhbUZwTG5CaGRHVnNPakU2TVRZMU1URTJOVGcyTnpVMTU5Njg4Njk0MDAwVZwLzNFbXBCVEZOQWpkRUNxNmpYcjTU5Njg4Njk0MDAwR3Q4N29UNFFmZWpMOTlBN1M4RkIrSjdzSE5IMFZic0phUUcyVHTU5Njg4Njk0MDAwUxnPT0=;BlobEndpoint=https://10.999.99.999:8282;" return connect_str except Exception as e: print(e)
Define a sample access method to get Azure file and directories
def list_directory_contents(connect_str): try: initialize_storage_account_connect_str(connect_str) file_system_client = service_client.get_file_system_client(file_system="{storage_container_name}") #sample values as shown below #file_system_client = service_client.get_file_system_client(file_system="infraqa-test") paths = file_system_client.get_paths(path="{directory_path}") #sample values as shown below #paths = file_system_client.get_paths(path="file data/data/format=csv/sample/") for path in paths: print(path.name + '\n') except Exception as e: print(e)
To verify that the proxy is functioning, call the access methods
connect_str = prepare_connect_str() list_directory_contents(connect_str)