- Platform Release 6.5
- Privacera Platform Release 6.5
- Enhancements and updates in Privacera Access Management 6.5 release
- Enhancements and updates in Privacera Discovery 6.5 release
- Enhancements and updates in Privacera Encryption 6.5 release
- Deprecation of older version of PolicySync
- Upgrade Prerequisites
- Supported versions of third-party systems
- Documentation changelog
- Known Issues 6.5
- Platform - Supported Versions of Third-Party Systems
- Platform Support Policy and End-of-Support Dates
- Privacera Platform Release 6.5
- Privacera Platform Installation
- About Privacera Manager (PM)
- Install overview
- Prerequisites
- Installation
- Default services configuration
- Component services configurations
- Access Management
- Data Server
- UserSync
- Privacera Plugin
- Databricks
- Spark standalone
- Spark on EKS
- Portal SSO with PingFederate
- Trino Open Source
- Dremio
- AWS EMR
- AWS EMR with Native Apache Ranger
- GCP Dataproc
- Starburst Enterprise
- Privacera services (Data Assets)
- Audit Fluentd
- Grafana
- Ranger Tagsync
- Discovery
- Encryption & Masking
- Privacera Encryption Gateway (PEG) and Cryptography with Ranger KMS
- AWS S3 bucket encryption
- Ranger KMS
- AuthZ / AuthN
- Security
- Access Management
- Reference - Custom Properties
- Validation
- Additional Privacera Manager configurations
- Upgrade Privacera Manager
- Troubleshooting
- How to validate installation
- Possible Errors and Solutions in Privacera Manager
- Unable to Connect to Docker
- Terminate Installation
- 6.5 Platform Installation fails with invalid apiVersion
- Ansible Kubernetes Module does not load
- Unable to connect to Kubernetes Cluster
- Common Errors/Warnings in YAML Config Files
- Delete old unused Privacera Docker images
- Unable to debug error for an Ansible task
- Unable to upgrade from 4.x to 5.x or 6.x due to Zookeeper snapshot issue
- Storage issue in Privacera UserSync & PolicySync
- Permission Denied Errors in PM Docker Installation
- Unable to initialize the Discovery Kubernetes pod
- Portal service
- Grafana service
- Audit server
- Audit Fluentd
- Privacera Plugin
- How-to
- Appendix
- AWS topics
- AWS CLI
- AWS IAM
- Configure S3 for real-time scanning
- Install Docker and Docker compose (AWS-Linux-RHEL)
- AWS S3 MinIO quick setup
- Cross account IAM role for Databricks
- Integrate Privacera services in separate VPC
- Securely access S3 buckets ssing IAM roles
- Multiple AWS account support in Dataserver using Databricks
- Multiple AWS S3 IAM role support in Dataserver
- Azure topics
- GCP topics
- Kubernetes
- Microsoft SQL topics
- Snowflake configuration for PolicySync
- Create Azure resources
- Databricks
- Spark Plug-in
- Azure key vault
- Add custom properties
- Migrate Ranger KMS master key
- IAM policy for AWS controller
- Customize topic and table names
- Configure SSL for Privacera
- Configure Real-time scan across projects in GCP
- Upload custom SSL certificates
- Deployment size
- Service-level system properties
- PrestoSQL standalone installation
- AWS topics
- Privacera Platform User Guide
- Introduction to Privacera Platform
- Settings
- Data inventory
- Token generator
- System configuration
- Diagnostics
- Notifications
- How-to
- Privacera Discovery User Guide
- What is Discovery?
- Discovery Dashboard
- Scan Techniques
- Processing order of scan techniques
- Add and scan resources in a data source
- Start or cancel a scan
- Tags
- Dictionaries
- Patterns
- Scan status
- Data zone movement
- Models
- Disallowed Tags policy
- Rules
- Types of rules
- Example rules and classifications
- Create a structured rule
- Create an unstructured rule
- Create a rule mapping
- Export rules and mappings
- Import rules and mappings
- Post-processing in real-time and offline scans
- Enable post-processing
- Example of post-processing rules on tags
- List of structured rules
- Supported scan file formats
- Data Source Scanning
- Data Inventory
- TagSync using Apache Ranger
- Compliance Workflow
- Data zones and workflow policies
- Workflow Policies
- Alerts Dashboard
- Data Zone Dashboard
- Data zone movement
- Workflow policy use case example
- Discovery Health Check
- Reports
- How-to
- Privacera Encryption Guide
- Overview of Privacera Encryption
- Install Privacera Encryption
- Encryption Key Management
- Schemes
- Encryption with PEG REST API
- Privacera Encryption REST API
- PEG API endpoint
- PEG REST API encryption endpoints
- PEG REST API authentication methods on Privacera Platform
- Common PEG REST API fields
- Construct the datalist for the /protect endpoint
- Deconstruct the response from the /unprotect endpoint
- Example data transformation with the /unprotect endpoint and presentation scheme
- Example PEG API endpoints
- /authenticate
- /protect with encryption scheme
- /protect with masking scheme
- /protect with both encryption and masking schemes
- /unprotect without presentation scheme
- /unprotect with presentation scheme
- /unprotect with masking scheme
- REST API response partial success on bulk operations
- Audit details for PEG REST API accesses
- Make encryption API calls on behalf of another user
- Troubleshoot REST API Issues on Privacera Platform
- Privacera Encryption REST API
- Encryption with Databricks, Hive, Streamsets, Trino
- Databricks UDFs for encryption and masking on PrivaceraPlatform
- Hive UDFs for encryption on Privacera Platform
- StreamSets Data Collector (SDC) and Privacera Encryption on Privacera Platform
- Trino UDFs for encryption and masking on Privacera Platform
- Privacera Access Management User Guide
- Privacera Access Management
- How Polices are evaluated
- Resource policies
- Policies overview
- Creating Resource Based Policies
- Configure Policy with Attribute-Based Access Control
- Configuring Policy with Conditional Masking
- Tag Policies
- Entitlement
- Service Explorer
- Users, groups, and roles
- Permissions
- Reports
- Audit
- Security Zone
- Access Control using APIs
- AWS User Guide
- Overview of Privacera on AWS
- Configure policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera Platform documentation changelog
AWS EMR
This topic shows how to configure AWS EMR with Privacera using Privacera Manager.
Configuration
SSH to the instance as USER.
Run the following commands.
cd ~/privacera/privacera-manager cp config/sample-vars/vars.emr.yml config/custom-vars/ vi config/custom-vars/vars.emr.yml
Edit the following properties.
Property
Description
Example
EMR_ENABLE
Enable EMR template creation.
true
EMR_CLUSTER_NAME
Define a unique name for the EMR cluster.
Privacera-EMR
EMR_CREATE_SG
Set this to true if you don't have existing security groups and want Privacera Manager to take care of adding security group creation steps in the EMR CF template.
false
EMR_MASTER_SG_ID
If EMR_CREATE_SG is false, set this property. Security Group ID for EMR Master Node Group.
sg-xxxxxxx
EMR_SLAVE_SG_ID
If EMR_CREATE_SG is false, set this property. Security Group ID for EMR Slave Node Group.
sg-xxxxxxx
EMR_SERVICE_ACCESS_SG_ID
If EMR_CREATE_SG is false, set this property. Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a Private Network.
sg-xxxxxxx
EMR_SG_VPC_ID
If EMR_CREATE_SG is true, set this property. VPC ID in which you want to create the EMR Cluster.
vpc-xxxxxxxxxxx
EMR_MASTER_SG_NAME
If EMR_CREATE_SG is true, set this property. Security Group Name for EMR Master Node Group. The security group name will be added to the
emr-template.json
.priv-master-sg
EMR_SLAVE_SG_NAME
If EMR_CREATE_SG is true, set this property. Security Group Name for EMR Slave Node Group. The security group name will be added to the
emr-template.json
.priv-slave-sg
EMR_SERVICE_ACCESS_SG_NAME
If EMR_CREATE_SG is true, set this property. Security Group Name for EMR ServiceAccessSecurity. The security group name will be added to the
emr-template.json
. Fill this property only if you are creating EMR in a Private Network.priv-private-sg
EMR_SUBNET_ID
Subnet ID
EMR_KEYPAIR
An existing EC2 key pair to SSH into the master node of the cluster.
privacera-test-pair
EMR_EC2_MARKET_TYPE
Set market type as SPOT or ON_DEMAND.
SPOT
EMR_EC2_INSTANCE_TYPE
Set the instance type. Instances can be of different types such as m5.xlarge, r5.xlarge and so on.
m5.large
EMR_MASTER_NODE_COUNT
Node count for Master. The number of nodes can be 1, 2 and so on.
1
EMR_CORE_NODE_COUNT
Node count for Core. The number of cores can be 1, 2 and so on.
1
EMR_VERSION
Version of EMR.
emr-x.xx.x
EMR_EC2_DOMAIN
Domain used by the nodes. It depends on EMR Region, for example, ".ec2.internal" is for us-east-1.
.ec2.internal
EMR_USE_STS_REGIONAL_ENDPOINTS
Set the property to enable/disable regional endpoints for S3 requests.
Default value is
false
.true
EMR_TERMINATION_PROTECT
Set to enable/disable termination protection.
true
EMR_LOGS_PATH
S3 location for storing EMR logs.
s3://privacera-logs-bucket/
EMR_KERBEROS_ENABLE
Set to true if you want to enable kerberization on EMR.
false
EMR_KDC_ADMIN_PASSWORD
If EMR_KERBEROS_ENABLE is true, set this property. The password used within the cluster for the kadmin service.
EMR_CROSS_REALM_PASSWORD
If EMR_KERBEROS_ENABLE is true, set this property. The cross-realm trust principal password, which must be identical across realms.
EMR_SECURITY_CONFIG
Name of the Security Configurations created for EMR. This can be a pre-created configuration, or Privacera Manager can generate a template through which you can create this configuration.
EMR_KERB_TICKET_LIFETIME
Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire.
EMR_KERB_TICKET_LIFETIME: 24
EMR_KERB_REALM
Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The Kerberos realm name for the other realm in the trust relationship.
EMR_KERB_DOMAIN
Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The domain name of the other realm in the trust relationship.
EMR_KERB_ADMIN_SERVER
Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The fully qualified domain name (FQDN) and an optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used.
EMR_KERB_KDC_SERVER
Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The fully qualified domain name (FQDN) and an optional port for the KDC in the other realm. If a port is not specified, 88 is used.
EMR_AWS_ACCT_ID
AWS Account ID where EMR Cluster resides
9999999
EMR_DEFAULT_ROLE
Default role attached to EMR Cluster for performing cluster-related activities. This should be a pre-created role.
EMR_DefaultRole
EMR_ROLE_FOR_CLUSTER_NODES
The IAM Role will be attached to each node in the EMR Cluster.
This should have only minimal permissions for downloading the
privacera_cust_conf.zip
and basic EMR capabilities. It can be an existing one, if not, you can use the IAM role CF template to generate it after the Privacera Manager update.restricted_node_role
EMR_USE_SINGLE_ROLE_FOR_APPS
If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. Create a Single IAM Role that will be used by All EMR Applications.
true
EMR_ROLE_FOR_APPS
If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. IAM Role name which will be used by all EMR Apps
app_data_access_role
EMR_ROLE_FOR_SPARK
If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. Create multiple IAM Roles to be used by specific applications. Set EMR_USE_SINGLE_ROLE_FOR_APPS to be false. IAM Role name which will be used by Spark Application (Dataserver) for data access.
spark_data_access_role
EMR_ROLE_FOR_HIVE
If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. IAM Role name which will be used by Hive Application for data access.
hive_data_access_role
EMR_ROLE_FOR_PRESTO
If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. IAM Role name which will be used by Presto Application for data access.
presto_data_access_role
EMR_HIVE_METASTORE
Metastore type. e.g. "glue", "hive" (For external hive-metastore)
glue
EMR_HIVE_METASTORE_PATH
S3 location for hive metastore
s3://hive-warehouse
EMR_HIVE_METASTORE_CONNECTION_URL
If EMR_HIVE_METASTORE is hive, set this property. JDBC Connection URL for connecting to hive.
jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true
EMR_HIVE_METASTORE_CONNECTION_DRIVER
If EMR_HIVE_METASTORE is hive, set this property. JDBC Driver Name
org.mariadb.jdbc.Driver
EMR_HIVE_METASTORE_CONNECTION_USERNAME
If EMR_HIVE_METASTORE is hive, set this property. JDBC UserName
hive
EMR_HIVE_METASTORE_CONNECTION_PASSWORD
If EMR_HIVE_METASTORE is hive, set this property. JDBC Password
StRong@PassW0rd
EMR_HIVE_SERVICE_NAME
Custom hive service name for hive application in EMR
teamA_policy
EMR_TRINO_HIVE_SERVICE_NAME
Custom hive service name for trino application in EMR
teamB_policy
EMR_SPARK_HIVE_SERVICE_NAME
Custom hive access service name for spark applications in EMR
teamC_policy
EMR_APP_SPARK_OLAC_ENABLE
To install Spark application with Privacera plugin, set the property to true. OLAC is known as Object Level Access Control.
Note:
Recommended when complete access control on the objects in AWS S3 is required.
When the property is set to true, s3 and s3n protocols will not be supported on EMR clusters while running Spark queries.
true
EMR_APP_SPARK_FGAC_ENABLE
To install Spark application with Privacera plugin, set the property to true. FGAC is known as Fine Grained Access Control for Table and Column.
Note: Recommended for compliance purposes, since the whole cluster will still have direct access to AWS S3 data.
false
EMR_APP_PRESTO_DB_ENABLE
To install PrestoDB application with Privacera plugin, set the property to true.
PrestoDB and Trino are mutually exclusive. Only one should be enabled at a time.
false
EMR_APP_PRESTO_SQL_ENABLE
To install Trino application with Privacera plugin, set the property to true.
PrestoDB and Trino are mutually exclusive. Only one should be enabled at a time.
Note: Trino is supported for EMR versions 6.1.0 and higher.
Note: If the EMR version is 6.4.0, setting this flag installs the Trino plugin.
false
EMR_APP_HIVE_ENABLE
To install Hive application with Privacera plugin, set the property to true.
true
EMR_APP_ZEPPELIN_ENABLE
To install Zeppelin application, set the property to true.
true
EMR_APP_LIVY_ENABLE
To install Livy application, set the property to true.
true
EMR_CUST_CONF_ZIP_PATH
A path where the
privacera_cust_conf.zip
file will be placed should be added. Privacera Manager will generate aprivacera_cust_conf.zip
under~/privacera/privacera-manager/output/emr
folder. Thisprivacera_cust_conf.zip
needs to be placed at an s3 or any https location from which the EMR cluster can download it.s3://privacera-artifacts/
EMR_SPARK_ENABLE_VIEW_LEVEL_ACCESS_CONTROL
Set the property to true to enable view-level column masking and row filter for SparkSQL. The property can be used only when you set
EMR_APP_SPARK_FGAC_ENABLE
totrue
.To learn how to use view-level access control in Spark, click here.
false
EMR_RANGER_IS_FALLBACK_SUPPORTED
Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.
To enable the fallback, set to true; to disable, set to false.
true
EMR_SPARK_DELTA_LAKE_ENABLE
Set this property to true to enable Delta Lake on EMR Spark.
true
EMR_SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL
Download URL of Delta Lake core JAR. The Delta Lake core JAR has dependency with Spark version.
You have to find the appropriate version for your EMR. See Delta Lake compatibility with Apache Spark.
Get the appropriate Delta Lake core JAR download link and update the property. See Delta Core.
For example, for Spark version 3.1.x, the download URL is
https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.1/delta-core_2.12-1.0.1.jar
.https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.1/delta-core_2.12-1.0.1.jar
If your cluster was running while External Hive Metastore was down, and you are unable to connect to it, restart the following three servers.
sudo systemctl restart hive-hcatalog-server sudo systemctl restart hive-server2 sudo systemctl restart presto-server
Run the following commands.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
After the update is finished, all the cloud-formation JSON template files and
privacera_cust_conf.zip
will be available at the path,~/privacera/privacera-manager/output/emr
.Configure and run the following in AWS instance where Privacera is installed.
(Optional) Create IAM roles using the
emr-roles-creation-template.json
template. Run the following command.aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-role-creation --template-body file://emr-roles-creation-template.json --capabilities CAPABILITY_NAMED_IAM
Note
This will create IAM roles with minimal permissions. You can add bucket permissions into respective IAM roles as per your requirements.
(Optional) Create Security Configurations using the
emr-security-config-template.json
template. Run the following command.aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-security-config-creation --template-body file://emr-security-config-template.json
Confirm the
privacera_cust_conf.zip
file has been copied to the location specified inEMR_CUST_CONF_ZIP_PATH
.Create EMR using the
emr-template.json
template. Run the following command.aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-creation --template-body file://emr-template.json
Note
If you are upgrading EMR to version 6.4 and higher from EMR version <=6.3 to use Trino plug-in, then you must re-create the EMR security configuration based on the new template generated via PM since the security configuration has
trino
user newly added
Note
For PrestoDB, secrets encryption of Solr authentication password is not supported. However, the properties file where the password resides is accessible only to the presto service user, hence it is invulnerable.
If your cluster was running while External Hive Metastore was down, and you are unable to connect to it, restart the following three servers:
sudo systemctl restart hive-hcatalog-server sudo systemctl restart hive-server2 sudo systemctl restart presto-server