- Platform Release 6.5
- Privacera Platform Release 6.5
- Enhancements and updates in Privacera Access Management 6.5 release
- Enhancements and updates in Privacera Discovery 6.5 release
- Enhancements and updates in Privacera Encryption 6.5 release
- Deprecation of older version of PolicySync
- Upgrade Prerequisites
- Supported versions of third-party systems
- Documentation changelog
- Known Issues 6.5
- Platform - Supported Versions of Third-Party Systems
- Platform Support Policy and End-of-Support Dates
- Privacera Platform Release 6.5
- Privacera Platform Installation
- About Privacera Manager (PM)
- Install overview
- Prerequisites
- Installation
- Default services configuration
- Component services configurations
- Access Management
- Data Server
- UserSync
- Privacera Plugin
- Databricks
- Spark standalone
- Spark on EKS
- Portal SSO with PingFederate
- Trino Open Source
- Dremio
- AWS EMR
- AWS EMR with Native Apache Ranger
- GCP Dataproc
- Starburst Enterprise
- Privacera services (Data Assets)
- Audit Fluentd
- Grafana
- Ranger Tagsync
- Discovery
- Encryption & Masking
- Privacera Encryption Gateway (PEG) and Cryptography with Ranger KMS
- AWS S3 bucket encryption
- Ranger KMS
- AuthZ / AuthN
- Security
- Access Management
- Reference - Custom Properties
- Validation
- Additional Privacera Manager configurations
- Upgrade Privacera Manager
- Troubleshooting
- How to validate installation
- Possible Errors and Solutions in Privacera Manager
- Unable to Connect to Docker
- Terminate Installation
- 6.5 Platform Installation fails with invalid apiVersion
- Ansible Kubernetes Module does not load
- Unable to connect to Kubernetes Cluster
- Common Errors/Warnings in YAML Config Files
- Delete old unused Privacera Docker images
- Unable to debug error for an Ansible task
- Unable to upgrade from 4.x to 5.x or 6.x due to Zookeeper snapshot issue
- Storage issue in Privacera UserSync & PolicySync
- Permission Denied Errors in PM Docker Installation
- Unable to initialize the Discovery Kubernetes pod
- Portal service
- Grafana service
- Audit server
- Audit Fluentd
- Privacera Plugin
- How-to
- Appendix
- AWS topics
- AWS CLI
- AWS IAM
- Configure S3 for real-time scanning
- Install Docker and Docker compose (AWS-Linux-RHEL)
- AWS S3 MinIO quick setup
- Cross account IAM role for Databricks
- Integrate Privacera services in separate VPC
- Securely access S3 buckets ssing IAM roles
- Multiple AWS account support in Dataserver using Databricks
- Multiple AWS S3 IAM role support in Dataserver
- Azure topics
- GCP topics
- Kubernetes
- Microsoft SQL topics
- Snowflake configuration for PolicySync
- Create Azure resources
- Databricks
- Spark Plug-in
- Azure key vault
- Add custom properties
- Migrate Ranger KMS master key
- IAM policy for AWS controller
- Customize topic and table names
- Configure SSL for Privacera
- Configure Real-time scan across projects in GCP
- Upload custom SSL certificates
- Deployment size
- Service-level system properties
- PrestoSQL standalone installation
- AWS topics
- Privacera Platform User Guide
- Introduction to Privacera Platform
- Settings
- Data inventory
- Token generator
- System configuration
- Diagnostics
- Notifications
- How-to
- Privacera Discovery User Guide
- What is Discovery?
- Discovery Dashboard
- Scan Techniques
- Processing order of scan techniques
- Add and scan resources in a data source
- Start or cancel a scan
- Tags
- Dictionaries
- Patterns
- Scan status
- Data zone movement
- Models
- Disallowed Tags policy
- Rules
- Types of rules
- Example rules and classifications
- Create a structured rule
- Create an unstructured rule
- Create a rule mapping
- Export rules and mappings
- Import rules and mappings
- Post-processing in real-time and offline scans
- Enable post-processing
- Example of post-processing rules on tags
- List of structured rules
- Supported scan file formats
- Data Source Scanning
- Data Inventory
- TagSync using Apache Ranger
- Compliance Workflow
- Data zones and workflow policies
- Workflow Policies
- Alerts Dashboard
- Data Zone Dashboard
- Data zone movement
- Workflow policy use case example
- Discovery Health Check
- Reports
- How-to
- Privacera Encryption Guide
- Overview of Privacera Encryption
- Install Privacera Encryption
- Encryption Key Management
- Schemes
- Encryption with PEG REST API
- Privacera Encryption REST API
- PEG API endpoint
- PEG REST API encryption endpoints
- PEG REST API authentication methods on Privacera Platform
- Common PEG REST API fields
- Construct the datalist for the /protect endpoint
- Deconstruct the response from the /unprotect endpoint
- Example data transformation with the /unprotect endpoint and presentation scheme
- Example PEG API endpoints
- /authenticate
- /protect with encryption scheme
- /protect with masking scheme
- /protect with both encryption and masking schemes
- /unprotect without presentation scheme
- /unprotect with presentation scheme
- /unprotect with masking scheme
- REST API response partial success on bulk operations
- Audit details for PEG REST API accesses
- Make encryption API calls on behalf of another user
- Troubleshoot REST API Issues on Privacera Platform
- Privacera Encryption REST API
- Encryption with Databricks, Hive, Streamsets, Trino
- Databricks UDFs for encryption and masking on PrivaceraPlatform
- Hive UDFs for encryption on Privacera Platform
- StreamSets Data Collector (SDC) and Privacera Encryption on Privacera Platform
- Trino UDFs for encryption and masking on Privacera Platform
- Privacera Access Management User Guide
- Privacera Access Management
- How Polices are evaluated
- Resource policies
- Policies overview
- Creating Resource Based Policies
- Configure Policy with Attribute-Based Access Control
- Configuring Policy with Conditional Masking
- Tag Policies
- Entitlement
- Service Explorer
- Users, groups, and roles
- Permissions
- Reports
- Audit
- Security Zone
- Access Control using APIs
- AWS User Guide
- Overview of Privacera on AWS
- Configure policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera Platform documentation changelog
AWS EMR with Native Apache Ranger
AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plugins for Apache Spark and Hive. By connecting EMR’s native Ranger with Privacera’s Ranger-based data access governance, it gives the following key advantages:
Companies will have the ability to sync their existing policies with their EMR solution.
Extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.
Note
Supported EMR version: 5.32 and above in EMR 5.x series.
Prerequisites
AWS Secrets are required for the following to store the Ranger Admin and Ranger plugin certificates.
ranger-admin-pub-cert
ranger-plugin-private-keypair
To create the two secrets in AWS Secret Manager, do the following:
Login to AWS console and navigate to Secrets Manager and then click Store a new secret option.
Select secret type as Other type of secrets and then go to the Plaintext tab. Keep the Default value unchanged. The actual value for this secret will be obtained after the installation is done.
Select the encryption key as per your requirement.
Click Next.
Under Secret name, type a name for the secret in the text field. For example: ranger-admin-pub-cert, ranger-plugin-private-keypair.
Click Next. The Configure automatic rotation page is displayed.
Click Next.
On the Review page, you can check your secret settings and then click Store to save your changes.
The Secret is stored successfully.
Configuration
SSH to the instance as USER.
Run the following commands.
cd ~/privacera/privacera-manager cp config/sample-vars/vars.emr.native.ranger.yml config/custom-vars/ vi config/custom-vars/vars.emr.native.ranger.yml
Edit the following properties.
Property
Description
Example
EMR_NATIVE_ENABLE
Property to enable EMR native Ranger integration.
EMR_NATIVE_ENABLE: "true"
Properties for EMR Specifications
EMR_NATIVE_CLUSTER_NAME
Name of the EMR Cluster.
EMR_NATIVE_CLUSTER_NAME: "Privacera-EMR-Native-Ranger"
EMR_NATIVE_AWS_REGION
AWS Region where the cluster will reside.
EMR_NATIVE_AWS_REGION: "{{AWS_REGION}}"
EMR_NATIVE_AWS_ACCT_ID
AWS Account ID where the EMR Cluster and its resources will reside.
EMR_NATIVE_AWS_ACCT_ID: "587946681758"
EMR_NATIVE_SUBNET_ID
Subnet ID where the EMR Cluster nodes will reside.
EMR_NATIVE_SUBNET_ID: ""
EMR_NATIVE_KEYPAIR
An existing EC2 key pair to SSH into the node of cluster
EMR_NATIVE_KEYPAIR: "privacera-test-pair"
EMR_NATIVE_EC2_MARKET_TYPE
Market Type for the EMR Cluster nodes. For example, SPOT or ON_DEMAND.
EMR_NATIVE_EC2_MARKET_TYPE: "SPOT"
EMR_NATIVE_EC2_INSTANCE_TYPE
Instance Type for the EMR Cluster nodes.
EMR_NATIVE_EC2_INSTANCE_TYPE: "m5.2xlarge"
EMR_NATIVE_MASTER_NODE_COUNT
Node count for Master.
EMR_NATIVE_MASTER_NODE_COUNT: "1"
EMR_NATIVE_CORE_NODE_COUNT
Node count for Core.
EMR_NATIVE_CORE_NODE_COUNT: "1"
EMR_NATIVE_VERSION
EMR Native Ranger integation is supported from 5.32 and above.
EMR_NATIVE_VERSION: "emr-5.32.0"
EMR_NATIVE_TERMINATION_PROTECT
To enable termination protection.
EMR_NATIVE_TERMINATION_PROTECT: "true"
EMR_NATIVE_LOGS_PATH
S3 location for EMR logs storage.
EMR_NATIVE_LOGS_PATH: "s3://privacera-emr/logs"
Properties to configure EMR Security Group
EMR_NATIVE_CREATE_SG
Set this to true, if you don't have existing security groups and want Privacera Manager to take care of adding security groups creation steps in EMR CloudFormation Template.
EMR_NATIVE_CREATE_SG: "false"
If
EMR_NATIVE_CREATE_SG
is false, fill the following properties with existing security group ids:EMR_NATIVE_MASTER_SG_ID
Security Group ID for EMR Master Node Group.
EMR_NATIVE_MASTER_SG_ID: "sg-xxxxxxx"
EMR_NATIVE_SLAVE_SG_ID
Security Group ID for EMR Slave Node Group.
EMR_NATIVE_SLAVE_SG_ID: "sg-xxxxxxx"
EMR_NATIVE_SERVICE_ACCESS_SG_ID
Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network.
EMR_NATIVE_SERVICE_ACCESS_SG_ID: "sg-xxxxxxx"
If
EMR_NATIVE_CREATE_SG
is true, fill the following properties to give security group names for new groups which will be added inemr-template.json
:EMR_NATIVE_SG_VPC_ID
VPC ID in which you want to create the EMR Cluster.
EMR_NATIVE_SG_VPC_ID: "vpc-xxxxxxxxxxx"
EMR_NATIVE_MASTER_SG_NAME
Security Group Name for EMR Master Node Group.
EMR_NATIVE_MASTER_SG_NAME: "priv-master-sg"
EMR_NATIVE_SLAVE_SG_NAME
Security Group Name for EMR Slave Node Group.
EMR_NATIVE_SLAVE_SG_NAME: "priv-slave-sg"
EMR_NATIVE_SERVICE_ACCESS_SG_NAME
Security Group Name for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network.
EMR_NATIVE_SERVICE_ACCESS_SG_NAME: "priv-private-sg"
EMR_NATIVE_SECURITY_CONFIG
Name of the security configurations created for EMR. This can be an existing configuration or Privacera Manager can generate a template through which new configurations can be created. The new template will be available at
~/privacera/privacera-manager/output/emr/emr-native-sec-config-template.json
after you run the Privacera Manager update command.EMR_NATIVE_SECURITY_CONFIG: ""
Properties for EMR Hive Metastore
EMR_NATIVE_HIVE_METASTORE
Metastore type. For example, internal, hive (For external hive-metastore)
EMR_NATIVE_HIVE_METASTORE: "hive"
EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH
S3 location for Hive metastore warehouse
EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH: "s3://hive-warehouse"
Fill the following properties, if
EMR_NATIVE_HIVE_METASTORE
is hive:EMR_NATIVE_METASTORE_CONNECTION_URL
JDBC Connection URL for connecting to Hive Metastore.
EMR_NATIVE_METASTORE_CONNECTION_URL:
jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true
EMR_NATIVE_METASTORE_CONNECTION_DRIVER
JDBC Driver Name
EMR_NATIVE_METASTORE_CONNECTION_DRIVER: "org.mariadb.jdbc.Driver"
EMR_NATIVE_METASTORE_CONNECTION_USERNAME
JDBC UserName
EMR_NATIVE_METASTORE_CONNECTION_USERNAME: "hive"
EMR_NATIVE_METASTORE_CONNECTION_PASSWORD
JDBC Password
EMR_NATIVE_METASTORE_CONNECTION_PASSWORD: "StRong@PassWord"
Properties of Kerberos Server
EMR_NATIVE_KDC_ADMIN_PASSWORD
The password used within the cluster for the kadmin service.
EMR_NATIVE_KDC_ADMIN_PASSWORD: ""
EMR_NATIVE_CROSS_REALM_PASSWORD
The cross-realm trust principal password, which must be identical across realms.
EMR_NATIVE_CROSS_REALM_PASSWORD: ""
EMR_NATIVE_KERB_TICKET_LIFETIME
The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire.
EMR_NATIVE_KERB_TICKET_LIFETIME: 24
EMR_NATIVE_KERB_REALM
The Kerberos realm name for the other realm in the trust relationship.
EMR_NATIVE_KERB_REALM: ""
EMR_NATIVE_KERB_DOMAIN
The domain name of the other realm in the trust relationship.
EMR_NATIVE_KERB_DOMAIN: ""
EMR_NATIVE_KERB_ADMIN_SERVER
The fully qualified domain name (FQDN) and optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used.
EMR_NATIVE_KERB_ADMIN_SERVER: ""
EMR_NATIVE_KERB_KDC_SERVER
The fully qualified domain name (FQDN) and optional port for the KDC in the other realm. If a port is not specified, 88 is used.
EMR_NATIVE_KERB_KDC_SERVER: ""
Properties of Certificates Secrets
EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN
Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger plugin key-pair. This is the secret created in the Prerequisites step above.
EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-plugin-key-pair-ixZbO2"
EMR_NATIVE_RANGER_ADMIN_SECRET_ARN
Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger admin public certificate. This is the secret created in the Prerequisites step above.
EMR_NATIVE_RANGER_ADMIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-admin-public-cert-ixfCO5"
Properties of EMR application
EMR_NATIVE_APP_SPARK_ENABLE
Installs Spark application with EMR native Ranger plugin, if set to true.
EMR_NATIVE_APP_SPARK_ENABLE: "true"
EMR_NATIVE_APP_HIVE_ENABLE
Installs Hive application with EMR native Ranger plugin, if set to true.
EMR_NATIVE_APP_HIVE_ENABLE: "true"
EMR_NATIVE_APP_ZEPPELIN_ENABLE
Installs Zeppelin application, if set to true.
EMR_NATIVE_APP_ZEPPELIN_ENABLE: "true"
EMR_NATIVE_APP_LIVY_ENABLE
Installs Livy application, if set to true.
EMR_NATIVE_APP_LIVY_ENABLE: "true"
Properties of IAM Role Configuration
EMR_NATIVE_DEFAULT_ROLE
Default role attached to EMR cluster for performing cluster related activities. This should be an existing role.
EMR_NATIVE_DEFAULT_ROLE: "EMR_DefaultRole"
EMR_NATIVE_INSTANCE_ROLE
The IAM Role which will be attached to each node in the EMR Cluster. This should have only minimal permissions for basic EMR functionalities.
EMR_NATIVE_INSTANCE_ROLE: "restricted_instance_role"
EMR_NATIVE_DATA_ACCESS_ROLE
This role provides credentials for trusted execution engines, such as Apache Hive and AWS EMR Record Server AWS EMR Components, to access AWS S3 data. Use this role only to access AWS S3 data, including any KMS keys, if you are using S3 SSE-KMS.
EMR_NATIVE_DATA_ACCESS_ROLE: "emr_native_data_access_role"
EMR_NATIVE_USER_ACCESS_ROLE
This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to AWS S3 data, unless its data that should be accessible by all users.
EMR_NATIVE_USER_ACCESS_ROLE: "emr_native_user_access_role"
Properties to send EMR Ranger Engines Audits to Solr
EMR_NATIVE_ENABLE_SOLR_AUDITS
Enable audits to Solr.
EMR_NATIVE_ENABLE_SOLR_AUDITS: "true"
AUDITSERVER_AUTH_TYPE
EMR Native Ranger Audits Frameworks does not support basic authentication, hence this needs to be disabled. This property needs to changed in
vars.auditserver.yml
, if already existing.AUDITSERVER_AUTH_TYPE: "none"
AUDITSERVER_SSL_ENABLE
Incase of self-signed SSL, EMR native Ranger does not support SSL for Solr audits. Hence, AuditServer SSL should be disabled.
AUDITSERVER_SSL_ENABLE: "false"
EMR_NATIVE_CLOUDWATCH_GROUPNAME
Add a CloudWatch LogGroup to push Ranger Audits. This should be an existing Group.
EMR_NATIVE_CLOUDWATCH_GROUPNAME: "emr_privacera_native_logs"
Note
You can also add custom properties that are not included by default. See EMR.
Run the following commands.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
Once update is done, all the CloudFormation JSON template files will be available at ~/privacera/privacera-manager/output/emr-native-ranger path.
Run the following command in the AWS instance where Privacera is installed.
cd ~/privacera/privacera-manager/output/emr-native-ranger
Create the certificates which needs to be added in AWS Secrets Manager.
You will get multiple prompts to enter the keystore password. Use the property value of
RANGER_PLUGIN_SSL_KEYSTORE_PASSWORD
set in~/privacera/privacera-manager/config/custom-vars/vars.ssl.yml
for each prompt.Run the following command.
./emr-native-create-certs.sh
This will create the following two files. You need to update the secrets in both the files, which was created in the Prerequisites section above:
ranger-admin-pub-cert.pem
ranger-plugin-keypair.pem
Display the contents of the
ranger-admin-pub-cert.pem
file.cat ranger-admin-pub-cert.pem
Select the file contents and then right-click in the terminal to copy the contents.
Login to AWS console and navigate to Secrets Manager and then click ranger-admin-pub-cert.
Navigate to Secret value section and then go to Retrieve Secret Value > Edit > Plaintext.
Replace the secrets with the new value, which you copied in step 2.
Similarly, follow the steps b-e above to display the file contents of
ranger-plugin-keypair.pem
and use the contents to replace the value of theranger-plugin-private-keypair
secrets in the AWS Secrets Manager.
(Optional) Create IAM roles using the emr-native-role-creation-template.json template.
aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-role-creation --template-body file://emr-native-role-creation-template.json --capabilities CAPABILITY_NAMED_IAM
Note
For giving access to data for Apache Hive and Apache Spark services, navigate to IAM Management in your AWS Console and add required S3 policies in the
EMR_NATIVE_DATA_ACCESS_ROLE
.(Optional) Create Security Configurations using the emr-native-sec-config-template.json template.
aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-security-config-creation --template-body file://emr-native-sec-config-template.json
Create EMR using the emr-native-template.json template.
aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-creation --template-body file://emr-native-template.json