- PrivaceraCloud Release 4.5
- PrivaceraCloud User Guide
- PrivaceraCloud
- What is PrivaceraCloud?
- Getting Started with Privacera Cloud
- User Interface
- Dashboard
- Access Manager
- Discovery
- Usage statistics
- Encryption and Masking
- Privacera Encryption core ideas and terminology
- Encryption Schemes
- Encryption Schemes
- System Encryption Schemes Enabled by Default
- View Encryption Schemes
- Formats, Algorithms, and Scopes
- Record the Names of Schemes in Use and Do Not Delete Them
- System Encryption Schemes Enabled by Default
- Viewing the Encryption Schemes
- Formats, Algorithms, and Scopes
- Record the Names of Schemes in Use and Do Not Delete Them
- Encryption Schemes
- Presentation Schemes
- Masking schemes
- Create scheme policies on PrivaceraCloud
- Encryption formats, algorithms, and scopes
- Deprecated encryption formats, algorithms, and scopes
- PEG REST API on PrivaceraCloud
- PEG API Endpoint
- Request Summary for PrivaceraCloud
- Prerequisites
- Anatomy of a PEG API endpoint on PrivaceraCloud
- About constructing the datalist for /protect
- About deconstructing the response from /unprotect
- Example of data transformation with /unprotect and presentation scheme
- Example PEG REST API endpoints for PrivaceraCloud
- Audit details for PEG REST API accesses
- Make calls on behalf of another user on PrivaceraCloud
- Privacera Encryption UDF for masking in Databricks
- Privacera Encryption UDFs for Trino
- Syntax of Privacera Encryption UDFs for Trino
- Prerequisites for installing Privacera Crypto plug-in for Trino
- Variable values to obtain from Privacera
- Determine required paths to crypto jar and crypto.properties
- Download Privacera Crypto Jar
- Set variables in Trino etc/crypto.properties
- Restart Trino to register the Privacera Crypto UDFs for Trino
- Example queries to verify Privacera-supplied UDFs
- Azure AD setup
- Launch Pad
- Settings
- General functions in PrivaceraCloud settings
- Applications
- About applications
- Azure Data Lake Storage Gen 2 (ADLS)
- Athena
- Privacera Discovery with Cassandra
- Databricks
- Databricks SQL
- Dremio
- DynamoDB
- Elastic MapReduce from Amazon
- EMRFS S3
- Files
- File Explorer for Google Cloud Storage
- Glue
- Google BigQuery
- Kinesis
- Lambda
- Microsoft SQL Server
- MySQL for Discovery
- Open Source Spark
- Oracle for Discovery
- PostgreSQL
- Power BI
- Presto
- Redshift
- Redshift Spectrum
- Kinesis
- Snowflake
- Starburst Enterprise with PrivaceraCloud
- Starburst Enterprise Presto
- Trino
- Datasource
- User Management
- API Key
- About Account
- Statistics
- Help
- Apache Ranger API
- Reference
- Okta Setup for SAML-SSO
- Azure AD setup
- SCIM Server User-Provisioning
- AWS Access with IAM
- Access AWS S3 buckets from multiple AWS accounts
- Add UserInfo in S3 Requests sent via Dataserver
- EMR Native Ranger Integration with PrivaceraCloud
- Spark Properties
- Operational Status
- How-to
- Create CloudFormation Stack
- Enable Real-time Scanning of S3 Buckets
- Enable Discovery Realtime Scanning Using IAM Role
- How to configure multiple JSON Web Tokens (JWTs) for EMR
- Enable offline scanning on Azure Data Lake Storage Gen 2 (ADLS)
- Enable Real-time Scanning on Azure Data Lake Storage Gen 2 (ADLS)
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- PrivaceraCloud
- PrivaceraCloud Previews
- Privacera documentation changelog
EMR Native Ranger Integration with PrivaceraCloud
AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plug-ins for Apache Spark and Hive. By connecting EMR’s plug-in with PrivaceraCloud’s Ranger-based data access governance has following advantages:
Enterprises can synch their existing policies with EMR.
Organizations can extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.
Prerequisite
Connect Elastic MapReduce from Amazon and EMRFS S3 applications in your PrivaceraCloud portal.
Configuration
Certificate setup in Secrets Manager
AWS EMR Native Ranger mandates usage of mutual TLS between Ranger plug-ins and the Privacera Ranger Admin. To provide these TLS certificates, they must be in the AWS Secrets Manager and provided in an EMR Security Configuration. Perform the following steps to proceed with configuration:
Create two secrets in AWS Secret Manager:
Ranger Admin Public Cert
Login to AWS Console and navigate to Secrets Manager and then click Store a new secret option.
Select secret type as Other type of secrets and then go to the Plaintext tab.
Go to your PrivaceraCloud account and follow navigation Settings > ApiKey > AWS EMR Native Ranger Plugin > Ranger Admin Public Cert > Download Certificate.
Add the contents of this Certificate in the Plaintext tab.
Select the encryption key as per your requirement.
Click Next. Enter the Secret name. For example: ranger-admin-pub-cert
Click Next. The Configure automatic rotation page is displayed. No action required.
Click Next.
Review Secret details and click Store.
The Secret is stored successfully.
Ranger Client KeyPair
Login to AWS Console and navigate to Secrets Manager and then click Store a new secret option.
Select secret type as Other type of secrets and then go to Plaintext tab.
Go to your PrivaceraCloud account and follow navigation Settings > ApiKey > AWS EMR Native Ranger Plugin > Ranger Client KeyPair > Download Certificate.
Add the contents of this certificate in the Plaintext tab.
Select the encryption key as per your requirement.
Click Next. Enter the Secret name. For example: ranger-plugin-key-cert
Click Next. The Configure automatic rotation page is displayed. No action required.
Click Next.
Review Secret details and click Store.
The Secret is stored successfully.
IAM roles setup
Recommended CloudFormation setup
Following three IAM roles need to be created before launching the cluster.
A custom Amazon EC2 instance profile for Amazon EMR, this will be attached to all the cluster nodes: EmrNativePrivaceraInstanceRole.
An IAM role for Apache Ranger Engines, this will be used for data access from S3: EmrNativePrivaceraDataAccessRole
An IAM role for other AWS services, this will be used to attach any other required permissions for the user on EMR cluster: *EmrNativePrivaceraUserAccessRole
These can be created easily with required minimal permission using the following CloudFormation template. You can modify the template based on your requirements (if required).
Sample CloudFormation template:
{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "Create roles and policies for use by Emr-Native Ranger with Privacera", "Parameters": { "EmrNativePrivaceraInstanceRole": { "Description": "IAM Role which will be attached to all Instances in the cluster. Should have minimal permissions. e.g. emr_native_privacera_restricted_instance_role", "Type": "String", "Default": "emr_native_privacera_restricted_instance_role" }, "EmrNativePrivaceraDataAccessRole": { "Description": "IAM Role which will be used by EMR Applications for accessing actual S3 Data. e.g. emr_native_privacera_data_access_role", "Type": "String", "Default": "emr_native_privacera_data_access_role" }, "EmrNativePrivaceraUserAccessRole": { "Description": "IAM Role which will allows users to interact with AWS Services. Shouldn't be used to access s3 Data. e.g. emr_native_privacera_user_access_role", "Type": "String", "Default": "emr_native_privacera_user_access_role" }, "RangerPluginKeyPairSecretArn": { "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger plugin key-pair. e.g. arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-plugin-cert-k4xsLM", "Type": "String", "Default": "" }, "RangerAdminPublicSecretArn": { "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger admin public cert. e.g arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-admin-cert-3W5Zdt", "Type": "String", "Default": "" }, "Region": { "Description": "AWS Region where cluster will be created. e.g. us-east-1", "Type": "String", "Default": "us-east-1" }, "CloudwatchLogGroupName": { "Description": "CloudWatch Log group name which will be used to store RangerAudits. This should be an existing one e.g. emr_native_privacera_audits", "Type": "String", "Default": "" }, "AwsAcctId": { "Description": "Account ID of your Amazon Account. e.g. 999999999999", "Type": "String", "Default": "" }, "LogsBucketS3": { "Description": "S3 path to store emr logs (without the protocol). e.g. privacera-logs/emr-native-logs", "Type": "String", "Default": "" } }, "Resources": { "EmrPrivaceraInstanceRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": { "Fn::Sub": "${EmrNativePrivaceraInstanceRole}" }, "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "ec2.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] } ] }, "Path": "/" } }, "EmrPrivaceraInstancePolicy": { "Type": "AWS::IAM::Policy", "Properties": { "PolicyName": { "Fn::Join": [ "", [ "emr_native_privacera_instance_policy" ] ] }, "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Sid": "EmrServiceLimited", "Effect": "Allow", "Resource": "*", "Action": [ "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:ListInstances", "elasticmapreduce:ListSteps", "glue:CreateDatabase", "glue:UpdateDatabase", "glue:DeleteDatabase", "glue:GetDatabase", "glue:GetDatabases", "glue:CreateTable", "glue:UpdateTable", "glue:DeleteTable", "glue:GetTable", "glue:GetTables", "glue:GetTableVersions", "glue:CreatePartition", "glue:BatchCreatePartition", "glue:UpdatePartition", "glue:DeletePartition", "glue:BatchDeletePartition", "glue:GetPartition", "glue:GetPartitions", "glue:BatchGetPartition", "glue:CreateUserDefinedFunction", "glue:UpdateUserDefinedFunction", "glue:DeleteUserDefinedFunction", "glue:GetUserDefinedFunction", "glue:GetUserDefinedFunctions" ] }, { "Sid": "EmrS3Limited", "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::*.elasticmapreduce/*", "arn:aws:s3:::elasticmapreduce/*", "arn:aws:s3:::elasticmapreduce", { "Fn::Sub": "arn:aws:s3:::${LogsBucketS3}" }, { "Fn::Sub": "arn:aws:s3:::${LogsBucketS3}/*" } ] }, { "Sid": "AllowAssumeOfRolesAndTagging", "Effect": "Allow", "Action": [ "sts:TagSession", "sts:AssumeRole" ], "Resource": [ { "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraDataAccessRole}" }, { "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraUserAccessRole}" } ] }, { "Sid": "AllowSecretsRetrieval", "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": [ { "Fn::Sub": "${RangerPluginKeyPairSecretArn}" }, { "Fn::Sub": "${RangerAdminPublicSecretArn}" } ] } ] }, "Roles": [ { "Ref": "EmrPrivaceraInstanceRole" } ] } }, "EmrPrivaceraUserAccessRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": { "Fn::Sub": "${EmrNativePrivaceraUserAccessRole}" }, "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "ec2.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] }, { "Effect": "Allow", "Principal": { "AWS": { "Fn::GetAtt": [ "EmrPrivaceraInstanceRole", "Arn" ] } }, "Action": "sts:AssumeRole" } ] }, "Path": "/" } }, "EmrPrivaceraDataAccessRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": { "Fn::Sub": "${EmrNativePrivaceraDataAccessRole}" }, "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "ec2.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] }, { "Effect": "Allow", "Principal": { "AWS": { "Fn::GetAtt": [ "EmrPrivaceraInstanceRole", "Arn" ] } }, "Action": "sts:AssumeRole" } ] }, "Path": "/" } }, "DataAccessPolicy": { "Type": "AWS::IAM::Policy", "Properties": { "PolicyName": { "Fn::Join": [ "", [ "emr_native_privacera_data_access_policy" ] ] }, "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Sid": "CloudwatchLogsPermissions", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Effect": "Allow", "Resource": [ { "Fn::Sub": "arn:aws:logs:${Region}:${AwsAcctId}:log-group:${CloudwatchLogGroupName}:*" } ] }, { "Sid": "BucketPermissionsInS3Buckets", "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:ListAllMyBuckets", "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::examplebucket" ] }, { "Sid": "ObjectPermissionsInS3Objects", "Action": [ "s3:GetObject", "s3:DeleteObject", "s3:PutObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::examplebucket/*" ] } ] }, "Roles": [ { "Ref": "EmrPrivaceraDataAccessRole" } ] } }, "EmrPrivaceraInstanceProfile": { "Type": "AWS::IAM::InstanceProfile", "Properties": { "InstanceProfileName": { "Ref": "EmrNativePrivaceraInstanceRole" }, "Roles": [ { "Ref": "EmrPrivaceraInstanceRole" } ] } }, "EmrNativePrivaceraDataAccessProfile": { "Type": "AWS::IAM::InstanceProfile", "Properties": { "InstanceProfileName": { "Ref": "EmrNativePrivaceraDataAccessRole" }, "Roles": [ { "Ref": "EmrPrivaceraDataAccessRole" } ] } }, "EmrNativePrivaceraUserAccessProfile": { "Type": "AWS::IAM::InstanceProfile", "Properties": { "InstanceProfileName": { "Ref": "EmrNativePrivaceraUserAccessRole" }, "Roles": [ { "Ref": "EmrPrivaceraUserAccessRole" } ] } } }, "Outputs": { "EmrNativePrivaceraInstanceRole": { "Value": { "Ref": "EmrPrivaceraInstanceRole" } }, "EmrNativePrivaceraDataAccessRole": { "Value": { "Ref": "EmrPrivaceraDataAccessRole" } }, "EmrNativePrivaceraUserAccessRole": { "Value": { "Ref": "EmrPrivaceraUserAccessRole" } } } }
To know about how to create a stack using CloudFormation template, see Create CloudFormation stack.
After the above stack is created successfully, you will have three IAM roles. Use EmrNativePrivaceraDataAccessRole IAM Role, to give access for S3 data to the Apache Ranger services.
For detailed information, see IAM roles for native integration with Apache Ranger
Manually setup IAM roles
Create the following three IAM Roles:
Create security configurations
Recommended CloudFormation setup
A new SecurityConfiguration needs to be created with the Kerberos Server and Ranger Integration details which will be attached to the EMR cluster.
This can be created easily with required minimal permission using the following CloudFormation template.
This template assumes that you have Cluster dedicated KDC with Cross Realm Trust Enabled.
You can modify the CloudFormation template based on your requirements.
Common variables from the previous setup steps should be kept the same.
Sample CloudFormation template:
{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "Create Security Configuration for use by Privacera-Protected EMR clusters", "Parameters": { "EmrNativePrivaceraSecConfName": { "Description": "Name to be given for the Security Configuration. e.g. emr_native_privacera_sec_conf", "Type": "String", "Default": "emr_native_privacera_sec_conf" }, "EmrNativePrivaceraDataAccessRole": { "Description": "IAM Role which will be used by EMR Applications for accessing actual S3 Data. e.g. emr_native_privacera_data_access_role", "Type": "String", "Default": "emr_native_privacera_data_access_role" }, "EmrNativePrivaceraUserAccessRole": { "Description": "IAM Role which will allows users to interact with AWS Services. Shouldn't be used to access s3 Data. e.g. emr_native_privacera_user_access_role", "Type": "String", "Default": "emr_native_privacera_user_access_role" }, "RangerPluginKeyPairSecretArn": { "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger plugin key-pair. e.g. arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-plugin-cert-k4xsLM", "Type": "String", "Default": "" }, "RangerAdminPublicSecretArn": { "Description": "Full ARN of secret [stored in AWS Secrets Manager] for ranger admin public cert. e.g arn:aws:secretsmanager:us-east-1:999999999999:secret:ranger-admin-cert-3W5Zdt", "Type": "String", "Default": "" }, "Region": { "Description": "AWS Region where cluster will be created. e.g. us-east-1", "Type": "String", "Default": "" }, "CloudwatchLogGroupName": { "Description": "CloudWatch Log group name which will be used to store RangerAudits. This should be an existing one e.g. emr_native_privacera_audits", "Type": "String", "Default": "" }, "AwsAcctId": { "Description": "Account ID of your Amazon Account. e.g. 999999999999", "Type": "String", "Default": "" }, "EmrNativeRangerAdminUrl": { "Description": "Get from--> PCloud Portal >> Access Manager >> Settings >> ApiKey >> Click Info Icon >> AWS EMR Native Ranger Plugin Section >> Ranger Admin mTLS URL >> Copy URL. e.g. https://api-mtls.privaceracloud.com/api/<api-key>", "Type": "String", "Default": "https://api-mtls.privaceracloud.com/api/<api-key>" }, "HiveRepoName": { "Description": "Hive Repo Name in RangerAdmin", "Type": "String", "Default": "privacera_hive" }, "EmrfsRepoName": { "Description": "EMRFS-S3 Repo Name in RangerAdmin", "Type": "String", "Default": "privacera_emrfs_s3" }, "KerberosTicketLifetime": { "Description": "The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire", "Type": "Number", "Default": 24 }, "KerberosAdminServer": { "Description": "The fully qualified domain name (FQDN) and optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used", "Type": "String", "Default": "" }, "KerberosDomain": { "Description": "The domain name of the other realm in the trust relationship", "Type": "String", "Default": "" }, "KDCServer": { "Description": "The fully qualified domain name (FQDN) and optional port for the KDC in the other realm. If a port is not specified, 88 is used", "Type": "String", "Default": "" }, "KerberosRealm": { "Description": "The Kerberos realm name for the other realm in the trust relationship", "Type": "String", "Default": "" } }, "Resources": { "SecurityConfiguration": { "Type": "AWS::EMR::SecurityConfiguration", "Properties": { "Name": { "Fn::Sub": "${EmrNativePrivaceraSecConfName}" }, "SecurityConfiguration": { "AuthorizationConfiguration": { "RangerConfiguration": { "AdminServerURL": { "Fn::Sub": "${EmrNativeRangerAdminUrl}" }, "RoleForRangerPluginsARN": { "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraDataAccessRole}" }, "RoleForOtherAWSServicesARN": { "Fn::Sub": "arn:aws:iam::${AwsAcctId}:role/${EmrNativePrivaceraUserAccessRole}" }, "AdminServerSecretARN": { "Fn::Sub": "${RangerAdminPublicSecretArn}" }, "RangerPluginConfigurations": [ { "App": "Spark", "ClientSecretARN": { "Fn::Sub": "${RangerPluginKeyPairSecretArn}" }, "PolicyRepositoryName": { "Fn::Sub": "${HiveRepoName}" } }, { "App": "Hive", "ClientSecretARN": { "Fn::Sub": "${RangerPluginKeyPairSecretArn}" }, "PolicyRepositoryName": { "Fn::Sub": "${HiveRepoName}" } }, { "App": "EMRFS-S3", "ClientSecretARN": { "Fn::Sub": "${RangerPluginKeyPairSecretArn}" }, "PolicyRepositoryName": { "Fn::Sub": "${EmrfsRepoName}" } } ], "AuditConfiguration": { "Destinations": { "AmazonCloudWatchLogs": { "CloudWatchLogGroup": { "Fn::Sub": "arn:aws:logs:${Region}:${AwsAcctId}:log-group:${CloudwatchLogGroupName}:*" } } } } } }, "AuthenticationConfiguration": { "KerberosConfiguration": { "Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": { "TicketLifetimeInHours": { "Ref": "KerberosTicketLifetime" }, "CrossRealmTrustConfiguration": { "AdminServer": { "Fn::Sub": "${KerberosAdminServer}" }, "Domain": { "Fn::Sub": "${KerberosDomain}" }, "KdcServer": { "Fn::Sub": "${KDCServer}" }, "Realm": { "Fn::Sub": "${KerberosRealm}" } } } } } } } } } }
To know about how to create a stack using CloudFormation template, see Create CloudFormation stack.
Manually Setup Security Configurations
Login to AWS Console and navigate to EMR Console > Security Configuration (from left panel) > Create New Security Configuration.
Enter the Security Configuration name. E.g. EMR_NATIVE_WITH_PLCOUD
Navigate to Authentication section and select Enable Kerberos authentication checkbox and enter the Kerberos environment details.
Under the Authorization section, select Enable integration with Apache Ranger for fine-grained access control and enter the details as below.
IAM role for Apache Ranger: “EMR_RS_DATA_ACCESS_ROLE” (Created during IAM Roles setup).
IAM role for other AWS Services: “EMR_RS_USER_ACCESS_ROLE” (Created during IAM Roles setup.
Ranger Policy Manager: Go to your PCloud Account > Settings > ApiKey > AWS EMR Native Ranger > Ranger Admin mTLS URL > click Copy URL and add it in this section.
Admin PEM secret: Choose ranger-admin-pub-cert using drop-down.
EMRFS client PEM secret: Choose ranger-plugin-key-cert using drop-down.
EMRFS policy repository: privacera_emrfs_s3
Spark configurations: Select this option, if want to enable Spark Application.
Spark client PEM secret: Choose ranger-plugin-key-cert using drop-down.
Spark policy repository: privacera_hive 10. Hive configurations: Select this option, if want to enable Hive Application.
Hive client PEM secret: Choose ranger-plugin-key-cert using drop-down.
Hive policy repository: privacera_hive
CloudWatch Log Group: Select a CloudWatch log group for pushing audits if required. Note: The “EMR_RS_DATA_ACCESS_ROLE” should have permissions to create and PutLogEvents in this log group(this has been configured during IAM roles setup).
Create EMR cluster
Recommended CloudFormation setup
The following CloudFormation template can be used to EMR cluster. You can modify the below template based on your requirements (if required).
Common variables from the previous setup steps should be kept the same.
Sample CloudFormation template:
{ "AWSTemplateFormatVersion": "2010-09-09", "Description": "Create EMR Cluster - Native Ranger Integration with Privacera", "Parameters": { "ClusterName": { "Description": "Name of the emr cluster", "Type": "String", "Default": "Privacera-EMR-Native-Ranger" }, "EMRVersion": { "Description": "EMR Native Ranger integation is supported from 5.32 onwards. e.g. emr-5.32.0, emr-5.33.0, etc.", "Type": "String", "Default": "emr-5.32.0" }, "MasterSecurityGroup": { "Description": "Security Group ID for EMR Master Node Group. e.g. sg-xxxxxxx", "Type": "String", "Default": "" }, "SlaveSecurityGroup": { "Description": "Security Group ID for EMR Slave Node Group. e.g. sg-xxxxxxx", "Type": "String", "Default": "" }, "ServiceAccessSecurityGroup": { "Description": "Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a Private Network. e.g. sg-xxxxxxx", "Type": "String", "Default": "" }, "NodeSubnetId": { "Description": "Subnet id for the cluster nodes. e.g. subnet-xxxx", "Type": "String", "Default": "" }, "SecurityConfig": { "Description": "SecurityConfiguration name that will be attached to the EMR Cluster. e.g emr-native-privacera-sec-conf", "Type": "String", "Default": "emr-native-privacera-sec-conf" }, "HiveMetaStoreWarehouseS3Path": { "Description": "Hive metastore warehouse s3 path. e.g. s3://hive-warehouse/data", "Type": "String", "Default": "" }, "NodeKeyPair": { "Description": "An existing EC2 key pair to SSH into the node of cluster. e.g. privacera-test-pair", "Type": "String", "Default": "" }, "NodeMarketType": { "Description": "Node Instance market type. e.g. SPOT, ON_DEMAND", "Type": "String", "Default": "" }, "KdcAdminPassword": { "Description": "The password used within the cluster for the kadmin service.", "Type": "String", "Default": "" }, "CrossRealmTrustPrincipalPassword": { "Description": "The cross-realm trust principal password, which much be identical across realms.", "Type": "String", "Default": "" }, "RangerAuditsSetupScriptUrl": { "Description": "Get from--> PCloud Portal >> Access Manager >> Settings >> ApiKey >> Click Info Icon >> AWS EMR Native Ranger Plugin Section >> Ranger Audit Setup Script >> Copy URL", "Type": "String", "Default": "" }, "EmrMasterNodeCount": { "Description": "Node count for Master. e.g. 1", "Type": "Number", "Default": 1 }, "EmrCoreNodeCount": { "Description": "Node count for Core. e.g. 1", "Type": "Number", "Default": 1 }, "EmrNodeInstanceType": { "Description": "e.g. m5.large, m5.2xlarge, r5.xlarge,etc. ", "Type": "String", "Default": "" }, "EmrTerminationProtection": { "Description": "To enable termination protection. Can be true/false", "Type": "String", "Default": "true" }, "EmrLogsPath": { "Description": "S3 location for emr logs storage. e.g. s3://privacera-emr/logs", "Type": "String", "Default": "" }, "EmrNativePrivaceraInstanceRole": { "Description": "IAM Role which will be attached to all Instances in the Cluster. Should have minimal permissions. e.g. emr_native_privacera_restricted_instance_role", "Type": "String", "Default": "emr_native_privacera_restricted_instance_role" }, "EmrDefaultRole": { "Description": "Default role attached to EMR Cluster for performing cluster related activities. This should be a pre-created one. e.g. EMR_DefaultRole", "Type": "String", "Default": "EMR_DefaultRole" }, "EmrHiveMetastoreConnectionUrl": { "Description": "JDBC Connection URL for connecting to hive. e.g. jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true", "Type": "String", "Default": "" }, "EmrHiveMetastoreConnectionDriver": { "Description": "JDBC Driver Name. e.g. org.mariadb.jdbc.Driver", "Type": "String", "Default": "" }, "EmrHiveMetastoreConnectionUsername": { "Description": "JDBC UserName", "Type": "String", "Default": "" }, "EmrHiveMetastoreConnectionPassword": { "Description": "JDBC Password", "Type": "String", "Default": "" } }, "Resources": { "EMRCLUSTER": { "Type": "AWS::EMR::Cluster", "Properties": { "Name": { "Ref": "ClusterName" }, "KerberosAttributes": { "Realm": "EC2.INTERNAL", "KdcAdminPassword": { "Ref": "KdcAdminPassword" }, "CrossRealmTrustPrincipalPassword": { "Ref": "CrossRealmTrustPrincipalPassword" } }, "SecurityConfiguration": { "Ref": "SecurityConfig" }, "VisibleToAllUsers": true, "EbsRootVolumeSize": 15, "Instances": { "MasterInstanceGroup": { "InstanceCount": { "Ref": "EmrMasterNodeCount" }, "InstanceType": { "Fn::Sub": "${EmrNodeInstanceType}" }, "Market": { "Fn::Sub": "${NodeMarketType}" }, "Name": "Master Instance Group" }, "CoreInstanceGroup": { "InstanceCount": { "Ref": "EmrCoreNodeCount" }, "InstanceType": { "Fn::Sub": "${EmrNodeInstanceType}" }, "Market": { "Fn::Sub": "${NodeMarketType}" }, "Name": "Core Instance Group" }, "Ec2KeyName": { "Ref": "NodeKeyPair" }, "EmrManagedSlaveSecurityGroup": { "Fn::Sub": "${SlaveSecurityGroup}" }, "EmrManagedMasterSecurityGroup": { "Fn::Sub": "${MasterSecurityGroup}" }, "ServiceAccessSecurityGroup": { "Fn::Sub": "${ServiceAccessSecurityGroup}" }, "Ec2SubnetId": { "Fn::Sub": "${NodeSubnetId}" }, "TerminationProtected": { "Fn::Sub": "${EmrTerminationProtection}" } }, "BootstrapActions": [ { "Name": "Configure Ranger Audits for Master Node", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=true" }, { "Fn::Sub": "wget ${RangerAuditsSetupScriptUrl}; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh" } ] } }, { "Name": "Configure Ranger Audits for Worker Nodes", "ScriptBootstrapAction": { "Path": "s3://elasticmapreduce/bootstrap-actions/run-if", "Args": [ { "Fn::Sub": "instance.isMaster=false" }, { "Fn::Sub": "wget ${RangerAuditsSetupScriptUrl}; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh" } ] } } ], "Applications": [ { "Name": "Hive" }, { "Name": "Spark" }, { "Name": "Zeppelin" }, { "Name": "Livy" }, { "Name": "Hue" } ], "Configurations": [ { "Classification": "spark", "ConfigurationProperties": { "maximizeResourceAllocation": "true" }, "Configurations": [] }, { "Classification": "spark-hive-site", "ConfigurationProperties": { "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreWarehouseS3Path" } } }, { "Classification": "hive-site", "ConfigurationProperties": { "javax.jdo.option.ConnectionURL": { "Fn::Sub": "${EmrHiveMetastoreConnectionUrl}" }, "javax.jdo.option.ConnectionDriverName": { "Fn::Sub": "${EmrHiveMetastoreConnectionDriver}" }, "javax.jdo.option.ConnectionUserName": { "Fn::Sub": "${EmrHiveMetastoreConnectionUsername}" }, "javax.jdo.option.ConnectionPassword": { "Fn::Sub": "${EmrHiveMetastoreConnectionPassword}" }, "hive.metastore.warehouse.dir": { "Ref": "HiveMetaStoreWarehouseS3Path" } } } ], "LogUri": { "Fn::Sub": "${EmrLogsPath}" }, "JobFlowRole": { "Fn::Sub": "${EmrNativePrivaceraInstanceRole}" }, "ServiceRole": { "Fn::Sub": "${EmrDefaultRole}" }, "ReleaseLabel": { "Fn::Sub": "${EMRVersion}" } } } } }
To know about how to create a stack using CloudFormation template, refer Create CloudFormation stack topic.
Manually setup EMR cluster
Login to AWS Console and navigate to EMR service and click Create Cluster.
Click Go to advanced options link.
Under the Software Configuration:
Select Release Version.
Select additional applications as per your environment.
If you select Hive or Spark applications, then it is mandatory to select HCatalog option.
Under the Edit software settings, select the Enter configuration, and add the following text if you want to use external Hive Metastore.
Glue Metastore is not supported.
[ { "Classification": "hive-site", "Properties": { "javax.jdo.option.ConnectionUserName": "${user-name}", "javax.jdo.option.ConnectionDriverName": "${jdbc-driver}", "javax.jdo.option.ConnectionURL": "${jdbc-url}", "javax.jdo.option.ConnectionPassword": "${jdbc-password}" } } ]
Click Next.
Under the Hardware settings, select values Networking, Node, and Instance values as appropriate for your environment.
Under the General cluster settings.
If you want to enable Audit logging for your applications in Privacera Portal, perform the following. It will add two scripts that will Install Ranger Audits Configurations on master and worker nodes.
Enter the Cluster name.
Select Logging, Debugging, and Termination protection checkboxes as per your environment.
Configure Ranger Audits logging for Master Node:
Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add.
The Add Bootstrap Action dialog appears.
In this dialog, enter the name to Configure Ranger Audits for Master.
Add the following script in the Optional arguments field using your own {ranger-audit-setup-script-url} script URL.
{ranger-audit-setup-script-url}: PCloud Portal > Access Manager > Settings > ApiKey > Click Info Icon > Ranger Audit Setup Script > Copy URL.
instance.isMaster=true "wget <ranger-audit-setup-script-url>; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
Click Add.
Configure Ranger Audits for Worker nodes.
Under Additional Options, expand Bootstrap Actions, select bootstrap action Run if and click Configure and add.
The Add Bootstrap Action dialog appears.
In this dialog, enter the name to Configure Ranger Audits for Master.
Add the following script in the Optional arguments field using your own {ranger-audit-setup-script-url} script URL.
{ranger-audit-setup-script-url}: PCloud Portal > Access Manager > Settings > ApiKey > Click Info Icon > Ranger Audit Setup Script > Copy URL.
instance.isMaster=false "wget <ranger-audit-setup-script-url>; chmod +x ./privacera_emr_native.sh ; sudo ./privacera_emr_native.sh"
Click Add.
Under Security Options:
Enter/select Security Options as per your environment.
Under the Permissions section:
EMR role: The EMR_EC2_Default role need to be selected.
EC2 instance profile: “EMR_RS_INSTANCE_ROLE” created during IAM Roles setup.
Expand Security Configuration, and select the configuration which you created earlier. E.g. "EMR_NATIVE_WITH_PLCOUD".
Set Realm and enter a KDC admin password.
Click the Create cluster.
Application usage
On the PrivaceraCloud Account, expand Settings and click Applications. For more information, see Elastic MapReduce from Amazon and EMRFS S3.
Spark
Spark SQL use case:
SSH to EMR master node.
kinit with your user.
Run Spark-SQL shell using “spark-sql”.
Run SQL type queries with Spark.
Policies are evaluated against the “privacera_hive” repository and audits can be seen under Access Manager > Audits.
Spark Shell use case:
SSH to EMR master node.
kinit with your user.
Run Spark-shell using “spark-shell”.
Run Scala queries with Spark.
Policies are evaluated against the privacera_emrfs_s3 policy repository for any S3 access. Audits can be seen under Access Manager > Audits.
Hive
SSH to EMR master node.
kinit with your user.
Login to beeline shell using command below:
beeline -u "jdbc:hive2://`hostname -f`:10000/default;principal=hive/`hostname -f`@EC2.INTERNAL"
Run Hive queries.
Policies are evaluated against the privacera_hive policy repository. Audits can be seen under Access Manager > Audits.
AWS documentation references
Last update: February 18, 2022