Skip to content

Setup for Access Management for EMR Cluster

This section outlines the steps to set up the EMR Cluster with the Privacera Plugin. Please ensure that all prerequisites are completed before beginning the setup process.

Perform following steps to configure EMR connector:

  1. SSH to the instance where Privacera is installed.

  2. Run the following command to navigate to the /config directory.

    Bash
    cd ~/privacera/privacera-manager/config
    

  3. Run the following command to copy the sample vars:

    Bash
    cp sample-vars/vars.emr.yml custom-vars/
    

  4. Run the following command to open the .yml file to be edited.

    Bash
    vi custom-vars/vars.emr.yml
    

  5. Modify the following properties:

Variable Definition
EMR_CLUSTER_NAME Enter unique name for the EMR cluster.
EMR_VERSION Enter the emr version (ex: emr-7.1.0)
Security group config
EMR_MASTER_SG_ID Set Security Group ID for EMR Master Node Group.
EMR_SLAVE_SG_ID Set Security Group ID for EMR Slave Node Group.
EMR_SERVICE_ACCESS_SG_ID Set Security Group ID for EMR Service Access
Instance config
EMR_SUBNET_ID Subnet ID for the instance.
EMR_KEYPAIR Existing EC2 key pair to SSH into the master node.
EMR_EC2_MARKET_TYPE EC2 instance market type. Supported: SPOT, ON_DEMAND
EMR_EC2_INSTANCE_TYPE EC2 instance type. ex: m5.xlarge
EMR_MASTER_NODE_COUNT Number of master node instances in the cluster.
EMR_CORE_NODE_COUNT Number of core node instances in the cluster.
Security config
EMR_SECURITY_CONFIG Name of the Security Configurations created for EMR.
EMR_KERBEROS_ENABLE Enable kerberos, this should be set to 'true'.
EMR_KDC_ADMIN_PASSWORD Cluster KDC admin password.
EMR_CROSS_REALM_PASSWORD Cross realm principle password.
EMR_KERB_REALM Specifies the Kerberos realm name.
EMR_KERB_DOMAIN Specifies the domain name of the other realm.
EMR_KERB_ADMIN_SERVER Specifies the FQDN or IP address of the admin server.
EMR_KERB_KDC_SERVER Specifies the FQDN or IP address of the KDC server.
AWS Account & IAM config
EMR_AWS_ACCT_ID AWS Account ID where EMR Cluster will be created.
EMR_DEFAULT_ROLE Role attached to EMR Cluster for performing cluster related activities.
EMR_ROLE_FOR_CLUSTER_NODES IAM Role which will be attached to each node in the EMR Cluster.
EMR_ROLE_FOR_APPS IAM Role name which will be used by all EMR Apps.
Spark OLAC config
EMR_APP_SPARK_OLAC_ENABLE Set to enable Object-Level Access Controle (OLAC) for EMR Spark.
Trino config
EMR_APP_PRESTO_SQL_ENABLE Set to enable trino plugin for EMR Trino.
EMR_APP_PRESTO_DB_ENABLE Set to enable prestodb plugin for EMR PrestoDB.
Hive config
EMR_HIVE_METASTORE_PATH Hive Metastore path.
Other config
EMR_LOGS_PATH S3 location for storing EMR cluster logs.
  1. Once the properties are configured, run the following commands to update your Privacera Manager platform instance:

    Bash
    cd ~/privacera/privacera-manager
    
    # step 1 - setup which generates the helm charts.
    # This step usually takes few minutes.
    ./privacera-manager.sh setup
    
    # step 2 - install or upgrade the Privacera Manager helm charts
    ./pm_with_helm.sh [install|upgrade]
    
    # step 3 - post-installation steps which generates plugin configurations,
    # updates Route 53 DNS, etc.
    ./privacera-manager.sh post-install
    

  2. Once the update is complete, all cloud-formation JSON template files will be available at the following path:

    Bash
    ~/privacera/privacera-manager/output/emr/
    

  1. In PrivaceraCloud, go to Settings -> Applications.

  2. On the Applications screen, select EMR.

  3. Enter the application Name and Description. Click Save. Name could be any name of your choice. E.g. AWS EMR Connector for account 123456.

  4. Open the EMR application.

  5. Enable the Access Management option with toggle button.

Configure shared-secret

Note: This step is required only for EMR Spark OLAC.

  1. In PrivaceraCloud, go to Settings -> Applications.
  2. For more information about how to connect S3 application, see Connect S3 to PrivaceraCloud.
  3. On the Applications screen, select S3.
  4. On the screen click the edit icon and navigate to 'ADVANCED' tab.
  5. Add the following property:
    Properties
    dataserver.shared.secret=<shared_secret_value>
    
  6. Click Save.

Privacera Plugin Script

  1. In PrivaceraCloud, go to Settings -> Applications.
  2. On the Applications screen, select EMR.
  3. From the screen, either copy the download url or download the script.
  4. If you have downloaded the script, then follow the below steps:
    • Upload to specific s3 bucket location
    • Get the object url of the uploaded script file from s3

Setup IAM Roles

The following two IAM roles must be created prior to launching the cluster. These roles can be established efficiently with minimal permissions by utilizing the IAM roles template provided below.

  • Node role
  • Application data access role

AWS IAM roles are required to access the AWS resources. The following roles are required to access the AWS resources: In the template Node role referred as 'EmrPrivaceraNodeRole' and Application data access role referred as 'EmrPrivaceraDataAcessRole'

privacera-emr-iam-role-template
JSON
 {
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description":"Create roles and policies for use by Privacera-Protected EMR Clusters",
  "Resources":{
    "EmrRestrictedRole":{
      "Type":"AWS::IAM::Role",
      "Properties":{
        "RoleName":{
          "Fn::Join":[
            "",
            [
              "EmrPrivaceraNodeRole"
            ]
          ]
        },
        "AssumeRolePolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Effect":"Allow",
              "Principal":{
                "Service":[
                  "ec2.amazonaws.com"
                ]
              },
              "Action":[
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "Path":"/"
      }
    },
    "EmrRestrictedPolicy":{
      "Type":"AWS::IAM::ManagedPolicy",
      "Properties":{
        "ManagedPolicyName":{
          "Fn::Join":[
            "",
            [
              "EMRPrivaceraNodePolicy"
            ]
          ]
        },
        "PolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Sid":"EmrServiceLimited",
              "Effect":"Allow",
              "Action":[
                "glue:CreateDatabase",
                "glue:UpdateDatabase",
                "glue:DeleteDatabase",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetTableVersions",
                "glue:CreatePartition",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition",
                "glue:CreateUserDefinedFunction",
                "glue:UpdateUserDefinedFunction",
                "glue:DeleteUserDefinedFunction",
                "glue:GetUserDefinedFunction",
                "glue:GetUserDefinedFunctions",
                "ec2:Describe*",
                "elasticmapreduce:Describe*",
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ListClusters",
                "elasticmapreduce:ListInstanceGroups",
                "elasticmapreduce:ListInstances",
                "elasticmapreduce:ListSteps"
              ],
              "Resource":"*"
            },
            {
              "Sid":"EmrS3Limited",
              "Effect":"Allow",
              "Action":"s3:*",
              "Resource":[
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_LOG_S3_PATH>",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_HIVE_WAREHOUSE_S3_PATH>"
              ]
            },
            {
              "Sid":"EmrAssumeIAM",
              "Effect":"Allow",
              "Action":"sts:AssumeRole",
              "Resource":[
                "arn:aws:iam::<PLEASE_UPDATE AWS_ACCCOUNT_ID>:role/EmrPrivaceraDataAccessRole"
              ]
            }
          ]
        },
        "Roles":[
          {
            "Ref":"EmrRestrictedRole"
          }
        ]
      }
    },
    "EmrRoleForApps":{
      "Type":"AWS::IAM::Role",
      "Properties":{
        "RoleName":{
          "Fn::Join":[
            "",
            [
              "EmrPrivaceraDataAccessRole"
            ]
          ]
        },
        "AssumeRolePolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Effect":"Allow",
              "Principal":{
                "Service":[
                  "ec2.amazonaws.com"
                ]
              },
              "Action":[
                "sts:AssumeRole"
              ]
            },
            {
              "Effect":"Allow",
              "Principal":{
                "AWS":{
                  "Fn::GetAtt":[
                    "EmrRestrictedRole",
                    "Arn"
                  ]
                }
              },
              "Action":"sts:AssumeRole"
            }
          ]
        },
        "Path":"/"
      }
    },
    "DataAccessPolicy":{
      "Type":"AWS::IAM::ManagedPolicy",
      "Properties":{
        "ManagedPolicyName":{
          "Fn::Join":[
            "",
            [
              "EmrPrivaceraDataAcessPolicy"
            ]
          ]
        },
        "PolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Sid":"S3DataAccess",
              "Effect":"Allow",
              "Action":[
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:DeleteBucket",
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:PutObjectAcl"
              ],
              "Resource":[
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_LOG_S3_PATH>",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_HIVE_WAREHOUSE_S3_PATH>"
              ]
            }
          ]
        },
        "Roles":[
          {
            "Ref":"EmrRoleForApps"
          }
        ]
      }
    },
    "EmrRestrictedRoleProfile":{
      "Type":"AWS::IAM::InstanceProfile",
      "Properties":{
        "InstanceProfileName":{
          "Ref":"EmrRestrictedRole"
        },
        "Roles":[
          {
            "Ref":"EmrRestrictedRole"
          }
        ]
      }
    },
    "EmrRoleForAppsProfile":{
      "Type":"AWS::IAM::InstanceProfile",
      "Properties":{
        "InstanceProfileName":{
          "Ref":"EmrRoleForApps"
        },
        "Roles":[
          {
            "Ref":"EmrRoleForApps"
          }
        ]
      }
    }
  },
  "Outputs":{
    "EMRRestrictedRole":{
      "Value":{
        "Ref":"EmrRestrictedRole"
      }
    },
    "EmrRoleForApps":{
      "Value":{
        "Ref":"EmrRoleForApps"
      }
    }
  }
}

To create role, use the following AWS CLI CloudFormation command:

Bash
aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-role-creation --template-body file://emr-roles-creation-template.json --capabilities CAPABILITY_NAMED_IAM

Setup EMR Security Configuration

A new security configuration must be established to integrate the Kerberos server with the EMR cluster.

Security Configuration can be created in two ways:

  1. Using AWS CLI Cloud Formation Command
  2. Using AWS EMR Console

In-transit encryption is required

The generation of certificates to enable in-transit encryption is essential. This step is mandated by AWS guidelines, as in-transit encryption serves as a prerequisite for enabling Kerberos for Trino. For more information, click here

Using AWS CLI Cloud Formation Command

If required, you can modify the below template emr-security-config-template.json based on your requirements

emr-security-config-template.json
JSON
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Create Security Configuration for use by Privacera-Protected EMR Clusters",
  "Resources": {
    "SecurityConfiguration": {
      "Type": "AWS::EMR::SecurityConfiguration",
      "Properties": {
        "Name": "EmrPrivaceraSecurityGroup",
        "SecurityConfiguration": {
          "AuthorizationConfiguration": {
            "EmrFsConfiguration": {
              "RoleMappings": [
                {
                  "Role": "arn:aws:iam::<PLEASE_UPDATE AWS_ACCCOUNT_ID>:role/<PLEASE_UPDATE>",
                  "IdentifierType": "User",
                  "Identifiers": [
                    "hive;presto;trino"
                  ]
                }
              ]
            }
          },
          "AuthenticationConfiguration": {
            "KerberosConfiguration": {
              "Provider": "ClusterDedicatedKdc",
              "ClusterDedicatedKdcConfiguration": {
                "TicketLifetimeInHours": 24,
                "CrossRealmTrustConfiguration": {
                  "AdminServer": "<PLEASE_UPDATE>",
                  "Domain": "<PLEASE_UPDATE>",
                  "KdcServer": "<PLEASE_UPDATE>",
                  "Realm": "<PLEASE_UPDATE>"
                }
              }
            }
          },
          "EncryptionConfiguration": {
            "EnableInTransitEncryption": true,
            "EnableAtRestEncryption": false,
            "InTransitEncryptionConfiguration": {
              "TLSCertificateConfiguration": {
                "CertificateProviderType": "PEM",
                "S3Object": "<PLEASE_UPDATE_PEM_CERT_ZIP_PATH>"
              }
            }
          }
        }
      }
    }
  }
}

To create security configuration, use the following AWS CLI CloudFormation command:

Bash
aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-security-config-creation --template-body file://emr-security-config-template.json

Using AWS EMR Console

Please follow the steps outlined below to create a security configuration using the AWS EMR console:

  1. Login to the AWS EMR console.
  2. In the left navigation pane, select Security Configuration and then click on Create New Security Configuration.
  3. Enter a name for the security configuration, such as emr_sec_config.
  4. Navigate to the Authentication section, check the box for Enable Kerberos authentication, and provide the Kerberos environment details as follows:
    • Provider: Cluster-dedicated KDC
    • Ticket Lifetime (hours): 24
    • Check the Turn on Cross-realm trust box and enter the following details:
      • Realm: EXAMPLE.COM
      • Domain: example.com
      • Admin Server: server.admin.com
      • KDC Server: server.example.com
  5. Select the option to Use IAM roles for EMRFS requests to Amazon S3.
    • IAM Role: select the App data access role created in AWS IAM roles using CloudFormation setup.
    • Under Basis for Access, select the identifier type User from the list and enter the corresponding identifiers: hadoop, hive, presto, and trino.

EMR Bootstrap action

Add the below bootstrap action to seup Privacera plugins in EMR cluster:

Hive 'doAs' should be disabled

It is recommended to disable the Hive Impersonation for Privacera Ranger Authorization. By default Privacera Plugin will override the property hive.server2.enable.doAs in /etc/hive/conf/hive-site.xml and set it to false. When hive.server2.enable.doAs=true, HiveServer2 performs the query processing as the user who submitted the query (usually the user you kinit with). However, if the parameter is set to false, the query will run as the user that the hiveserver2 process runs as, which is typically hive.

Spark OLAC, Hive, Trino

privacera-emr-bootstrap-action-spark_olac-hive-trino
JSON
"BootstrapActions": [
{
  "Name": "Install Privacera Plugins on Master Node",
  "ScriptBootstrapAction": {
    "Path": "s3://elasticmapreduce/bootstrap-actions/run-if",
    "Args": [
      {
        "Fn::Sub": "instance.isMaster=true"
      },
      {
        "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
      }
    ]
  }
},
{
  "Name": "Install Spark OLAC in Core Node",
  "ScriptBootstrapAction": {
    "Path": "s3://elasticmapreduce/bootstrap-actions/run-if",
    "Args": [
      {
        "Fn::Sub": "instance.isMaster=false"
      },
      {
        "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
      }
    ]
  }
}
]

Spark FGAC, Hive, Trino

privacera-emr-bootstrap-action-spark_fgac-hive-trino
JSON
"BootstrapActions": [
{
  "Name": "Install Privacera Plugins on Master Node",
  "ScriptBootstrapAction": {
    "Path": "s3://elasticmapreduce/bootstrap-actions/run-if",
    "Args": [
      {
        "Fn::Sub": "instance.isMaster=true"
      },
      {
        "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
      }
    ]
  }
},
{
  "Name": "Install Spark FGAC in Core Node",
  "ScriptBootstrapAction": {
    "Path": "s3://elasticmapreduce/bootstrap-actions/run-if",
    "Args": [
      {
        "Fn::Sub": "instance.isMaster=false"
      },
      {
        "Fn::Sub": "wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
      }
    ]
  }
}
]

Create EMR cluster

Sample EMR template

  • To create an EMR cluster, please utilize the CloudFormation templates provided below. Customization of these templates is permitted; however, it is recommended to maintain the same common variables from the previous setup steps.
EMR Template: Spark_OLAC, Hive, Trino (for EMR versions 6.11.0 )
JSON
{
  "Parameters":{
    "CLUSTERNAME":{
      "Description":"Name of the emr cluster",
      "Type":"String",
      "Default":"PCloud-EMR-Spark_OLAC-Hive-Trino"
    },
    "EMRRegion":{
      "Description":"aws region name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "EMRVersion":{
      "Description":"Emr version",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "MasterSecurityGroup":{
      "Description":"Emr master/edge node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "SlaveSecurityGroup":{
      "Description":"Emr worker/slave node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "ServiceAccessSecurityGroup":{
      "Description":"Emr service access security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "Ec2SubnetId":{
      "Description":"Ec2 subnet id",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "HiveMetaStoreS3Path":{
      "Description":"Hive metastore s3 path",
      "Type":"String",
      "Default":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/hive_warehouse"
    },
    "Ec2KeyName":{
      "Description":"Ec2 keypair name",
      "Type":"String",
      "Default":"<EC2-SSH-KEY_PAIR-NAME>"
    },
    "Market":{
      "Description":"Ec2 Instance market type",
      "Type":"String",
      "Default":"ON_DEMAND"
    },
    "KDCName":{
      "Description":"KDC Name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "KdcAdminPassword":{
      "Description":"KDC admin user password",
      "Type":"String",
      "Default":"<KDC-USER-PASSWORD>"
    },
    "CrossRealmTrustPrincipalPassword":{
      "Description":"KDC Cross Realm Trust Principal password",
      "Type":"String",
      "Default":"<KDC-PRINCIPAL-PASSWORD>"
    },
    "PrivaceraDownloadUrl":{
      "Description":"Privacera Base Download Url",
      "Type":"String",
      "Default":"https://privaceracloud.com/api/public/get/emr_script/<PLEASE UPDATED API-KEY>"
    }
  },
  "Resources":{
    "EMRCLUSTER":{
      "Type":"AWS::EMR::Cluster",
      "Properties":{
        "Name":{
          "Ref":"CLUSTERNAME"
        },
        "KerberosAttributes":{
          "Realm":"EC2.INTERNAL",
          "KdcAdminPassword":{
            "Ref":"KdcAdminPassword"
          },
          "CrossRealmTrustPrincipalPassword":{
            "Ref":"CrossRealmTrustPrincipalPassword"
          }
        },
        "SecurityConfiguration":{
          "Ref":"KDCName"
        },
        "VisibleToAllUsers":true,
        "EbsRootVolumeSize":15,
        "Instances":{
          "MasterInstanceGroup":{
            "InstanceCount":1,
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Master Instance Group"
          },
          "CoreInstanceGroup":{
            "InstanceCount":"<PLEASE_UPDATE>",
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Core Instance Group"
          },
          "Ec2KeyName":{
            "Ref":"Ec2KeyName"
          },
          "EmrManagedSlaveSecurityGroup":{
            "Fn::Sub":"${SlaveSecurityGroup}"
          },
          "EmrManagedMasterSecurityGroup":{
            "Fn::Sub":"${MasterSecurityGroup}"
          },
          "ServiceAccessSecurityGroup":{
            "Fn::Sub":"${ServiceAccessSecurityGroup}"
          },
          "Ec2SubnetId":{
            "Fn::Sub":"${Ec2SubnetId}"
          },
          "TerminationProtected":true
        },
        "BootstrapActions":[
          {
            "Name":"Install Privacera Plugins on Master Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=true"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          },
          {
            "Name":"Install Spark OLAC in Core Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=false"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          }
        ],
        "Applications":[
          {
            "Name":"Hive"
          },
          {
            "Name":"Spark"
          },
          {
            "Name":"Trino"
          },
          {
            "Name":"Zeppelin"
          },
          {
            "Name":"Livy"
          },
          {
            "Name":"Hue"
          }
        ],
        "Configurations":[
          {
            "Classification":"spark",
            "ConfigurationProperties":{
              "maximizeResourceAllocation":"true"
            },
            "Configurations":[

            ]
          },
          {
            "Classification":"spark-hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.schema.verification":"false",
              "hive.server2.enable.doAs":"false",
              "parquet.column.index.access":"true",
              "fs.s3a.impl":"com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"trino-config",
            "ConfigurationProperties":{
              "internal-communication.shared-secret":"<PLEASE_UPDATE>"
            }
          },
          {
            "Classification":"trino-connector-hive",
            "ConfigurationProperties":{
              "connector.name":"hive",
              "hive.metastore":"glue",
              "hive.config.resources":"/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
              "hive.s3-file-system-type":"EMRFS",
              "hive.hdfs.impersonation.enabled":"false"
            }
          },
          {
            "Classification":"livy-conf",
            "ConfigurationProperties":{
              "livy.impersonation.enabled":"true"
            }
          },
          {
            "Classification":"core-site",
            "ConfigurationProperties":{
              "hadoop.proxyuser.livy.groups":"*",
              "hadoop.proxyuser.livy.hosts":"*"
            }
          }
        ],
        "LogUri":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/emr_logs/",
        "JobFlowRole":"<PLEASE_UPDATE NODE_ROLE_NAME>",
        "ServiceRole":"EMR_DefaultRole",
        "ReleaseLabel":{
          "Fn::Sub":"${EMRVersion}"
        }
      }
    }
  }
}
EMR template: Spark_OLAC, Hive, Trino (for EMR versions 6.4.0 and above)
JSON
{
  "Parameters":{
    "CLUSTERNAME":{
      "Description":"Name of the emr cluster",
      "Type":"String",
      "Default":"PCloud-EMR-Spark_OLAC-Hive-Trino"
    },
    "EMRRegion":{
      "Description":"aws region name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "EMRVersion":{
      "Description":"Emr version",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "MasterSecurityGroup":{
      "Description":"Emr master/edge node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "SlaveSecurityGroup":{
      "Description":"Emr worker/slave node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "ServiceAccessSecurityGroup":{
      "Description":"Emr service access security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "Ec2SubnetId":{
      "Description":"Ec2 subnet id",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "HiveMetaStoreS3Path":{
      "Description":"Hive metastore s3 path",
      "Type":"String",
      "Default":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/hive_warehouse"
    },
    "Ec2KeyName":{
      "Description":"Ec2 keypair name",
      "Type":"String",
      "Default":"<EC2-SSH-KEY_PAIR-NAME>"
    },
    "Market":{
      "Description":"Ec2 Instance market type",
      "Type":"String",
      "Default":"ON_DEMAND"
    },
    "KDCName":{
      "Description":"KDC Name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "KdcAdminPassword":{
      "Description":"KDC admin user password",
      "Type":"String",
      "Default":"<KDC-USER-PASSWORD>"
    },
    "CrossRealmTrustPrincipalPassword":{
      "Description":"KDC Cross Realm Trust Principal password",
      "Type":"String",
      "Default":"<KDC-PRINCIPAL-PASSWORD>"
    },
    "PrivaceraDownloadUrl":{
      "Description":"Privacera Base Download Url",
      "Type":"String",
      "Default":"https://privaceracloud.com/api/public/get/emr_script/<PLEASE UPDATED API-KEY>"
    }
  },
  "Resources":{
    "EMRCLUSTER":{
      "Type":"AWS::EMR::Cluster",
      "Properties":{
        "Name":{
          "Ref":"CLUSTERNAME"
        },
        "KerberosAttributes":{
          "Realm":"EC2.INTERNAL",
          "KdcAdminPassword":{
            "Ref":"KdcAdminPassword"
          },
          "CrossRealmTrustPrincipalPassword":{
            "Ref":"CrossRealmTrustPrincipalPassword"
          }
        },
        "SecurityConfiguration":{
          "Ref":"KDCName"
        },
        "VisibleToAllUsers":true,
        "EbsRootVolumeSize":15,
        "Instances":{
          "MasterInstanceGroup":{
            "InstanceCount":1,
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Master Instance Group"
          },
          "CoreInstanceGroup":{
            "InstanceCount":"<PLEASE_UPDATE>",
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Core Instance Group"
          },
          "Ec2KeyName":{
            "Ref":"Ec2KeyName"
          },
          "EmrManagedSlaveSecurityGroup":{
            "Fn::Sub":"${SlaveSecurityGroup}"
          },
          "EmrManagedMasterSecurityGroup":{
            "Fn::Sub":"${MasterSecurityGroup}"
          },
          "ServiceAccessSecurityGroup":{
            "Fn::Sub":"${ServiceAccessSecurityGroup}"
          },
          "Ec2SubnetId":{
            "Fn::Sub":"${Ec2SubnetId}"
          },
          "TerminationProtected":true
        },
        "BootstrapActions":[
          {
            "Name":"Install Privacera Plugins on Master Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=true"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          },
          {
            "Name":"Install Spark OLAC in Core Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=false"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          }
        ],
        "Applications":[
          {
            "Name":"Hive"
          },
          {
            "Name":"Spark"
          },
          {
            "Name":"Trino"
          },
          {
            "Name":"Zeppelin"
          },
          {
            "Name":"Livy"
          },
          {
            "Name":"Hue"
          }
        ],
        "Configurations":[
          {
            "Classification":"spark",
            "ConfigurationProperties":{
              "maximizeResourceAllocation":"true"
            },
            "Configurations":[

            ]
          },
          {
            "Classification":"spark-hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.schema.verification":"false",
              "hive.server2.enable.doAs":"false",
              "parquet.column.index.access":"true",
              "fs.s3a.impl":"com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"trino-connector-hive",
            "ConfigurationProperties":{
              "hive.metastore":"glue",
              "hive.allow-drop-table":"true",
              "hive.allow-add-column":"true",
              "hive.allow-rename-column":"true",
              "connector.name":"hive-hadoop2",
              "hive.config.resources":"/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
              "hive.s3-file-system-type":"EMRFS",
              "hive.hdfs.impersonation.enabled":"false",
              "hive.allow-drop-column":"true",
              "hive.allow-rename-table":"true"
            }
          },
          {
            "Classification":"livy-conf",
            "ConfigurationProperties":{
              "livy.impersonation.enabled":"true"
            }
          },
          {
            "Classification":"core-site",
            "ConfigurationProperties":{
              "hadoop.proxyuser.livy.groups":"*",
              "hadoop.proxyuser.livy.hosts":"*"
            }
          }
        ],
        "LogUri":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/emr_logs/",
        "JobFlowRole":"<PLEASE_UPDATE NODE_ROLE_NAME>",
        "ServiceRole":"EMR_DefaultRole",
        "ReleaseLabel":{
          "Fn::Sub":"${EMRVersion}"
        }
      }
    }
  }
}
EMR Template for Multiple Master Node: Spark_OLAC, Hive, Trino (for EMR version 6.4.0 and above)
JSON
{
  "Parameters":{
    "CLUSTERNAME":{
      "Description":"Name of the emr cluster",
      "Type":"String",
      "Default":"PCloud-Multiple-Master-EMR-Spark_OLAC-Hive-Trino"
    },
    "EMRRegion":{
      "Description":"aws region name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "EMRVersion":{
      "Description":"Emr version",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "MasterSecurityGroup":{
      "Description":"Emr master/edge node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "SlaveSecurityGroup":{
      "Description":"Emr worker/slave node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "ServiceAccessSecurityGroup":{
      "Description":"Emr service access security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "Ec2SubnetId":{
      "Description":"Ec2 subnet id",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "HiveMetaStoreS3Path":{
      "Description":"Hive metastore s3 path",
      "Type":"String",
      "Default":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/hive_warehouse"
    },
    "Ec2KeyName":{
      "Description":"Ec2 keypair name",
      "Type":"String",
      "Default":"<EC2-SSH-KEY_PAIR-NAME>"
    },
    "Market":{
      "Description":"Ec2 Instance market type",
      "Type":"String",
      "Default":"ON_DEMAND"
    },
    "KDCName":{
      "Description":"KDC Name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "KdcAdminPassword":{
      "Description":"KDC admin user password",
      "Type":"String",
      "Default":"<PLEASE_UPDATE EXTERNAL-KDC-USER-PASSWORD>"
    },
    "PrivaceraDownloadUrl":{
      "Description":"Privacera Base Download Url",
      "Type":"String",
      "Default":"https://privaceracloud.com/api/public/get/emr_script/<PLEASE UPDATED API-KEY>"
    }
  },
  "Resources":{
    "EMRCLUSTER":{
      "Type":"AWS::EMR::Cluster",
      "Properties":{
        "Name":{
          "Ref":"CLUSTERNAME"
        },
        "KerberosAttributes":{
          "Realm":"<PLEASE_UPDATE EXTERNAL-KDC-REALM>",
          "KdcAdminPassword":{
            "Ref":"KdcAdminPassword"
          }
        },
        "SecurityConfiguration":{
          "Ref":"KDCName"
        },
        "VisibleToAllUsers":true,
        "EbsRootVolumeSize":15,
        "Instances":{
          "MasterInstanceGroup":{
            "InstanceCount":3,
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Master Instance Group"
          },
          "CoreInstanceGroup":{
            "InstanceCount":"<PLEASE_UPDATE>",
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Core Instance Group"
          },
          "Ec2KeyName":{
            "Ref":"Ec2KeyName"
          },
          "EmrManagedSlaveSecurityGroup":{
            "Fn::Sub":"${SlaveSecurityGroup}"
          },
          "EmrManagedMasterSecurityGroup":{
            "Fn::Sub":"${MasterSecurityGroup}"
          },
          "ServiceAccessSecurityGroup":{
            "Fn::Sub":"${ServiceAccessSecurityGroup}"
          },
          "Ec2SubnetId":{
            "Fn::Sub":"${Ec2SubnetId}"
          },
          "TerminationProtected":true
        },
        "BootstrapActions":[
          {
            "Name":"Install Privacera Plugins on Master Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=true"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          },
          {
            "Name":"Install Spark OLAC in Core Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=false"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          }
        ],
        "Applications":[
          {
            "Name":"Hive"
          },
          {
            "Name":"Spark"
          },
          {
            "Name":"Trino"
          },
          {
            "Name":"Zeppelin"
          },
          {
            "Name":"Livy"
          },
          {
            "Name":"Hue"
          },
          {
            "Name":"Oozie"
          }
        ],
        "Configurations":[
          {
            "Classification":"spark",
            "ConfigurationProperties":{
              "maximizeResourceAllocation":"true"
            },
            "Configurations":[

            ]
          },
          {
            "Classification":"spark-hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.schema.verification":"false",
              "hive.server2.enable.doAs":"false",
              "parquet.column.index.access":"true",
              "fs.s3a.impl":"com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"trino-connector-hive",
            "ConfigurationProperties":{
              "hive.metastore":"glue",
              "hive.allow-drop-table":"true",
              "hive.allow-add-column":"true",
              "hive.allow-rename-column":"true",
              "connector.name":"hive-hadoop2",
              "hive.config.resources":"/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
              "hive.s3-file-system-type":"EMRFS",
              "hive.hdfs.impersonation.enabled":"false",
              "hive.allow-drop-column":"true",
              "hive.allow-rename-table":"true"
            }
          },
          {
            "Classification":"livy-conf",
            "ConfigurationProperties":{
              "livy.impersonation.enabled":"true"
            }
          },
          {
            "Classification":"core-site",
            "ConfigurationProperties":{
              "hadoop.security.credstore.java-keystore-provider.password-file":"",
              "hadoop.proxyuser.livy.groups":"*",
              "hadoop.proxyuser.livy.hosts":"*"
            }
          },
          {
            "Classification":"oozie-site",
            "ConfigurationProperties":{
              "oozie.service.JPAService.jdbc.driver":"org.mariadb.jdbc.Driver",
              "oozie.service.JPAService.jdbc.url":"jdbc:mysql://<PLEASE_UPDATE EXTERNAL_DB_HOST>:<PLEASE_UPDATE EXTERNAL_DB_PORT>/<PLEASE_UPDATE EXTERNAL_DB_NAME>?createDatabaseIfNotExist=true",
              "oozie.service.JPAService.jdbc.username":"<PLEASE_UPDATE EXTERNAL_JDBC_USER>",
              "oozie.service.JPAService.jdbc.password":"<PLEASE_UPDATE EXTERNAL_JDBC_PASSWORD>"
            },
            "Configurations":[

            ]
          },
          {
            "Classification":"hue-ini",
            "ConfigurationProperties":{

            },
            "Configurations":[
              {
                "Classification":"desktop",
                "ConfigurationProperties":{

                },
                "Configurations":[
                  {
                    "Classification":"database",
                    "ConfigurationProperties":{
                      "engine":"mysql",
                      "host":"<PLEASE_UPDATE EXTERNAL_DB_HOST>",
                      "port":"<PLEASE_UPDATE EXTERNAL_DB_HOST>",
                      "user":"<PLEASE_UPDATE EXTERNAL_DB_USER>",
                      "password":"<PLEASE_UPDATE EXTERNAL_DB_PASSWORD>",
                      "name":"<PLEASE_UPDATE EXTERNAL_DB_NAME>"
                    },
                    "Configurations":[

                    ]
                  }
                ]
              }
            ]
          }
        ],
        "LogUri":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/emr_logs/",
        "JobFlowRole":"<PLEASE_UPDATE NODE_ROLE_NAME>",
        "ServiceRole":"EMR_DefaultRole",
        "ReleaseLabel":{
          "Fn::Sub":"${EMRVersion}"
        }
      }
    }
  }
}
EMR template: Spark_OLAC, Hive, PrestoSQL (for EMR versions 6.x to 6.3.1)
JSON
{
  "Parameters":{
    "CLUSTERNAME":{
      "Description":"Name of the emr cluster",
      "Type":"String",
      "Default":"PCloud-EMR-Spark_OLAC-Hive-PrestoSQL"
    },
    "EMRRegion":{
      "Description":"aws region name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "EMRVersion":{
      "Description":"Emr version",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "MasterSecurityGroup":{
      "Description":"Emr master/edge node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "SlaveSecurityGroup":{
      "Description":"Emr worker/slave node security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "ServiceAccessSecurityGroup":{
      "Description":"Emr service access security group",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "Ec2SubnetId":{
      "Description":"Ec2 subnet id",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "HiveMetaStoreS3Path":{
      "Description":"Hive metastore s3 path",
      "Type":"String",
      "Default":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/hive_warehouse"
    },
    "Ec2KeyName":{
      "Description":"Ec2 keypair name",
      "Type":"String",
      "Default":"<EC2-SSH-KEY_PAIR-NAME>"
    },
    "Market":{
      "Description":"Ec2 Instance market type",
      "Type":"String",
      "Default":"ON_DEMAND"
    },
    "KDCName":{
      "Description":"KDC Name",
      "Type":"String",
      "Default":"<PLEASE_UPDATE>"
    },
    "KdcAdminPassword":{
      "Description":"KDC admin user password",
      "Type":"String",
      "Default":"<KDC-USER-PASSWORD>"
    },
    "CrossRealmTrustPrincipalPassword":{
      "Description":"KDC Cross Realm Trust Principal password",
      "Type":"String",
      "Default":"<KDC-PRINCIPAL-PASSWORD>"
    },
    "PrivaceraDownloadUrl":{
      "Description":"Privacera Base Download Url",
      "Type":"String",
      "Default":"https://privaceracloud.com/api/public/get/emr_script/<PLEASE UPDATED API-KEY>"
    }
  },
  "Resources":{
    "EMRCLUSTER":{
      "Type":"AWS::EMR::Cluster",
      "Properties":{
        "Name":{
          "Ref":"CLUSTERNAME"
        },
        "KerberosAttributes":{
          "Realm":"EC2.INTERNAL",
          "KdcAdminPassword":{
            "Ref":"KdcAdminPassword"
          },
          "CrossRealmTrustPrincipalPassword":{
            "Ref":"CrossRealmTrustPrincipalPassword"
          }
        },
        "SecurityConfiguration":{
          "Ref":"KDCName"
        },
        "VisibleToAllUsers":true,
        "EbsRootVolumeSize":15,
        "Instances":{
          "MasterInstanceGroup":{
            "InstanceCount":1,
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Master Instance Group"
          },
          "CoreInstanceGroup":{
            "InstanceCount":"<PLEASE_UPDATE>",
            "InstanceType":"m5.xlarge",
            "Market":{
              "Fn::Sub":"${Market}"
            },
            "Name":"Core Instance Group"
          },
          "Ec2KeyName":{
            "Ref":"Ec2KeyName"
          },
          "EmrManagedSlaveSecurityGroup":{
            "Fn::Sub":"${SlaveSecurityGroup}"
          },
          "EmrManagedMasterSecurityGroup":{
            "Fn::Sub":"${MasterSecurityGroup}"
          },
          "ServiceAccessSecurityGroup":{
            "Fn::Sub":"${ServiceAccessSecurityGroup}"
          },
          "Ec2SubnetId":{
            "Fn::Sub":"${Ec2SubnetId}"
          },
          "TerminationProtected":true
        },
        "BootstrapActions":[
          {
            "Name":"Install Privacera Plugins on Master Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=true"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          },
          {
            "Name":"Install Spark OLAC in Core Node",
            "ScriptBootstrapAction":{
              "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
              "Args":[
                {
                  "Fn::Sub":"instance.isMaster=false"
                },
                {
                  "Fn::Sub":"wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-olac"
                }
              ]
            }
          }
        ],
        "Applications":[
          {
            "Name":"Hive"
          },
          {
            "Name":"Spark"
          },
          {
            "Name":"PrestoSQL"
          },
          {
            "Name":"Zeppelin"
          },
          {
            "Name":"Livy"
          },
          {
            "Name":"Hue"
          }
        ],
        "Configurations":[
          {
            "Classification":"spark",
            "ConfigurationProperties":{
              "maximizeResourceAllocation":"true"
            },
            "Configurations":[

            ]
          },
          {
            "Classification":"spark-hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"hive-site",
            "ConfigurationProperties":{
              "hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
              "hive.metastore.schema.verification":"false",
              "hive.server2.enable.doAs":"false",
              "parquet.column.index.access":"true",
              "fs.s3a.impl":"com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
              "hive.metastore.warehouse.dir":{
                "Ref":"HiveMetaStoreS3Path"
              }
            }
          },
          {
            "Classification":"prestosql-connector-hive",
            "ConfigurationProperties":{
              "hive.metastore":"glue",
              "hive.allow-drop-table":"true",
              "hive.allow-add-column":"true",
              "hive.allow-rename-column":"true",
              "connector.name":"hive-hadoop2",
              "hive.config.resources":"/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
              "hive.s3-file-system-type":"EMRFS",
              "hive.hdfs.impersonation.enabled":"false",
              "hive.allow-drop-column":"true",
              "hive.allow-rename-table":"true"
            }
          },
          {
            "Classification":"livy-conf",
            "ConfigurationProperties":{
              "livy.impersonation.enabled":"true"
            }
          },
          {
            "Classification":"core-site",
            "ConfigurationProperties":{
              "hadoop.proxyuser.livy.groups":"*",
              "hadoop.proxyuser.livy.hosts":"*"
            }
          }
        ],
        "LogUri":"s3://<PLEASE_UPDATE BUCKET_NAME>/emr/emr_logs/",
        "JobFlowRole":"<PLEASE_UPDATE NODE_ROLE_NAME>",
        "ServiceRole":"EMR_DefaultRole",
        "ReleaseLabel":{
          "Fn::Sub":"${EMRVersion}"
        }
      }
    }
  }
}

To create EMR cluster any one of the following approach can be followed:

Using AWS CLI

  1. Run the following command to create the EMR cluster:
    Bash
    aws cloudformation create-stack --stack-name privacera-emr-creation --template-body file://<emr_template_json_file> --region <aws_region>
    

Using AWS Console

  1. Navigate to the AWS CloudFormation console.
  2. Click Create stack.
  3. Select Upload a template file.
  4. Click Choose file and select the JSON template file.
  5. Click Next.
  6. Enter the stack name and click Next.
  7. Click Next again.
  8. Finally, click Create stack.

Comments