Setup for Access Management for EMR Cluster¶
This section outlines the steps to set up the EMR Cluster with the Privacera Plugin. Please ensure that all prerequisites are completed before beginning the setup process.
Perform following steps to configure EMR connector:
-
SSH to the instance where Privacera is installed.
-
Run the following command to navigate to the /config directory.
Bash -
Run the following command to copy the sample vars:
Bash -
Run the following command to open the .yml file to be edited.
Bash -
Modify the following properties:
Variable | Definition |
---|---|
EMR_CLUSTER_NAME | Enter unique name for the EMR cluster. |
EMR_VERSION | Enter the emr version (ex: emr-7.1.0) |
Security group config | |
EMR_MASTER_SG_ID | Set Security Group ID for EMR Master Node Group. |
EMR_SLAVE_SG_ID | Set Security Group ID for EMR Slave Node Group. |
EMR_SERVICE_ACCESS_SG_ID | Set Security Group ID for EMR Service Access |
Instance config | |
EMR_SUBNET_ID | Subnet ID for the instance. |
EMR_KEYPAIR | Existing EC2 key pair to SSH into the master node. |
EMR_EC2_MARKET_TYPE | EC2 instance market type. Supported: SPOT, ON_DEMAND |
EMR_EC2_INSTANCE_TYPE | EC2 instance type. ex: m5.xlarge |
EMR_MASTER_NODE_COUNT | Number of master node instances in the cluster. |
EMR_CORE_NODE_COUNT | Number of core node instances in the cluster. |
Security config | |
EMR_SECURITY_CONFIG | Name of the Security Configurations created for EMR. |
EMR_KERBEROS_ENABLE | Enable kerberos, this should be set to 'true'. |
EMR_KDC_ADMIN_PASSWORD | Cluster KDC admin password. |
EMR_CROSS_REALM_PASSWORD | Cross realm principle password. |
EMR_KERB_REALM | Specifies the Kerberos realm name. |
EMR_KERB_DOMAIN | Specifies the domain name of the other realm. |
EMR_KERB_ADMIN_SERVER | Specifies the FQDN or IP address of the admin server. |
EMR_KERB_KDC_SERVER | Specifies the FQDN or IP address of the KDC server. |
AWS Account & IAM config | |
EMR_AWS_ACCT_ID | AWS Account ID where EMR Cluster will be created. |
EMR_DEFAULT_ROLE | Role attached to EMR Cluster for performing cluster related activities. |
EMR_ROLE_FOR_CLUSTER_NODES | IAM Role which will be attached to each node in the EMR Cluster. |
EMR_ROLE_FOR_APPS | IAM Role name which will be used by all EMR Apps. |
Spark OLAC config | |
EMR_APP_SPARK_OLAC_ENABLE | Set to enable Object-Level Access Controle (OLAC) for EMR Spark. |
Trino config | |
EMR_APP_PRESTO_SQL_ENABLE | Set to enable trino plugin for EMR Trino. |
EMR_APP_PRESTO_DB_ENABLE | Set to enable prestodb plugin for EMR PrestoDB. |
Hive config | |
EMR_HIVE_METASTORE_PATH | Hive Metastore path. |
Other config | |
EMR_LOGS_PATH | S3 location for storing EMR cluster logs. |
-
Once the properties are configured, run the following commands to update your Privacera Manager platform instance:
-
Once the update is complete, all cloud-formation JSON template files will be available at the following path:
Bash
-
In PrivaceraCloud, go to Settings -> Applications.
-
On the Applications screen, select EMR.
-
Enter the application Name and Description. Click Save. Name could be any name of your choice. E.g.
AWS EMR Connector for account 123456
. -
Open the EMR application.
-
Enable the Access Management option with toggle button.
Configure shared-secret¶
Note: This step is required only for EMR Spark OLAC.
- In PrivaceraCloud, go to Settings -> Applications.
- For more information about how to connect S3 application, see Connect S3 to PrivaceraCloud.
- On the Applications screen, select S3.
- On the screen click the edit icon and navigate to 'ADVANCED' tab.
- Add the following property:
Properties - Click Save.
Privacera Plugin Script¶
- In PrivaceraCloud, go to Settings -> Applications.
- On the Applications screen, select EMR.
- From the screen, either copy the download url or download the script.
- If you have downloaded the script, then follow the below steps:
- Upload to specific s3 bucket location
- Get the object url of the uploaded script file from s3
Setup IAM Roles¶
The following two IAM roles must be created prior to launching the cluster. These roles can be established efficiently with minimal permissions by utilizing the IAM roles template provided below.
- Node role
- Application data access role
AWS IAM roles are required to access the AWS resources. The following roles are required to access the AWS resources: In the template Node role referred as 'EmrPrivaceraNodeRole' and Application data access role referred as 'EmrPrivaceraDataAcessRole'
privacera-emr-iam-role-template
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|
To create role, use the following AWS CLI CloudFormation command:
Bash | |
---|---|
Setup EMR Security Configuration¶
A new security configuration must be established to integrate the Kerberos server with the EMR cluster.
Security Configuration can be created in two ways:
In-transit encryption is required
The generation of certificates to enable in-transit encryption is essential. This step is mandated by AWS guidelines, as in-transit encryption serves as a prerequisite for enabling Kerberos for Trino. For more information, click here
Using AWS CLI Cloud Formation Command¶
If required, you can modify the below template emr-security-config-template.json based on your requirements
emr-security-config-template.json
To create security configuration, use the following AWS CLI CloudFormation command:
Bash | |
---|---|
Using AWS EMR Console¶
Please follow the steps outlined below to create a security configuration using the AWS EMR console:
- Login to the AWS EMR console.
- In the left navigation pane, select Security Configuration and then click on Create New Security Configuration.
- Enter a name for the security configuration, such as
emr_sec_config
. - Navigate to the Authentication section, check the box for Enable Kerberos authentication, and provide the Kerberos environment details as follows:
- Provider: Cluster-dedicated KDC
- Ticket Lifetime (hours): 24
- Check the
Turn on Cross-realm trust
box and enter the following details:- Realm:
EXAMPLE.COM
- Domain:
example.com
- Admin Server:
server.admin.com
- KDC Server:
server.example.com
- Realm:
- Select the option to Use IAM roles for EMRFS requests to Amazon S3.
- IAM Role: select the App data access role created in AWS IAM roles using CloudFormation setup.
- Under Basis for Access, select the identifier type
User
from the list and enter the corresponding identifiers:hadoop, hive, presto, and trino
.
EMR Bootstrap action¶
Add the below bootstrap action to seup Privacera plugins in EMR cluster:
Hive 'doAs' should be disabled
It is recommended to disable the Hive Impersonation for Privacera Ranger Authorization. By default Privacera Plugin will override the property hive.server2.enable.doAs in /etc/hive/conf/hive-site.xml and set it to false. When hive.server2.enable.doAs=true, HiveServer2 performs the query processing as the user who submitted the query (usually the user you kinit with). However, if the parameter is set to false, the query will run as the user that the hiveserver2 process runs as, which is typically hive.
Spark OLAC, Hive, Trino¶
privacera-emr-bootstrap-action-spark_olac-hive-trino
Spark FGAC, Hive, Trino¶
privacera-emr-bootstrap-action-spark_fgac-hive-trino
Create EMR cluster¶
Sample EMR template¶
- To create an EMR cluster, please utilize the CloudFormation templates provided below. Customization of these templates is permitted; however, it is recommended to maintain the same common variables from the previous setup steps.
EMR Template: Spark_OLAC, Hive, Trino (for EMR versions 6.11.0 )
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
EMR template: Spark_OLAC, Hive, Trino (for EMR versions 6.4.0 and above)
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
EMR Template for Multiple Master Node: Spark_OLAC, Hive, Trino (for EMR version 6.4.0 and above)
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
|
EMR template: Spark_OLAC, Hive, PrestoSQL (for EMR versions 6.x to 6.3.1)
JSON | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
To create EMR cluster any one of the following approach can be followed:
Using AWS CLI¶
- Run the following command to create the EMR cluster:
Bash
Using AWS Console¶
- Navigate to the AWS CloudFormation console.
- Click Create stack.
- Select Upload a template file.
- Click Choose file and select the JSON template file.
- Click Next.
- Enter the stack name and click Next.
- Click Next again.
- Finally, click Create stack.
- Prev topic: Prerequistes
- Next topic: Advanced Configuration