Skip to main content

Privacera Documentation

Table of Contents

Connect Elastic MapReduce from Amazon application to PrivaceraCloud

This topic describes how to connect an EMR application to PrivaceraCloud for access control.

Connect EMR application

  1. Go to Settings > Applications.

  2. In the Applications screen, select EMR.

  3. Enter the application Name and Description, and then click Save.

  4. Click the toggle button to enable Access Management for your application.

EMR Spark access control types

EMR Spark supports two types of access control: Fine-Grained Access Control (FGAC) and Object Level Access Control (OLAC). Only one of them can be added during configuration.

EMR Spark OLAC

The advantages of EMR Spark OLAC:

  • It allows you to access existing AWS S3 resource location that you are trying to access with Spark.

  • It uses privacera_s3 service for resource-based access control and privacera_tag service for tag-based access control.

  • It uses the signed-authorization implementation from Privacera.

EMR Spark FGAC

When FGAC is installed and enabled, each data user query is parsed by Spark and authenticated by the PrivaceraCloud Spark Plug-In. All resources referred to by the query must be accessible to the requesting user via authentication.

  • It supports database, table, and column policies, in addition to row filtering and column masking.

  • It uses the privacera_hive, privacera_s3, privacera_adls, and privacera_files services for resource-based access control, and the privacera_tag service for tag-based access control.

  • It uses the plugin implementation from Privacera.

PrivaceraCloud configuration

Obtain shared key

Obtain or determine a character string to serve as a "shared key" between PrivaceraCloud and the AWS EMR cluster. We'll refer to this as {SHARED_KEY} in the configuration steps below.

  1. In PrivaceraCloud portal, go to Settings > Applications.

  2. Select the existing AWS DataServer S3 application, and then click the edit (pen) icon.

  3. In the the ADVANCED tab, add the following property:

    Substitute actual value for {SHARED KEY}.

    dataserver.shared.secret={SHARED_KEY}
  4. Click Save.

Obtain EMR script download URL

Obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup from PrivaceraCloud:

  1. In PrivaceraCloud portal, go to Settings > Applications .

  2. Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.

  3. Click the API Key Info (i) button.

  4. On the API Key Info page, click the COPY URL button in front of the AWS EMR Setup Script to store the emr-script-download-url.

AWS IAM roles using CloudFormation setup

  1. The following two IAM roles need to be created before launching the cluster. These can be created easily with minimal permission using the Sample CloudFormation Template Roles:.

    • Node role: EmrPrivaceraNodeRole

    • App data access role: EmrPrivaceraDataAcessRole

    If required, you can modify the template based on your requirements.

  2. To create role, use the following AWS CLI CloudFormation command:

    aws --region <AWS-REGION> cloudformation create-stack --stack-name 
    privacera-emr-role-creation --template-body 
    file://emr-roles-creation-template.json --capabilities CAPABILITY_NAMED_IAM

    For more information about how to create a stack using a CloudFormation template, see Create CloudFormation stack.

  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Create roles and policies for use by Privacera-Protected EMR Clusters",
  "Resources": {
    "EmrRestrictedRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "RoleName": {
          "Fn::Join": [
            "",
            [
              "EmrPrivaceraNodeRole"
            ]
          ]
        },
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "ec2.amazonaws.com"
                ]
              },
              "Action": [
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "Path": "/"
      }
    },
    "EmrRestrictedPolicy": {
      "Type": "AWS::IAM::ManagedPolicy",
      "Properties": {
        "ManagedPolicyName": {
          "Fn::Join": [
            "",
            [
              "EMRPrivaceraNodePolicy"
            ]
          ]
        },
        "PolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "EmrServiceLimited",
              "Effect": "Allow",
              "Action": [
                "glue:CreateDatabase",
                "glue:UpdateDatabase",
                "glue:DeleteDatabase",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetTableVersions",
                "glue:CreatePartition",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition",
                "glue:CreateUserDefinedFunction",
                "glue:UpdateUserDefinedFunction",
                "glue:DeleteUserDefinedFunction",
                "glue:GetUserDefinedFunction",
                "glue:GetUserDefinedFunctions",
                "ec2:Describe*",
                "elasticmapreduce:Describe*",
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ListClusters",
                "elasticmapreduce:ListInstanceGroups",
                "elasticmapreduce:ListInstances",
                "elasticmapreduce:ListSteps"
              ],
              "Resource": "*"
            },
            {
              "Sid": "EmrS3Limited",
              "Effect": "Allow",
              "Action": "s3:*",
              "Resource": [
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::infraqa-test/user/suraj/dev/emr/fgac/privacera_cust_conf.zip"
              ]
            },
            {
              "Sid": "EmrAssumeIAM",
              "Effect": "Allow",
              "Action": "sts:AssumeRole",
              "Resource": [
                "arn:aws:iam::587946681758:role/infraQA_app_data_access_role"
              ]
            }
          ]
        },
        "Roles": [
          {
            "Ref": "EmrRestrictedRole"
          }
        ]
      }
    },
   "EmrRoleForApps": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "RoleName": {
          "Fn::Join": [
            "",
            [
              "EmrPrivaceraDataAcessRole"
            ]
          ]
        },
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "ec2.amazonaws.com"
                ]
              },
              "Action": [
                "sts:AssumeRole"
              ]
            },
            {
              "Effect": "Allow",
              "Principal": {
                "AWS": {
                  "Fn::GetAtt": [
                    "EmrRestrictedRole",
                    "Arn"
                  ]
                }
              },
              "Action": "sts:AssumeRole"
            }
          ]
        },
        "Path": "/"
      }
    },
    "DataAccessPolicy": {
      "Type": "AWS::IAM::ManagedPolicy",
      "Properties": {
        "ManagedPolicyName": {
          "Fn::Join": [
            "",
            [
              "EmrPrivaceraDataAcessPolicy"
            ]
          ]
        },
        "PolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "S3DataAccess",
              "Effect": "Allow",
              "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:DeleteBucket",
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:PutObjectAcl"
              ],
              "Resource": [
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::infraqa-test/user/suraj/dev/emr/fgac/privacera_cust_conf.zip"
              ]
            }
          ]
        },
        "Roles": [
          {
            "Ref": "EmrRoleForApps"
          }
        ]
      }
    },
    "EmrRestrictedRoleProfile":{
      "Type":"AWS::IAM::InstanceProfile",
        "Properties":{
          "InstanceProfileName":{
            "Ref":"EmrRestrictedRole"
          },
          "Roles":[
            {
             "Ref":"EmrRestrictedRole"
            }
          ]
        }
    },
    "EmrRoleForAppsProfile":{
      "Type":"AWS::IAM::InstanceProfile",
        "Properties":{
          "InstanceProfileName":{
            "Ref":"EmrRoleForApps"
          },
          "Roles":[
            {
              "Ref":"EmrRoleForApps"
            }
          ]
        }
    }
  },
  "Outputs": {
    "EMRRestrictedRole": {
      "Value": {
        "Ref": "EmrRestrictedRole"
      }
    },
    "EmrRoleForApps": {
      "Value": {
        "Ref": "EmrRoleForApps"
      }
    }
  }
}

Create a security configuration

You can create a security configuration using:

  • CloudFormation setup (Recommended)

  • AWS EMR console (Manually)

Manually create a security configuration using AWS EMR console

To create a security configuration using the AWS EMR console:

  1. Log into the AWS EMR console.

  2. In the left navigation, select Security Configuration > Create New Security Configuration.

  3. Enter a Name for Security Configuration. For example, emr_sec_config.

  4. Navigate to the Authentication section, check Enable Kerberos authentication, and enter the Kerberos environment details as follows:

    • Provider: Cluster dedicated KDC

    • TicketLifetime: 24 hours

    • Cross-realm trust

      • Realm: EXAMPLE.COM

      • Domain: example.com

      • Admin server: sever.admin.com

      • KDC server: server.example.com

  5. Select Use IAM roles for EMRFS requests to Amazon S3.

    • IAM Role: select the App data access role created in AWS IAM roles using CloudFormation setup.

    • Under Basis for access select an identifier type ( User ) from the list and enter corresponding identifiers (hadoop;hive;presto;trino).

Create EMR cluster

You can create EMR cluster using:

  • CloudFormation setup (Recommended)

    • Using CloudFormation EMR templates

    • Using Cloud formation AWS CLI

    • Using Cloud formation AWS Console

  • AWS EMR console (Manually)

Manually create EMR cluster using AWS EMR console

Follow these steps to manually create AWS EMR cluster using AWS EMR console:

  1. Login to AWS Management Console and navigate to EMR Console.

  2. Click the Create cluster button.

  3. click Go to advanced, and select Release. For example, emr-6.4.0

Configure applications for AWS EMR cluster

Follow these steps to configure individual application for AWS EMR cluster:

  1. Select additional applications as per your environment, such as Spark, Hadoop, Hive, Trino or Presto.

  2. In Edit software settings, select Enter configuration and add the following individual application's configuration array.

{
  "Classification": "spark",
  "ConfigurationProperties": {
    "maximizeResourceAllocation": "true"
  },
  "Configurations": []
},
{
  "Classification": "spark-hive-site",
  "ConfigurationProperties": {
    "hive.metastore.warehouse.dir": "s3://<bucket-name>/<path>"
  }
}
{
  "Classification": "hive-site",
  "ConfigurationProperties": {
    "javax.jdo.option.ConnectionURL": "",
    "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
    "javax.jdo.option.ConnectionUserName": "root",
    "javax.jdo.option.ConnectionPassword": "welcome1",
    "hive.server2.enable.doAs": "false",
    "parquet.column.index.access": "true",
    "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
    "hive.metastore.warehouse.dir": "s3://<bucket-name>/<path>"
  }
}

Note

If the EMR version is 6.4.0 or above, use Trino in place of {application} in the array. Use presto-sql for older versions.

Presto and Trino/Presto-sql are incompatible. Only one at a time should be used.

  • EHM configuration:

    {
      "Classification": "<application>-connector-hive",
      "ConfigurationProperties": {
        "hive.allow-drop-table": "true",
        "hive.allow-add-column": "true",
        "hive.allow-rename-column": "true",
        "connector.name": "hive-hadoop2",
        "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
        "hive.s3-file-system-type": "EMRFS",
        "hive.hdfs.impersonation.enabled": "false",
        "hive.allow-drop-column": "true",
        "hive.allow-rename-table": "true"
      }
    }
  • Glue configuration:

    {
      "Classification": "<application>-connector-hive",
      "ConfigurationProperties": {
        "hive.metastore": "glue",
        "hive.allow-drop-table": "true",
        "hive.allow-add-column": "true",
        "hive.allow-rename-column": "true",
        "connector.name": "hive-hadoop2",
        "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
        "hive.s3-file-system-type": "EMRFS",
        "hive.hdfs.impersonation.enabled": "false",
        "hive.allow-drop-column": "true",
        "hive.allow-rename-table": "true"
      }
    }

Presto and Trino/Presto-sql are incompatible. Only one at a time should be used.

{
  "Classification": "presto-connector-hive",
  "ConfigurationProperties": {
    "hive.metastore": "glue",
    "hive.allow-drop-table": "true",
    "hive.allow-add-column": "true",
    "hive.allow-rename-column": "true",
    "connector.name": "hive-hadoop2",
    "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
    "hive.s3-file-system-type": "EMRFS",
    "hive.hdfs.impersonation.enabled": "false",
    "hive.allow-drop-column": "true",
    "hive.allow-rename-table": "true"
  }
}

Bootstrap actions

  1. In Hardware settings, select Networking, Node, and Instance values as appropriate for your environment.

  2. Go to General cluster settings, configure the cluster name, logging, debugging, and termination protection as needed for your environment.

    Configure the General cluster settings by including two scripts that install the Privacera Signing Agent on both the master and worker nodes.

  3. In Additional Options, expand Bootstrap Actions, select bootstrap action Run if, and then click Configure and add.

  4. In the Bootstrap actions dialog, set the name of the master and core node to Privacera Signing Agent.

  5. Copy the following script into Optional Arguments using your own emr-script-download-url script URL. See Obtain EMR script download URL.

  6. Click Add when finished.

  1. Optional Arguments for Privacera installation script:

    Master node

    instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"

    Core node

    instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
  2. Optional Arguments for Privacera installation script with delta.

    Export the following two additional variables in the Bootstrap actions to enable Delta Lake:

    SPARK_DELTA_LAKE_ENABLE
    SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL

    Master node

    instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"

    Core node

    instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"

    Note

    Ensure the following:

    • The Delta Lake core jar is dependent on the Spark version. You must choose the correct version for your EMR.

    • Obtain the appropriate download URL for the Delta Lake core jar and update.

    • Delta Lake and Spark version compatibility. For more information about Delta Lake releases, see Compatibility with Apache Spark For example, to download the delta-core version 2.12.1.0.1, go to the following URL: repo1.maven.org/delta/delta-core 2.12-1.0.1.jar.

  1. Optional Arguments for Privacera installation script:

    Master node

    instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"

    Core node

    instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
  2. Optional Arguments for Privacera installation script with delta.

    Export the following two additional variables in the Bootstrap actions to enable Delta Lake:

    SPARK_DELTA_LAKE_ENABLE
    SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL

    Master node

    instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Core node

    instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Note

    Ensure the following:

    • The Delta Lake core jar is dependent on the Spark version. You must choose the correct version for your EMR.

    • Obtain the appropriate download URL for the Delta Lake core jar and update.

    • Delta Lake and Spark version compatibility. For more information about Delta Lake releases, see Compatibility with Apache Spark For example, to download the delta-core version 2.12.1.0.1, go to the following URL: repo1.maven.org/delta/delta-core 2.12-1.0.1.jar.

  3. Optional Arguments for Privacera installation script with custom Hive repository.

    Master node

    instance.isMaster=true "export EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Core node

    instance.isMaster=false "export EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Note

    Ensure the following:

    • You can customized &lt;hive_repo_name&gt; for the Hive application in EMR.

    • You can customized &lt;spark_hive_repo_name&gt; For the spark applications in EMR.

    • You can customized &lt;trino_hive_repo_name&gt; for the Trino application in EMR.

Configure security options in EMR cluster

Follow these steps to configure security options in EMR cluster:

  1. In Security Options, select security options as per your environment.

  2. Open Security Configuration, and select the configuration you created earlier, e.g., "PRIVACERA_KDC".

    Then in the the following fields, enter values:

  3. In the following fields, enter values:

    • Realm

    • KDC admin password

  4. Click Create cluster to complete.