Skip to main content

Privacera Documentation

Table of ContentsTable of Contents

Connect Elastic MapReduce from Amazon application to PrivaceraCloud

This topic describes how to connect an Elastic MapReduce from Amazon application to PrivaceraCloud for access control.

Connect EMR application

  1. Go to Settings > Applications.

  2. In the Applications screen, select EMR.

  3. Enter the application Name and Description, and then click Save.

  4. Click the toggle button to enable Access Management for your application.

EMR Spark access control types

EMR Spark supports two types of access control: Fine-Grained Access Control (FGAC) and Object Level Access Control (OLAC). EMR FGAC and OLAC are mutually exclusive.

Kerberos required for EMR FGAC or OLAC

Note

To support Privacera FGAC or OLAC, the EMR application must be configured with Kerberos.

EMR Spark OLAC

The advantages of EMR Spark OLAC:

  • It allows you to access existing AWS S3 resource location that you are trying to access with Spark.

  • It uses privacera_s3 service for resource-based access control and privacera_tag service for tag-based access control.

  • It uses the signed-authorization implementation from Privacera.

EMR Spark FGAC

When FGAC is installed and enabled, each data user query is parsed by Spark and authorized by the PrivaceraCloud Spark Plug-In. All resources referred to by the query must be accessible to the requesting user.

PrivaceraCloud configuration

Obtain shared key

Note

This is only needed for OLAC setup.

Obtain or determine a character string to serve as a "shared key" between PrivaceraCloud and the AWS EMR cluster. We'll refer to this as {SHARED_KEY} in the configuration steps below.

  1. Connect S3 application to PrivaceraCloud. For more information about how to connect S3 application, see Connect S3 to PrivaceraCloud.

  2. Select the connected S3 application, and then click the edit (pen) icon.

  3. In the the ADVANCED tab, add the following property:

    Substitute actual value for {SHARED KEY}.

    dataserver.shared.secret={SHARED_KEY}
  4. Click Save.

Obtain EMR script download URL

Obtain your account unique call-in <emr-script-download-url> to allow the EMR cluster to obtain additional scripts and setup from PrivaceraCloud:

  1. In PrivaceraCloud portal, go to Settings > API Keys .

  2. Use an existing Active Api Key or create a new one. Set Expiry = Never Expires.

  3. Click the API Key Info (i) button.

  4. On the API Key Info page, click the COPY URL button in front of the AWS EMR Setup Script to store the emr-script-download-url.

AWS IAM roles using CloudFormation setup

  1. The following two IAM roles need to be created before launching the cluster. These can be created easily with minimal permission using the IAM roles creation template.

    • Node role: EmrPrivaceraNodeRole

    • App data access role: EmrPrivaceraDataAcessRole

    If required, you can modify the template based on your requirements.

  2. To create role, use the following AWS CLI CloudFormation command:

    Note

    This must be executed from an EC2 that has permission to create a CloudFormation stack.

    aws --region <AWS-REGION> cloudformation create-stack --stack-name 
    privacera-emr-role-creation --template-body 
    file://emr-roles-creation-template.json --capabilities CAPABILITY_NAMED_IAM

    For more information about how to create a stack using a CloudFormation template, see Create CloudFormation stack.

 {
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description":"Create roles and policies for use by Privacera-Protected EMR Clusters",
  "Resources":{
    "EmrRestrictedRole":{
      "Type":"AWS::IAM::Role",
      "Properties":{
        "RoleName":{
          "Fn::Join":[
            "",
            [
              "EmrPrivaceraNodeRole"
            ]
          ]
        },
        "AssumeRolePolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Effect":"Allow",
              "Principal":{
                "Service":[
                  "ec2.amazonaws.com"
                ]
              },
              "Action":[
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "Path":"/"
      }
    },
    "EmrRestrictedPolicy":{
      "Type":"AWS::IAM::ManagedPolicy",
      "Properties":{
        "ManagedPolicyName":{
          "Fn::Join":[
            "",
            [
              "EMRPrivaceraNodePolicy"
            ]
          ]
        },
        "PolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Sid":"EmrServiceLimited",
              "Effect":"Allow",
              "Action":[
                "glue:CreateDatabase",
                "glue:UpdateDatabase",
                "glue:DeleteDatabase",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetTableVersions",
                "glue:CreatePartition",
                "glue:BatchCreatePartition",
                "glue:UpdatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition",
                "glue:CreateUserDefinedFunction",
                "glue:UpdateUserDefinedFunction",
                "glue:DeleteUserDefinedFunction",
                "glue:GetUserDefinedFunction",
                "glue:GetUserDefinedFunctions",
                "ec2:Describe*",
                "elasticmapreduce:Describe*",
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ListClusters",
                "elasticmapreduce:ListInstanceGroups",
                "elasticmapreduce:ListInstances",
                "elasticmapreduce:ListSteps"
              ],
              "Resource":"*"
            },
            {
              "Sid":"EmrS3Limited",
              "Effect":"Allow",
              "Action":"s3:*",
              "Resource":[
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_LOG_S3_PATH>",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_HIVE_WAREHOUSE_S3_PATH>"
              ]
            },
            {
              "Sid":"EmrAssumeIAM",
              "Effect":"Allow",
              "Action":"sts:AssumeRole",
              "Resource":[
                "arn:aws:iam::<PLEASE_UPDATE AWS_ACCCOUNT_ID>:role/EmrPrivaceraDataAccessRole"
              ]
            }
          ]
        },
        "Roles":[
          {
            "Ref":"EmrRestrictedRole"
          }
        ]
      }
    },
    "EmrRoleForApps":{
      "Type":"AWS::IAM::Role",
      "Properties":{
        "RoleName":{
          "Fn::Join":[
            "",
            [
              "EmrPrivaceraDataAccessRole"
            ]
          ]
        },
        "AssumeRolePolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Effect":"Allow",
              "Principal":{
                "Service":[
                  "ec2.amazonaws.com"
                ]
              },
              "Action":[
                "sts:AssumeRole"
              ]
            },
            {
              "Effect":"Allow",
              "Principal":{
                "AWS":{
                  "Fn::GetAtt":[
                    "EmrRestrictedRole",
                    "Arn"
                  ]
                }
              },
              "Action":"sts:AssumeRole"
            }
          ]
        },
        "Path":"/"
      }
    },
    "DataAccessPolicy":{
      "Type":"AWS::IAM::ManagedPolicy",
      "Properties":{
        "ManagedPolicyName":{
          "Fn::Join":[
            "",
            [
              "EmrPrivaceraDataAcessPolicy"
            ]
          ]
        },
        "PolicyDocument":{
          "Version":"2012-10-17",
          "Statement":[
            {
              "Sid":"S3DataAccess",
              "Effect":"Allow",
              "Action":[
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:DeleteBucket",
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:PutObjectAcl"
              ],
              "Resource":[
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce/*",
                "arn:aws:s3:::elasticmapreduce",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_LOG_S3_PATH>",
                "arn:aws:s3:::<PLEASE_UPDATE EMR_CLUSTER_HIVE_WAREHOUSE_S3_PATH>"
              ]
            }
          ]
        },
        "Roles":[
          {
            "Ref":"EmrRoleForApps"
          }
        ]
      }
    },
    "EmrRestrictedRoleProfile":{
      "Type":"AWS::IAM::InstanceProfile",
      "Properties":{
        "InstanceProfileName":{
          "Ref":"EmrRestrictedRole"
        },
        "Roles":[
          {
            "Ref":"EmrRestrictedRole"
          }
        ]
      }
    },
    "EmrRoleForAppsProfile":{
      "Type":"AWS::IAM::InstanceProfile",
      "Properties":{
        "InstanceProfileName":{
          "Ref":"EmrRoleForApps"
        },
        "Roles":[
          {
            "Ref":"EmrRoleForApps"
          }
        ]
      }
    }
  },
  "Outputs":{
    "EMRRestrictedRole":{
      "Value":{
        "Ref":"EmrRestrictedRole"
      }
    },
    "EmrRoleForApps":{
      "Value":{
        "Ref":"EmrRoleForApps"
      }
    }
  }
}

Create a security configuration

You can create a security configuration using:

  • CloudFormation setup (Recommended)

  • AWS EMR console (Manually)

Manually create a security configuration using AWS EMR console

To create a security configuration using the AWS EMR console:

  1. Log into the AWS EMR console.

  2. In the left navigation, select Security Configuration > Create New Security Configuration.

  3. Enter a Name for Security Configuration. For example, emr_sec_config.

  4. Navigate to the Authentication section, check Enable Kerberos authentication, and enter the Kerberos environment details as follows:

    • Provider: Cluster dedicated KDC

    • TicketLifetime: 24 hours

    • Cross-realm trust

      • Realm: EXAMPLE.COM

        You must use the actual URL instead of the domain EXAMPLE.COM.

      • Domain: example.com

      • Admin server: sever.admin.com

      • KDC server: server.example.com

  5. Select Use IAM roles for EMRFS requests to Amazon S3.

    • IAM Role: select the App data access role created in AWS IAM roles using CloudFormation setup.

    • Under Basis for access select an identifier type ( User ) from the list and enter corresponding identifiers (hadoop;hive;presto;trino).

Create EMR cluster

Prerequisites

To enable access control for Trino or PrestoSQL in EMR, the specific application need to be enabled prior to configuring EMR.

  1. Go to Settings > Applications.

  2. In the Applications screen, select Trino or Presto as per your EMR version.

    Note

    If the EMR version is 6.4.0 or above, then select Trino.

    If the EMR version is 6.x to 6.3.1, then select Presto.

  3. Enter the application Name and Description, and then click Save.

  4. Click the toggle button to enable Access Management for your application.

Kerberos required for EMR FGAC or OLAC

Note

To support Privacera FGAC or OLAC, EMR application must be running on Kerberos.

You can create EMR cluster using:

  • CloudFormation setup (Recommended)

    • Using CloudFormation EMR templates

    • Using Cloud formation AWS CLI

    • Using Cloud formation AWS Console

  • AWS EMR console (Manually)

Manually create EMR cluster using AWS EMR console

Follow these steps to manually create AWS EMR cluster using AWS EMR console:

  1. Login to AWS Management Console and navigate to EMR Console.

  2. Click the Create cluster button.

  3. click Go to advanced, and select Release. For example, emr-6.4.0

Configure applications for AWS EMR cluster

Follow these steps to configure individual application for AWS EMR cluster:

  1. Select additional applications as per your environment, such as Spark, Hadoop, Hive, Trino or Presto.

  2. In Edit software settings, select Enter configuration and add the following individual application's configuration array.

{
  "Classification": "spark",
  "ConfigurationProperties": {
    "maximizeResourceAllocation": "true"
  },
  "Configurations": []
},
{
  "Classification": "spark-hive-site",
  "ConfigurationProperties": {
    "hive.metastore.warehouse.dir": "s3://<bucket-name>/<path>"
  }
}
{
  "Classification": "hive-site",
  "ConfigurationProperties": {
    "javax.jdo.option.ConnectionURL": "",
    "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
    "javax.jdo.option.ConnectionUserName": "root",
    "javax.jdo.option.ConnectionPassword": "welcome1",
    "hive.server2.enable.doAs": "false",
    "parquet.column.index.access": "true",
    "fs.s3a.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
    "hive.metastore.warehouse.dir": "s3://<bucket-name>/<path>"
  }
}

Note

If the EMR version is 6.4.0 or above, use trino in place of {application} in the array. Use prestosql for older versions.

Replace {application} based on the component you select.

For example:

  • For Trino, use trino-connector-hive

  • For PrestoSQL, use prestosql-connecor-hive

Presto and Trino/Presto-sql are incompatible. Only one at a time should be used.

  • Glue configuration:

    {
      "Classification": "<application>-connector-hive",
      "ConfigurationProperties": {
        "hive.metastore": "glue",
        "hive.allow-drop-table": "true",
        "hive.allow-add-column": "true",
        "hive.allow-rename-column": "true",
        "connector.name": "hive-hadoop2",
        "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
        "hive.s3-file-system-type": "EMRFS",
        "hive.hdfs.impersonation.enabled": "false",
        "hive.allow-drop-column": "true",
        "hive.allow-rename-table": "true"
      }
    }
  • EHM configuration:

    {
      "Classification": "<application>-connector-hive",
      "ConfigurationProperties": {
        "hive.allow-drop-table": "true",
        "hive.allow-add-column": "true",
        "hive.allow-rename-column": "true",
        "connector.name": "hive-hadoop2",
        "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
        "hive.s3-file-system-type": "EMRFS",
        "hive.hdfs.impersonation.enabled": "false",
        "hive.allow-drop-column": "true",
        "hive.allow-rename-table": "true"
      }
    }

Bootstrap actions

  1. In Hardware settings, select Networking, Node, and Instance values as appropriate for your environment.

  2. Go to General cluster settings, configure the cluster name, logging, debugging, and termination protection as needed for your environment.

    Configure the General cluster settings by including two scripts that install the Privacera Signing Agent on both the master and worker nodes.

  3. In Additional Options, expand Bootstrap Actions, select bootstrap action Run if, and then click Configure and add.

  4. In the Bootstrap actions dialog, set the name of the master and core node to Privacera Signing Agent.

  5. Copy the following script into Optional Arguments using your own emr-script-download-url script URL. See Obtain EMR script download URL.

  6. Click Add when finished.

  1. Optional Arguments for Privacera installation script:

    Master node

    instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"

    Core node

    instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fbac"
  2. Optional Arguments for Privacera installation script with delta.

    Export the following two additional variables in the Bootstrap actions to enable Delta Lake:

    SPARK_DELTA_LAKE_ENABLE
    SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL

    Master node

    instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"

    Core node

    instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fbac"

    Note

    Ensure the following:

    • The Delta Lake core jar is dependent on the Spark version. You must choose the correct version for your EMR.

    • Obtain the appropriate download URL for the Delta Lake core jar and update.

    • Delta Lake and Spark version compatibility. For more information about Delta Lake releases, see Compatibility with Apache Spark.

  1. Optional Arguments for Privacera installation script:

    Master node

    instance.isMaster=true "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"

    Core node

    instance.isMaster=false "wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo ./privacera_emr.sh spark-fgac"
  2. Optional Arguments for Privacera installation script with delta.

    Export the following two additional variables in the Bootstrap actions to enable Delta Lake:

    SPARK_DELTA_LAKE_ENABLE
    SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL

    Master node

    instance.isMaster=true "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Core node

    instance.isMaster=false "export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake; export SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL=<DELTA_LAKE_CORE_JAR_DOWNLOAD_UR>; wget <emr-script-download-url>; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Note

    Ensure the following:

    • The Delta Lake core jar is dependent on the Spark version. You must choose the correct version for your EMR.

    • Obtain the appropriate download URL for the Delta Lake core jar and update.

    • Delta Lake and Spark version compatibility. For more information about Delta Lake releases, see Compatibility with Apache Spark.

  3. Create Custom Policy Repository

    Before adding optional arguments for the Privacera installation script with a custom Hive repository, you need to complete the following steps:

    1. Login to PrivaceraCloud Portal.

    2. On the Privacera Portal home page, expand Access Management, and then click the Resource Policies.

    3. Click the three dots menu on the service for which you want to create a custom policy repository.

    4. Click Add Service.

    5. Add the values in the given fields, and then click Save.

      Service Name: name of your service.

      For example, <service_name>_dev_policy. Where service name will be hive, s3, files, or so on.

      Select Tag Service: select privacera_tag to apply tag based policies for this custom repository.

      Username: It should be service user i.e., hive

      Password: dummy

      jdbc.driverClassName: dummy or org.apache.hive.jdbc.HiveDriver

      Jdb.url: dummy

  4. Optional Arguments for Privacera installation script with custom Hive repository.

    Master node

    instance.isMaster=true "export EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Core node

    instance.isMaster=false "export EMR_HIVE_SERVICE_NAME=<hive_repo_name>; export EMR_TRINO_HIVE_SERVICE_NAME=<trino_hive_repo_name>; export EMR_SPARK_HIVE_SERVICE_NAME=<spark_hive_repo_name>; wget <emr-script-download-url> ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-fgac"

    Note

    Ensure the following:

    • You can customized <;hive_repo_name>; for the Hive application in EMR.

    • You can customized <;spark_hive_repo_name>; For the spark applications in EMR.

    • You can customized <;trino_hive_repo_name>; for the Trino application in EMR.

Configure security options in EMR cluster

Follow these steps to configure security options in EMR cluster:

  1. In Security Options, select security options as per your environment.

  2. Open Security Configuration, and select the configuration you created earlier, e.g., "emr_sec_config".

    Then in the the following fields, enter values:

  3. In the following fields, enter values:

    • Realm

    • KDC admin password

  4. Click Create cluster to complete.