Setup for Access Management for EMR Serverless¶

Configure¶

Self Managed

Perform following steps to configure EMR Serverless connector:

SSH into the instance where Privacera Manager is installed.

Navigate to the /config and copy the sample variables by running the following command:

Bash
cd ~/privacera/privacera-manager/config
cp -n sample-vars/vars.emr-serverless.yml custom-vars/
vi custom-vars/vars.emr-serverless.yml

Modify the following properties. You can get the supported versions from the AWS EMR Serverless from AWS EMR Serverless Versions

Tip

EMR_SERVERLESS_VERSION is the EMR Serverless Spark Docker image tag, which you can get from this link. Use only image:tag format, e.g., emr-7.0.0:latest

Bash
EMR_SERVERLESS_ENABLE: "true"

# EMR serverless version e.g. emr-7.2.0:latest
EMR_SERVERLESS_VERSION: "<PLEASE_CHANGE>"

# The unique name of the EMR Serverless application within your AWS account 
# to avoid conflicts with other applications. Example: privacera1-emr-serverless
EMR_SERVERLESS_APP_NAME: "<PLEASE_CHANGE>"

Once the properties are configured, update your Privacera Manager platform instance by following the commands
Bash
1 2
cd ~/privacera/privacera-manager ./privacera-manager.sh post-install

Once the post-install process is complete, you will see emr-serverless folder in the ~/privacera/privacera-manager/output directory, with the following folder structure:

Bash
output/
├── emr-serverless/
│   ├── olac/
│   │   ├── Dockerfile_Privacera_Spark_OLAC
│   │   ├── setup_emrserverless_spark_olac.sh
│   │   ├── spark_custom_conf
│   │   │   ├── global-truststore.p12
│   │   │   ├── privacera_spark.properties
│   │   ├── spark_custom_conf.zip
│   ├── olac_fgac/
│   │   ├── Dockerfile_Privacera_Spark_OLAC_FGAC
│   │   ├── setup_emrserverless_spark_olac_fgac.sh
│   │   ├── spark_custom_conf
│   │   │   ├── auditserver-secrets-keystore.jks
│   │   │   ├── global-truststore.p12
│   │   │   ├── jwttoken.pub
│   │   │   ├── privacera_spark.properties
│   │   │   ├── ranger-plugin-keystore.p12
│   │   │   ├── ranger.jceks
│   │   ├── spark_custom_conf.zip

Build custom Docker image (Multi Architecture - AWS Intel/AMD64 and AWS Graviton)¶

This section is an reference for building a custom Docker image for EMR Serverless. Refer to the latest AWS documentation for customizing Docker images for EMR Serverless.

What is multi-architecture Docker image?

A multi-archecture or multi-platform docker image is an image that can support multiple CPU architectures such as Intel/AMD64 and ARM. It looks like a single image with a single tag, but it is a list of images targeting multiple architectures organized by a manifest list.

For EMR Serverless custom docker image, this will allow you to build a single Docker image that can run on both AWS Intel/AMD64 and AWS Graviton EC2 instances.

You can refer to GKE Site and Docker Site for more information on multi-architecture Docker images.

Multi-architecture Docker images are not mandatory for Privacera. The other option is to build separate Docker images for each architecture and push them to ECR.

Self Managed

To build and push the Docker image, you need to copy the following Docker files and other configuration files to the EC2 instance where you can build the Docker image or you can build in the same EC2 instance where Privacera Manager is installed.
Enable OLAC Plugin in EMR Serverless
- Navigate to the output/emr-serverless/olac directory by running the following command:
  Bash
  1
  cd ~/privacera/privacera-manager/output/emr-serverless/olac
- Required files to copy to the EC2 instance where the Docker image can be built.
  Bash
  1 2 3
  Dockerfile_Privacera_Spark_OLAC setup_emrserverless_spark_olac.sh spark_custom_conf.zip
Enable OLAC_FGAC Plugin in EMR Serverless
- Navigate to the output/emr-serverless/olac_fgac directory by running the following command:
  Bash
  1
  cd ~/privacera/privacera-manager/output/emr-serverless/olac_fgac
- Required files to copy to the EC2 instance where the Docker image can be built.
  Bash
  1 2 3
  Dockerfile_Privacera_Spark_OLAC_FGAC setup_emrserverless_spark_olac_fgac.sh spark_custom_conf.zip

Once the required files are on the EC2 instance where you can build the Docker image, run the following command. You can set the following environments variables before running the command or replace the values in the command itself.

Here are some global variables that you need to set before running the command:

Variable Name	Description	Sample Value
`aws_account_id`	Your AWS account ID.	`"123456789012"`
`region`	The AWS region where your ECR repository is located.	`"us-east-1"`
`ecr_repo_name`	The name of your ECR repository where the Docker image will be pushed.	`"privacera/emr-serverless-spark-olac"`
`tag`	The tag for the Docker image.	`v1.0`

You can set the following environments variables before running the command or replace the values in the command itself.

Bash
aws_account_id=<PLEASE_CHANGE>
region=<PLEASE_CHANGE>
ecr_repo_name=<PLEASE_CHANGE>
tag=<PLEASE_CHANGE>

Ensure that you have the necessary IAM permissions to manage customized Docker image in your Amazon Elastic Container Registry (ECR). You can use the IAM policies on this AWS documentation link to grant the necessary permissions.

Create ECR repository by running the following command. This is a one-time setup.

Bash
aws ecr create-repository \
    --repository-name ${ecr_repo_name} \
    --region ${region}

Run the following command to log into AWS Elastic Container Registry (ECR) repository that you just created. Make sure to set the environment variables or replace the variables before running the command.

Bash
# Login to ECR repo
aws ecr get-login-password --region ${region} | \
    docker login --username AWS \
    --password-stdin ${aws_account_id}.dkr.ecr.${region}.amazonaws.com

Make sure you have buildx support in your docker cli by running -
Bash
1
docker buildx --help
If the above command fails, then you need to enable buildx by following the instructions at this to install Docker buildx.

Follow these steps to build a multi-arch Docker image that can be used to run on AWS Intel/AMD EC2 instances and AWS Graviton EC2 instances for EMR Serverless.

First build a builder instance with the following command. This allows you to build the multi-architecture Docker image for the CPU architecture of your build host.

Bash
# --use option can set as the builder as a default builder  
docker buildx create \
    --name multi-arch-builder \
    --driver=docker-container

Then follow this command build to build the EMR Serverless Docker image for both ARM and x86 platforms.

Since it is a multi-architecture image, it cannot be loaded in your docker engine but has to be pushed into ECR directly. This step also pushes into the ECR repository. Later you can do a docker pull from ECR which will load the correct platform image based on the architecture of the host.

Bash
docker buildx build \
    --file ./Dockerfile_Privacera_Spark_OLAC \
    --tag ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag} \
    --platform linux/arm64/v8,linux/amd64 \
    --builder multi-arch-builder \
    --push \
    .

To verify that the Docker image was created successfully, run the following commands. Make sure to set the environment variables or replace the variables before running the command.

Bash
# this command will show the manifest which should have both the ARM 
# and x86 platforms
docker manifest inspect \
    ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag}

Bash
# this will pull the correct platform image based on the architecture of 
# the host
docker pull \
    ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag}

Bash
# This command will open the bash shell in the docker container.
docker run -it --rm --entrypoint /bin/bash \
    ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag}

Once inside the container, you can inspect the environment to ensure it’s set up correctly. There should be /opt/privacera folder inside the image. Run exit to exit. This should also delete the container since the --rm flag was used.

Create Application¶

With EMR Serverless, you can create one or more applications that use open-source analytics frameworks. To create an application, follow these steps:

Note

Refer to the latest AWS documentation for deploying EMR Serverless applications.

Application settings: Provide a unique name for the application (e.g., emr_serverless_spark_app). Select type as Spark, and specify the release version that you have configured in the vars.emr-serverless.yml file.
Custom Image Settings: Select the image that you have uploaded in ECR repository.
Application Configuration: Add Privacera specific Spark configuration properties to spark-defaults section:

JSON configuration:

Privacera specific Spark configuration properties need to be added to the spark-defaults classification section as shown below. Rest of the application configuration is show for completeness, and you can modify as per your requirement. The rootLogger.level property can be set to warn, debug, or trace based on the log level you want to set.

JSON
{
  "runtimeConfiguration": [
    {
      "classification": "spark-driver-log4j2",
      "configurations": [],
      "properties": {
        "rootLogger.level": "<warn | debug | trace>"
      }
    },
    {
      "classification": "spark-executor-log4j2",
      "configurations": [],
      "properties": {
        "rootLogger.level": "<warn | debug | trace>"
      }
    },
    {
      "classification": "spark-defaults",
      "configurations": [],
      "properties": {
        "spark.executor.extraJavaOptions": "-javaagent:/usr/lib/spark/jars/privacera-agent.jar",
        "spark.driver.extraJavaOptions": "-javaagent:/usr/lib/spark/jars/privacera-agent.jar",
        "spark.sql.hive.metastore.sharedPrefixes": "com.amazonaws.services.dynamodbv2,com.privacera,com.amazonaws",
        "spark.hadoop.fs.s3a.access.key": "P_ACCESS_KEY",
        "spark.hadoop.fs.s3a.secret.key": "P_SECRET_KEY",
        "spark.hadoop.fs.s3a.session.token": "P_SESSION_TOKEN",
        "spark.hadoop.fs.s3a.s3.signing-algorithm": "PrivaceraAwsSdkV2Signer",
        "spark.hadoop.fs.s3a.custom.signers": "PrivaceraAwsSdkV2Signer:com.privacera.spark.agent.signer.PrivaceraAwsSdkV2Signer"
      }
    }
  ]
}

Prev topic: Prerequistes
Next topic: Advanced Configuration

Setup for Access Management for EMR Serverless¶

Configure¶

Build custom Docker image (Multi Architecture - AWS Intel/AMD64 and AWS Graviton)¶

Create Application¶

Comments