Setup for Access Management for EMR Serverless¶
Configure¶
Perform following steps to configure EMR Serverless connector:
-
SSH into the instance where Privacera Manager is installed.
-
Navigate to the
/config
directory by running the following command:Bash -
Copy the sample variables by running the following command:
Bash -
Open the
.yml
file for editing by running the following command:Bash -
Modify the following properties. You can get the supported versions from the AWS EMR Serverless from AWS EMR Serverless Versions
Note
EMR_SERVERLESS_VERSION is the EMR Serverless Spark Docker image tag, which you can get from this link.
-
Once the properties are configured, update your Privacera Manager platform instance by following the commands
-
Once the
post-install
process is complete, you will see emr-serverless folder in the ~/privacera/privacera-manager/output directory, with the following folder structure:
Build custom Docker image (Multi Architecture - AWS Intel/AMD64 and AWS Graviton)¶
What is multi-architecture Docker image?
A multi-archecture or multi-platform docker image is an image that can support multiple CPU architectures such as Intel/AMD64 and ARM. It looks like a single image with a single tag, but it is a list of images targeting multiple architectures organized by a manifest list.
For EMR Serverless custom docker image, this will allow you to build a single Docker image that can run on both AWS Intel/AMD64 and AWS Graviton EC2 instances.
You can refer to GKE Site and Docker Site for more information on multi-architecture Docker images.
Multi-architecture Docker images are not mandatory for Privacera. The other option is to build separate Docker images for each architecture and push them to ECR.
-
To build and push the Docker image, you need to copy the following Docker files and other configuration files to the EC2 instance where you can build the Docker image or you can build in the same EC2 instance where Privacera Manager is installed.
-
Once the required files are on the EC2 instance where you can build the Docker image, run the following command. You can set the following environments variables before running the command or replace the values in the command itself.
Here are some global variables that you need to set before running the command:
Variable Name Description Sample Value aws_account_id
Your AWS account ID. "123456789012"
region
The AWS region where your ECR repository is located. "us-east-1"
ecr_repo_name
The name of your ECR repository where the Docker image will be pushed. "privacera/emr-serverless-spark-olac"
tag
The tag for the Docker image. v1.0
You can set the following environments variables before running the command or replace the values in the command itself.
-
Ensure that you have the necessary IAM permissions to manage customized Docker image in your Amazon Elastic Container Registry (ECR). You can use the IAM policies on this AWS documentation link to grant the necessary permissions.
-
Create ECR repository by running the following command. This is a one-time setup.
-
Run the following command to log into AWS Elastic Container Registry (ECR) repository that you just created. Make sure to set the environment variables or replace the variables before running the command.
-
Make sure you have
buildx
support in your docker cli by running -If the above command fails, then you need to enable buildx by following the instructions at this to install Docker buildx.Bash -
Follow these steps to build a multi-arch Docker image that can be used to run on AWS Intel/AMD EC2 instances and AWS Graviton EC2 instances for EMR Serverless.
First build a builder instance with the following command. This allows you to build the multi-architecture Docker image for the CPU architecture of your build host.
Bash Then follow this command build to build the EMR Serverless Docker image for both ARM and x86 platforms.
Since it is a multi-architecture image, it cannot be loaded in your docker engine but has to be pushed into ECR directly. This step also pushes into the ECR repository. Later you can do a docker pull from ECR which will load the correct platform image based on the architecture of the host.
-
To verify that the Docker image was created successfully, run the following commands. Make sure to set the environment variables or replace the variables before running the command.
Bash Bash Once inside the container, you can inspect the environment to ensure it’s set up correctly. There should beBash /opt/privacera
folder inside the image. Runexit
to exit. This should also delete the container since the--rm
flag was used.
Create Application¶
With EMR Serverless, you can create one or more applications that use open-source analytics frameworks. To create an application, follow these steps:
Note
Refer to the latest AWS documentation for deploying EMR Serverless applications.
- Application settings: Provide a unique name for the application (e.g.,
emr_serverless_spark_app
). Select type asSpark
, and specify the release version that you have configured in thevars.emr-serverless.yml
file. - Custom Image Settings: Select the image that you have uploaded in ECR repository.
- Application Configuration: Add Privacera specific Spark configuration properties to spark-defaults section:
JSON configuration:
Privacera specific Spark configuration properties need to be added to the spark-defaults
classification section as shown below. Rest of the application configuration is show for completeness, and you can modify as per your requirement. The rootLogger.level
property can be set to warn
, debug
, or trace
based on the log level you want to set.
Next Steps¶
To submit a job to the EMR Serverless application, refer to the Privacera's User Guide for AWS EMR Serverless
- Prev topic: Prerequistes
- Next topic: Advanced Configuration