Skip to content

Setup for Access Management for EMR Serverless

Configure

Perform following steps to configure EMR Serverless connector:

  1. SSH into the instance where Privacera Manager is installed.

  2. Navigate to the /config directory by running the following command:

    Bash
    cd ~/privacera/privacera-manager/config
    

  3. Copy the sample variables by running the following command:

    Bash
    cp sample-vars/vars.emr-serverless.yml custom-vars/
    

  4. Open the .yml file for editing by running the following command:

    Bash
    vi custom-vars/vars.emr-serverless.yml
    

  5. Modify the following properties. You can get the supported versions from the AWS EMR Serverless from AWS EMR Serverless Versions

    Bash
    1
    2
    3
    4
    5
    6
    7
    8
    EMR_SERVERLESS_ENABLE: "true"
    
    # EMR serverless version e.g. emr-7.2.0:latest
    EMR_SERVERLESS_VERSION: "<PLEASE_CHANGE>"
    
    # The unique name of the EMR Serverless application within your AWS account 
    # to avoid conflicts with other applications.
    EMR_SERVERLESS_APP_NAME: "<PLEASE_CHANGE>"
    

  6. Once the properties are configured, update your Privacera Manager platform instance by following the commands

    Bash
    cd ~/privacera/privacera-manager
    ./privacera-manager.sh post-install
    

  7. Once the post-install process is complete, you will see emr-serverless folder in the ~/privacera/privacera-manager/output directory, with the following folder structure:

    Bash
    1
    2
    3
    4
    5
    6
    7
    output/
    ├── emr-serverless/
       ├── olac/
          ├── Dockerfile_Privacera_Spark_OLAC
          ├── setup_emrserverless_spark_olac.sh
          ├── spark_custom_conf
          ├── spark_custom_conf.zip
    

  8. To build and push the Docker image, you need to copy the following Docker files and other configuration files to the EC2 instance where you can build the Docker image or you can build in the same EC2 instance where Privacera Manager is installed.

    Bash
    1
    2
    3
    4
    ## Files to copy
    Dockerfile_Privacera_Spark_OLAC
    setup_emrserverless_spark_olac.sh
    spark_custom_conf.zip
    
  9. Once the required files are on the EC2 instance where you can build the Docker image, run the following command. You can set the following environments variables before running the command or replace the values in the command itself.

    Here are some global variables that you need to set before running the command:

    Variable Name Description Sample Value
    aws_account_id Your AWS account ID. "123456789012"
    region The AWS region where your ECR repository is located. "us-east-1"
    ecr_repo_name The name of your ECR repository where the Docker image will be pushed. "privacera/emr-serverless-spark-olac"
    tag The tag for the Docker image. v1.0

    You can set the following environments variables before running the command or replace the values in the command itself.

    Bash
    1
    2
    3
    4
    aws_account_id=
    region=
    ecr_repo_name=
    tag=
    
    Bash
    1
    2
    3
    # Build the Docker image
    docker build . -f Dockerfile_Privacera_Spark_OLAC \
       -t ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag}
    
  10. To verify that the Docker image was created successfully, run the following command. Make sure to set the environment variables or replace the variables before running the command.

    Bash
    1
    2
    3
    # This command will open the bash shell in the docker container.
    docker run -it --rm --entrypoint /bin/bash \
        ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag}
    

    Once inside the container, you can inspect the environment to ensure it’s set up correctly. Run exit to exit. This should also delete the container since the --rm flag was used.

  11. Run the following command to push the docker image to the Amazon Elastic Container Registry (ECR) repository. Make sure to set the environment variables or replace the variables before running the command.

    Bash
    1
    2
    3
    # Login to ECR repo
    aws ecr get-login-password --region region | docker login --username AWS \
        --password-stdin ${aws_account_id}.dkr.ecr.region.amazonaws.com
    

    Note

    Make sure you have the necessary IAM permissions to manage customized Docker image in your Amazon Elastic Container Registry (ECR).

    Bash
    # Push the docker image
    docker push ${aws_account_id}.dkr.ecr.${region}.amazonaws.com/${ecr_repo_name}:${tag}
    
  12. Once the docker image is pushed to ECR, you will be able to see the docker image in the ECR repository.

Create Application

With EMR Serverless, you can create one or more applications that use open-source analytics frameworks. To create an application, follow these steps:

Note

Refer to the latest AWS documentation for deploying EMR Serverless applications.

  • Application settings: Provide a unique name for the application (e.g., emr_serverless_spark_app). Select type as Spark, and specify the release version that you have configured in the vars.emr-serverless.yml file.
  • Custom Image Settings: Select the image that you have uploaded in ECR repository.
  • Application Configuration: Edit the JSON with the following configuration:

    JSON configuration:
    JSON
    {
      "classification": "spark-defaults",
      "configurations": null,
       "properties": {
         "spark.hadoop.fs.s3a.custom.signers": "PrivaceraAwsSdkV2Signer:com.privacera.spark.agent.signer.PrivaceraAwsSdkV2Signer",
         "spark.hadoop.fs.s3a.s3.signing-algorithm": "PrivaceraAwsSdkV2Signer",
         "spark.hadoop.fs.s3a.access.key": "P_ACCESS_KEY",
         "spark.hadoop.fs.s3a.secret.key": "P_SECRET_KEY",
         "spark.hadoop.fs.s3a.session.token": "P_SESSION_TOKEN",
         "spark.executor.extraJavaOptions": "-javaagent:/usr/lib/spark/jars/privacera-agent.jar",
         "spark.sql.hive.metastore.sharedPrefixes": "com.amazonaws.services.dynamodbv2,com.privacera,com.amazonaws",
         "spark.driver.extraJavaOptions": "-javaagent:/usr/lib/spark/jars/privacera-agent.jar"
       }
    }
    

Next Steps

To submit a job to the EMR Serverless application, refer to the Privacera's User Guide for AWS EMR Serverless

Comments