Skip to content

AWS EMR Serverless - Access - Spark OLAC

For AWS EMR Serverless, Privacera only supports access control to AWS S3 objects using Apache Spark.

For using Privacera with AWS EMR Serverless, you need to make sure that the JWT token is passed to the Spark job or the Jupyter Notebook. Given below are the reference steps to configure the Apache Spark job and Jupyter Notebook

Tip

Replace the JWT-TOKEN with the actual JWT token in all the below use cases.

  1. EMR Studio Workspace and connect to Jupyter Notebook

    • Create a Workspace by providing a unique name, S3 storage path and enable the Interactive endpoint.
    • Connect to Jupyter Notebook and provide the JWT token in the notebook using the following format:
    Python
    spark.conf.set("spark.hadoop.privacera.jwt.oauth.enable", "true")
    spark.conf.set("spark.hadoop.privacera.jwt.token.str", "<JWT-TOKEN>")
    
  2. Spark Job

    • Create a Spark Job with the following Privacera specific Spark properties in the spark-defaults classification.

      JSON configuration:
      JSON
      1
      2
      3
      4
      5
      "spark.hadoop.privacera.jwt.oauth.enable": "true",
      "spark.hadoop.privacera.jwt.token.str": "<JWT-TOKEN>",
      "spark.driver.extraJavaOptions": "-javaagent:/usr/lib/spark/jars/privacera-agent.jar",
      "spark.executor.extraJavaOptions": "-javaagent:/usr/lib/spark/jars/privacera-agent.jar",
      "spark.sql.hive.metastore.sharedPrefixes": "com.amazonaws.services.dynamodbv2,com.privacera,com.amazonaws"
      

Comments