Connector Guide - AWS EMR - Accessing AWS S3 (OLAC)¶
When AWS EMR is enabled with Privacera, then you can use AWS S3 as the object store. The following instructions can be used to connect to the cluster and run Spark jobs. The access is done at the AWS S3 Object leve which is also known as OLAC (Object Level Access Control).
-
SSH to your EMR master node:
Bash -
If you are using JWT for authentication, then you will have to pass the JWT token to the EMR cluster. You can do this by either passing the JWT token directly as a command-line argument or using a file path containing the JWT token.
-
To pass the JWT token directly as a command-line argument, use the following configuration when connecting to the cluster:
Bash -
To use the file path containing the JWT token, use the following configuration:
Bash
-
-
Connecting to Apache Spark Cluster
- Connect to pyspark
Bash -
Include the below additional configuration if you have enabled JWT authorization in the cluster.
-
To pass the JWT token directly as a command-line argument, use the following configuration:
Bash -
To use the file path containing the JWT token, use the following configuration:
Bash
-
-
Run spark read/write
- Connect to spark-shell
Bash -
Include the below additional configuration if you have enabled JWT authorization in the cluster.
-
To pass the JWT token directly as a command-line argument, use the following configuration:
Bash -
To use the file path containing the JWT token, use the following configuration:
Bash
-
-
Run spark read/write
When using Spark SQL, the query retrieves the metadata from AWS Glue catalog or Hive Metastore, which provides the location of the data in S3. The access to these files is controlled by Privacera.
For running SQL commands, the cluster should have access to the AWS Glue catalog or Hive Metastore.
-
Connect to spark-sql
Bash -
Run spark sql query
- Prev Connector Guide