Exclude S3 Objects from Privacera Access Check¶
By default, the Dataserver is used to perform access control on all objects. However, if you want to exclude certain objects or entire buckets from Privacera access checks and access them directly through the IAM role attached to the EMR node, you can use one of the methods outlined in this section.
It is also recommended to exclude the event logs bucket from Privacera Access Check.
Pre-requisites¶
- Ensure that the IAM role attached to the EMR nodes has the necessary permissions to access the specified paths.
Setup¶
There are 2 ways to exclude S3 objects from Privacera Access Check
- Pass the paths using Apache Spark configuration property
privacera.olac.ignore.paths
. - Use EMR Bootstrap Action to update the paths in the
privacera_spark_custom.properties
file.
Here are few additional details about the methods: a. The property accepts a comma-separated list of paths to be excluded from Privacera Access Check. b. Paths to be ignored support all s3 file protocols such as s3://
, s3a://
, and s3n://
. c. You can use the wildcard character *
in both the bucket name and object path.
Tip
- If property is passed using both methods, the property passed in the Apache Spark Job will take precedence.
- For the event log bucket, ensure that you ignore the entire bucket, not just the specific path where the event logs are stored.
1. Apache Spark Configuration Property while submitting the Spark Job¶
You can pass the paths to be ignored by passing it in the spark.hadoop.privacera.olac.extra.ignore.paths
property while submitting the Spark job. This could be done using the --conf
option.
2. EMR Bootstrap Action¶
Follow the steps below to create the Script file and add the Bootstrap Action in the EMR template file:
-
Create a Script file named
privacera_update_spark_ignore_path.sh
with the content provided and upload it to an S3 location accessible through the IAM role attached to the EMR nodes. -
Update the EMR Template file with the below given Bootstrap Action:
Replace the placeholders with the appropriate values
<BUCKET>
: S3 bucket where the script is uploaded.<PATH_TO_SCRIPT>
: Path to the script file.<IGNORE PATHS SEPERATED BY COMMAS>
: Comma-separated list of paths to be ignored.
3. Save the EMR Template and trigger the EMR Cluster. The Bootstrap Action will be executed during the EMR cluster creation, and the ignore paths will be added to theJSON privacera_spark_custom.properties
.
The above bootstrap action will update the ignore paths in both the Master and Executor nodes.
- Prev topic: Advanced Configuration