Skip to content

Exclude S3 Objects from Privacera Access Check

By default, the Dataserver is used to perform access control on all objects. However, if you want to exclude certain objects or entire buckets from Privacera access checks and access them directly through the IAM role attached to the EMR node, you can use one of the methods outlined in this section.

It is also recommended to exclude the event logs bucket from Privacera Access Check.

Pre-requisites

  • Ensure that the IAM role attached to the EMR nodes has the necessary permissions to access the specified paths.

Setup

There are 2 ways to exclude S3 objects from Privacera Access Check

  1. Pass the paths using Apache Spark configuration property privacera.olac.ignore.paths.
  2. Use EMR Bootstrap Action to update the paths in the privacera_spark_custom.properties file.

Here are few additional details about the methods: a. The property accepts a comma-separated list of paths to be excluded from Privacera Access Check. b. Paths to be ignored support all s3 file protocols such as s3://, s3a://, and s3n://. c. You can use the wildcard character * in both the bucket name and object path.

Tip

  • If property is passed using both methods, the property passed in the Apache Spark Job will take precedence.
  • For the event log bucket, ensure that you ignore the entire bucket, not just the specific path where the event logs are stored.

1. Apache Spark Configuration Property while submitting the Spark Job

You can pass the paths to be ignored by passing it in the spark.hadoop.privacera.olac.extra.ignore.paths property while submitting the Spark job. This could be done using the --conf option.

2. EMR Bootstrap Action

Follow the steps below to create the Script file and add the Bootstrap Action in the EMR template file:

  1. Create a Script file named privacera_update_spark_ignore_path.sh with the content provided and upload it to an S3 location accessible through the IAM role attached to the EMR nodes.

    privacera_update_spark_ignore_path.sh
    #!/bin/bash
    
    # Check if an argument is provided
    if [ "$#" -eq 0 ]; then
      echo "No argument provided. Usage: ./privacera_update_spark_ignore_path.sh <comma_separated_ignore_s3_uri>"
      exit 1
    fi
    
    
    ignore_path=${1}
    echo "Comma separated ignore path: ${ignore_path}"
    
    
    priv_spark_conf_dir="/opt/privacera/plugin/privacera-spark-plugin/spark-conf"
    priv_spark_conf_file="${priv_spark_conf_dir}/privacera_spark_custom.properties"
    
    
    echo "Creating ${priv_spark_conf_dir}"
    sudo mkdir -p ${priv_spark_conf_dir}
    
    
    echo "Creating ${priv_spark_conf_file}"
    sudo touch ${priv_spark_conf_file}
    sudo chown hadoop:hadoop ${priv_spark_conf_file}
    
    
    echo "Updating ignore path '${ignore_path}' into '${priv_spark_conf_file}'"
    sudo echo "privacera.olac.ignore.paths=${ignore_path}" >> ${priv_spark_conf_file}
    
  2. Update the EMR Template file with the below given Bootstrap Action:

    Replace the placeholders with the appropriate values

    • <BUCKET>: S3 bucket where the script is uploaded.
    • <PATH_TO_SCRIPT>: Path to the script file.
    • <IGNORE PATHS SEPERATED BY COMMAS>: Comma-separated list of paths to be ignored.

    JSON
    {
      "Name": "Update Spark Ignore Path",
      "ScriptBootstrapAction": {
        "Path": {
          "Fn::Sub": "s3://<BUCKET>/<PATH_TO_SCRIPT>/privacera_update_spark_ignore_path.sh"
        },
        "Args": [
          "<IGNORE_PATHS_SEPERATED_BY_COMMAS>"
        ]
      }
    }
    
    3. Save the EMR Template and trigger the EMR Cluster. The Bootstrap Action will be executed during the EMR cluster creation, and the ignore paths will be added to the privacera_spark_custom.properties.

The above bootstrap action will update the ignore paths in both the Master and Executor nodes.

Comments