Skip to content

Enable Delta Lake for AWS EMR

To enable Delta Lake support for AWS EMR, set the EMR_SPARK_DELTA_LAKE_ENABLE to true.

  1. SSH to the instance where Privacera is installed.

  2. Navigate to the custom-vars plugin directory:

    Bash
    cd ~/privacera/privacera-manager/config/custom-vars
    

  3. Open the penv.sh file and update the following properties:

    Bash
    1
    2
    3
    vi vars.emr.yml
    
    EMR_SPARK_DELTA_LAKE_ENABLE: "true"
    

  4. After updating the configuration, follow the setup steps starting from the Create EMR Cluster for triggering an EMR cluster.

  5. For information on how to use and validate Delta Lake in AWS EMR, see the Using Delta Lake with AWS EMR guide.

To enable Delta Lake support for AWS EMR, set the SPARK_DELTA_LAKE_ENABLE to enable-spark-deltalake.

  1. To enable Delta Lake support for EMR Spark, update the BootstrapActions configuration in emr template as shown below. Then, create a new emr cluster with this template:

    privacera-emr-bootstrap-actions-delta-lake
    JSON
              "BootstrapActions":[
                {
                  "Name":"Install Spark OLAC in Master Node",
                  "ScriptBootstrapAction":{
                    "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
                    "Args":[
                      {
                        "Fn::Sub":"instance.isMaster=true"
                      },
                      {
                        "Fn::Sub":"export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake ; wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-olac"
                      }
                    ]
                  }
                },
                {
                  "Name":"Install Spark OLAC in Core Node",
                  "ScriptBootstrapAction":{
                    "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
                    "Args":[
                      {
                        "Fn::Sub":"instance.isMaster=false"
                      },
                      {
                        "Fn::Sub":"export SPARK_DELTA_LAKE_ENABLE=enable-spark-deltalake ; wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-olac"
                      }
                    ]
                  }
                }
              ]
    
  2. Add the following Delta classification to the EMR template to enable Delta support

    privacera-emr-delta-lake-classfication
    JSON
    1
    2
    3
    4
    5
    6
              {
                "Classification":"delta-defaults",
                "ConfigurationProperties":{
                   "delta.enabled":"true"
                }
              }
    
  3. After updating the configuration, follow the setup steps starting from the Create EMR Cluster for triggering an EMR cluster.

  4. For information on how to use and validate Delta Lake in AWS EMR, see the Using Delta Lake with AWS EMR guide.