Skip to content

Enable Iceberg for AWS EMR

To enable Iceberg support for AWS EMR, set the EMR_SPARK_ICEBERG_ENABLE to true.

  1. SSH to the instance where Privacera is installed.

  2. Navigate to the custom-vars plugin directory:

    Bash
    cd ~/privacera/privacera-manager/config/custom-vars
    

  3. Open the penv.sh file and update the following properties:

    Bash
    1
    2
    3
    vi vars.emr.yml
    
    EMR_SPARK_ICEBERG_ENABLE: "true"
    

  4. After updating the configuration, follow the setup steps starting from the Create EMR Cluster for triggering an EMR cluster.

  5. For information on how to use and validate Iceberg in AWS EMR, see the Using Iceberg with AWS EMR guide.

To enable Iceberg support for AWS EMR, set the SPARK_ICEBERG_ENABLE to true.

  1. To enable Iceberg support for EMR Spark, update the BootstrapActions configuration in emr template as shown below. Then, create a new emr cluster with this template:

    privacera-emr-bootstrap-actions-iceberg-lake
    JSON
              "BootstrapActions":[
                {
                  "Name":"Install Spark OLAC in Master Node",
                  "ScriptBootstrapAction":{
                    "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
                    "Args":[
                      {
                        "Fn::Sub":"instance.isMaster=true"
                      },
                      {
                        "Fn::Sub":"export SPARK_ICEBERG_ENABLE=true ; wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-olac"
                      }
                    ]
                  }
                },
                {
                  "Name":"Install Spark OLAC in Core Node",
                  "ScriptBootstrapAction":{
                    "Path":"s3://elasticmapreduce/bootstrap-actions/run-if",
                    "Args":[
                      {
                        "Fn::Sub":"instance.isMaster=false"
                      },
                      {
                        "Fn::Sub":"export SPARK_ICEBERG_ENABLE=true ; wget ${PrivaceraDownloadUrl}/privacera_emr.sh ; chmod +x ./privacera_emr.sh ; sudo -E ./privacera_emr.sh spark-olac"
                      }
                    ]
                  }
                }
              ]
    
  2. Add the following Iceberg classification to the EMR template to enable Iceberg support

    privacera-emr-iceberg-classfication
    JSON
    1
    2
    3
    4
    5
    6
              {
                "Classification":"iceberg-defaults",
                "ConfigurationProperties":{
                   "iceberg.enabled":"true"
                }
              }
    
  3. After updating the configuration, follow the setup steps starting from the Create EMR Cluster for triggering an EMR cluster.

  4. For information on how to use and validate Iceberg in AWS EMR, see the Using Iceberg with AWS EMR guide.