Skip to content

Unset AWS credentials in Spark's environment for Glue Data Catalog calls on EMR

Overview

When Spark accesses AWS Glue Data Catalog through the AWS SDK v2, the default credential chain checks environment variables and shared credentials before the EC2 instance profile. If AWS_* environment variables or AWS_SHARED_CREDENTIALS_FILE point to credentials that are not valid for Glue API calls, Spark metadata calls can fail with errors such as:

Text Only
software.amazon.awssdk.services.glue.model.GlueException: The security token included in the request is invalid

Why this happens

On EMR 7.x, Spark uses AWS SDK v2 to call AWS Glue. The SDK’s default credential provider chain prefers environment variables and the shared credentials file over the EC2 instance profile when those are set. Interactive sessions, automation, or other components on the node may set AWS_* values or AWS_SHARED_CREDENTIALS_FILE for purposes that do not match Glue Data Catalog authentication. Spark then picks up those sources first, which often surfaces as the invalid security token error above.

To handle this case, follow the configuration steps below.

Configure

  1. SSH to the instance where Privacera Manager is installed.

  2. Navigate to the Privacera Manager configuration directory:

    Bash
    cd ~/privacera/privacera-manager/config
    

  3. Open the EMR variables file for editing:

    Bash
    vi custom-vars/vars.emr.yml
    

  4. Uncomment and set the following property:

    YAML
    EMR_SPARK_GLUE_DATACATALOG_UNSET_AWS_ENV_ENABLE: "true"
    

    Note

    • Set to "true" to enable Spark-side unsetting of AWS_* and AWS_SHARED_CREDENTIALS_FILE so Glue metadata calls use instance profile credentials.
    • Set to "false" (default) to retain legacy behavior (no additional unset block in spark-env.sh).
  5. Apply the configuration by running Privacera Manager post-install:

    Bash
    cd ~/privacera/privacera-manager
    ./privacera-manager.sh post-install
    

Verification

After the cluster is up, SSH to a node where Spark runs and check:

Bash
grep -n "unset AWS_SHARED_CREDENTIALS_FILE\|unset AWS_ACCESS_KEY_ID\|unset AWS_SECRET_ACCESS_KEY\|unset AWS_SESSION_TOKEN" /etc/spark/conf/spark-env.sh

When enabled, you should see unset lines similar to:

Text Only
1
2
3
4
unset AWS_SHARED_CREDENTIALS_FILE
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN
After this, retry a Spark metadata operation (for example SHOW DATABASES) to confirm Glue access succeeds without invalid-token errors.