Unset AWS credentials in Spark's environment for Glue Data Catalog calls on EMR¶

Overview¶

When Spark accesses AWS Glue Data Catalog through the AWS SDK v2, the default credential chain checks environment variables and shared credentials before the EC2 instance profile. If AWS_* environment variables or AWS_SHARED_CREDENTIALS_FILE point to credentials that are not valid for Glue API calls, Spark metadata calls can fail with errors such as:

Text Only
1	`software.amazon.awssdk.services.glue.model.GlueException: The security token included in the request is invalid`

Why this happens¶

On EMR 7.x, Spark uses AWS SDK v2 to call AWS Glue. The SDK’s default credential provider chain prefers environment variables and the shared credentials file over the EC2 instance profile when those are set. Interactive sessions, automation, or other components on the node may set AWS_* values or AWS_SHARED_CREDENTIALS_FILE for purposes that do not match Glue Data Catalog authentication. Spark then picks up those sources first, which often surfaces as the invalid security token error above.

To handle this case, follow the configuration steps below.

Configure¶

Self Managed and Data Plane

SSH to the instance where Privacera Manager is installed.
Navigate to the Privacera Manager configuration directory:
Bash
1
cd ~/privacera/privacera-manager/config
Open the EMR variables file for editing:
Bash
1
vi custom-vars/vars.emr.yml
Uncomment and set the following property:
YAML
1
EMR_SPARK_GLUE_DATACATALOG_UNSET_AWS_ENV_ENABLE: "true"
Note
- Set to "true" to enable Spark-side unsetting of AWS_* and AWS_SHARED_CREDENTIALS_FILE so Glue metadata calls use instance profile credentials.
- Set to "false" (default) to retain legacy behavior (no additional unset block in spark-env.sh).

Apply the configuration by running Privacera Manager post-install:

Bash
cd ~/privacera/privacera-manager
./privacera-manager.sh post-install

Verification¶

After the cluster is up, SSH to a node where Spark runs and check:

Bash
grep -n "unset AWS_SHARED_CREDENTIALS_FILE\|unset AWS_ACCESS_KEY_ID\|unset AWS_SECRET_ACCESS_KEY\|unset AWS_SESSION_TOKEN" /etc/spark/conf/spark-env.sh

When enabled, you should see unset lines similar to:

Text Only
1 2 3 4	`unset AWS_SHARED_CREDENTIALS_FILE unset AWS_ACCESS_KEY_ID unset AWS_SECRET_ACCESS_KEY unset AWS_SESSION_TOKEN`

After this, retry a Spark metadata operation (for example SHOW DATABASES) to confirm Glue access succeeds without invalid-token errors.

Prev topic: Advanced Configuration