Skip to content

Troubleshooting for Access Management for EMR

Accessing S3 Buckets Containing a 'dot' in the Name in EMR 6.x and above

In EMR version 6.x and above, you may encounter an error when attempting to read from or write to an S3 bucket that contains a dot (.) in its name using the s3a protocol in PySpark or Spark shell. This issue is caused by a problem with the AWS SDK.

Text Only
com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <{bucket-name-with-name}.east.us.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <{bucket-name-with-name}.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]

You need to enable path-style access for buckets with dots and give the properties shown below:

Bash
pyspark --conf "spark.hadoop.fs.s3a.path.style.access=true"
Bash
spark-shell --conf "spark.hadoop.fs.s3a.path.style.access=true"

Delta Table Creation Fails with S3 Protocol

When creating a Delta table using the s3 protocol in AWS EMR, the table creation fails as expected when no policy is applied. However, after applying the required permissions, the table creation still fails with the following exception:

Text Only
(pyspark.errors.exceptions.captured.AnalysisException: Cannot create table ('`db`.`table`'). 
The associated location ('s3://<location>') is not empty and also not a Delta table.)

To successfully create a Delta table without encountering exceptions, follow these steps:

  1. Check for Auto-Generated Folders:

    • After running the Delta table creation query, verify whether any _$folder$ directories exist in the specified S3 location.
  2. Manually Delete Unwanted Folders:

    • If the following folders are present in AWS S3, then you will need to work with your administrator to delete them in AWS S3 directly or through Privacera S3 browser.

      Note

      The following folder structure is an example. The actual folders may vary based on the table location used in your query.

      • <hms_database>_$folder$
      • <hms_database>/<delta_tables>_$folder$
      • <hms_database>/<delta_tables>/<table_1>_$folder$
      • <hms_database>/<delta_tables>/<table_1>/_delta_log_$folder$
  3. Retry the Delta Table Creation Query:

    • After removing the unwanted folders, re-run your Delta table creation SQL query.

Note

  • The Delta library automatically creates these folders (ending with _$folder$) during the table creation process.
  • If the user lacks the necessary permissions to the S3 bucket, these folders are not cleaned up, as the query execution is halted due to the permission issue.
  • We recommend performing manual cleanup before retrying the query with the required permissions.
  • The same issue can occur even without Privacera if the IAM role has permission to create/delete _$folder$ objects but lacks permission to the actual table location.

Comments