Troubleshooting for Access Management for EMR¶
Accessing S3 Buckets Containing a 'dot' in the Name in EMR 6.x and above¶
In EMR version 6.x and above, you may encounter an error when attempting to read from or write to an S3 bucket that contains a dot (.) in its name using the s3a protocol in PySpark or Spark shell. This issue is caused by a problem with the AWS SDK.
You need to enable path-style access for buckets with dots and give the properties shown below:
| Bash | |
|---|---|
| Bash | |
|---|---|
Delta Table Creation Fails with S3 Protocol¶
When creating a Delta table using the s3 protocol in AWS EMR, the table creation fails as expected when no policy is applied. However, after applying the required permissions, the table creation still fails with the following exception:
| Text Only | |
|---|---|
To successfully create a Delta table without encountering exceptions, follow these steps:
-
Check for Auto-Generated Folders:
- After running the Delta table creation query, verify whether any
_$folder$directories exist in the specified S3 location.
- After running the Delta table creation query, verify whether any
-
Manually Delete Unwanted Folders:
-
If the following folders are present in AWS S3, then you will need to work with your administrator to delete them in AWS S3 directly or through Privacera S3 browser.
Note
The following folder structure is an example. The actual folders may vary based on the table location used in your query.
<hms_database>_$folder$<hms_database>/<delta_tables>_$folder$<hms_database>/<delta_tables>/<table_1>_$folder$<hms_database>/<delta_tables>/<table_1>/_delta_log_$folder$
-
-
Retry the Delta Table Creation Query:
- After removing the unwanted folders, re-run your Delta table creation SQL query.
Note
- The Delta library automatically creates these folders (ending with
_$folder$) during the table creation process. - If the user lacks the necessary permissions to the S3 bucket, these folders are not cleaned up, as the query execution is halted due to the permission issue.
- We recommend performing manual cleanup before retrying the query with the required permissions.
- The same issue can occur even without Privacera if the IAM role has permission to create/delete
_$folder$objects but lacks permission to the actual table location.
Enable DEBUG logs for AWS EMR Spark Jobs¶
Here are the steps to enable debugs logs for spark plugin running in EMR. Secure shell (ssh) to master node of your EMR cluster and run the following commands:
Expand the following section and copy it's content to the log4j2.properties file, or modify the existing file.log4j2.properties
You will have to restart your spark-shell or spark-sql command to pick up the changes.
Run your use-case to generate the debug logs. The logs will be available in the /tmp/<user>/privacera.log file.
After your debugging session is over, ensure to revert the changes made to the log4j2.properties file.
Verify access-control.properties in EMR Trino¶
This section explains how to verify the access-control.properties configuration file used by EMR Trino.
- SSH to the master node.
Bash - Check the
access-control.propertiesconfigurationTheBash access-control.propertiesfile should contain the following properties
Note
Verify that the access-control.name property is set to privacera-ranger to confirm that Privacera Trino Plugin is configured.
Enable Debug Logs for EMR Trino¶
This section explains how to enable debug logging for EMR Trino to help troubleshoot access management issues.
- SSH to the master node.
Bash - Check the existing Trino logging configuration file The file should contain only:
Bash Properties - Edit the Trino logging configuration file and add the following logging configuration Add or update the logging configuration:
Bash
Adjust the log level as needed. You can set it to INFO, DEBUG, TRACE, or ERROR, depending on the level of troubleshooting required. - Restart the Trino service to apply the changes
Bash
To Verify Trino Server Logs¶
- Navigate to the Trino logs directory
Bash - Run the following command and ensure configuration is updated
Bash - Monitor the server logs in real-time
Bash
After your debugging session is complete, revert the changes made to the log.properties file to prevent excessive log generation.
Deploy Custom Code Build for EMR Trino¶
This section explains how to deploy a custom code build of the Privacera Trino plugin to your EMR cluster.
-
SSH to the master node.
Bash -
Navigate to the Privacera plugins directory
Bash -
Back up the existing Trino plugin package
-
Download the new custom Trino plugin package
-
Execute the Privacera setup script
-
Restart the Trino service to apply the custom build
Bash
Verification¶
- To confirm that the custom-built Privacera Trino plugin is installed correctly, check the plugin version file:
Bash - The
privacera_version.txtfile should contain the version information that corresponds to your custom build.
Rollback Procedure¶
If issues occur with the custom build, you can rollback to the previous version:
-
Stop the Trino service
Bash -
Remove the new plugin package and restore the backup
-
Execute the setup script and restart the service
- Prev topic: Advanced Configuration