Skip to content

Accessing S3 Object in EMR Trino

The Privacera Trino plugin performs authorization checks to determine whether a user is permitted to access a specified Amazon S3 path. It does not perform S3 authentication or request signing.

All S3 authentication and request signing are handled by Trino running on Amazon EMR, using the IAM role attached to the EMR cluster.

Note

The IAM role referred to throughout this page is the role configured under the EMR cluster's Security Configuration > IAM Roles Mapping > IAM Role — commonly referred to as the app-data-access-role in Privacera deployments.

The IAM policy attached to the app-data-access-role must explicitly allow the necessary S3 actions (such as s3:GetObject, s3:ListBucket, and s3:PutObject) on the target S3 bucket or object path, depending on the Trino operation being performed.


Prerequisites

To access S3 on EMR Trino, the IAM policy attached to the app-data-access-role must include the following minimal S3 permissions for Trino use cases.

Sample IAM Policy

JSON
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket",
        "arn:aws:s3:::your-bucket/*"
      ],
      "Sid": "EmrTrinoS3Limited"
    }
  ]
}

Required Trino Connector Configuration

In addition to the IAM permissions, the following properties must be set in the trino-connector-hive EMR classification when creating the cluster. These settings ensure Trino uses EMRFS for S3 access and disables HDFS impersonation, which is required for Privacera's authorization model.

Property Value Description
hive.s3-file-system-type EMRFS Instructs Trino to use the EMR File System (EMRFS) for S3 access instead of the native S3 file system.
hive.hdfs.impersonation.enabled false Disables HDFS impersonation so Trino accesses S3 using the app-data-access-role rather than impersonating the end user.

Warning

Do not set fs.native-s3.enabled=true when hive.s3-file-system-type is set to EMRFS. Setting fs.native-s3.enabled=true overrides EMRFS and causes Trino to use the native S3 file system, which bypasses EMRFS-based S3 access and can lead to unexpected behavior.

Sample EMR Classification

JSON
{
  "Classification": "trino-connector-hive",
  "ConfigurationProperties": {
    "hive.metastore":"glue",
    "connector.name": "hive",
    "hive.s3-file-system-type": "EMRFS",
    "hive.hdfs.impersonation.enabled": "false",
    "hive.config.resources": "/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml"
  }
}