Skip to content

Access Management for EMR cluster

Introduction

Privacera offers a robust access control solution for Amazon EMR clusters, empowering users to define and enforce Fine-Grained Access Control (FGAC) policies across Spark, Hive, and Trino, as well as Object-Level Access Control (OLAC) specifically for Spark.

Connector Details

Topics Details
Integration methodology Apache Ranger Plugin
Access Tools
    Spark OLAC
  • pyspark
  • spark-shell
  • spark-submit
  • Hive
  • beeline
  • Trino
  • trino-cli
  • Others
  • Hue
  • Livy
Supported User Identities for Policies
  • LDAP/AD/SCIM Users
  • LDAP/AD/SCIM Groups
  • Privacera Roles
Data Source User Identities
  • Kerberos User
  • JWT (only for Spark)

Supported Access Management Features

Feature Spark OLAC Spark OLAC_FGAC Hive FGAC Trino FGAC
🟢 Object Level Access Control Yes Yes No No
🟢 Database Level Access Control No Yes Yes Yes
🟢 Table Access Control No Yes Yes Yes
🟢 View Access Control No Yes Yes Yes
🟢 Column Access Control No Yes Yes Yes
🟢 Row Access Control No Yes Yes Yes
🟢 Dynamic Column Data Masking No Yes Yes Yes
🟢 Dynamic Column Data Encryption No No Yes Yes
🟢 Centralized Access Audit No Yes Yes Yes
🟢 Granular Access Audit Record No Yes Yes Yes

Supported Runtime Versions

Privacera supports the following EMR runtime versions:

Version Release Version End-of-support date
🟢 6.15.0 9.1.0.1 January 25, 2026
Version Release Version End-of-support date
🟢 7.5.0 9.0.8.1 November 21, 2026
🟢 7.2.0 9.0.1.1 July 25, 2026
🟢 6.15.0 8.7.1.1 January 25, 2026

Limitations for Access Management Features

  1. To enforce access control policies in Privacera, Kerberos is required.
  2. JWT is supported for only Spark Plugin.

How it Works

The Privacera integrates with EMR clusters through the Apache Ranger plugin. The plugin is deployed during the creation of EMR clusters as part of the Spark, Hive, or Trino processes via bootstrap actions. The Apache Ranger plugin retrieves policies from the Privacera Policy Server and enforces them by intercepting and evaluating user queries in real-time. Additionally, any Attribute-based access control (ABAC) and Tag-based policies configured in Privacera are enforced by the Apache Ranger plugin at runtime.

User Identity Mapping

Policies in Privacera are configured for users and groups based on Kerberos or JWT, as well as for roles created within Privacera. These identities are mapped to the Databricks user identities as follows:

Privacera Identity EMR Identity
LDAP/AD/SCIM User Kerberos User / JWT
LDAP/AD/SCIM Group N/A
Privacera Role N/A

The Apache Ranger plugin, which runs as part of the Databricks Spark process maps the user's email address to the corresponding AD/SCIM user. The groups and roles associated with the user are dynamically fetched from Privacera and are used to enforce group and role-based policies within the Databricks clusters.

Comments