Skip to content

Access Management for EMR cluster

Introduction

Privacera offers a robust access control solution for Amazon EMR clusters, empowering users to define and enforce Fine-Grained Access Control (FGAC) policies across Spark, Hive, and Trino, as well as Object-Level Access Control (OLAC) specifically for Spark.

Connector Details

Topics Details
Integration methodology Apache Ranger Plugin
Access Tools
    Spark OLAC
  • pyspark
  • spark-shell
  • spark-submit
  • Hive
  • beeline
  • Trino
  • trino-cli
  • Others
  • Hue
  • Livy
Supported User Identities for Policies
  • LDAP/AD/SCIM Users
  • LDAP/AD/SCIM Groups
  • Privacera Roles
Data Source User Identities
  • Kerberos User
  • JWT (only for Spark)

Supported Access Management Features

Feature Spark OLAC Hive FGAC Trino FGAC
🟢 Object Level Access Control Yes No No
🟢 Database Level Access Control No Yes Yes
🟢 Table Access Control No Yes Yes
🟢 View Access Control No Yes Yes
🟢 Column Access Control No Yes Yes
🟢 Row Access Control No Yes Yes
🟢 Dynamic Column Data Masking No Yes Yes
🟢 Dynamic Column Data Encryption No Yes Yes
🟢 Centralized Access Audit No Yes Yes
🟢 Granular Access Audit Record No Yes Yes

Limitations for Access Management Features

  1. To enforce access control policies in Privacera, Kerberos is required.
  2. JWT is supported for only Spark Plugin.

How it Works

The Privacera integrates with EMR clusters through the Apache Ranger plugin. The plugin is deployed during the creation of EMR clusters as part of the Spark, Hive, or Trino processes via bootstrap actions. The Apache Ranger plugin retrieves policies from the Privacera Policy Server and enforces them by intercepting and evaluating user queries in real-time. Additionally, any Attribute-based access control (ABAC) and Tag-based policies configured in Privacera are enforced by the Apache Ranger plugin at runtime.

User Identity Mapping

Policies in Privacera are configured for users and groups based on Kerberos or JWT, as well as for roles created within Privacera. These identities are mapped to the Databricks user identities as follows:

Privacera Identity EMR Identity
LDAP/AD/SCIM User Kerberos User / JWT
LDAP/AD/SCIM Group N/A
Privacera Role N/A

The Apache Ranger plugin, which runs as part of the Databricks Spark process maps the user's email address to the corresponding AD/SCIM user. The groups and roles associated with the user are dynamically fetched from Privacera and are used to enforce group and role-based policies within the Databricks clusters.

Comments