Skip to content

Coexistence of OLAC and FGAC for Compliance in Apache Spark

This document is not applicable for Databricks Cluster with Privacera FGAC, Databricks Unity Catalog or EMR/EMR Serverless secured by AWS Lake Formation

Privacera’s Object Level Access Control (OLAC) and Fine-Grained Access Control (FGAC) are typically discussed as separate options for securing data in Apache Spark. However, it is also possible and sometimes desirable to combine both OLAC and FGAC in the same environment. In such a deployment, OLAC grants or denies access to entire objects/folders and manages credentials to the underlying object store meaning the cluster does not need a privileged IAM role, as these credentials are provided by the Privacera DataServer while FGAC applies row-level filtering and column masking policies.

This applies for Open Source Apache Spark on Kubernetes and EMR/EMR Serverless without Lake Formation. It is also not needed in Databricks Cluster when FGAC is used.

Why Combine OLAC and FGAC?

  1. Layered Security: OLAC ensures only the necessary files are accessible, reducing the attack surface. Meanwhile, FGAC enforces more granular policies (e.g., row/column restrictions) on data that is already permitted at the object level.
  2. Minimized Risk of Bypass: Even if a user gains object-level access credentials, only the files they are allowed for the user to see will be visible.
  3. Unified Compliance Approach: For compliance-based use cases, combining the two methods helps enforce comprehensive data governance—covering everything from controlling credentials to ensuring sensitive data is masked.
  4. Centralized Credential Provisioning: Through the Privacera DataServer, short-lived credentials for object stores (S3, ADLS, GCS, or MinIO) are provided as needed. This eliminates the requirement for privileged IAM roles within the Spark cluster, further strengthening security by limiting direct access to storage backends.

⚠ Limitations

  1. Configuration Complexity: Managing two sets of policies (OLAC and FGAC) simultaneously can become complex, requiring coordination, thorough testing, and clear documentation.
  2. Secure Cluster Requirement: Even when OLAC and FGAC are in place, unauthorized users might still attempt to bypass Privacera. However the user or service user will only be able to access the files they are allowed to see.
  3. Policy Conflicts: Overlapping or contradictory rules between OLAC and FGAC can result in unexpected behaviors, requiring careful policy design.
  4. Partial Enforcement in Some Tools: While OLAC and FGAC cover most Spark scenarios, some external tools or direct storage access methods might not be fully subject to these combined policies.

Architecture Overview

The architecture for OLAC and FGAC integration is designed to ensure that both object-level and row/column-level. This requires the cluster to be configured to use both Privacera DataServer and Privacera Plugin. The Privacera DataServer is responsible for generating signed URLs or STS tokens for object-level access, while the Privacera Plugin enforces row/column-level policies.

Combined OLAC and FGAC Integration Diagram

Below is a diagram illustrating how a SparkSQL query can be subject to both FGAC (row/column) checks and OLAC (object-level) checks. The Spark engine does not require a privileged IAM role because the Privacera DataServer


sequenceDiagram
participant U as User
participant SS as SparkSQL
participant PP as PrivaceraPlugin
participant SE as SparkEngine (No IAM Role)
participant DS as PrivaceraDataServer
participant PPf as PrivaceraPlatform
participant CS as CloudStorageService
participant AL as AuditLogs

    U->>SS: Submit SparkSQL Query
    Note right of SS: Access & Row & Column Policies
    SS->>PP: Intercept & Check FGAC Policies

    alt Access Allowed by FGAC
        PP->>SS: Return Updated Query
        SS->>DS: Request Ephemeral Credentials
        DS->>PPf: Evaluate OLAC Policy
        PPf-->>DS: Access Approved
        DS->>CS: Generate Signed URL or STS
        CS-->>DS: Return Signed URL or STS
        DS->>SS: Provide Ephemeral Credentials
        SS->>SE: Execute Query with Credentials
        SE->>CS: Fetch Data using Signed URL or STS
        CS-->>SE: Return Data
        SE->>U: Query Results
        DS->>AL: Log Access Request
    else Access Denied by FGAC or OLAC
        PP->>SS: Return Access Denied
        SS->>U: Error
    end

Example Use Cases

Use Case 1: Hybrid Security for Customer Data

  • Scenario: A marketing analytics team requires fine-grained access to customer data (e.g., only certain rows/columns) but also wants to ensure that no unauthorized users can read the raw object files.
  • Implementation:
    1. OLAC manages S3 object credentials through Privacera DataServer, granting or denying file-level access.
    2. FGAC enforces policies that only display masked or limited rows to users in the “marketing” group.
    3. Outcome: Combined, these controls ensure marketing analysts see only the data needed for targeted campaigns, while object-level credentials remain secured.

Use Case 2: ETL Job with Limited Object Store Access

  • Scenario: The ETL job is trusted, but due to governance policies, it should only access specific folders in the object store.
  • Implementation:
    1. OLAC ensures that the partner can only read the specific folder or dataset in the object store needed for the ETL job.
    2. FGAC masks sensitive columns (PII, financial data) and filters out rows based on the partner’s access level.
    3. Outcome: Generated datasets are compliant with data governance policies, and the ETL job can run without needing a privileged IAM role.

FAQ

  1. Is it mandatory to run both OLAC and FGAC together? No. You can run OLAC or FGAC independently. However, using both provides a layered approach: OLAC for object-level access and FGAC for row/column-level filtering.

  2. Will these controls stop malicious users from bypassing Spark altogether? They reduce the risk, but a secure cluster configuration is still essential. For example, if a user has direct credentials for your storage service, they could bypass Spark-level checks.

  3. Can these policies be applied to other compute engines besides SparkSQL? Typically, OLAC handles object-level credentials no matter the engine, but FGAC row/column policies are specifically for $1

Comments