Coexistence of OLAC and FGAC for Compliance in Apache Spark¶
This document is not applicable for Databricks Cluster with Privacera FGAC, Databricks Unity Catalog or EMR/EMR Serverless secured by AWS Lake Formation
Privacera’s Object Level Access Control (OLAC) and Fine-Grained Access Control (FGAC) are typically discussed as separate options for securing data in Apache Spark. However, it is also possible and sometimes desirable to combine both OLAC and FGAC in the same environment. In such a deployment, OLAC grants or denies access to entire objects/folders and manages credentials to the underlying object store meaning the cluster does not need a privileged IAM role, as these credentials are provided by the Privacera DataServer while FGAC applies row-level filtering and column masking policies.
This applies for Open Source Apache Spark on Kubernetes and EMR/EMR Serverless without Lake Formation. It is also not needed in Databricks Cluster when FGAC is used.
Why Combine OLAC and FGAC?¶
- Layered Security: OLAC ensures only the necessary files are accessible, reducing the attack surface. Meanwhile, FGAC enforces more granular policies (e.g., row/column restrictions) on data that is already permitted at the object level.
- Minimized Risk of Bypass: Even if a user gains object-level access credentials, only the files they are allowed for the user to see will be visible.
- Unified Compliance Approach: For compliance-based use cases, combining the two methods helps enforce comprehensive data governance—covering everything from controlling credentials to ensuring sensitive data is masked.
- Centralized Credential Provisioning: Through the Privacera DataServer, short-lived credentials for object stores (S3, ADLS, GCS, or MinIO) are provided as needed. This eliminates the requirement for privileged IAM roles within the Spark cluster, further strengthening security by limiting direct access to storage backends.
Limitations¶
- Configuration Complexity: Managing two sets of policies (OLAC and FGAC) simultaneously can become complex, requiring coordination, thorough testing, and clear documentation.
- Secure Cluster Requirement: Even when OLAC and FGAC are in place, unauthorized users might still attempt to bypass Privacera. However the user or service user will only be able to access the files they are allowed to see.
- Policy Conflicts: Overlapping or contradictory rules between OLAC and FGAC can result in unexpected behaviors, requiring careful policy design.
- Partial Enforcement in Some Tools: While OLAC and FGAC cover most Spark scenarios, some external tools or direct storage access methods might not be fully subject to these combined policies.
Architecture Overview¶
The architecture for OLAC and FGAC integration is designed to ensure that both object-level and row/column-level. This requires the cluster to be configured to use both Privacera DataServer and Privacera Plugin. The Privacera DataServer is responsible for generating signed URLs or STS tokens for object-level access, while the Privacera Plugin enforces row/column-level policies.
Combined OLAC and FGAC Integration Diagram¶
Below is a diagram illustrating how a SparkSQL query can be subject to both FGAC (row/column) checks and OLAC (object-level) checks. The Spark engine does not require a privileged IAM role because the Privacera DataServer
sequenceDiagram
participant U as User
participant SS as SparkSQL
participant PP as PrivaceraPlugin
participant SE as SparkEngine (No IAM Role)
participant DS as PrivaceraDataServer
participant PPf as PrivaceraPlatform
participant CS as CloudStorageService
participant AL as AuditLogs
U->>SS: Submit SparkSQL Query
Note right of SS: Access & Row & Column Policies
SS->>PP: Intercept & Check FGAC Policies
alt Access Allowed by FGAC
PP->>SS: Return Updated Query
SS->>DS: Request Ephemeral Credentials
DS->>PPf: Evaluate OLAC Policy
PPf-->>DS: Access Approved
DS->>CS: Generate Signed URL or STS
CS-->>DS: Return Signed URL or STS
DS->>SS: Provide Ephemeral Credentials
SS->>SE: Execute Query with Credentials
SE->>CS: Fetch Data using Signed URL or STS
CS-->>SE: Return Data
SE->>U: Query Results
DS->>AL: Log Access Request
else Access Denied by FGAC or OLAC
PP->>SS: Return Access Denied
SS->>U: Error
end
Example Use Cases¶
Use Case 1: Hybrid Security for Customer Data
- Scenario: A marketing analytics team requires fine-grained access to customer data (e.g., only certain rows/columns) but also wants to ensure that no unauthorized users can read the raw object files.
- Implementation:
- OLAC manages S3 object credentials through Privacera DataServer, granting or denying file-level access.
- FGAC enforces policies that only display masked or limited rows to users in the “marketing” group.
- Outcome: Combined, these controls ensure marketing analysts see only the data needed for targeted campaigns, while object-level credentials remain secured.
Use Case 2: ETL Job with Limited Object Store Access
- Scenario: The ETL job is trusted, but due to governance policies, it should only access specific folders in the object store.
- Implementation:
- OLAC ensures that the partner can only read the specific folder or dataset in the object store needed for the ETL job.
- FGAC masks sensitive columns (PII, financial data) and filters out rows based on the partner’s access level.
- Outcome: Generated datasets are compliant with data governance policies, and the ETL job can run without needing a privileged IAM role.
FAQ¶
-
Is it mandatory to run both OLAC and FGAC together? No. You can run OLAC or FGAC independently. However, using both provides a layered approach: OLAC for object-level access and FGAC for row/column-level filtering.
-
Will these controls stop malicious users from bypassing Spark altogether? They reduce the risk, but a secure cluster configuration is still essential. For example, if a user has direct credentials for your storage service, they could bypass Spark-level checks.
-
Can these policies be applied to other compute engines besides SparkSQL? Typically, OLAC handles object-level credentials no matter the engine, but FGAC row/column policies are specifically for $1
- Prev topic: FGAC for Compliance in Apache Spark