Skip to main content

Privacera Documentation

Privacera Spark plugin versus Open-source Spark plugin

The following table illustrates how, in a Kubernetes environment, the Privacera Spark plugin is better optimized than the open-source Spark plugin to perform fine-grained and object-level access control.

Privacera Spark Plug-in

Open-source Spark Plug-in

Fine-grained access control (FGAC)

  • Supports access management for both table/column level and object level (eg: S3)

  • IAM role which has S3 access required on the EKS nodes

  • Supports fine-grained access for tables/column level for Spark SQL only

Object-level access control (OLAC)

  • Supports OLAC for object-level access (eg: S3, ADLS)

  • Authorization through JWT token or Privacera token

  • Access control support for access through Spark context and also outside of Spark context

  • No support for object-level access

  • Only Spark SQL is supported

  • spark.read( ) and spark.write( ) operations to read/write S3 objects not supported

Audits

  • Access requests are logged in Audit server, which feeds Solr, S3, and other targets

  • Access requests are logged with different destinations eg: Solr, DB, Log4j, HDFS

Support

  • Active development for new releases and features

  • Enterprise-level support with 24x7 on-call

  • 200+ employee company with close to 100 engineers

  • Fully-supported on-prem and SaaS options for Ranger backend

  • Only supports up to Spark 2.4

  • No active development on the main project in 2 years

  • Only one significant contributor

  • Ranger must be compiled and supported by the user

Fine-grained access control

Privacera Spark plug-in does access control both at the file level and table/column level with fine-grained access control. Fine-grained access control assumes the EKS clusters nodes have the IAM role setup that can access S3 objects. Appropriate policies in Privacera Cloud can be set up at the file level/table level to control access. As long as the requests are in the Spark context, the access control will work with Spark plug-in.

Spark_on_EKS.png

Limitations

Though the plug-in does access control on all Spark related jobs, there are certain places where the plug-in cannot do access control:

  1. S3 requests from outside of Spark context.

  2. Since the IAM role needs to be given to the EKS nodes, it opens up the ability for unauthorized users to bypass Ranger security and access S3 and other AWS resources by using Python Boto library or other custom jar libraries.

Object-level Access Control

Object-level access control (OLAC) does only access control on the files/objects on S3, whether it is accessed through Spark jobs or outside of Spark jobs. It requires Data access server and S3 setup on EKS. OLAC is only supported with a Data access server on Privacera Platform. A data access server utilizes a Signed URL to provide access to S3 objects. OLAC also supports access control for requests outside of the Spark context using Python Boto Library or third-party custom libraries.