Skip to content

Privacera Spark Plug-in versus Open-source Spark Plug-in#

The following table illustrates how Privacera Spark plug-in is better optimized than open-source Spark plug-in in a Kubernetes environment to perform fine-grained and object-level access control.

 

Privacera Spark Plug-in

Open-source Spark Plug-in

Fine-grained access control (FGAC)

  • Supports access management for both table/column level and object level (eg: S3)
  • IAM role which has S3 access required on the EKS nodes 
  • Supports fine-grained access for tables/column level for Spark SQL only

Object-level access control (OLAC)

  • Supports OLAC for object-level access (eg: S3, ADLS) 
  • Authorization through JWT token or Privacera token
  • Access control support for access through Spark context and also outside of Spark context
  • No support for object-level access
  • Only Spark SQL is supported
  • spark.read( ) and spark.write( ) operations to read/write S3 objects not supported 

Audits

  • Access requests are logged in Audit server, which feeds Solr, S3, and other targets
  • Access requests are logged with different destinations eg: Solr, DB, Log4j, HDFS

Support

  • Active development for new releases and features
  • Enterprise-level support with 24x7 on-call
  • 200+ employee company with close to 100 engineers
  • Fully-supported on-prem and SaaS options for Ranger backend
  • Only supports up to Spark 2.4
  • No active development on the main project in 2 years
  • Only one significant contributor 
  • Ranger must be compiled and supported by the user

Fine-grained access control#

Privacera Spark plug-in does access control both at the file level and table/column level with fine-grained access control. Fine-grained access control assumes the EKS clusters nodes have the IAM role setup that can access S3 objects. Appropriate policies in Privacera Cloud can be set up at the file level/table level to control access. As long as the requests are in the Spark context, the access control will work with Spark plug-in.

Limitations#

Though the plug-in does access control on all Spark related jobs, there are certain places where the plug-in cannot do access control:

  1. S3 requests from outside of Spark context. 

  2. Since the IAM role needs to be given to the EKS nodes, it opens up the ability for unauthorized users to bypass Ranger security and access S3 and other AWS resources by using Python Boto library or other custom jar libraries.

Object-level Access Control#

Object-level access control does only access control on the files/objects on S3 whether it is accessed through Spark jobs or outside of Spark jobs. It requires Data access server and S3 setup on EKS.  OLAC is only supported with a Data access server on Privacera Platform (on-prem). Data server utilizes Signed URL concept to provide access to S3 objects. OLAC also supports access control for requests outside of Spark context using Python Boto Library or third party custom libraries.


Last update: March 30, 2022