JWT Token User Identity¶

Overview¶

This feature allows the use of JWT tokens to carry the user identity information required by Privacera to enforce access control. This works for certain connectors or use-cases where the data source may not be able to pass the user identity reliably to Privacera.

Connectors¶

The following connectors support the use of JWT tokens to carry user identity information:

OLAC connectors
- AWS EMR (on EC2) Spark OLAC connector without Kerberos - JWT token user identity is the only supported way to enforce access control in a non-Kerberos EMR (on EC2) cluster.
- AWS EMR-Serverless Spark OLAC connector without Lake Formation - JWT token user identity allows you to use Privacera for access control in a non-Lake Formation EMR-Serverless cluster without using IAM roles for user identity.
- Databricks Standard Cluster OLAC connector - JWT token user identity is an additional way to pass user identity to Privacera for access control if you don't want to use the logged-in user identity.
- Apache Spark on EKS OLAC connector - JWT token user identity is the only supported way to enforce access control in Apache Spark on EKS cluster.
FGAC connectors
- Databricks High Concurrency Cluster FGAC connector - JWT token user identity is an additional way to pass user identity to Privacera for access control if you don't want to use the logged-in user identity.

Supported Deployments¶

PrivaceraCloud
Self Managed Deployment
PrivaceraCloud Data-plane Deployment

Prerequisites¶

You need to have a JWT token generation capability in your identity provider (IdP) to generate the JWT token. The JWT token is signed by your IdP and contains the user identity information. The user is configured in Privacera with the same username. The public key of the IdP is used to validate the JWT token. It is either configured statically in Privacera or provided dynamially through a JWKS endpoint which is configured in Privacera.

For OLAC use-case, you need to have Privacera Dataserver configured and running, to which we will add the additional configuration to validate JWT token.

Sample Flow for OLAC¶

sequenceDiagram
    participant User
    participant IdentityProvider
    participant ComputeEnv as Compute Env + Privacera OLAC Plugin 
    participant PrivaceraDataServer
    participant CloudStorage

    User->>IdentityProvider: 1. Request JWT token
    IdentityProvider-->>User: 2. Provide JWT token
    User->>ComputeEnv: 3. Pass JWT token
    ComputeEnv->>PrivaceraDataServer: 4. Send JWT token
    PrivaceraDataServer->>PrivaceraDataServer: 5. Validate JWT token using static key or key from JWK endpoint
    PrivaceraDataServer->>PrivaceraDataServer: 6. Generate Signed URL/STS token
    PrivaceraDataServer-->>ComputeEnv: 7. Provide Signed URL/STS token
    ComputeEnv->>CloudStorage: 8. Access data using Signed URL/STS token
    CloudStorage-->>ComputeEnv: 9. Data retrieved

Diagram Explanation

Request JWT Token: The user requests a JWT token from the Identity Provider (IdP).
Provide JWT Token: The IdP provides the JWT token to the user.
Pass JWT Token: The user passes the JWT token to the compute environment.
Send JWT Token: The compute environment sends the JWT token to Privacera DataServer.
Validate JWT Token: Privacera DataServer validates the JWT token signature by using either IdP public key that is statically configured or is obtained dynamically from IdP's JWKS endpoint.
Generate Signed URL/STS Token: Privacera DataServer generates a Signed URL or STS token.
Provide Signed URL/STS Token: Privacera DataServer provides the Signed URL or STS token to the compute environment.
Access Data: The compute environment accesses data from cloud storage using the Signed URL or STS token.
Data Retrieved: The data is retrieved from the cloud storage and provided to the compute environment.

Sample Flow for FGAC¶

sequenceDiagram
    participant User
    participant IdentityProvider
    participant ComputeEnv as Compute Env + Privacera FGAC Plugin 
    participant CloudStorage

    User->>IdentityProvider: 1. Request JWT token
    IdentityProvider-->>User: 2. Provide JWT token
    User->>ComputeEnv: 3. Pass JWT token
    ComputeEnv->>ComputeEnv: 4. Privacera FGAC plugin validates JWT token using static key or key from JWK endpoint
    ComputeEnv->>ComputeEnv: 5. Privacera FGAC plugin uses identity to enforce access control
    ComputeEnv->>CloudStorage: 6. Access data using Compute Env native permissions (IAM role)
    CloudStorage-->>ComputeEnv: 7. Data retrieved

Diagram Explanation

Request JWT Token: The user requests a JWT token from the Identity Provider (IdP).
Provide JWT Token: The IdP provides the JWT token to the user.
Pass JWT Token: The user passes the JWT token to the compute environment.
Validate JWT Token: Privacera FGAC plugin validates the JWT token signature by using either IdP public key that is statically configured or is obtained dynamically from IdP's JWKS endpoint.
Enforce Access Control: Privacera FGAC plugin uses the user identity to enforce access control.
Access Data: The compute environment accesses data from cloud storage using the compute environments native permissions (IAM role).
Data Retrieved: The data is retrieved from the cloud storage and provided to the compute environment.

Concepts¶

JWT Token Format¶

A JSON Web Token (JWT) consists of three Base64 strings separated by dots (.). These 3 parts are header, payload and signature. The header and payload are JSON objects, and the signature is a computed over the header and payload using a secret key. The signature is used to confirm the identity of the issuer and the integrity of the JWT token.

The header contains the algorithm used to sign the JWT token. An example JWT header JSON is shown below. All the values are examples and should not be used as is.

JSON
{
  "alg": "RS256",
  "typ": "JWT",
  "kid": "1234567890"
}

The fields in the header are as follows,

The alg field is the algorithm used to sign the JWT token. Privacera supports only RSA256 and ECDSA256 algorithms for JWT token signature, which correspond to RS256 and ES256 as values of this field.
The typ field is the type of the token. This is a literal value and is always JWT.
The kid field is the key id of the public key used to sign the JWT token. This is an optional field. It is present if JWKS endpoint is used to fetch the public key.

The payload contains the claims. An example JWT payload JSON is shown below. All the values are examples and should not be used as is.

JSON
{
  "iss": "https://testidp.example.com/issuer/websec",
  "sub": "infra_test_user",
  "iat": "1721223184",
  "exp": "1721283133",
  "aud": "https://dataserver.example.com",
  "scope": [
    "infra_test_group"
  ]
}

The fields in the payload are as follows:

The iss field is the issuer of the JWT token. This value is configured in Privacera so that it can be used to obtain the configuration for validating the JWT token. This is a mandatory field. Typically, it is in the format of a URL, but it is a literal value and no connection attempt will be made to this URL.
The sub field is the subject of the JWT token. This is the user identity that Privacera should use to enforce access control. This is a mandatory field. You can configure another key in the payload to be used as the user identity.
The iat field is the issued at time of the JWT token. This is the time when the token was issued in Unix time. This is a mandatory field. The token is rejected if current time is before this time.
The exp field is the expiration time of the JWT token. This is the expiry time of the token in Unix time. This is a mandatory field. The token is rejected if the current time is after this time.
The aud field is the audience of the JWT token. This is the intended recipient of the token, which is Privacera Dataserver. This is a string that is configured in Privacera and Privacera will use the token only if it matches. This is an optional field.
The scope field is used to carry additional list of groups. This is an optional field. You can configure another key in the payload to be used as the group list. The groups can be either space separated or comma separated. These groups can be used to override the user's groups that are configured in Privacera or to add additional groups. TODO: need the properties

All other fields in the payload will be ignored by Privacera.

Token Duration

For OLAC jobs, the token duration can be short as it is used only during the startup of the job to pass the identity to the Privacera Dataserver. For FGAC jobs, the token duration should be long enough to cover the duration of the job.

JWT Signature Verification¶

JWT Signature verification is done using the public key of the IdP. The public key can be configured statically in Privacera or dynamically fetched from the IdP's JWKS endpoint.

Privacera supports only RSA256 and ECDSA256 algorithms for JWT token signature.

In case of dynamic public key configuration, the public key is fetched from the IdP's JWKS (JSON Web Key Set) endpoint using the kid field in the JWT header. The JWKS service returns a set of keys containing public keys used to verify the JWT token. The endpoint could return a set of keys or one specific key given the kid field in the JWT header.

Here is an example of JWKS with RSA JWK returned by the JWKS service -

JSON
{
"keys": [
  {
    "alg": "RS256",
    "kty": "RSA",
    "use": "sig",
    "x5c": [
      "your-x509-cert-chain"
    ],
    "n": "your-rsa-public-modulus",
    "e": "your-base64url-encoded-exponent",
    "kid": "your-unique-key-id",
    "x5t#S256": "your-unique-thumbprint-sha256",
    "exp": "your-expiration-time"
  }
]}

The various keys are described below as -

alg - Algorithm used to sign the JWT token. It is either RS256 or HS256.
kty - Key type. It is either RSA or EC.
use - Use of the key. It is either sig or enc.
x5c - X.509 certificate chain. It is an array of base64 encoded X.509 certificates.
n - RSA modulus. It is a base64 encoded string.
e - RSA exponent. It is a base64 encoded string.
kid - Key ID. It is a string identifier.
x5t#S256 - X.509 certificate SHA-1 thumbprint. It is a base64 encoded string.
exp - Expiration time of the key. It is a Unix time.

Here is an example of JWKS with ECDSA JWK returned by the JWKS service -

JSON
{
"keys": [
  {
    "kty": "EC",
    "crv": "P-256",
    "x": "your-x-coordinate",
    "y": "your-y-coordinate",
    "kid": "your-unique-key-id",
    "exp": "your-expiration-time"
  }
]}

The various keys are described below as -

kty - Key type. It is either RSA or EC.
crv - Curve used for the key. It is a string.
x - X coordinate of the key. It is a base64 encoded string.
y - Y coordinate of the key. It is a base64 encoded string.
kid - Key ID. It is a string identifier.
exp - Expiration time of the key. It is a Unix time.

The Privacera Dataserver will obtain the key from the JWKS service endpoint when a JWT token with key id is received. This key will be cached for it's expiration duration.

Using JWT Token User Identity Feature in Privacera¶

To use this feature you need to do the following:

For OLAC supported connectors
1. Configure Privacera Dataserver to use JWT tokens
2. Configure EMR, Databricks or Apache Spark plugin to use JWT token
3. At runtime, generate JWT token and pass it to the Spark job
For FGAC supported connectors
1. Configure the Databricks Spark plugin to use JWT token
2. At runtime, generate JWT token and pass it to the Spark job