Skip to main content

Privacera Platform

Table of Contents

Component services configurations

:

Access Management

Data Server
AWS
AWS Data Server
Configure Privacera Data Access Server

This section covers how you can configure Privacera Data Access Server.

CLI Configuration Steps
  1. SSH to the instance where Privacera Manager is installed.

  2. Run the following command.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataserver.aws.yml config/custom-vars/
    
  3. Edit the properties. For property details and description, refer to the Configuration properties below.

    vi config/custom-vars/vars.dataserver.aws.yml
    

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, click here.

  4. Run Privacera Manager update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

DATASERVER_RANGER_AUTH_ENABLED

Enable/disable Ranger authorization in DataServer.

DATASERVER_V2_WORKDER_THREADS

Number of worker threads to process inbound connection.

20

DATASERVER_V2_CHANNEL_CONNECTION_BACKLOG

Maximum queue size for inbound connection.

128

DATASERVER_V2_CHANNEL_CONNECTION_POOL

Enable connection pool for outbound request. The property is disabled by default.

DATASERVER_V2_FRONT_CHANNEL_IDLE_TIMEOUT

Idle timeout for inbound connection.

60

DATASERVER_V2_BACK_CHANNEL_IDLE_TIMEOUT

Idle timeout for outbound connection and will take effect only if the connection pool enabled.

60

DATASERVER_HEAP_MIN_MEMORY_MB

Add the minimum Java Heap memory in MB used by Dataserver.

1024

DATASERVER_HEAP_MAX_MEMORY_MB

Add the maximum Java Heap memory in MB used by Dataserver.

1024

DATASERVER_USE_REGIONAL_ENDPOINT

Set this property to enforce default region for all S3 buckets.

true

DATASERVER_AWS_REGION

Default AWS region for S3 bucket.

us-east-1

AWS S3 data server

This section covers how you can configure access control for AWS S3 through Privacera Data Access Server.

Prerequisites

Ensure that the following prerequisites are met:

  • Create and add an AWS IAM Policy defined to allow access to S3 resources.

    Follow AWS IAM Create and Attach Policy instructions, using either "Full S3 Access" or "Limited S3 Access" policy templates, depending on your enterprise requirements.

    Return to this section once the Policy is attached to the Privacera Manager Host VM.

CLI configuration
  1. SSH to the instance where Privacera Manager is installed.

  2. Configure Privacera Data Server.

  3. Edit the properties. For property details and description, refer to the Configuration Properties below.

    vi config/custom-vars/vars.dataserver.aws.yml
    

    Note

    • In Kubernetes environment, enable DATASERVER_USE_POD_IAM_ROLE and DATASERVER_IAM_POLICY_ARN for using a specific IAM role for Dataserver pod. For property details and description, see S3 properties.

    • You can also add custom properties that are not included by default. See Dataserver.

  4. Run Privacera Manager update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

DATASERVER_USE_POD_IAM_ROLE

Property to enable the creation of an IAM role that will be used for the Dataserver pod.

true

DATASERVER_IAM_POLICY_ARN

Full IAM policy ARN which needs to be attached to the IAM role associated with the Dataserver pod.

arn:aws:iam::aws:policy/AmazonS3FullAccess

DATASERVER_USE_IAM_ROLE

If you've given permission to an IAM role to access the bucket, enable **Use IAM Roles**.

DATASERVER_S3_AWS_API_KEY

If you've used a access to access the bucket, disable **Use IAM Role**, and set the AWS API Key.

AKIAIOSFODNN7EXAMPLE

DATASERVER_S3_AWS_SECRET_KEY

If you've used a secret key to access the bucket, disable **Use IAM Role**, and set the AWS Secret Key.

wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

DATASERVER_V2_S3_ENDPOINT_ENABLE

Enable to use a custom S3 endpoint.

DATASERVER_V2_S3_ENDPOINT_SSL

Property to enable/disable, if SSL is enabled/disabled on the MinIO server.

DATASERVER_V2_S3_ENDPOINT_HOST

Add the endpoint server host.

192.468.12.142

DATASERVER_V2_S3_ENDPOINT_PORT

Add the endpoint server port.

9000

DATASERVER_AWS_REQUEST_INCLUDE_USERINFO

Property to enable adding session role in CloudWatch logs for requests going via Dataserver.

This will be available with the **privacera-user** key in the Request Params of CloudWatch logs.

Set to true, if you want to see the **privacera-user** in CloudWatch.

true

AWS Athena data server

This section covers how you can configure access control for AWS Athena through Privacera Data Access Server.

Prerequisites

Ensure the following:

  • Create and add an AWS IAM Policy defined to allow rights to use Athena and Glue resources and databases.

    Follow AWS IAM Create and Attach Policy instructions, using the "Athena Access" policy modified as necessary for your enterprise. Return to this section once the Policy is attached to the Privacera Manager Host VM.

CLI configuration
  1. SSH to the instance where Privacera Manager is installed.

  2. Configure Privacera Data Server.

  3. Edit the properties. For property details and description, refer to the Configuration Properties below.

    vi config/custom-vars/vars.dataserver.aws.yml
    

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, click here.

  4. Run Privacera Manager update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Identify an existing S3 bucket or create one to store the Athena query results.

AWS_ATHENA_RESULT_STORAGE_URL: "s3://${S3_BUCKET_FOR_QUERY_RESULTS}/athena-query-results/index.html"
Azure
Azure ADLS Data Server

This topic covers integration of Azure Data Lake Storage (ADLS) with the Privacera Platform using Privacera Data Access Server.

Prerequisites

Ensure that the following prerequisites are met:

  • You have access to an Azure Storage account along with required credentials.

    For more information on how to set up an Azure storage account, refer to Azure Storage Account Creation.

  • Get the values for the following Azure properties: Application (client) ID, Client secrets

CLI Configuration
  1. Go to the privacera-manager folder in your virtual machine. Open the config folder, copy the sample vars.dataserver.azure.yml file to the custom-vars/ folder, and edit it.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataserver.azure.yml config/custom-vars/
    vi custom-vars/vars.dataserver.azure.yml
    
  2. Edit the Azure-related information. For property details and description, click here.

    1. If you want to use Azure CLI, use the following properties:

      ENABLE_AZURE_CLI: "true"
      AZURE_GEN2_SHARED_KEY_AUTH: "true"
      AZURE_ACCOUNT_NAME: "<PLEASE_CHANGE>"
      AZURE_SHARED_KEY: "<PLEASE_CHANGE>"
      
    2. If you want to access multiple Azure storage accounts with shared key authentication, use the following properties:

      AZURE_GEN2_SHARED_KEY_AUTH: "true"
      AZURE_ACCT_SHARED_KEY_PAIRS: "<PLEASE_CHANGE>"
      

      Note

      Configuring AZURE_GEN2_SHARED_KEY_AUTH property allows you to access the resources in the Azure accounts only through the File Explorer in Privacera Portal.

    3. If you want to access multiple azure storage accounts with OAuth application based authentication, use the following property:

      AZURE_GEN2_SHARED_KEY_AUTH: "false"
      AZURE_TENANTID: "<PLEASE_CHANGE>"
      AZURE_SUBSCRIPTION_ID: "<PLEASE_CHANGE>"
      AZURE_RESOURCE_GROUP: "<PLEASE_CHANGE>"
      DATASERVER_AZURE_APP_CLIENT_CONFIG_LIST:
       - index: 0
         clientId: "<PLEASE_CHANGE>"
         clientSecret: "<PLEASE_CHANGE>"
         storageAccName: "<PLEASE_CHANGE>"

      Note

      Configuring AZURE_GEN2_SHARED_KEY_AUTH property allows you to access the resources in the Azure accounts only through the File Explorer in Privacera Portal.

      Note

      You can also add custom properties that are not included by default. See Dataserver.

  3. Run the following command.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration Properties

Property Name

Description

Example

ENABLE_AZURE_CLI

Uncomment to use Azure CLI.

The AZURE_ACCT_SHARED_KEY_PAIRS property wouldn't work with this property. So, you have set the AZURE_ACCOUNT_NAME and AZURE_SHARED_KEY properties.

true

AZURE_GEN2_SHARED_KEY_AUTH

For AZURE_GEN2_SHARED_KEY_AUTH property, use shared key authentication. Set it to true.

To use multiple Azure storage accounts with shared key authentication, then set this property to true, along with AZURE_ACCT_SHARED_KEY_PAIRS.

To use multiple Azure storage accounts with OAuth authentication, then set this property to false, along with DATASERVER_AZURE_APP_CLIENT_CONFIG_LIST.

true

AZURE_ACCOUNT_NAME

Azure ADLS storage account name

company-qa-dept

AZURE_SHARED_KEY

Azure ADLS storage account shared access key

=0Ty4br:2BIasz>rXm{cqtP8hA;7|TgZZZuTHJTg40z8E5z4UJ':roeJy=d7*/W"

AZURE_ACCT_SHARED_KEY_PAIRS

Comma-separated multiple storage account names and its shared keys.

The format must be ${storage_account_name_1}:${secret_key_1},${storage_account_name_2}:${secret_key_2}

accA:sharedKeyA, accB:sharedKeyB

AZURE_TENANTID

To get the value for this property, Go to Azure portal > Azure Active Directory > Propertie > Tenant ID

5a5cxxx-xxxx-xxxx-xxxx-c3172b33xxxx

AZURE_APP_CLIENT_ID

Get the value by following the Pre-requisites section above.

8c08xxxx-xxxx-xxxx-xxxx-6w0c95v0xxxx

AZURE_SUBSCRIPTION_ID

To get the value for this property, Go to Azure portal > Select Subscriptions in the left sidebar > Select whichever subscription is needed &gt; Click on overview &gt; Copy the Subscription ID

27e8xxxx-xxxx-xxxx-xxxx-c716258wxxxx

AZURE_RESOURCE_GROUP

To get the value for this property, Go to Azure portal > Storage accounts > Select the storage account you want to configure >Click on Overview > Resource Group

privacera

DATASERVER_AZURE_APP_CLIENT_CONFIG_LIST:
 - index: 0
   clientId: "<PLEASE_CHANGE>"
   clientSecret: "<PLEASE_CHANGE>"
   storageAccName: "<PLEASE_CHANGE>"
                                    

Configure multiple OAuth Azure applications and the storage accounts mapped with the configured client id.

**Note**: The ‘clientSecret’ property must be in BASE64 format in the YAML file.

DATASERVER_AZURE_APP_CLIENT_CONFIG_LIST: 
 - index: 0
   clientId: "8c08xxxx-xxxx-xxxx-xxxx-6w0c95v0xxxx"
   clientSecret: "WncwSaMpleRZ1ZoLThJYWpZd3YzMkFJNEljZGdVN0FfVAo="
   storageAccName: "storageAccA,storageAccB"
 - index: 1
   clientId: "5d37xxxx-xxxx-xxxx-xxxx-7z0cu7e0xxxx"
   clientSecret: "ZncwSaMpleRZ1ZoLThJYWpZd3YzMkFJNEljZGdVN0FfVAo="
   storageAccName: "storageAccC"  
Validation

All-access or attempted access (Allowed and Denied) for Azure ADLS resources will now be recorded to the audit stream. This Audit stream can be reviewed in the Audit page of the Privacera Access Manager. Default access for a data repository is 'Denied' so all data access will be denied.

To verify Privacera Data Management control, perform the following steps:

  1. Login to Privacera Portal, as a portal administrator, open Data Inventory: Data Explorer, and attempt to view the targeted ADLS files or folders. The data will be hidden and a Denied status will be registered in the Audit page.

  2. In Privacera Portal, open Access Management: Resource Policies. Open System 'ADLS' and 'application' (data repository) 'privacera_adls'. Create or modify an access policy to allow access to some or all of your ADLS storage.

  3. Return to Data Inventory: Data Explorer and re-attempt to view the data as allowed by your new policy or policy change. Repeat step 1.

    You should be able to view files or folders in the account, and an Allowed status will be registered in the Audit page.

To check the log in the Audit page in Privacera Portal, perform the following steps:

  1. On the Privacera Portal page, expand Access Management and click the Auditfrom the left menu.

  2. The Audit page will be displayed with Ranger Audit details.

GCP Data Server

This topic covers integration of Google Cloud Storage (GCS) and Google BigQuery (GBQ) with the Privacera Platform using Privacera Data Access Server.

Prerequisites

Ensure that the following prerequisites are met:

  • If GCS is being configured, then you need access to an Google Cloud Storage account along with required credentials.

  • If GBQ is being configured, then you need access to an Google Cloud BigQuery account along with required credentials.

  • Get the credential file (JSON) associated with the service account by downloading it.

CLI Configuration
  1. SSH to the instance where Privacera is installed.

  2. Copy the credential file you've downloaded from your machine to a location on your instance where Privacera Manager is configured. Get the file path of the JSON file and add it in the next step.

  3. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.dataserver.gcp.yml config/custom-vars/
    vi config/custom-vars/vars.dataserver.gcp.yml
  4. Update the following credential file information.

    GCP_CREDENTIAL_FILE_PATH: "/tmp/my_google_credential.json"

    Note

    You can also add custom properties that are not included by default. See Dataserver.

  5. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update

    After the update is completed, Privacera gets installed and a default GCS data source is created.

  6. Add GCS Project ID in the GCS data source.

    1. Navigate to Portal UI > Settings > Data Source Registration and edit GOOGLE_CLOUD_STORAGE.

    2. Click Application Properties and add the following properties:

      • Credential Type: Select Google Credentials Local File Path from the dropdown list.

      • Google Credentials Local File Path: Set value to None.

      • Google Project Id: Enter your Google Project ID.

    3. To view the buckets, navigate to Data Inventory > File Explorer.

      If you can not view the buckets, restart Dataserver.

      cd  privacera/privacera-manager
      ./privacera-manager.sh restart dataserver

Tip

You can use Google APIs to apply access control on GCS. For more information, click here.

PolicySync
Snowflake

This topic covers how you can configure Snowflake PolicySync access control using Privacera Manager.

Prerequisites

Ensure the following:

  • Create a Snowflake account that is accessible from the instance used for Privacera Manager installation.

  • Create the Snowflake warehouse, database, users, and roles required by PolicySync. For more information, see Snowflake Configuration for PolicySync.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.policysync.snowflake.yml config/custom-vars/
    vi config/custom-vars/vars.policysync.snowflake.yml
  3. Set the properties for your specific installation. For property details and description, see the Configuration Properties section that follows.

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see Snowflake Connector.

  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
  5. Steps to validate the install.

    1. Install snowsql if it's not already installed.

      mkdir -p ~/privacera/downloads
      cd ~/privacera/downloads
      wget https://privacera.s3.amazonaws.com/public/pm-demo-data/snowflake/install_snowsql.sh -O install_snowsql.sh
      chmod +x install_snowsql.sh
      ./install_snowsql.sh
    2. Download the script for access check.

      wget https://privacera.s3.amazonaws.com/public/pm-demo-data/snowflake/snowflake_access_check.sh -O snowflake_access_check.sh
      chmod +x snowflake_access_check.sh
    3. Run downloaded script. For example,./snowflake_access_check.sh testsnowflake.prod.us-west-2.aws emily welcome123

      ./snowflake_access_check.sh ${SNOWFLAKE_ACCOUNT} ${USERNAME} ${PASSWORD}
    4. Verify access/denied results by logging in with the Privacera Portal user credentials.

      Navigate to Privacera Portal > Access Management > Audit. Now, access to Snowflake will be shown as Allowed.

Configuration properties

JDBC configuration

Table 1. JDBC configuration

Name

Type

Default

Required

Description

SNOWFLAKE_JDBC_URL

string

Yes

Specifies the JDBC URL for the Snowflake connector.

SNOWFLAKE_JDBC_USERNAME

string

Yes

Specifies the JDBC username to use.

SNOWFLAKE_JDBC_PASSWORD

string

Yes

Specifies the JDBC password to use.

SNOWFLAKE_USE_KEY_PAIR_AUTHENTICATION

boolean

false

Yes

Specifies whether PolicySync uses key-pair authentication.

Set this property to true to enable key pair authentication.

SNOWFLAKE_JDBC_PRIVATE_KEY_FILE_NAME

string

No

Specifies the file name of the private key that PolicySync uses for key-pair authentication. This file is placed in the ~/privacera/privacera-manager/config/custom-vars directory.

Specify this setting only if SNOWFLAKE_USE_KEY_PAIR_AUTHENTICATION is set to true.

SNOWFLAKE_JDBC_PRIVATE_KEY_PASSWORD

string

No

Specifies the password for the private key. If the private key does not have a password, do not specify this setting.

Specify this setting only if SNOWFLAKE_USE_KEY_PAIR_AUTHENTICATION is set to true.

SNOWFLAKE_WAREHOUSE_TO_USE

string

Yes

Specifies the JDBC warehouse that PolicySync establishes a connection to, which is used to run SQL queries.

SNOWFLAKE_ROLE_TO_USE

string

Yes

Specifies the role that PolicySync uses when it runs SQL queries.

JDBC_MAX_POOL_SIZE

integer

15

No

Specifies the maximum size for the JDBC connection pool.

JDBC_MIN_IDLE_CONNECTION

integer

3

No

Specifies the minimum size of the JDBC connection pool.

JDBC_LEAK_DETECTION_THRESHOLD

string

900000L

No

Specifies the duration in milliseconds that a connection is not part of the connection pool before PolicySync logs a possible connection leak message. If set to 0, leak detection is disabled.



Resource management

Table 2. Resource management

Name

Type

Default

Required

Description

SNOWFLAKE_OWNER_ROLE

string

No

Specifies the role that owns the resources managed by PolicySync. You must ensure that this user exists as PolicySync does not create this user.

  • If a value is not specified, resources are owned by the creating user. In this case, the owner of the resource will have all access to the resource.

  • If a value is specified, the owner of the resource will be changed to the specified value.

The following resource types are supported:

  • Database

  • Schemas

  • Tables

  • Views

SNOWFLAKE_HANDLE_PIPE_OWNERSHIP

boolean

false

No

Specifies whether PolicySync changes the ownership of a pipe to the role specified by SNOWFLAKE_OWNER_ROLE.

SNOWFLAKE_MANAGE_WAREHOUSE_LIST

string

No

Specifies a comma-separated list of warehouse names for which PolicySync manages access control. If unset, access control is managed for all warehouses. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of warehouses might resemble the following:

testdb1warehouse,testdb2warehouse, sales_dbwarehouse*

SNOWFLAKE_MANAGE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names for which PolicySync manages access control. If unset, access control is managed for all databases. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of databases might resemble the following: testdb1,testdb2,sales db*.

If specified, SNOWFLAKE_IGNORE_DATABASE_LIST takes precedence over this setting.

SNOWFLAKE_MANAGE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a schema:

<DATABASE_NAME>.<SCHEMA_NAME>

If specified, SNOWFLAKE_IGNORE_SCHEMA_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all schemas are managed:

<DATABASE_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all schemas.

  • If set to none no schemas are managed.

SNOWFLAKE_MANAGE_TABLE_LIST

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a table:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

If specified, SNOWFLAKE_IGNORE_TABLE_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<DATABASE_NAME>.<SCHEMA_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all tables.

  • If set to none no tables are managed.

SNOWFLAKE_MANAGE_STREAM_LIST

string

No

Specifies a comma-separated list of stream names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of streams might resemble the following:

testdb1.schema1.stream1,testdb2.schema2.stream*

If unset, access control is managed for all streams.

SNOWFLAKE_MANAGE_FUNCTION_LIST

string

No

Specifies a comma-separated list of function names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of functions might resemble the following:

testdb1.schema1.fn1,testdb2.schema2.fn*

If unset, access control is managed for all functions.

SNOWFLAKE_MANAGE_PROCEDURE_LIST

string

No

Specifies a comma-separated list of procedure names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of procedures might resemble the following:

testdb1.schema1.procedureA,testdb2.schema2.procedure*

If unset, access control is managed for all procedures.

SNOWFLAKE_MANAGE_SEQUENCE_LIST

string

No

Specifies a comma-separated list of sequence names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of sequences might resemble the following:

testdb1.schema1.seq1,testdb2.schema2.seq*

If unset, access control is managed for all sequences.

SNOWFLAKE_MANAGE_FILE_FORMAT_LIST

string

No

Specifies a comma-separated list of file format names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of file formats might resemble the following:

testdb1.schema1.fileFmtA,testdb2.schema2.fileFmt*

If unset, access control is managed for all file formats.

SNOWFLAKE_MANAGE_PIPE_LIST

string

No

Specifies a comma-separated list of pipe names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of pipes might resemble the following:

testdb1.schema1.pipeA,testdb2.schema2.pipe*

If unset, access control is managed for all pipes.

SNOWFLAKE_MANAGE_EXTERNAL_STAGE_LIST

string

No

Specifies a comma-separated list of external stage names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of external stages might resemble the following:

testdb1.schema1.externalStage1,testdb2.schema2.extStage*

If unset, access control is managed for all external stages.

SNOWFLAKE_MANAGE_INTERNAL_STAGE_LIST

string

No

Specifies a comma-separated list of internal stages names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

An example list of internal stages might resemble the following:

testdb1.schema1.internalStage1,testdb2.schema2.intStage*

If unset, access control is managed for all internal stages.

SNOWFLAKE_IGNORE_WAREHOUSE_LIST

string

No

Specifies a comma-separated list of warehouse names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all warehouses are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_WAREHOUSE_LIST.

SNOWFLAKE_IGNORE_DATABASE_LIST

string

DEMO_DB,SNOWFLAKE,UTIL_DB,SNOWFLAKE_SAMPLE_DATA

No

Specifies a comma-separated list of database names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all databases are subject to access control.

For example:

testdb1,testdb2,sales_db*

This setting supersedes any values specified by SNOWFLAKE_MANAGE_DATABASE_LIST.

SNOWFLAKE_IGNORE_SCHEMA_LIST

string

*.INFORMATION_SCHEMA

No

Specifies a comma-separated list of schema names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all schemas are subject to access control.

For example:

testdb1.schema1,testdb2.schema2,sales_db*.sales*

This setting supersedes any values specified by SNOWFLAKE_MANAGE_SCHEMA_LIST.

SNOWFLAKE_IGNORE_TABLE_LIST

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. If not specified, all tables are subject to access control. Names are case-sensitive. Specify tables using the following format:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

This setting supersedes any values specified by SNOWFLAKE_MANAGE_TABLE_LIST.

SNOWFLAKE_IGNORE_STREAM_LIST

string

No

Specifies a comma-separated list of stream names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all streams are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_STREAM_LIST.

SNOWFLAKE_IGNORE_FUNCTION_LIST

string

No

Specifies a comma-separated list of functions names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all functions are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_FUNCTION_LIST.

SNOWFLAKE_IGNORE_PROCEDURE_LIST

string

No

Specifies a comma-separated list of procedures names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all procedures are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_PROCEDURE_LIST.

SNOWFLAKE_IGNORE_SEQUENCE_LIST

string

No

Specifies a comma-separated list of sequences names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all sequences are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_SEQUENCE_LIST.

SNOWFLAKE_IGNORE_FILE_FORMAT_LIST

string

No

Specifies a comma-separated list of file format names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all file formats are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_FILE_FORMAT_LIST.

SNOWFLAKE_IGNORE_PIPE_LIST

string

No

Specifies a comma-separated list of pipes names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all pipes are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_PIPE_LIST.

SNOWFLAKE_IGNORE_EXTERNAL_STAGE_LIST

string

No

Specifies a comma-separated list of external stage names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all external stages are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_EXTERNAL_STAGE_LIST.

SNOWFLAKE_IGNORE_INTERNAL_STAGE_LIST

string

No

Specifies a comma-separated list of internal stage names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all internal stages are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_INTERNAL_STAGE_LIST.



User, group, and role creation

Table 3. User, group, and role creation

Name

Type

Default

Required

Description

SNOWFLAKE_CREATE_USER

boolean

true

No

Specifies whether PolicySync creates local users for each user in Privacera.

SNOWFLAKE_CREATE_USER_ROLE

boolean

true

No

Specifies whether PolicySync creates local roles for each user in Privacera.

SNOWFLAKE_USER_LOGIN_NAME_USE_EMAIL

boolean

false

No

Specifies whether PolicySync uses the user email address as the login name when creating a new user in Snowflake.

SNOWFLAKE_DEFAULT_USER_PASSWORD

string

Yes

Specifies the password to use when PolicySync creates new users.

SNOWFLAKE_ENTITY_ROLE_PREFIX

string

priv_

No

SNOWFLAKE_USER_ROLE_PREFIX

string

No

Specifies the prefix that PolicySync uses when creating local users. For example, if you have a user named <USER> defined in Privacera and the role prefix is priv_user_, the local role is named priv_user_<USER>.

SNOWFLAKE_GROUP_ROLE_PREFIX

string

No

Specifies the prefix that PolicySync uses when creating local roles. For example, if you have a group named etl_users defined in Privacera and the role prefix is prefix_, the local role is named prefix_etl_users.

SNOWFLAKE_ROLE_ROLE_PREFIX

string

No

Specifies the prefix that PolicySync uses when creating roles from Privacera in the Snowflake data source.

For example, if you have a role in Privacera named finance defined in Privacera and the role prefix is role_prefix_, the local role is named role_prefix_finance.

SNOWFLAKE_MANAGE_ENTITIES

boolean

true

No

SNOWFLAKE_MANAGE_USERS

boolean

No

Specifies whether PolicySync maintains user membership in roles in the Snowflake data source.

SNOWFLAKE_MANAGE_GROUPS

boolean

No

Specifies whether PolicySync creates groups from Privacera in the Snowflake data source.

SNOWFLAKE_MANAGE_ROLES

boolean

No

Specifies whether PolicySync creates roles from Privacera in the Snowflake data source.

SNOWFLAKE_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, SNOWFLAKE_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

SNOWFLAKE_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, SNOWFLAKE_IGNORE_GROUP_LIST takes precedence over this setting.

SNOWFLAKE_MANAGE_ROLE_LIST

string

No

Specifies a comma-separated list of role names for which PolicySync manages access control. If unset, access control is managed for all roles. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: role1,role2,dev_role*.

If specified, SNOWFLAKE_IGNORE_ROLE_LIST takes precedence over this setting.

SNOWFLAKE_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_USER_LIST.

SNOWFLAKE_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_GROUP_LIST.

SNOWFLAKE_IGNORE_ROLE_LIST

string

No

Specifies a comma-separated list of role names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all roles are subject to access control.

This setting supersedes any values specified by SNOWFLAKE_MANAGE_ROLE_LIST.

SNOWFLAKE_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the SNOWFLAKE_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

SNOWFLAKE_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the SNOWFLAKE_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

SNOWFLAKE_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the SNOWFLAKE_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

SNOWFLAKE_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the SNOWFLAKE_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

SNOWFLAKE_ROLE_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a role name and replaces each matching character with the value specified by the SNOWFLAKE_ROLE_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

SNOWFLAKE_ROLE_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the SNOWFLAKE_ROLE_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

SNOWFLAKE_USER_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts user names to lowercase when creating local users. If set to true, case sensitivity is preserved.

SNOWFLAKE_GROUP_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts group names to lowercase when creating local groups. If set to true, case sensitivity is preserved.

SNOWFLAKE_ROLE_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts role names to lowercase when creating local roles. If set to true, case sensitivity is preserved.

SNOWFLAKE_USER_NAME_CASE_CONVERSION

string

lower

No

Specifies how user name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if SNOWFLAKE_USER_NAME_PERSIST_CASE_SENSITIVITY is set to true.

SNOWFLAKE_GROUP_NAME_CASE_CONVERSION

string

lower

No

Specifies how group name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if SNOWFLAKE_GROUP_NAME_PERSIST_CASE_SENSITIVITY is set to true.

SNOWFLAKE_ROLE_NAME_CASE_CONVERSION

string

lower

No

Specifies how role name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if SNOWFLAKE_ROLE_NAME_PERSIST_CASE_SENSITIVITY is set to true.

SNOWFLAKE_USER_FILTER_WITH_EMAIL

boolean

false

No

Set this property to true if you only want to manage users who have an email address associated with them in the portal.

SNOWFLAKE_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by SNOWFLAKE_MANAGE_GROUP_LIST. The default value is false.

SNOWFLAKE_MANAGE_USER_FILTERBY_ROLE

boolean

false

No

Specifies whether to manage only users that are members of the roles specified by SNOWFLAKE_MANAGE_ROLE_LIST. The default value is false.

SNOWFLAKE_USER_ROLE_USE_UPPERCASE

boolean

false

No

Specifies whether PolicySync converts a user role name to uppercase when performing operations.

SNOWFLAKE_GROUP_ROLE_USE_UPPERCASE

boolean

false

No

Specifies whether PolicySync converts a group name to uppercase when performing operations.

SNOWFLAKE_ROLE_ROLE_USE_UPPERCASE

boolean

false

No

Specifies whether PolicySync converts a role name to uppercase when performing operations.



Grant updates

Table 4. Grant updates

Name

Type

Default

Required

Description

SNOWFLAKE_GRANT_UPDATES

boolean

true

No

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.

string

No

Specifies whether PolicySync applies grants and revokes in batches. If enabled, this behavior improves overall performance of applying permission changes.

SNOWFLAKE_GRANT_UPDATES_MAX_RETRY_ATTEMPTS

integer

2

No

Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is 2.

SNOWFLAKE_ENABLE_PRIVILEGES_BATCHING

boolean

false

No

Specifies whether PolicySync applies privileges described in Access Manager policies.



Column level access control

Table 5. Column level access control

Name

Type

Default

Required

Description

SNOWFLAKE_ENABLE_COLUMN_ACCESS_EXCEPTION

boolean

true

No

Specifies whether an access denied exception is displayed if a user does not have access to a table column and attempts to access that column.

If enabled, you must set SNOWFLAKE_ENABLE_MASKING to true.



Native masking

Table 6. Native masking

Name

Type

Default

Required

Description

SNOWFLAKE_ENABLE_MASKING

boolean

true

No

Specifies whether PolicySync enables native masking policy creation functionality.

SNOWFLAKE_MASKING_POLICY_DB_NAME

string

No

Specifies the name of the database where PolicySync creates custom masking policies.

SNOWFLAKE_MASKING_POLICY_SCHEMA_NAME

string

PUBLIC

No

Specifies the name of the schema where PolicySync creates all native masking policies. If not specified, the resource schema is used as the masking policy schema.

SNOWFLAKE_MASKING_POLICY_NAME_TEMPLATE

string

{database}{separator}{schema}{separator}{table}

No

Specifies a naming template that PolicySync uses when creating native masking policies. For example, given the following values:

  • {database}: customer_db

  • {schema}: customer_schema

  • {table}: customer_data

  • {separator} _priv_

With the default naming template, the following name is used when creating a native masking policy. The {column} field is replaced by the column name.

customer_db_priv_customer_schema_priv_customer_data_{column}



Native row filter

Table 7. Native row filter

Name

Type

Default

Required

Description

SNOWFLAKE_ENABLE_ROW_FILTER

boolean

true

No

Specifies whether to use the data source native row filter functionality. This setting is disabled by default. When enabled, you can create row filters only on tables, but not on views.

SNOWFLAKE_ROW_FILTER_POLICY_DB_NAME

string

No

Specifies the name of the database where PolicySync creates native row-filter policies. If not specified, the resource database is considered the same as the row-filter policy database.

SNOWFLAKE_ROW_FILTER_POLICY_SCHEMA_NAME

string

PUBLIC

No

Specifies the name of the schema where PolicySync creates all native row-filter policies. If not specified, the resource schema is considered the same as the row-filter policy schema.

SNOWFLAKE_ROW_FILTER_POLICY_NAME_TEMPLATE

string

{database}{separator}{schema}{separator}{table}

No

Specifies a template for the name that PolicySync uses when creating a row filter policy. For example, given a table data from the schema schema that resides in the db database, the row filter policy name might resemble the following:

db_priv_schema_priv_data_<ROW_FILTER_ITEM_NUMBER>



View based masking/row filter

Table 8. View based masking/row filter

Name

Type

Default

Required

Description

SNOWFLAKE_ENABLE_VIEW_BASED_ROW_FILTER

boolean

false

No

Specifies whether to use secure view based row filtering. The default value is false.

While Snowflake supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

SNOWFLAKE_ENABLE_VIEW_BASED_MASKING

boolean

false

No

Specifies whether to use secure view based masking. The default value is false.

SNOWFLAKE_SECURE_VIEW_SCHEMA_NAME_PREFIX

string

No

Specifies a prefix string to apply to a secure schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view schema name for a schema named example1 is dev_example1.

SNOWFLAKE_SECURE_VIEW_SCHEMA_NAME_POSTFIX

string

No

Specifies a postfix string to apply to a secure view schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a schema named example1 is example1_dev.

SNOWFLAKE_SECURE_VIEW_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

SNOWFLAKE_SECURE_VIEW_NAME_POSTFIX

string

_SECURE

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

SNOWFLAKE_SECURE_VIEW_SCHEMA_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a schema name. For example, if a schema is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

SNOWFLAKE_SECURE_VIEW_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

SNOWFLAKE_SECURE_VIEW_CREATE_FOR_ALL

boolean

false

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.



Masking/Row filter policy name separator

Table 9. Masking/Row filter policy name separator

Name

Type

Default

Required

Description

SNOWFLAKE_POLICY_NAME_SEPARATOR

string

_PRIV_

No

Specifies a string to use as part of the name of native row filter and masking policies.

SNOWFLAKE_ROW_FILTER_ALIAS_TOKEN

string

obj

No

Specifies an identifier that PolicySync uses to identify columns from the main table and parse each correctly.



Masked Value for Masking

Table 10. Masked Value for Masking

Name

Type

Default

Required

Description

SNOWFLAKE_MASKED_NUMBER_VALUE

integer

0

No

Specifies the default masking value for numeric column types.

SNOWFLAKE_MASKED_DOUBLE_VALUE

integer

0

No

Specifies the default masking value for DOUBLE column types.

SNOWFLAKE_MASKED_TEXT_VALUE

string

<MASKED>

No

Specifies the default masking value for text and string column types.

POLICYSYNC_V2_MASKED_DATE_VALUE

string

No

Specifies the default masking value for date column types.



PEG integration

Table 11. PEG integration

Name

Type

Default

Required

Description

SNOWFLAKE_PEG_FUNCTION_DB

string

No

Specifies the name of the database where the PEG encryption functions reside.

SNOWFLAKE_PEG_FUNCTION_SCHEMA

string

public

No

Specifies the schema name where the PEG encryption functions reside.



Load sql queries from system config json file

Table 12. Load sql queries from system config json file

Name

Type

Default

Required

Description

SNOWFLAKE_LOAD_RESOURCES_KEY

string

load_md_from_account_columns

No

Specifies how PolicySync loads resources from Snowflake. The following values are allowed:

  • load_md: Load the resources using metadata queries.

  • load_md_from_account_columns: Load resources by directly running SHOW QUERIES on the account. This mode is preferred when you want to manage an entire Snowflake account.

  • load_md_from_database_columns: Load the resources by directly running SHOW QUERIES only on managed databases. This mode is preferred when you want to manage only a few databases.



Audit integration

Table 13. Audit integration

Name

Type

Default

Required

Description

SNOWFLAKE_AUDIT_ENABLE

boolean

true

Yes

Specifies whether Privacera fetches access audit data from the data source.

SNOWFLAKE_ENABLE_AUDIT_SOURCE_SIMPLE

boolean

true

No

Specifies whether to enable simple auditing. When enabled, PolicySync gathers the following audit information from the database:

  • RequestData (query text)

  • AccessResult (execute status)

  • AccessType (query type)

  • User (username)

  • ResourcePath (database_name.schema_name)

  • EventTime (query time)

  • AclEnforcer (connector name)

If you enabled this setting, do not enable SNOWFLAKE_ENABLE_AUDIT_SOURCE_ADVANCE.

SNOWFLAKE_ENABLE_AUDIT_SOURCE_ADVANCE

boolean

false

No

Specifies whether to enable advanced auditing. When enabled, PolicySync gathers the following audit information from the database:

  • AccessResult (execute status)

  • AccessType (query type)

  • User (username)

  • ResourcePath (database_name.schema_name.column_names)

  • EventTime (query time)

  • AclEnforcer (connector name)

If you enabled this setting, do not enable SNOWFLAKE_ENABLE_AUDIT_SOURCE_SIMPLE.

SNOWFLAKE_AUDIT_ENABLE_RESOURCE_FILTER

boolean

No

Specifies whether PolicySync filters access audit information by managed resources, such as databases, schemas, and so forth.

SNOWFLAKE_AUDIT_INITIAL_PULL_MINUTES

string

30

No

Specifies the initial delay, in minutes, before PolicySync retrieves access audits from Snowflake.

SNOWFLAKE_AUDIT_SOURCE_ADVANCE_DB_NAME

string

PRIVACERA_ACCESS_LOGS_DB

No

Specifies the database that PolicySync retrieves access audits from. This setting applies only if you set SNOWFLAKE_ENABLE_AUDIT_SOURCE_ADVANCE to true.



Load intervals

Table 14. Load intervals

Name

Type

Default

Required

Description

SNOWFLAKE_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

SNOWFLAKE_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

SNOWFLAKE_PERMISSION_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

SNOWFLAKE_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Object Permission Mapping

For more information about object permission mapping, see Snowflake Documentation.

Object

Supported Permissions

Description

Global

CreateWarehouse

CreateDatabase

Enables creating a new virtual warehouse.

Enables creating a new database in the system.

Warehouse

UseWarehouse

Operate

Monitor

Modify

Enables using a virtual warehouse and, as a result, executing queries on the warehouse.

Enables changing the state of a warehouse (stop, start, suspend, resume).

Enables viewing current and past queries executed on a warehouse as well as usage statistics on that warehouse.

Enables altering any properties of a warehouse, including changing its size.

Database

UseDB

CreateSchema

Enables using a database, including returning the database details in the SHOW DATABASES command output.

Enables creating a new schema in a database, including cloning a schema.

Schema

UseSchema

CreateTable

CreateProcedure

CreateFunction

CreateStream

CreateSequence

CreateFileFormat

CreateStage

CreatePipe

CreateExternalTable

Enables using a schema, including returning the schema details in the SHOW SCHEMAS command output.

Enables creating a new table in a schema, including cloning a table.

Enables creating a new stored procedure in a schema.

Enables creating a new UDF or external function in a schema.

Enables creating a new stream in a schema, including cloning a stream.

Enables creating a new sequence in a schema, including cloning a sequence.

Enables creating a new file format in a schema, including cloning a file format.

Enables creating a new stage in a schema, including cloning a stage.

Enables creating a new pipe in a schema.

Enables creating a new external table in a schema.

Table

Select

Insert

Update

Delete

Truncate

References

Enables executing a SELECT statement on a table.

Enables executing an INSERT command on a table.

Enables executing an UPDATE command on a table..

Enables executing a DELETE command on a table.

Enables executing a TRUNCATE TABLE command on a table.

Enables referencing a table as the unique/primary key table for a foreign key constraint.

View

Select

Enables executing a SELECT statement on a view.

Procedure

Usage

Enables calling a stored procedure.

Function

Usage

Enables calling a function.

Stream

Select

Enables executing a SELECT statement on a stream.

File_Format

Usage

Enables using a file format in a SQL statement.

Sequence

Usage

Enables using a sequence in a SQL statement.

Internal_Stage

Read

Write

Enables performing any operations that require reading from an internal stage (GET, LIST, COPY INTO <table>);

Enables performing any operations that require writing to an internal stage (PUT, REMOVE, COPY INTO <location>);

External_Stage

Usage

Enables using an external stage object in a SQL statement.

Pipe

Operate

Monitor

Enables viewing details for the pipe (using DESCRIBE PIPE or SHOW PIPES), pausing or resuming the pipe, and refreshing the pipe.

Enables viewing details for the pipe (using DESCRIBE PIPE or SHOW PIPES).

Changing the owner of a table

By default, Privacera’s PolicySync changes the ownership of all resources (databases and tables) to Privacera’s Admin Roles. The reasoning behind this is that there must be a single entity to manage the privileges for the resource. If the owner is not changed, then the user who created the table could also modify the privileges. This could cause inconsistencies in the privileges and even lead to cases where the owner might involuntarily drop security policies like column masking/row-level filtering or provide excessive permissions to unauthorized users.

In Snowflake, there is a limitation in its privilege model, where DROP privileges can’t be given to specific users. Instead, to drop tables, the user must be the owner of the table or must have an Account Admin Role.

When any table is created by a user in Snowflake, the owner defaults to the database role of the user who created it. After the table is created, Privacera will forcefully change the role to the PRIVACERA_POLICYSYNC_ROLE role. And all those users associated with this role will become the owners of the table.

To change the ownership, do the following:

  1. Edit the following file:

    vi config/custom-vars/vars.PolicySync.snowflake.yml
  2. Add the SNOWFLAKE_OWNER_ROLE property and enter the PRIVACERA_POLICYSYNC_ROLE role.

    SNOWFLAKE_OWNER_ROLE: "PRIVACERA_POLICYSYNC_ROLE"

    If you do not want to change the ownership, leave it blank.

  3. Run the update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update

    Note

    When a new object is created by a managed user/group/role and is detected by PolicySync, the PolicySync will change the ownership of that object, as specified in the SNOWFLAKE_OWNER_ROLE property.

Redshift

This topic covers how you can configure PolicySync Redshift access control using Privacera Manager.

CLI Configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.policysync.redshift.yml config/custom-vars/
    vi config/custom-vars/vars.policysync.redshift.yml
  3. Set the properties for your specific installation. For property details and description, see the Configuration Properties section that follows.

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see Redshift Connector.

  4. Run the following commands.

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
Configuration Properties

JDBC configuration

Table 15. JDBC configuration

Name

Type

Default

Required

Description

REDSHIFT_JDBC_URL

string

Yes

Specifies the JDBC URL for the Amazon Redshift connector.

REDSHIFT_JDBC_USERNAME

string

Yes

Specifies the JDBC username to use.

For PolicySync to push policies to Amazon Redshift, this user must have superuser privileges.

REDSHIFT_JDBC_PASSWORD

string

Yes

Specifies the JDBC password to use.

REDSHIFT_JDBC_DB

string

Yes

Specifies the name of the JDBC database to use.

PolicySync also uses the connection to this database to load metadata and create principals such as users and groups.

REDSHIFT_DEFAULT_USER_PASSWORD

string

Yes

Specifies the password to use when PolicySync creates new users.

The password must meet the following requirements:

  • It must be between 8 and 64 characters long.

  • It must contain at least one uppercase letter, one lowercase letter, and one number.

  • It can use any ASCII character with the ASCII codes 33–126 except: ', ", ,, /, or @

REDSHIFT_OWNER_ROLE

string

No

Specifies the role that owns the resources managed by PolicySync. You must ensure that this user exists as PolicySync does not create this user.

  • If a value is not specified, resources are owned by the creating user. In this case, the owner of the resource will have all access to the resource.

  • If a value is specified, the owner of the resource will be changed to the specified value.

The following resource types are supported:

  • Database

  • Schemas

  • Tables

  • Views



Load keys and intervals

Table 16. Load keys and intervals

Name

Type

Default

Required

Description

REDSHIFT_LOAD_RESOURCES_KEY

string

load_from_database_columns

No

Specifies how PolicySync loads resources from Amazon Redshift. The following values are allowed:

  • load_md: Load resources from Amazon Redshift with a top-down resources approach, that is, it first loads the databases and then the schemas followed by tables and its columns.

  • load_from_database_columns: Load resources one by one for each resource type that is, it loads all databases first, then it loads all schemas in all databases, followed by all tables in all schemas and its columns. This mode is recommended since it is faster than the load mode.

REDSHIFT_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

REDSHIFT_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

REDSHIFT_PERMISSION_SYNC_INTERVAL

integer

540

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

REDSHIFT_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Resources management

Table 17. Resources management

Name

Type

Default

Required

Description

REDSHIFT_MANAGE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names for which PolicySync manages access control. If unset, access control is managed for all databases. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of databases might resemble the following: testdb1,testdb2,sales db*.

If specified, REDSHIFT_IGNORE_DATABASE_LIST takes precedence over this setting.

REDSHIFT_MANAGE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a schema:

<DATABASE_NAME>.<SCHEMA_NAME>

If specified, REDSHIFT_IGNORE_SCHEMA_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all schemas are managed:

<DATABASE_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all schemas.

  • If set to none no schemas are managed.

REDSHIFT_MANAGE_TABLE_LIST

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a table:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

If specified, REDSHIFT_IGNORE_TABLE_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<DATABASE_NAME>.<SCHEMA_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all tables.

  • If set to none no tables are managed.

REDSHIFT_IGNORE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all databases are subject to access control.

For example:

testdb1,testdb2,sales_db*

This setting supersedes any values specified by REDSHIFT_MANAGE_DATABASE_LIST.

REDSHIFT_IGNORE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all schemas are subject to access control.

For example:

testdb1.schema1,testdb2.schema2,sales_db*.sales*

This setting supersedes any values specified by REDSHIFT_MANAGE_SCHEMA_LIST.

REDSHIFT_IGNORE_TABLE_LIST

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. If not specified, all tables are subject to access control. Names are case-sensitive. Specify tables using the following format:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

This setting supersedes any values specified by REDSHIFT_MANAGE_TABLE_LIST.



Users/Groups/Roles management

Table 18. Users/Groups/Roles management

Name

Type

Default

Required

Description

REDSHIFT_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the REDSHIFT_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

REDSHIFT_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the REDSHIFT_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

REDSHIFT_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the REDSHIFT_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

REDSHIFT_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the REDSHIFT_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

REDSHIFT_ROLE_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a role name and replaces each matching character with the value specified by the REDSHIFT_ROLE_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

REDSHIFT_ROLE_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the REDSHIFT_ROLE_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

REDSHIFT_USER_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether Amazon Redshift supports case sensitivity for users. Because case sensitivity in Amazon Redshift is global, enabling this enables case sensitivity for users, groups, roles, and resources.

REDSHIFT_GROUP_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether Amazon Redshift supports case sensitivity for groups. Because case sensitivity in Amazon Redshift is global, enabling this enables case sensitivity for users, groups, roles, and resources.

REDSHIFT_ROLE_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether Amazon Redshift supports case sensitivity for roles. Because case sensitivity in Amazon Redshift is global, enabling this enables case sensitivity for users, groups, roles, and resources.

REDSHIFT_ENABLE_CASE_SENSITIVE_IDENTIFIER

boolean

false

No

Specifies whether Amazon Redshift preserves case for user, group, role, and resource names. By default, Amazon Redshift converts all user, group, role, and resource names to lowercase. If set to true, PolicySync enables case sensitivity on a per connection basis.

REDSHIFT_ENABLE_CASE_SENSITIVE_IDENTIFIER_QUERY

string

SET enable_case_sensitive_identifier=true;

No

Specifies a query for Amazon Redshift that enables case sensitivity per connection. If you enable REDSHIFT_ENABLE_CASE_SENSITIVE_IDENTIFIER, then this setting defines the query that PolicySync runs.

REDSHIFT_USER_NAME_CASE_CONVERSION

string

lower

No

Specifies how user name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if REDSHIFT_USER_NAME_PERSIST_CASE_SENSITIVITY is set to true.

REDSHIFT_GROUP_NAME_CASE_CONVERSION

string

lower

No

Specifies how group name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if REDSHIFT_GROUP_NAME_PERSIST_CASE_SENSITIVITY is set to true.

REDSHIFT_ROLE_NAME_CASE_CONVERSION

string

lower

No

Specifies how role name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if REDSHIFT_ROLE_NAME_PERSIST_CASE_SENSITIVITY is set to true.

REDSHIFT_CREATE_USER

boolean

true

No

Specifies whether PolicySync creates local users for each user in Privacera.

REDSHIFT_CREATE_USER_ROLE

boolean

true

No

Specifies whether PolicySync creates local roles for each user in Privacera.

REDSHIFT_MANAGE_USERS

boolean

true

No

Specifies whether PolicySync maintains user membership in roles in the Amazon Redshift data source.

REDSHIFT_MANAGE_GROUPS

boolean

true

No

Specifies whether PolicySync creates groups from Privacera in the Amazon Redshift data source.

REDSHIFT_MANAGE_ROLES

boolean

true

No

Specifies whether PolicySync creates roles from Privacera in the Amazon Redshift data source.

REDSHIFT_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, REDSHIFT_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

REDSHIFT_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, REDSHIFT_IGNORE_GROUP_LIST takes precedence over this setting.

REDSHIFT_MANAGE_ROLE_LIST

string

No

Specifies a comma-separated list of role names for which PolicySync manages access control. If unset, access control is managed for all roles. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: role1,role2,dev_role*.

If specified, REDSHIFT_IGNORE_ROLE_LIST takes precedence over this setting.

REDSHIFT_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by REDSHIFT_MANAGE_USER_LIST.

REDSHIFT_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by REDSHIFT_MANAGE_GROUP_LIST.

REDSHIFT_IGNORE_ROLE_LIST

string

No

Specifies a comma-separated list of role names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all roles are subject to access control.

This setting supersedes any values specified by REDSHIFT_MANAGE_ROLE_LIST.

REDSHIFT_USER_ROLE_PREFIX

string

priv_user_

No

Specifies the prefix that PolicySync uses when creating local users. For example, if you have a user named <USER> defined in Privacera and the role prefix is priv_user_, the local role is named priv_user_<USER>.

REDSHIFT_GROUP_ROLE_PREFIX

string

priv_group_

No

Specifies the prefix that PolicySync uses when creating local roles. For example, if you have a group named etl_users defined in Privacera and the role prefix is prefix_, the local role is named prefix_etl_users.

REDSHIFT_ROLE_ROLE_PREFIX

string

priv_role_

No

Specifies the prefix that PolicySync uses when creating roles from Privacera in the Amazon Redshift data source.

For example, if you have a role in Privacera named finance defined in Privacera and the role prefix is role_prefix_, the local role is named role_prefix_finance.

REDSHIFT_USE_NATIVE_PUBLIC_GROUP

boolean

true

No

Specifies whether PolicySync uses the Amazon Redshift native public group for access grants whenever a policy refers to a public group. The default value is true.

REDSHIFT_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by REDSHIFT_MANAGE_GROUP_LIST. The default value is false.

REDSHIFT_MANAGE_USER_FILTERBY_ROLE

boolean

false

No

Specifies whether to manage only users that are members of the roles specified by REDSHIFT_MANAGE_ROLE_LIST. The default value is false.



Access control management

Table 19. Access control management

Name

Type

Default

Required

Description

REDSHIFT_ENABLE_VIEW_BASED_MASKING

boolean

true

No

Specifies whether to use secure view based masking. The default value is true.

REDSHIFT_ENABLE_VIEW_BASED_ROW_FILTER

boolean

true

No

Specifies whether to use secure view based row filtering. The default value is true.

While Amazon Redshift supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

REDSHIFT_SECURE_VIEW_CREATE_FOR_ALL

boolean

true

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.

REDSHIFT_MASKED_NUMBER_VALUE

integer

0

No

Specifies the default masking value for numeric column types.

REDSHIFT_MASKED_TEXT_VALUE

string

<MASKED>

No

Specifies the default masking value for text and string column types.

REDSHIFT_SECURE_VIEW_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

REDSHIFT_SECURE_VIEW_NAME_POSTFIX

string

_secure

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

REDSHIFT_SECURE_VIEW_SCHEMA_NAME_PREFIX

string

No

Specifies a prefix string to apply to a secure schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view schema name for a schema named example1 is dev_example1.

REDSHIFT_SECURE_VIEW_SCHEMA_NAME_POSTFIX

string

No

Specifies a postfix string to apply to a secure view schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a schema named example1 is example1_dev.

REDSHIFT_SECURE_VIEW_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

REDSHIFT_SECURE_VIEW_SCHEMA_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a schema name. For example, if a schema is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

REDSHIFT_GRANT_UPDATES

boolean

false

No

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is false.

REDSHIFT_GRANT_UPDATES_MAX_RETRY_ATTEMPTS

integer

2

No

Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is 2.

REDSHIFT_ENABLE_DATA_ADMIN

boolean

true

No

This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.



Access audits management

Table 20. Access audits management

Name

Type

Default

Required

Description

REDSHIFT_AUDIT_ENABLE

boolean

false

No

Specifies whether Privacera fetches access audit data from the data source.

REDSHIFT_AUDIT_EXCLUDED_USERS

string

REDSHIFT_JDBC_USERNAME

No

Specifies a comma separated list of users to exclude when fetching access audits. For example: "user1,user2,user3".

REDSHIFT_AUDIT_INITIAL_PULL_MINUTES

integer

30

No

Specifies the initial delay, in minutes, before PolicySync retrieves access audits from Amazon Redshift.



Note

As there is no concept of role in Redshift, PolicySync internally creates a group for that user and grants permission to that group to which the user belongs.

For example, when you add a role in the Privacera Portal, you will notice that CREATE GROUP priv_role_rol1 is created instead of CREATE ROLE priv_role_rol1.

Limitations with Dynamic Masking and Row Filter
  • Updating Group/Role will not update Dynamic Row Filter or Dynamic Masking.

  • In case of dynamic view, you must have Usage permission on both VIEW Schema as well as Table Schema.

  • Assuming row filter is enabled, you have cleared RocksDB cache and PolicySync is up, you will not be able to disable the row filter.

Redshift Spectrum

This topic describes how to configure access control for Redshift Spectrum PolicySync using Privacera Manager.

Privacera supports access control for Redshift Spectrum only on the following:

  • Create Database

  • Usage Schema

Prerequisites

The following prerequisites must be met to use the Redshift Spectrum:

  1. You will require an Amazon Redshift cluster and a SQL client connected to the cluster.

  2. The AWS Region in which the Amazon Redshift cluster and Amazon S3 bucket are located must be the same.

Configuration

Redshift Spectrum configuration is similar to Redshift configuration. For more information about Redshift configuration, see Redshift.

Getting started

Redshift Spectrum supports the creation of external tables within Redshift cluster in four simple steps:

Major security concern

Redshift does not support Access control lists (ACLs) on EXTERNAL TABLES; to gain access to the data (EXTERNAL TABLE), you must provide USAGE schema permission on the EXTERNAL SCHEMA.

Limitations

The following are the limitations with Redshift Spectrum:

  • If the USAGE permission is granted to EXTERNAL SCHEMA, the user gains access to all of its tables.

  • Access to any of the external tables cannot be explicitly granted or revoked.

  • The creation of Redshift managed tables (not EXTERNAL TABLES) is not permitted within an EXTERNAL SCHEMA.

  • The creation of secure views is not permitted within an EXTERNAL SCHEMA.

Privacera has never managed external tables due to the limitations listed above. By default, we manage permissions for external schemas at the schema level.

Support for Row Level Filter and Column Masking on the basis of Secure Views on EXTERNAL SCHEMA is possible, but only with the user's CONSENT, as the user will also have direct access to the EXTERNAL TABLE If they query the table's data, neither the Row Level Filter nor the Column Masking will be applied.

Note

We do not recommend this solution, but if you agree that users will not query the data directly (via external tables), we can enable it by adding a REDSHIFT_ENABLE_EXTERNAL_SCHEMA_SUPPORT property (default behavior is set to false).

Proposed solution

On an EXTERNAL TABLE, we supports Row Level Filter and Column Masking to a limited extent.

  • Instead of creating a table, we create a secure view with the _secure postfix added to the schema name (as we cannot create Redshift views inside external schemas).

  • To GRANT access to secure view, we must grant USAGE permission to the Source Schema because the secure view schema will be separated from the EXTERNAL SCHEMA. As a result, permission is granted to the source (actual) table.

  • Only Select Permission to the EXTERNAL TABLE is supported. DataAdmin permission is ineffective because USAGE permission to EXTERNAL SCHEMA allows direct access to EXTERNAL TABLE.

Property configuration

Note

Due to limitations, EXTERNAL SCHEMA support for Row Level Filter and Column Masking is not recommended.

This following property should not be enabled without consent after reading the documentation.

Table 21. Redshift property

Property

Description

Default Value

Example

REDSHIFT_ENABLE_EXTERNAL_SCHEMA_SUPPORT

Set this property to true to enable Row Level Filter and Column Masking policies on secure views after reading the limitations.

false

true/false



The values of the following properties must be left blank:

REDSHIFT_SECURE_VIEW_NAME_PREFIX: ""
REDSHIFT_SECURE_VIEW_NAME_POSTFIX: ""

The values of the following properties must be set:

REDSHIFT_SECURE_VIEW_SCHEMA_NAME_PREFIX: ""
REDSHIFT_SECURE_VIEW_SCHEMA_NAME_POSTFIX: "_secure"

For more information about the properties, see the Redshift custom properties under ??? section.

PostgreSQL

This topic covers how you can configure PostgreSQL PolicySync access control using Privacera Manager. Privacera supports the following PostgreSQL implementations:

  • Amazon RDS PostgreSQL

  • Amazon Aurora in PostgreSQL mode

  • Google Cloud SQL PostgreSQL

  • PostgreSQL

Prerequisites
  • Create a database in PostgreSQL and get the database name and its URL:

  • Create a database user granting all privileges to fully access the database, and then get the user credentials to connect to the database.

If you choose to enable audits for PolicySync, ensure the following prerequisites are met:

CLI Configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.policysync.postgres.yml custom-vars/
    vi custom-vars/vars.policysync.postgres.yml
  3. Set the properties for your specific installation. For property details and description, see the Configuration Properties section that follows.

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see PostgreSQL Connector.

  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
Configuration Properties

JDBC configuration

Table 22. JDBC configuration

Name

Type

Default

Required

Description

POSTGRES_JDBC_URL

string

Yes

Specifies the JDBC URL for the PostgreSQL connector.

Use the following format for the JDBC string:

jdbc:postgresql://<PG_SERVER_HOST>:<PG_SERVER_PORT>

POSTGRES_JDBC_USERNAME

string

Yes

Specifies the JDBC username to use.

POSTGRES_JDBC_PASSWORD

string

Yes

Specifies the JDBC password to use.

POSTGRES_JDBC_DB

string

privacera_db

Yes

Specifies the name of the JDBC database to use.

POSTGRES_DEFAULT_USER_PASSWORD

string

Yes

Specifies the password to use when PolicySync creates new users.

POSTGRES_OWNER_ROLE

string

No

Specifies the role that owns the resources managed by PolicySync. You must ensure that this user exists as PolicySync does not create this user.

  • If a value is not specified, resources are owned by the creating user. In this case, the owner of the resource will have all access to the resource.

  • If a value is specified, the owner of the resource will be changed to the specified value.

The following resource types are supported:

  • Database

  • Schemas

  • Tables

  • Views



Load keys and intervals

Table 23. Load keys and intervals

Name

Type

Default

Required

Description

POSTGRES_LOAD_RESOURCES_KEY

string

load_from_database_columns

No

Specifies how PolicySync loads resources from PostgreSQL. The following values are allowed:

  • load_md: Load resources from PostgreSQL with a top-down resources approach, that is, it first loads the databases and then the schemas followed by tables and its columns.

  • load_from_database_columns: Load resources one by one for each resource type that is, it loads all databases first, then it loads all schemas in all databases, followed by all tables in all schemas and its columns. This mode is recommended since it is faster than the load mode.

POSTGRES_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

POSTGRES_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

POSTGRES_PERMISSION_SYNC_INTERVAL

integer

540

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

POSTGRES_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Resources management

Table 24. Resources management

Name

Type

Default

Required

Description

POSTGRES_MANAGE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names for which PolicySync manages access control. If unset, access control is managed for all databases. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of databases might resemble the following: testdb1,testdb2,sales db*.

If specified, POSTGRES_IGNORE_DATABASE_LIST takes precedence over this setting.

POSTGRES_MANAGE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a schema:

<DATABASE_NAME>.<SCHEMA_NAME>

If specified, POSTGRES_IGNORE_SCHEMA_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all schemas are managed:

<DATABASE_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all schemas.

  • If set to none no schemas are managed.

POSTGRES_MANAGE_TABLE_LIST

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a table:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

If specified, POSTGRES_IGNORE_TABLE_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<DATABASE_NAME>.<SCHEMA_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all tables.

  • If set to none no tables are managed.

POSTGRES_IGNORE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all databases are subject to access control.

For example:

testdb1,testdb2,sales_db*

This setting supersedes any values specified by POSTGRES_MANAGE_DATABASE_LIST.

POSTGRES_IGNORE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all schemas are subject to access control.

For example:

testdb1.schema1,testdb2.schema2,sales_db*.sales*

This setting supersedes any values specified by POSTGRES_MANAGE_SCHEMA_LIST.

POSTGRES_IGNORE_TABLE_LIST

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. If not specified, all tables are subject to access control. Names are case-sensitive. Specify tables using the following format:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

This setting supersedes any values specified by POSTGRES_MANAGE_TABLE_LIST.



Users/Groups/Roles management

Table 25. Users/Groups/Roles management

Name

Type

Default

Required

Description

POSTGRES_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the POSTGRES_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

POSTGRES_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the POSTGRES_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

POSTGRES_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the POSTGRES_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

POSTGRES_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the POSTGRES_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

POSTGRES_ROLE_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a role name and replaces each matching character with the value specified by the POSTGRES_ROLE_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

POSTGRES_ROLE_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the POSTGRES_ROLE_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

POSTGRES_USER_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts user names to lowercase when creating local users. If set to true, case sensitivity is preserved.

POSTGRES_GROUP_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts group names to lowercase when creating local groups. If set to true, case sensitivity is preserved.

POSTGRES_ROLE_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts role names to lowercase when creating local roles. If set to true, case sensitivity is preserved.

POSTGRES_USER_NAME_CASE_CONVERSION

string

lower

No

Specifies how user name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if POSTGRES_USER_NAME_PERSIST_CASE_SENSITIVITY is set to true.

POSTGRES_GROUP_NAME_CASE_CONVERSION

string

lower

No

Specifies how group name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if POSTGRES_GROUP_NAME_PERSIST_CASE_SENSITIVITY is set to true.

POSTGRES_ROLE_NAME_CASE_CONVERSION

string

lower

No

Specifies how role name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if POSTGRES_ROLE_NAME_PERSIST_CASE_SENSITIVITY is set to true.

POSTGRES_CREATE_USER

boolean

true

No

Specifies whether PolicySync creates local users for each user in Privacera.

POSTGRES_CREATE_USER_ROLE

boolean

true

No

Specifies whether PolicySync creates local roles for each user in Privacera.

POSTGRES_MANAGE_USERS

boolean

true

No

Specifies whether PolicySync maintains user membership in roles in the PostgreSQL data source.

POSTGRES_MANAGE_GROUPS

boolean

true

No

Specifies whether PolicySync creates groups from Privacera in the PostgreSQL data source.

POSTGRES_MANAGE_ROLES

boolean

true

No

Specifies whether PolicySync creates roles from Privacera in the PostgreSQL data source.

POSTGRES_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, POSTGRES_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

POSTGRES_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, POSTGRES_IGNORE_GROUP_LIST takes precedence over this setting.

POSTGRES_MANAGE_ROLE_LIST

string

No

Specifies a comma-separated list of role names for which PolicySync manages access control. If unset, access control is managed for all roles. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: role1,role2,dev_role*.

If specified, POSTGRES_IGNORE_ROLE_LIST takes precedence over this setting.

POSTGRES_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by POSTGRES_MANAGE_USER_LIST.

POSTGRES_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by POSTGRES_MANAGE_GROUP_LIST.

POSTGRES_IGNORE_ROLE_LIST

string

No

Specifies a comma-separated list of role names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all roles are subject to access control.

This setting supersedes any values specified by POSTGRES_MANAGE_ROLE_LIST.

POSTGRES_USER_ROLE_PREFIX

string

priv_user_

No

Specifies the prefix that PolicySync uses when creating local users. For example, if you have a user named <USER> defined in Privacera and the role prefix is priv_user_, the local role is named priv_user_<USER>.

POSTGRES_GROUP_ROLE_PREFIX

string

priv_group_

No

Specifies the prefix that PolicySync uses when creating local roles. For example, if you have a group named etl_users defined in Privacera and the role prefix is prefix_, the local role is named prefix_etl_users.

POSTGRES_ROLE_ROLE_PREFIX

string

priv_role_

No

Specifies the prefix that PolicySync uses when creating roles from Privacera in the PostgreSQL data source.

For example, if you have a role in Privacera named finance defined in Privacera and the role prefix is role_prefix_, the local role is named role_prefix_finance.

POSTGRES_USE_NATIVE_PUBLIC_GROUP

boolean

true

No

Specifies whether PolicySync uses the PostgreSQL native public group for access grants whenever a policy refers to a public group. The default value is true.

POSTGRES_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by POSTGRES_MANAGE_GROUP_LIST. The default value is false.

POSTGRES_MANAGE_USER_FILTERBY_ROLE

boolean

false

No

Specifies whether to manage only users that are members of the roles specified by POSTGRES_MANAGE_ROLE_LIST. The default value is false.



Access control management

Table 26. Access control management

Name

Type

Default

Required

Description

POSTGRES_POLICY_NAME_SEPARATOR

string

_priv_

No

Specifies a string to use as part of the name of native row filter and masking policies.

POSTGRES_ROW_FILTER_POLICY_NAME_TEMPLATE

string

{database}{separator}{schema}{separator}{table}

No

Specifies a template for the name that PolicySync uses when creating a row filter policy. For example, given a table data from the schema schema that resides in the db database, the row filter policy name might resemble the following:

db_priv_schema_priv_data_<ROW_FILTER_ITEM_NUMBER>

POSTGRES_ENABLE_ROW_FILTER

boolean

false

No

Specifies whether to use the data source native row filter functionality. This setting is disabled by default. When enabled, you can create row filters only on tables, but not on views.

POSTGRES_ENABLE_VIEW_BASED_MASKING

boolean

true

No

Specifies whether to use secure view based masking. The default value is true.

Because PolicySync does not support native masking for PostgreSQL, enabling this setting is recommended.

POSTGRES_ENABLE_VIEW_BASED_ROW_FILTER

boolean

true

No

Specifies whether to use secure view based row filtering. The default value is true.

While PostgreSQL supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

POSTGRES_SECURE_VIEW_CREATE_FOR_ALL

boolean

true

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.

POSTGRES_MASKED_NUMBER_VALUE

integer

0

No

Specifies the default masking value for numeric column types.

POSTGRES_MASKED_TEXT_VALUE

string

<MASKED>

No

Specifies the default masking value for text and string column types.

POSTGRES_SECURE_VIEW_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

POSTGRES_SECURE_VIEW_NAME_POSTFIX

string

_secure

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

POSTGRES_SECURE_VIEW_SCHEMA_NAME_PREFIX

string

No

Specifies a prefix string to apply to a secure schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view schema name for a schema named example1 is dev_example1.

POSTGRES_SECURE_VIEW_SCHEMA_NAME_POSTFIX

string

No

Specifies a postfix string to apply to a secure view schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a schema named example1 is example1_dev.

POSTGRES_SECURE_VIEW_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

POSTGRES_SECURE_VIEW_SCHEMA_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a schema name. For example, if a schema is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

POSTGRES_GRANT_UPDATES

boolean

true

No

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.

POSTGRES_GRANT_UPDATES_MAX_RETRY_ATTEMPTS

integer

2

No

Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is 2.

POSTGRES_ENABLE_DATA_ADMIN

boolean

true

No

This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.



Access audits management

Table 27. Access audits management

Name

Type

Default

Required

Description

POSTGRES_AUDIT_ENABLE

boolean

false

Yes

Specifies whether Privacera fetches access audit data from the data source.

POSTGRES_AUDIT_EXCLUDED_USERS

string

POSTGRES_JDBC_USERNAME

No

Specifies a comma separated list of users to exclude when fetching access audits. For example: "user1,user2,user3".

POSTGRES_AUDIT_SOURCE

string

sqs

No

Specifies the source for audit information. The following values are supported:

  • sqs

  • gcp_pgaudit

The default value is: sqs



AWS SQS Postgres audit properties

Table 28. AWS SQS Postgres audit properties

Name

Type

Default

Required

Description

POSTGRES_AWS_ACCESS_KEY

string

No

Specifies the Amazon Web Services (AWS) access key that PolicySync uses to create an IAM client role to access the SQS queue to retrieve access audit information.

Specify this only if your deployment machine lacks an IAM role with the necessary permissions.

POSTGRES_AWS_SECRET_KEY

string

No

Specifies the Amazon Web Services (AWS) secret key that PolicySync uses to create an IAM client role to access the SQS queue to retrieve access audit information.

Specify this only if your deployment machine lacks an IAM role with the necessary permissions.

POSTGRES_AWS_REGION

string

POSTGRES_AUDIT_SQS_QUEUE_REGION

No

Specifies the Amazon Web Services (AWS) SQS queue region.

POSTGRES_AUDIT_SQS_QUEUE_REGION

string

us-east-1

No

Specifies the Amazon Web Services (AWS) SQS queue region.

POSTGRES_AWS_SQS_QUEUE_ENDPOINT

string

No

Specifies the SQS endpoint URL on Amazon Web Services (AWS). You must specify this value if you use a private VPC in your AWS account that is not available on the Internet.

POSTGRES_AWS_SQS_QUEUE_NAME

string

POSTGRES_AUDIT_SQS_QUEUE_NAME

No

Specifies the Amazon Web Services (AWS) SQS queue name that PolicySync uses to retrieve access audit information.

POSTGRES_AWS_SQS_QUEUE_MAX_POLL_MESSAGES

integer

100

No

Specifies the number of messages to retrieve from the SQS queue at one time for audit information.



GCP PostgreSQL audit properties

Table 29. GCP PostgreSQL audit properties

Name

Type

Default

Required

Description

POSTGRES_GCP_AUDIT_SOURCE_INSTANCE_ID

string

No

Specifies the Google Cloud Platform SQL instance ID for the PostgreSQL server. PolicySync uses this instance ID for retrieving access audit information.

The instance ID must be provided in the following formation:

<PROJECT_ID>:<DB_INSTANCE_ID>

POSTGRES_OAUTH_PRIVATE_KEY_FILE_NAME

string

policysync-postgres-gcp-audit-service-account.json

No

Specifies the name of the JSON file that contains your service account credentials. This setting applies only to PostgreSQL on Google Cloud Platform.



Accessing PostgreSQL Audits in GCP

Prerequisites

Ensure the following prerequisites are met:

Configuration

  1. In GCP:

    1. Run the following commands on Google Cloud's shell (gcloud) by providing GCP_PROJECT_ID and INSTANCE_NAME.

      gcloud sql instances patch  {INSTANCE_NAME} --database-flags=cloudsql.enable_pgaudit=on,pgaudit.log=all --project {GCP_PROJECT_ID}
    2. Run a SQL command using a compatible psql client to create the pgAudit extension.

      CREATEEXTENSIONpgaudit;
    3. Create a service account and private key JSON file, which will be used by PolicySync to pull access audits. See Setting up authentication and edit the following fields:

      • Service account name: Enter any user-defined name. For example, policysync-postgres-gcp-audit-service-account.

      • Select a role: Select Private Logs Viewer role.

      • Create new key: Create a service account key and download the JSON file in the custom-vars folder.

  2. In Privacera Manager:

    Add the following properties in vars.policysync.postgres.yml file:

    POSTGRES_AUDIT_SOURCE:"gcp_pgaudit"POSTGRES_GCP_AUDIT_SOURCE_INSTANCE_ID:"<PLEASE_CHANGE>"POSTGRES_OAUTH_PRIVATE_KEY_FILE_NAME:"<PLEASE_CHANGE>"

    Property

    Mandatory

    Description

    Default Value

    Example

    POSTGRES_AUDIT_SOURCE

    Yes

    Supported audit sources are sqs and gcp_pgaudit. Default is set to sqs.

    gcp_pgaudit

    POSTGRES_GCP_AUDIT_SOURCE_INSTANCE_ID

    Yes

    This property is used to specify the GCP Cloud SQL instance id for the PostgreSQL server, which will be used to retrieve access audits.

    The value for this instance id must be in the format:

    project_id:db_instance_id.

    demo-project:postgres-demo-server

    POSTGRES_OAUTH_PRIVATE_KEY_FILE_NAME

    Yes

    This property is used to specify the name of the JSON file containing the service account credential that was downloaded from the Google service account keys section.

    policysync-postgres-gcp-audit-service-account.json

Configure AWS RDS PostgreSQL instance for access audits

You can configure your AWS account to allow Privacera to access your RDS PostgreSQL instance audit logs through Amazon cloudWatch logs. To enable this functionality, you must make the following changes in your account:

  • Update the AWS RDS parameter group for the database

  • Create an AWS SQS queue

  • Specify an AWS Lambda function

  • Create an IAM role for an EC2 instance

Update the AWS RDS parameter group for the database

To expose access audit logs, you must update configuration for the data source.

Procedure

  1. Log in to your AWS account.

  2. To create a role for audits, run the following SQL query with a user with administrative credentials for your data source:

    CREATE ROLE rds_pgaudit;
  3. Create a new parameter group for your database and specify the following values:

    • Parameter group family: Select a database from either the aurora-postgresql or postgres families.

    • Type: Select DB Parameter Group.

    • Group name: Specify a group name for the parameter group.

    • Description: Specify a description for the parameter group.

  4. Edit the parameter group that you created in the previous step and set the following values:

    • pgaudit.log: Specify all, overwriting any existing value.

    • shared_preload_libraries: Specify pg_stat_statements,pgaudit.

    • pgaudit.role: Specify rds_pgaudit.

  5. Associate the parameter group that you created with your database. Modify the configuration for the database instance and make the following changes:

    • DB parameter group: Specify the parameter group you created in this procedure.

    • PostgreSQL log: Ensure this option is set to enable logging to Amazon cloudWatch logs.

  6. When prompted, choose the option to immediately apply the changes you made in the previous step.

  7. Restart the database instance.

Verification

To verify that your database instance logs are available, complete the following steps:

  1. From the Amazon RDS console, View the logs for your database instance from the RDS console.

  2. From the CloudWatch console, complete the following steps:

    1. Find the /aws/rds/cluster/* log group that corresponds to your database instance.

    2. Click the log group name to confirm that a log stream exists for the database instance, and then click on a log stream name to confirm that log messages are present.

Create an AWS SQS queue

To create an SQS queue used by an AWS Lambda function that you will create later, complete the following steps.

  1. From the AWS console, create a new Amazon SQS queue with the default settings. Use the following format when specifying a value for the Name field:

    privacera-postgres-<RDS_CLUSTER_NAME>-audits

    where:

    • RDS_CLUSTER_NAME: Specifies the name of your RDS cluster.

  2. After the queue is created save the URL of the queue for use later.

Specify an AWS Lambda function

To create an AWS Lambda function to interact with the SQS queue, complete the following steps. In addition to creating the function, you must create a new IAM policy and associate a new IAM role with the function. You need to know your AWS account ID and AWS region to complete this procedure.

  1. From the IAM console, create a new IAM policy and input the following JSON:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "logs:CreateLogGroup",
                "Resource": "arn:aws:logs:<REGION>:<ACCOUNT_ID>:*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": [
                    "arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/aws/lambda/<LAMBDA_FUNCTION_NAME>:*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": "sqs:SendMessage",
                "Resource": "arn:aws:sqs:<REGION>:<ACCOUNT_ID>:<SQS_QUEUE_NAME>"
            }
        ]
    }

    where:

    • REGION: Specify your AWS region.

    • ACCOUNT_ID: Specify your AWS account ID.

    • LAMBDA_FUNCTION_NAME: Specify the name of the AWS Lambda function, which you will create later. For example: privacera-postgres-cluster1-audits

    • SQS_QUEUE_NAME: Specify the name of the AWS SQS Queue.

  2. Specify a name for the IAM policy, such as privacera-postgres-audits-lambda-execution-policy, and then create the policy.

  3. From the IAM console, create a new IAM role and choose for the Use case the Lambda option.

  4. Search for the IAM policy that you just created with a name that might be similar to privacera-postgres-audits-lambda-execution-policy and select it.

  5. Specify a Role name for the IAM policy, such as privacera-postgres-audits-lambda-execution-role, and then create the role.

  6. From the AWS Lambda console, create a new function and specify the following fields:

    • Function name: Specify a name for the function, such as privacera-postgres-cluster1-audits.

    • Runtime: Select Node.js 12.x from the list.

    • Permissions: Select Use an existing role and choose the role created earlier in this procedure, such as privacera-postgres-audits-lambda-execution-role.

  7. Add a trigger to the function you created in the previous step and select CloudWatch Logs from the list, and then specify the following values:

    • Log group: Select the log group path for your Amazon RDS database instance, such as /aws/rds/cluster/database-1/postgresql.

    • Filter name: Specify auditTrigger.

  8. In the Lambda source code editor, provide the following JavaScript code in the index.js file, which is open by default in the editor:

    var zlib = require('zlib');
    
    // CloudWatch logs encoding
    var encoding = process.env.ENCODING || 'utf-8';  // default is utf-8
    var awsRegion = process.env.REGION || 'us-east-1';
    var sqsQueueURL = process.env.SQS_QUEUE_URL;
    var ignoreDatabase = process.env.IGNORE_DATABASE;
    var ignoreUsers = process.env.IGNORE_USERS;
    
    var ignoreDatabaseArray = ignoreDatabase.split(',');
    var ignoreUsersArray = ignoreUsers.split(',');
    
    // Import the AWS SDK
    const AWS = require('aws-sdk');
    
    // Configure the region
    AWS.config.update({region: awsRegion});
    
    exports.handler = function (event, context, callback) {
    
        var zippedInput = Buffer.from(event.awslogs.data, 'base64');
    
            zlib.gunzip(zippedInput, function (e, buffer) {
            if (e) {
                callback(e);
            }
    
            var awslogsData = JSON.parse(buffer.toString(encoding));
    
            // Create an SQS service object
            const sqs = new AWS.SQS({apiVersion: '2012-11-05'});
    
            console.log(awslogsData);
            if (awslogsData.messageType === 'DATA_MESSAGE') {
    
                // Chunk log events before posting
                awslogsData.logEvents.forEach(function (log) {
    
                    //// Remove any trailing \n
                    console.log(log.message)
    
                    // Checking if message falls under ignore users/database
                    var sendToSQS = true;
    
                    if(sendToSQS) {
    
                        for(var i = 0; i < ignoreDatabaseArray.length; i++) {
                           if(log.message.toLowerCase().indexOf("@" + ignoreDatabaseArray[i]) !== -1) {
                                sendToSQS = false;
                                break;
                           }
                        }
                    }
    
                    if(sendToSQS) {
    
                        for(var i = 0; i < ignoreUsersArray.length; i++) {
                           if(log.message.toLowerCase().indexOf(ignoreUsersArray[i] + "@") !== -1) {
                                sendToSQS = false;
                                break;
                           }
                        }
                    }
    
                    if(sendToSQS) {
                    
                        let sqsOrderData = {
                            MessageBody: JSON.stringify(log),
                            MessageDeduplicationId: log.id,
                            MessageGroupId: "Audits",
                            QueueUrl: sqsQueueURL
                        };
    
                        // Send the order data to the SQS queue
                        let sendSqsMessage = sqs.sendMessage(sqsOrderData).promise();
    
                        sendSqsMessage.then((data) => {
                            console.log("Sent to SQS");
                        }).catch((err) => {
                            console.log("Error in Sending to SQS = " + err);
                        });
    
                    }
                });
            }
        });
    };
  9. For the Lambda function, edit the environment variables and create the following environment variables:

    • REGION: Specify your AWS region.

    • SQS_QUEUE_URL: Specify your AWS SQS queue URL.

    • IGNORE_DATABASE: Specify privacera_db.

    • IGNORE_USERS: Specify your database administrative user, such as privacera.

Create an IAM role for an EC2 instance

To create an IAM role for the AWS EC2 instance where you installed Privacera so that Privacera can read the AWS SQS queue, complete the following steps:

  1. From the IAM console, create a new IAM policy and input the following JSON:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "sqs:DeleteMessage",
                    "sqs:GetQueueUrl",
                    "sqs:ListDeadLetterSourceQueues",
                    "sqs:ReceiveMessage",
                    "sqs:GetQueueAttributes"
                ],
                "Resource": "<SQS_QUEUE_ARN>"
            },
            {
                "Effect": "Allow",
                "Action": "sqs:ListQueues",
                "Resource": "*"
            }
        ]
    }
    

    where:

    • SQS_QUEUE_ARN: Specifies the AQS SQS Queue ARN identifier for the SQS Queue you created earlier.

  2. Specify a name for the IAM policy, such as postgres-audits-sqs-read-policy, and create the policy.

  3. Attach the IAM policy to the AWS EC2 instance where you installed Privacera.

Accessing PostgreSQL Audits in GCP

Prerequisites

Ensure the following prerequisites are met:

Configuration

  1. In GCP:

    1. Run the following commands on Google Cloud's shell (gcloud) by providing GCP_PROJECT_ID and INSTANCE_NAME.

      gcloud sql instances patch  {INSTANCE_NAME} --database-flags=cloudsql.enable_pgaudit=on,pgaudit.log=all --project {GCP_PROJECT_ID}
      
    2. Run a SQL command using a compatible psql client to create the pgAudit extension.

      CREATE EXTENSION pgaudit;              
    3. Create a service account and private key JSON file, which will be used by PolicySync to pull access audits. See Setting up authentication and edit the following fields:

      • Service account name: Enter any user-defined name. For example, policysync-postgres-gcp-audit-service-account.

      • Select a role: Select Private Logs Viewer role.

      • Create new key: Create a service account key and download the JSON file in the custom-vars folder.

  2. In Privacera Manager:

    Add the following properties in vars.policysync.postgres.yml file:

    POSTGRES_AUDIT_SOURCE:"gcp_pgaudit"
    POSTGRES_GCP_AUDIT_SOURCE_INSTANCE_ID:""
    POSTGRES_OAUTH_PRIVATE_KEY_FILE_NAME:""
    
Accessing Cross Account SQS Queue for PostgreSQL Audits

Prerequisites

Ensure the following prerequisites are met:

  • Access to AWS account with EC2 instance where Privacera Manager is configured.

  • Access to AWS account where SQS Queue is configured.

Configuration

  1. Get the ARN of the account where the EC2 instance is running.

    1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

    2. In the navigation pane, choose Instances.

    3. Search for your instance and select it.

    4. In the Security tab, click the link in the IAM Role.

      ec2_iam_arn.jpg
    5. Copy the ARN of the IAM Role.

  2. Get the ARN of the account where the SQS Queue instance is configured.

    1. Open the Amazon SQS console at https://console.aws.amazon.com/sqs/.

    2. From the left navigation pane, choose Queues. From the queue list, select the queue that you created.

    3. In the Details section, copy the ARN of the queue.

  3. Add the policy in the AWS SQS account to grant permissions to the AWS EC2 account.

    1. Open the Amazon SQS console at https://console.aws.amazon.com/sqs/.

    2. In the navigation pane, choose Queues.

    3. Choose a queue and choose Edit.

    4. Scroll to the Access policy section.

      sqs_queue_access_policy.jpg
    5. Add the access policy statements in the input box.

      {
        "Version": "2012-10-17",
        "Id": "PolicyAllowSQS",
        "Statement": [
          {
            "Sid": "StmtAllowSQS",
            "Effect": "Allow",
            "Principal": {
              "AWS": "${EC2_INSTANCE_ROLE_ARN}"
            },
            "Action": [
              "sqs:DeleteMessage",
              "sqs:GetQueueUrl",
              "sqs:ListDeadLetterSourceQueues",
              "sqs:ReceiveMessage",
              "sqs:GetQueueAttributes"
            ],
            "Resource": "${SQS_QUEUE_ARN}"
          }
        ]
      }
                
    6. When you finish configuring the access policy, choose Save.

    7. After saving, copy the SQS queue URL in the Details section.

  4. Add the SQS queue URL.

    Run the following command.

    cd ~/privacera/privacera-manager/
    vi config/custom-vars/vars.policysync.postgres.yml

    Add the URL in the following property.

    POSTGRES_AUDIT_SQS_QUEUE_NAME:"${SQS_QUEUE_URL}"
Microsoft SQL Server

These instructions enable and configure Privacera Microsoft SQL (MS SQL) database connector to an existing MS SQL database running on the Azure cloud platform or the AWS Relational Database Service (RDS). This connector uses the PolicySync method in which access policies defined in Privacera are mapped to and synchronized to the 'native' access controls in MS SQL.

The PolicySync approach has several benefits and advantages:

  • Fine-grained access control - at the database, schema, table, view, and column levels.

  • Column level masking

  • Dynamic row-level filters on tables and views

Prerequisites

The MS SQL Server must already be installed and running.

If you are installing an evaluation, you may need to install and configure an MS SQL Server with one or more databases to test against.

1) Target Database Access

The MS SQL Database server must also be accessible from the Privacera Platform host(s). The standard inbound port for MS SQL Server access is TCP 1433. Make sure that is open 'outbound' from Privacera Platform host(s) and inbound to your target MS SQL server.

2) Access Control by Privacera Service Account

Privacera Platform requires access to the target database and the service account must be established in a 'loginmanager' role. This can be configured in three ways: (1) Access Control on Azure AD Users; (2) Access Control on Local Database Users; or (3) Access Control on both Azure AD user and local users.

Access Control on Azure AD Users

  1. Confirm the MS SQL Server is configured to work with Azure AD Users.

  2. In your Azure AD, create a Privacera 'service' user to be used by Privacera for the Policy access control synchronization. For this example we'll assume the name is 'privacera_policysync@example.com' but set the value appropriately for your domain(s). Keep note of the username and password as we'll use both later.

  3. For each targeted database:

    1. Log on to the target database with an Admin role account.

    2. Execute the following:

      IF DATABASE_PRINCIPAL_ID('privacera_policysync@example.com') IS NULL BEGIN
        CREATE USER [privacera_policysync@example.com] FROM EXTERNAL PROVIDER;
        END;
      
      -- Grant full control on database to privacera_policysync@example.com user
      GRANT CONTROL ON DATABASE::${YOUR_DATABASE} TO [privacera_policysync@example.com];
      

Access Control on Local Database Users

  1. Create a Privacera 'service' user in the master database to be used by Privacera for the Policy access control synchronization. For this example, we'll assume the name is 'privacera_policysync' but set the value appropriately for your domain(s). Keep note of the username and password as we'll use both later.

    IF NOT EXISTS (SELECT name  FROM sys.sql_logins WHERE name = 'privacera_policysync') BEGIN
      CREATE LOGIN [privacera_policysync] WITH PASSWORD = '${PASSWORD}'
      END;
    
    IF DATABASE_PRINCIPAL_ID('privacera_policysync') IS NULL BEGIN
      CREATE USER [privacera_policysync] FROM LOGIN [privacera_policysync];
      END;
    
    EXEC sp_addrolemember [loginmanager], [privacera_policysync];
    
  2. For each targeted database:

    1. Log on to the target database with an Admin role account.

    2. Execute the following:

      IF DATABASE_PRINCIPAL_ID('privacera_policysync') IS NULL BEGIN
        CREATE USER [privacera_policysync] FROM LOGIN [privacera_policysync];
        END;
      
      -- Grant full control on database to privacera_policysync user
      GRANT CONTROL ON DATABASE::${YOUR_DATABASE} TO [privacera_policysync];
      

Access Control on Azure AD and Local Database Users

  1. Confirm the MS SQL Server is configured to work with Azure AD Users.

  2. In your Azure AD, create a Privacera 'service' user to be used by Privacera for the Policy access control synchronization. For this example, we'll assume the name is 'privacera_policysync@example.com' but set the value appropriately for your domain(s). Keep note of the username and password as we'll use both later.

  3. Create a Privacera 'service' user in the master database to be used by Privacera for the Policy access control synchronization. For this example, we'll assume the name is 'privacera_policysync@example.com' but set the value appropriately for your domain(s). Keep note of the username and password as we'll use both later.

    IF DATABASE_PRINCIPAL_ID('privacera_policysync@example.com') IS NULL BEGIN
      CREATE USER [privacera_policysync@example.com] FROM EXTERNAL PROVIDER;
      END;
    
    EXEC sp_addrolemember [loginmanager], [privacera_policysync@example.com];
    
  4. For each targeted database:

    1. Log on to the target database with an Admin role account.

    2. Execute the following:

      IF DATABASE_PRINCIPAL_ID('privacera_policysync@example.com') IS NULL BEGIN
        CREATE USER [privacera_policysync@example.com] FROM EXTERNAL PROVIDER;
        END;
      
      -- Grant full control on database to privacera_policysync@example.com user
      GRANT CONTROL ON DATABASE::${YOUR_DATABASE} TO [privacera_policysync@example.com];
      

3) Create or Identify an ADLS Gen2 storage used to store MS SQL Server Audits

  1. Consult the following article How to Configure MS SQL Server for Database Synapse Audits.

  2. Using information from that article obtain the Audit storage URL. This will be used in the Privacera MS SQL PolicySync configuration.

4) Create an MSSSQL server in AWS RDS to store MSSQL Server Audits

  1. Consult the following article SQL Server Audit.

  2. Using information from that article obtain the Audit storage URL. This will be used in the Privacera MSSQL PolicySync configuration.

CLI Configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following command.

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.policysync.mssql.yml custom-vars/
    vi custom-vars/vars.policysync.mssql.yml
    
  3. Set the properties for your specific installation. For property details and description, see the Configuration Properties section that follows.

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see .

    There are two properties that establish the type of 'masking' that will be supported for this connector: 'native masking', and 'view-based masking'.

    Native Masking - in MS SQL known as 'Dynamic Data Masking' - is supported directly by MS SQL Server. This level of masking has low granularity and only supports the ability to mask by database for each user. See Microsoft documentation Dynamic Data Masking for more background.

    Privacera Platform supports 'View-based Masking', which for MS SQL supports Row-level filtering and masking. View-based masking is the default and is recommended.

    Property Name

    Value/Comments

    MS SQL_ENABLE_MASKING

    Set to true to enable MS SQL 'native' masking and disable View-based masking functionality.

    MS SQL_ENABLE_VIEW_BASED_MASKING

    Set to true to enable View-based masking functionality.

    MS SQL_UNMASKED_DATA_ROLE

    A comma-separated MS SQL role list for roles authorized for unmasked data. all other users will see masked data.

  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration Properties

JDBC configuration properties

Table 30. JDBC configuration properties

Name

Type

Default

Required

Description

MSSQL_JDBC_URL

string

Yes

Specifies the JDBC URL for the Microsoft SQL Server connector.

Use the following format for the JDBC string:

jdbc:sqlserver://<JDBC_SQLSERVER_URL_WITH_PORT_NUMBER>

MSSQL_JDBC_USERNAME

string

Yes

Specifies the JDBC username to use.

MSSQL_JDBC_PASSWORD

string

Yes

Specifies the JDBC password to use.

MSSQL_MASTER_DB

string

master

Yes

Specifies the name of the JDBC master database that PolicySync establishes an initial connection to.

MSSQL_AUTHENTICATION_TYPE

string

SqlPassword

Yes

Specifies the authentication type for the database engine. The following types are supported:

  • If the user specified by MSSQL_JDBC_USERNAME is a local user, specify: SqlPassword

  • If the user specified by MSSQL_JDBC_USERNAME is a Microsoft Azure Active Directory user, specify: ActiveDirectoryPassword

MSSQL_DEFAULT_USER_PASSWORD

string

Yes

Specifies the password to use when PolicySync creates new users.

MSSQL_OWNER_ROLE

string

No

Specifies the role that owns the resources managed by PolicySync. You must ensure that this user exists as PolicySync does not create this user.

  • If a value is not specified, resources are owned by the creating user. In this case, the owner of the resource will have all access to the resource.

  • If a value is specified, the owner of the resource will be changed to the specified value.

The following resource types are supported:

  • Database

  • Schemas

  • Tables

  • Views



Load keys and intervals

Table 31. Load keys and intervals

Name

Type

Default

Required

Description

MSSQL_LOAD_RESOURCES_KEY

string

load_from_database_columns

No

Specifies how PolicySync loads resources from Microsoft SQL Server. The following values are allowed:

  • load: Load resources from Microsoft SQL Server with a single SQL query.

MSSQL_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

MSSQL_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

MSSQL_PERMISSION_SYNC_INTERVAL

integer

540

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

MSSQL_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Resources management

Table 32. Resources management

Name

Type

Default

Required

Description

MSSQL_MANAGE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names for which PolicySync manages access control. If unset, access control is managed for all databases. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of databases might resemble the following: testdb1,testdb2,sales db*.

If specified, MSSQL_IGNORE_DATABASE_LIST takes precedence over this setting.

MSSQL_MANAGE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a schema:

<DATABASE_NAME>.<SCHEMA_NAME>

If specified, MSSQL_IGNORE_SCHEMA_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all schemas are managed:

<DATABASE_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all schemas.

  • If set to none no schemas are managed.

MSSQL_MANAGE_TABLE_LIST

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a table:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

If specified, MSSQL_IGNORE_TABLE_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<DATABASE_NAME>.<SCHEMA_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all tables.

  • If set to none no tables are managed.

MSSQL_IGNORE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all databases are subject to access control.

For example:

testdb1,testdb2,sales_db*

This setting supersedes any values specified by MSSQL_MANAGE_DATABASE_LIST.

MSSQL_IGNORE_SCHEMA_LIST

string

No

Specifies a comma-separated list of schema names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all schemas are subject to access control.

For example:

testdb1.schema1,testdb2.schema2,sales_db*.sales*

This setting supersedes any values specified by MSSQL_MANAGE_SCHEMA_LIST.

MSSQL_IGNORE_TABLE_LIST

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. If not specified, all tables are subject to access control. Names are case-sensitive. Specify tables using the following format:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

This setting supersedes any values specified by MSSQL_MANAGE_TABLE_LIST.



Users/Groups/Roles management

Table 33. Users/Groups/Roles management

Name

Type

Default

Required

Description

MSSQL_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the MSSQL_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

MSSQL_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the MSSQL_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

MSSQL_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the MSSQL_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

MSSQL_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the MSSQL_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

MSSQL_ROLE_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a role name and replaces each matching character with the value specified by the MSSQL_ROLE_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

MSSQL_ROLE_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the MSSQL_ROLE_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

MSSQL_USER_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts user names to lowercase when creating local users. If set to true, case sensitivity is preserved.

MSSQL_GROUP_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts group names to lowercase when creating local groups. If set to true, case sensitivity is preserved.

MSSQL_ROLE_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts role names to lowercase when creating local roles. If set to true, case sensitivity is preserved.

MSSQL_USER_NAME_CASE_CONVERSION

string

lower

No

Specifies how user name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if MSSQL_USER_NAME_PERSIST_CASE_SENSITIVITY is set to true.

MSSQL_GROUP_NAME_CASE_CONVERSION

string

lower

No

Specifies how group name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if MSSQL_GROUP_NAME_PERSIST_CASE_SENSITIVITY is set to true.

MSSQL_ROLE_NAME_CASE_CONVERSION

string

lower

No

Specifies how role name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if MSSQL_ROLE_NAME_PERSIST_CASE_SENSITIVITY is set to true.

MSSQL_USER_FILTER_WITH_EMAIL

string

No

Set this property to true if you only want to manage users who have an email address associated with them in the portal.

MSSQL_MANAGE_USERS

boolean

false

No

Specifies whether PolicySync maintains user membership in roles in the Microsoft SQL Server data source.

MSSQL_MANAGE_GROUPS

boolean

false

No

Specifies whether PolicySync creates groups from Privacera in the Microsoft SQL Server data source.

MSSQL_MANAGE_ROLES

boolean

false

No

Specifies whether PolicySync creates roles from Privacera in the Microsoft SQL Server data source.

MSSQL_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, MSSQL_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

MSSQL_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, MSSQL_IGNORE_GROUP_LIST takes precedence over this setting.

MSSQL_MANAGE_ROLE_LIST

string

No

Specifies a comma-separated list of role names for which PolicySync manages access control. If unset, access control is managed for all roles. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: role1,role2,dev_role*.

If specified, MSSQL_IGNORE_ROLE_LIST takes precedence over this setting.

MSSQL_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by MSSQL_MANAGE_USER_LIST.

MSSQL_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by MSSQL_MANAGE_GROUP_LIST.

MSSQL_IGNORE_ROLE_LIST

string

No

Specifies a comma-separated list of role names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all roles are subject to access control.

This setting supersedes any values specified by MSSQL_MANAGE_ROLE_LIST.

MSSQL_USER_ROLE_PREFIX

string

priv_user_

No

Specifies the prefix that PolicySync uses when creating local users. For example, if you have a user named <USER> defined in Privacera and the role prefix is priv_user_, the local role is named priv_user_<USER>.

MSSQL_GROUP_ROLE_PREFIX

string

priv_group_

No

Specifies the prefix that PolicySync uses when creating local roles. For example, if you have a group named etl_users defined in Privacera and the role prefix is prefix_, the local role is named prefix_etl_users.

MSSQL_ROLE_ROLE_PREFIX

string

priv_role_

No

Specifies the prefix that PolicySync uses when creating roles from Privacera in the Microsoft SQL Server data source.

For example, if you have a role in Privacera named finance defined in Privacera and the role prefix is role_prefix_, the local role is named role_prefix_finance.

MSSQL_USE_NATIVE_PUBLIC_GROUP

boolean

false

No

Specifies whether PolicySync uses the Microsoft SQL Server native public group for access grants whenever a policy refers to a public group. The default value is false.

MSSQL_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by MSSQL_MANAGE_GROUP_LIST. The default value is false.

MSSQL_MANAGE_USER_FILTERBY_ROLE

boolean

false

No

Specifies whether to manage only users that are members of the roles specified by MSSQL_MANAGE_ROLE_LIST. The default value is false.



Native Row filter

Table 34. Native Row filter

Name

Type

Default

Required

Description

MSSQL_ENABLE_ROW_FILTER

boolean

false

No

Specifies whether to use the data source native row filter functionality. This setting is disabled by default. When enabled, you can create row filters only on tables, but not on views.

MSSQL_ENABLE_VIEW_BASED_MASKING

boolean

true

No

Specifies whether to use secure view based masking. The default value is true.

MSSQL_ENABLE_VIEW_BASED_ROW_FILTER

boolean

true

No

Specifies whether to use secure view based row filtering. The default value is true.

While Microsoft SQL Server supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

MSSQL_SECURE_VIEW_CREATE_FOR_ALL

boolean

true

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.

MSSQL_MASKED_NUMBER_VALUE

integer

0

No

Specifies the default masking value for numeric column types.

MSSQL_MASKED_TEXT_VALUE

string

<MASKED>

No

Specifies the default masking value for text and string column types.

MSSQL_MASKED_DATE_VALUE

string

null

No

Specifies the default masking value for date column types.

MSSQL_SECURE_VIEW_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

MSSQL_SECURE_VIEW_NAME_POSTFIX

string

_secure

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

MSSQL_SECURE_VIEW_SCHEMA_NAME_PREFIX

string

No

Specifies a prefix string to apply to a secure schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view schema name for a schema named example1 is dev_example1.

MSSQL_SECURE_VIEW_SCHEMA_NAME_POSTFIX

string

No

Specifies a postfix string to apply to a secure view schema name. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a schema named example1 is example1_dev.

MSSQL_SECURE_VIEW_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

MSSQL_SECURE_VIEW_SCHEMA_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a schema name. For example, if a schema is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

MSSQL_GRANT_UPDATES

boolean

true

Yes

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.

MSSQL_GRANT_UPDATES_MAX_RETRY_ATTEMPTS

integer

2

No

Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is 2.

MSSQL_ENABLE_DATA_ADMIN

boolean

true

No

This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.



Access audits management

Table 35. Access audits management

Name

Type

Default

Required

Description

MSSQL_AUDIT_ENABLE

boolean

false

Yes

Specifies whether Privacera fetches access audit data from the data source.

If specified, you must specify a value for the MSSQL_AUDIT_STORAGE_URL setting.

MSSQL_AUDIT_STORAGE_URL

string

No

Specifies the URL for the audit logs provided by the Azure SQL Auditing service. For example: https://test.blob.core.windows.net/sqldbauditlogs/test

MSSQL_AUDIT_INITIAL_PULL_MINUTES

integer

30

No

Specifies the initial delay, in minutes, before PolicySync retrieves access audits from Microsoft SQL Server.

MSSQL_AUDIT_LOAD_KEY

string

load

No

Specifies the method that PolicySync uses to load access audit information.

The following values are valid:

  • load: If your data source is Microsoft SQL Server, set this value to use SQL queries to load access audit information.

  • load_aws: If your data source is Amazon RDS for SQL Server, set this value to load access audit information.

  • load_synapse: If your data source is Microsoft Azure Synapse, set this value to load access audit information.

MSSQL_USER_LOAD_KEY

string

load

No

Specifies how PolicySync loads users from Microsoft SQL Server. The following values are valid:

  • load

  • load_db

MSSQL_EXTERNAL_USER_AS_INTERNAL

boolean

false

No

Specifies whether PolicySync creates local users for external users.

MSSQL_AUDIT_EXCLUDED_USERS

string

No

Specifies a comma separated list of users to exclude when fetching access audits. For example: "user1,user2,user3".

MSSQL_MANAGE_GROUP_POLICY_ONLY

boolean

false

No

Specifies whether access policies apply to only groups. If enabled, any policies that apply to users or roles are ignored.



Databricks SQL

This topic shows how to configure access control in Databricks SQL.

Prerequisites

Ensure the following prerequisite is met:

  • Create an endpoint in Databricks SQL with a user having admin privileges. For more information, refer to Create an endpoint in Databricks SQL.

  • As you configure the endpoint using the link provided above, get the following values:

    • Host URL

    • JDBC URL

    • SQL endpoint token

    • Database List

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.policysync.databricks.sql.analytics.yml custom-vars/
    vi config/custom-vars/vars.policysync.databricks.sql.analytics.yml
    
  3. Set the properties for your specific installation. For property details and description, see the Configuration Properties section that follows.

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see Databricks SQL Connector.

  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

JDBC configuration properties

Table 36. JDBC configuration properties

Name

Type

Default

Required

Description

DATABRICKS_SQL_ANALYTICS_JDBC_URL

string

Yes

Specifies the JDBC URL for the Databricks SQL connector.

Use the following format for the JDBC URL:

jdbc:spark://<WORKSPACE_URL>:443/<DATABASE>;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/1234567890

The workspace URL and the database name are derived from your Databricks SQL configuration.

DATABRICKS_SQL_ANALYTICS_JDBC_USERNAME

string

Yes

Specifies the JDBC username to use.

DATABRICKS_SQL_ANALYTICS_JDBC_PASSWORD

string

Yes

Specifies the access token of the SQL endpoint to use.

DATABRICKS_SQL_ANALYTICS_JDBC_DB

string

Yes

Specifies the name of the JDBC database to use.

DATABRICKS_SQL_ANALYTICS_OWNER_ROLE

string

No

Specifies the role that owns the resources managed by PolicySync. You must ensure that this user exists as PolicySync does not create this user.

  • If a value is not specified, resources are owned by the creating user. In this case, the owner of the resource will have all access to the resource.

  • If a value is specified, the owner of the resource will be changed to the specified value.

The following resource types are supported:

  • Database

  • Schemas

  • Tables

  • Views

DATABRICKS_SQL_ANALYTICS_HOST_URL

string

Yes

Specifies the base URL for the Databricks SQL instance.



Load keys and intervals

Table 37. Load keys and intervals

Name

Type

Default

Required

Description

DATABRICKS_SQL_ANALYTICS_LOAD_RESOURCES_KEY

string

load_like

No

Specifies how PolicySync loads resources from Databricks SQL. The following values are allowed:

  • load_like: Default value for loading resources, to be used in production.

  • load: Load resources from Databricks SQL with a top-down resources approach, that is, it first loads the database and then the schemas followed by tables and its columns. This mode is only for development purposes.

DATABRICKS_SQL_ANALYTICS_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

DATABRICKS_SQL_ANALYTICS_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

DATABRICKS_SQL_ANALYTICS_PERMISSION_SYNC_INTERVAL

integer

540

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

DATABRICKS_SQL_ANALYTICS_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Resources management

Table 38. Resources management

Name

Type

Default

Required

Description

DATABRICKS_SQL_ANALYTICS_MANAGE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names for which PolicySync manages access control. If unset, access control is managed for all databases. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of databases might resemble the following: testdb1,testdb2,sales db*.

If specified, DATABRICKS_SQL_ANALYTICS_IGNORE_DATABASE_LIST takes precedence over this setting.

DATABRICKS_SQL_ANALYTICS_MANAGE_TABLE_LIST

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

Use the following format when specifying a table:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

If specified, DATABRICKS_SQL_ANALYTICS_IGNORE_TABLE_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<DATABASE_NAME>.<SCHEMA_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all tables.

  • If set to none no tables are managed.

DATABRICKS_SQL_ANALYTICS_IGNORE_DATABASE_LIST

string

No

Specifies a comma-separated list of database names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all databases are subject to access control.

For example:

testdb1,testdb2,sales_db*

This setting supersedes any values specified by DATABRICKS_SQL_ANALYTICS_MANAGE_DATABASE_LIST.

DATABRICKS_SQL_ANALYTICS_IGNORE_TABLE_LIST

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. If not specified, all tables are subject to access control. Names are case-sensitive. Specify tables using the following format:

<DATABASE_NAME>.<SCHEMA_NAME>.<TABLE_NAME>

This setting supersedes any values specified by DATABRICKS_SQL_ANALYTICS_MANAGE_TABLE_LIST.



Users/Groups/Roles management

Table 39. Users/Groups/Roles management

Name

Type

Default

Required

Description

DATABRICKS_SQL_ANALYTICS_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a user name and replaces each matching character with the value specified by the DATABRICKS_SQL_ANALYTICS_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the DATABRICKS_SQL_ANALYTICS_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the DATABRICKS_SQL_ANALYTICS_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the DATABRICKS_SQL_ANALYTICS_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the DATABRICKS_SQL_ANALYTICS_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the DATABRICKS_SQL_ANALYTICS_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_ROLE_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a role name and replaces each matching character with the value specified by the DATABRICKS_SQL_ANALYTICS_ROLE_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_ROLE_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the DATABRICKS_SQL_ANALYTICS_ROLE_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

DATABRICKS_SQL_ANALYTICS_USER_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts user names to lowercase when creating local users. If set to true, case sensitivity is preserved.

DATABRICKS_SQL_ANALYTICS_GROUP_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts group names to lowercase when creating local groups. If set to true, case sensitivity is preserved.

DATABRICKS_SQL_ANALYTICS_ROLE_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts role names to lowercase when creating local roles. If set to true, case sensitivity is preserved.

DATABRICKS_SQL_ANALYTICS_USER_NAME_CASE_CONVERSION

string

lower

No

Specifies how user name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if DATABRICKS_SQL_ANALYTICS_USER_NAME_PERSIST_CASE_SENSITIVITY is set to true.

DATABRICKS_SQL_ANALYTICS_GROUP_NAME_CASE_CONVERSION

string

lower

No

Specifies how group name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if DATABRICKS_SQL_ANALYTICS_GROUP_NAME_PERSIST_CASE_SENSITIVITY is set to true.

DATABRICKS_SQL_ANALYTICS_ROLE_NAME_CASE_CONVERSION

string

lower

No

Specifies how role name conversions are performed. The following options are valid:

  • lower: Convert to lowercase

  • upper: Convert to uppercase

  • none: Preserve case

This setting applies only if DATABRICKS_SQL_ANALYTICS_ROLE_NAME_PERSIST_CASE_SENSITIVITY is set to true.

DATABRICKS_SQL_ANALYTICS_CREATE_USER

boolean

true

No

Specifies whether PolicySync creates local users for each user in Privacera.

DATABRICKS_SQL_ANALYTICS_MANAGE_USERS

boolean

true

No

Specifies whether PolicySync maintains user membership in roles in the Databricks SQL data source.

DATABRICKS_SQL_ANALYTICS_MANAGE_GROUPS

boolean

true

No

Specifies whether PolicySync creates groups from Privacera in the Databricks SQL data source.

DATABRICKS_SQL_ANALYTICS_MANAGE_ROLES

boolean

true

No

Specifies whether PolicySync creates roles from Privacera in the Databricks SQL data source.

DATABRICKS_SQL_ANALYTICS_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, DATABRICKS_SQL_ANALYTICS_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

DATABRICKS_SQL_ANALYTICS_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, DATABRICKS_SQL_ANALYTICS_IGNORE_GROUP_LIST takes precedence over this setting.

DATABRICKS_SQL_ANALYTICS_MANAGE_ROLE_LIST

string

No

Specifies a comma-separated list of role names for which PolicySync manages access control. If unset, access control is managed for all roles. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: role1,role2,dev_role*.

If specified, DATABRICKS_SQL_ANALYTICS_IGNORE_ROLE_LIST takes precedence over this setting.

DATABRICKS_SQL_ANALYTICS_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by DATABRICKS_SQL_ANALYTICS_MANAGE_USER_LIST.

DATABRICKS_SQL_ANALYTICS_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by DATABRICKS_SQL_ANALYTICS_MANAGE_GROUP_LIST.

DATABRICKS_SQL_ANALYTICS_IGNORE_ROLE_LIST

string

No

Specifies a comma-separated list of role names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all roles are subject to access control.

This setting supersedes any values specified by DATABRICKS_SQL_ANALYTICS_MANAGE_ROLE_LIST.

DATABRICKS_SQL_ANALYTICS_GROUP_ROLE_PREFIX

string

priv_group_

No

Specifies the prefix that PolicySync uses when creating local roles. For example, if you have a group named etl_users defined in Privacera and the role prefix is prefix_, the local role is named prefix_etl_users.

DATABRICKS_SQL_ANALYTICS_ROLE_ROLE_PREFIX

string

priv_role_

No

Specifies the prefix that PolicySync uses when creating roles from Privacera in the Databricks SQL data source.

For example, if you have a role in Privacera named finance defined in Privacera and the role prefix is role_prefix_, the local role is named role_prefix_finance.

DATABRICKS_SQL_ANALYTICS_USE_NATIVE_PUBLIC_GROUP

boolean

true

No

Specifies whether PolicySync uses the Databricks SQL native public group for access grants whenever a policy refers to a public group. The default value is true.

DATABRICKS_SQL_ANALYTICS_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by DATABRICKS_SQL_ANALYTICS_MANAGE_GROUP_LIST. The default value is false.

DATABRICKS_SQL_ANALYTICS_MANAGE_USER_FILTERBY_ROLE

boolean

false

No

Specifies whether to manage only users that are members of the roles specified by DATABRICKS_SQL_ANALYTICS_MANAGE_ROLE_LIST. The default value is false.

DATABRICKS_SQL_ANALYTICS_USER_USE_EMAIL_AS_SERVICE_NAME

boolean

true

No

This Property is used to map the username to the email address when granting/revoking access.



Access control management

Table 40. Access control management

Name

Type

Default

Required

Description

DATABRICKS_SQL_ANALYTICS_ENABLE_VIEW_BASED_MASKING

boolean

true

No

Specifies whether to use secure view based masking. The default value is true.

DATABRICKS_SQL_ANALYTICS_ENABLE_VIEW_BASED_ROW_FILTER

boolean

true

No

Specifies whether to use secure view based row filtering. The default value is true.

While Databricks SQL supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_CREATE_FOR_ALL

boolean

true

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.

DATABRICKS_SQL_ANALYTICS_MASKED_NUMBER_VALUE

integer

0

No

Specifies the default masking value for numeric column types.

DATABRICKS_SQL_ANALYTICS_MASKED_TEXT_VALUE

string

<MASKED>

No

Specifies the default masking value for text and string column types.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_NAME_POSTFIX

string

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same schema name as the table schema name.

If you want to change the secure view schema name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_DATABASE_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same name as the table database name.

For example, if the prefix is priv_, then the secure view name for a database named example1 is priv_example1.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_DATABASE_NAME_POSTFIX

string

_secure

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same name as the table database name.

For example, if the postfix is _sec, then the secure view name for a database named example1 is example1_sec.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

DATABRICKS_SQL_ANALYTICS_SECURE_VIEW_DATABASE_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a database name. For example, if the database is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

DATABRICKS_SQL_ANALYTICS_GRANT_UPDATES

boolean

true

Yes

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.

DATABRICKS_SQL_ANALYTICS_ENABLE_DATA_ADMIN

boolean

true

No

This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.

DATABRICKS_SQL_ANALYTICS_USE_HIVE_ACCESS_POLICIES

boolean

false

No

Uncomment and set this property to true to use the privacera_hive service instead of privacera_databricks_sql_analytics. After setting the Privacera Manager property, PolicySync starts syncing the policies from the privacera_hive service. Be aware that deny and exclude polices from privacera_hive will not work, as only allow policies from Hive are supported this way.



Access audits management

Table 41. Access audits management

Name

Type

Default

Required

Description

DATABRICKS_SQL_ANALYTICS_AUDIT_ENABLE

boolean

true

Yes

Specifies whether Privacera fetches access audit data from the data source.

DATABRICKS_SQL_ANALYTICS_AUDIT_INITIAL_PULL_MINUTES

integer

30

No

Specifies the initial delay, in minutes, before PolicySync retrieves access audits from Databricks SQL.

DATABRICKS_SQL_ANALYTICS_AUDIT_EXCLUDED_USERS

string

{{DATABRICKS_SQL_ANALYTICS_JDBC_USERNAME}}

No

Specifies a comma separated list of users to exclude when fetching access audits. For example: "user1,user2,user3".



RocksDB

This topic shows how to configure the RocksDB key-value store so you can tune the performance settings for PolicySync.

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.policysync.rocksdb.tuning.yml custom-vars/
    vi custom-vars/vars.policysync.rocksdb.tuning.yml
  3. Edit the properties as required. For property details and description, refer to the Configuration Properties below.

  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
Configure the maximum log size and number of logs retained

By default each log file grows to a maximum of size of 100MB, or 104,857,600 byes, and the number of log files retained is unlimited.

To ensure that RocksDB log files do not consume too much disk space, you can configure the maximum log size and the number of files that Privacera retains by setting the following properties:

  • pscontext.rocksdb.max.log.file.size: Specifies the maximum size of a log file in bytes. The default is 104857600 bytes.

  • pscontext.rocksdb.keep.log.file.num: Specifies the maximum number of log files to retain. When this number is exceeded older log files are automatically deleted.

Procedure

  1. Log in to the system where Privacera Manager is installed, and then change to the ~/privacera/privacera-manager directory.

  2. Create a file in the config/custom-properties/ directory with the rangersync-custom-v2.properties file name.

  3. Edit the rangersync-custom-v2.properties file and specify values for the previously described logging properties as appropriate.

  4. Run Privacera Manager to update your configuration:

    ./privacera-manager.sh update
Configuration Properties

Property

Description

Example

ROCKSDB_MAX_BACKGROUND_JOBS

Specifies the maximum number of concurrent background jobs (both flushes and compactions combined).

ROCKSDB_MAX_BACKGROUND_JOBS: "2"

ROCKSDB_ALLOW_CONCURRENT_MEMTABLE_WRITE

If true, allow multi-writers to update mem tables in parallel. Only some memtable factorys support concurrent writes; currently it is implemented only for SkipListFactory. Concurrent memtable writes are not compatible with inplace_update_support or filter_deletes.

ROCKSDB_ALLOW_CONCURRENT_MEMTABLE_WRITE: "true"

ROCKSDB_ENABLE_PIPELINED_WRITE

By default, a single write thread queue is maintained. The thread gets to the head of the queue becomes write batch group leader and responsible for writing to WAL and memtable for the batch group. If enablePipelinedWrite() is true, separate write thread queue is maintained for WAL write and memtable write. A write thread first enter WAL writer queue and then memtable writer queue. Pending thread on the WAL writer queue thus only have to wait for previous writers to finish their WAL writing but not the memtable writing. Enabling the feature may improve write throughput and reduce latency of the prepare phase of two-phase commit.

ROCKSDB_ENABLE_PIPELINED_WRITE: "false"

ROCKSDB_DB_WRITE_BUFFER_SIZE

Amount of data to build up in memtables across all column families before writing to disk. This is distinct from ColumnFamilyOptions.writeBufferSize(), which enforces a limit for a single memtable. This feature is disabled by default. Specify a non-zero value to enable it.

ROCKSDB_DB_WRITE_BUFFER_SIZE: "0"

ROCKSDB_RANDOM_ACCESS_MAX_BUFFER_SIZE

This is a maximum buffer size that is used by WinMmapReadableFile in unbuffered disk I/O mode. We need to maintain an aligned buffer for reads. We allow the buffer to grow until the specified value and then for bigger requests allocate one shot buffers. In unbuffered mode we always bypass read-ahead buffer at ReadaheadRandomAccessFile When read-ahead is required we then make use of MutableDBOptionsInterface.compactionReadaheadSize() value and always try to read ahead. With read-ahead we always pre-allocate buffer to the size instead of growing it up to a limit. This option is currently honored only on Windows Default: 1 Mb Special value: 0 - means do not maintain per instance buffer. Allocate per request buffer and avoid locking.

ROCKSDB_RANDOM_ACCESS_MAX_BUFFER_SIZE: "0"

ROCKSDB_WRITABLE_FILE_MAX_BUFFER_SIZE

This is a maximum buffer size that is used by WinMmapReadableFile in unbuffered disk I/O mode. We need to maintain an aligned buffer for reads. We allow the buffer to grow until the specified value and then for bigger requests allocate one shot buffers. In unbuffered mode we always bypass read-ahead buffer at ReadaheadRandomAccessFile When read-ahead is required we then make use of MutableDBOptionsInterface.compactionReadaheadSize() value and always try to read ahead. With read-ahead we always pre-allocate buffer to the size instead of growing it up to a limit. This option is currently honored only on Windows Default: 1 Mb Special value: 0 - means do not maintain per instance buffer. Allocate per request buffer and avoid locking.

ROCKSDB_WRITABLE_FILE_MAX_BUFFER_SIZE: "0"

ROCKSDB_ALLOW_MMAP_READS

Allow the OS to mmap file for reading sst tables.

ROCKSDB_ALLOW_MMAP_READS: "false"

ROCKSDB_ALLOW_MMAP_WRITES

Allow the OS to mmap file for writing.

ROCKSDB_ALLOW_MMAP_READS: "false"

ROCKSDB_BYTES_PER_SYNC

Allows OS to incrementally sync files to disk while they are being written, asynchronously, in the background. Issue one request for every bytes_per_sync written.

ROCKSDB_BYTES_PER_SYNC: "0"

ROCKSDB_WAL_BYTES_PER_SYNC

Same as setBytesPerSync(long) , but applies to WAL files

ROCKSDB_WAL_BYTES_PER_SYNC: "0"

ROCKSDB_RATELIMITER_RATE_BYTES_PER_SEC

rateBytesPerSecond this is the only parameter you want to set most of the time. It controls the total write rate of compaction and flush in bytes per second. Currently, RocksDB does not enforce rate limit for anything other than flush and compaction, e.g. write to WAL.

ROCKSDB_RATELIMITER_RATE_BYTES_PER_SEC: "0"

ROCKSDB_MAX_OPEN_FILES

Number of open files that can be used by the DB. You may need to increase this if your database has a large working set. Value -1 means files opened are always kept open. You can estimate number of files based on target_file_size_base and target_file_size_multiplier for level-based compaction. For universal-style compaction, you can usually set it to -1.

ROCKSDB_MAX_OPEN_FILES: "0"

ROCKSDB_CF_WRITE_BUFFER_SIZE

Amount of data to build up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk file. Larger values increase performance, especially during bulk loads. Up to max_write_buffer_number write buffers may be held in memory at the same time, so you may wish to adjust this parameter to control memory usage. Also, a larger write buffer will result in a longer recovery time the next time the database is opened.

ROCKSDB_CF_WRITE_BUFFER_SIZE: "0"

ROCKSDB_CF_COMPRESSIONTYPE_LZ4COMPRESSION

ROCKSDB_CF_COMPRESSIONTYPE_ZSTDCOMPRESSION

ROCKSDB_CF_COMPRESSIONTYPE_ZLIBCOMPRESSION

Compress blocks using the specified compression algorithm. This parameter can be changed dynamically. Default: SNAPPY_COMPRESSION, which gives lightweight but fast compression.

ROCKSDB_CF_COMPRESSIONTYPE_LZ4COMPRESSION: "false"

ROCKSDB_CF_COMPRESSIONTYPE_ZSTDCOMPRESSION: "false"

ROCKSDB_CF_COMPRESSIONTYPE_ZLIBCOMPRESSION: "false"

ROCKSDB_CF_LEVEL_COMPACTION_DYNAMIC_LEVEL_BYTES

With this option on, from an empty DB, we make last level the base level, which means merging L0 data into the last level, until it exceeds max_bytes_for_level_base. And then we make the second last level to be base level, to start to merge L0 data to second last level, with its target size to be 1/max_bytes_for_level_multiplier of the last levels extra size. After the data accumulates more so that we need to move the base level to the third last one, and so on.

ROCKSDB_CF_LEVEL_COMPACTION_DYNAMIC_LEVEL_BYTES: "false"

ROCKSDB_CF_BLOOMLOCALITY

Control locality of bloom filter probes to improve cache miss rate. This option only applies to memtable prefix bloom and plaintable prefix bloom. It essentially limits the max number of cache lines each bloom filter check can touch. This optimization is turned off when set to 0. The number should never be greater than number of probes. This option can boost performance for in-memory workload but should use with care since it can cause higher false positive rate.

ROCKSDB_CF_BLOOMLOCALITY: "0"

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL

Set compaction style for DB.

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL: "false"

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_SIZERATIO

Percentage flexibility while comparing file size. If the candidate file(s) size is 1% smaller than the next file's size, then include next file into this candidate set.

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_SIZERATIO: "1"

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MINMERGEWIDTH

The minimum number of files in a single compaction run.

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MINMERGEWIDTH: "2"

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MAXSIZEAMPPERCENT

The size amplification is defined as the amount (in percentage) of additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto 300 bytes of storage.

ROCKSDB_CF_COMPRESSIONSTYLE_UNIVERSAL_MAXSIZEAMPPERCENT: "200"

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO: "false"

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_ALLOWCOMPACTION

If true, try to do compaction to compact smaller files into larger ones. Minimum files to compact follows options.level0_file_num_compaction_trigger and compaction won't trigger if average compact bytes per del file is larger than options.write_buffer_size. This is to protect large files from being compacted again.

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_ALLOWCOMPACTION: "false"

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_MAXTABLEFILESIZE

Once the total sum of table files reaches this, we will delete the oldest table file

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_MAXTABLEFILESIZE: "1024"

ROCKSDB_CF_COMPRESSIONSTYLE_NONE

ROCKSDB_CF_COMPRESSIONSTYLE_FIFO_MAXTABLEFILESIZE: "1024"

ROCKSDB_CF_LEVEL0FILENUMCOMPACTIONTRIGGER

Number of files to trigger level-0 compaction. A value < 0 means that level-0 compaction will not be triggered by number of files at all.

ROCKSDB_CF_LEVEL0FILENUMCOMPACTIONTRIGGER: "0"

ROCKSDB_CF_LEVEL0SLOWDOWNWRITESTRIGGER

Soft limit on number of level-0 files. We start slowing down writes at this point. A value < 0 means that no writing slow down will be triggered by number of files in level-0.

ROCKSDB_CF_LEVEL0SLOWDOWNWRITESTRIGGER: "0"

ROCKSDB_CF_LEVEL0STOPWRITESTRIGGER

Soft limit on number of level-0 files. We start slowing down writes at this point. A value < 0 means that no writing slow down will be triggered by number of files in level-0.

ROCKSDB_CF_LEVEL0STOPWRITESTRIGGER: "0"

ROCKSDB_CF_MAX_WRITE_BUFFER_NUMBER

The total maximum number of write buffers to maintain in memory including copies of buffers that have already been flushed. Unlike AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber(), this parameter does not affect flushing. This controls the minimum amount of write history that will be available in memory for conflict checking when Transactions are used. When using an OptimisticTransactionDB: If this value is too low, some transactions may fail at commit time due to not being able to determine whether there were any write conflicts. When using a TransactionDB: If Transaction::SetSnapshot is used, TransactionDB will read either in-memory write buffers or SST files to do write-conflict checking. Increasing this value can reduce the number of reads to SST files done for conflict detection. Setting this value to 0 will cause write buffers to be freed immediately after they are flushed. If this value is set to -1, AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() will be used. Default: If using a TransactionDB/OptimisticTransactionDB, the default value will be set to the value of AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() if it is not explicitly set by the user. Otherwise, the default is 0.

ROCKSDB_CF_LEVEL0STOPWRITESTRIGGER: "0"

ROCKSDB_CF_MAX_WRITE_BUFFER_NUMBER_TO_MAINTAIN

The total maximum number of write buffers to maintain in memory including copies of buffers that have already been flushed. Unlike AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber(), this parameter does not affect flushing. This controls the minimum amount of write history that will be available in memory for conflict checking when Transactions are used. When using an OptimisticTransactionDB: If this value is too low, some transactions may fail at commit time due to not being able to determine whether there were any write conflicts. When using a TransactionDB: If Transaction::SetSnapshot is used, TransactionDB will read either in-memory write buffers or SST files to do write-conflict checking. Increasing this value can reduce the number of reads to SST files done for conflict detection. Setting this value to 0 will cause write buffers to be freed immediately after they are flushed. If this value is set to -1, AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() will be used. Default: If using a TransactionDB/OptimisticTransactionDB, the default value will be set to the value of AdvancedMutableColumnFamilyOptionsInterface.maxWriteBufferNumber() if it is not explicitly set by the user. Otherwise, the default is 0.

ROCKSDB_CF_MAX_WRITE_BUFFER_NUMBER_TO_MAINTAIN: "0"

ROCKSDB_CF_NUMLEVEL

Set the number of levels for this database If level-styled compaction is used, then this number determines the total number of levels.

ROCKSDB_CF_NUMLEVEL: "0"

ROCKSDB_CF_TARGETFILESIZEBASE

The target file size for compaction. This targetFileSizeBase determines a level-1 file size. Target file size for level L can be calculated by targetFileSizeBase * (targetFileSizeMultiplier ^ (L-1)) For example, if targetFileSizeBase is 2MB and target_file_size_multiplier is 10, then each file on level-1 will be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB.

ROCKSDB_CF_TARGETFILESIZEBASE: "0"

ROCKSDB_CF_MAXBYTESFORLEVELBASE

The upper-bound of the total size of level-1 files in bytes. Maximum number of bytes for level L can be calculated as (maxBytesForLevelBase) * (maxBytesForLevelMultiplier ^ (L-1)) For example, if maxBytesForLevelBase is 20MB, and if max_bytes_for_level_multiplier is 10, total data size for level-1 will be 200MB, total file size for level-2 will be 2GB, and total file size for level-3 will be 20GB.

ROCKSDB_CF_MAXBYTESFORLEVELBASE: "0"

ROCKSDB_CF_MULTIPLIER

The ratio between the total size of level-(L+1) files and the total size of level-L files for all L.

ROCKSDB_CF_MULTIPLIER: "0"

ROCKSDB_CF_TABLECONFIG_ENABLE

Enable tableconfig for columnfamily

ROCKSDB_CF_TABLECONFIG_ENABLE: "false"

ROCKSDB_CF_TABLECONFIG_BLOCKSIZE

Approximate size of user data packed per block. Note that the block size specified here corresponds to uncompressed data. The actual size of the unit read from disk may be smaller if compression is enabled. This parameter can be changed dynamically.

ROCKSDB_CF_TABLECONFIG_BLOCKSIZE: "4000"

ROCKSDB_CF_TABLECONFIG_CACHEINDEXANDFILTERBLOCKS

Indicating if we'd put index/filter blocks to the block cache. If not specified, each "table reader" object will pre-load index/filter block during table initialization.

ROCKSDB_CF_TABLECONFIG_CACHEINDEXANDFILTERBLOCKS: "false"

ROCKSDB_CF_TABLECONFIG_FORMATVERSION

We currently have five versions:

#0 - This version is currently written out by all RocksDB's versions by default. Can be read by really old RocksDB's. Doesn't support changing checksum (default is CRC32).

#1 - Can be read by RocksDB's versions since 3.0. Supports non-default checksum, like xxHash. It is written by RocksDB when BlockBasedTableOptions::checksum is something other than kCRC32c. (version 0 is silently upconverted)

#2 - Can be read by RocksDB's versions since 3.10. Changes the way we encode compressed blocks with LZ4, BZip2 and Zlib compression. If you don't plan to run RocksDB before version 3.10, you should probably use this.

#3 - Can be read by RocksDB's versions since 5.15. Changes the way we encode the keys in index blocks. If you don't plan to run RocksDB before version 5.15, you should probably use this. This option only affects newly written tables. When reading existing tables, the information about version is read from the footer.

#4 - Can be read by RocksDB's versions since 5.16. Changes the way we encode the values in index blocks. If you don't plan to run RocksDB before version 5.16 and you are using index_block_restart_interval > 1, you should probably use this as it would reduce the index size.

#This option only affects newly written tables. When reading existing tables, the information about version is read from the footer.

ROCKSDB_CF_TABLECONFIG_FORMATVERSION: "0"

ROCKSDB_CF_TABLECONFIG_PINL0FILTERANDINDEXBLOCKSINCACHE

Indicating if we'd like to pin L0 index/filter blocks to the block cache. If not specified, defaults to false.

ROCKSDB_CF_TABLECONFIG_PINL0FILTERANDINDEXBLOCKSINCACHE: "false"

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KHASHSEARCH

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KBINARYSEARCH

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KTWOLEVELINDEXSEARCH

Sets the index type to used with this table.

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KHASHSEARCH: "false"

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KBINARYSEARCH: "false"

ROCKSDB_CF_TABLECONFIG_INDEXTYPE_KTWOLEVELINDEXSEARCH: "false"

Google BigQuery

Google BigQuery provides fine-grained access control on BigQuery Datasets. This includes

  • Table-level Access Control

  • Column-level Access Control

  • Native/Dynamic secure view-based Row Filter

  • Masking With Dynamic secure views created using PolicySync

Privacera access control for BigQuery relies on the privacera_bigquery connector service.

Prerequisites
Create PrivaceraPolicySyncRole IAM Role

You need to give Privacera PolicySync basic access to GCP. To grant that access, create PrivaceraPolicySyncRole IAM role in your GCP project or GCP organization using the following commands on Google Cloud's shell (gcloud). The shell can be installed and accessed locally or through Google Console.

Run the following command to create the file containing the permissions required for the PrivaceraPolicySyncRole role:

ROLE_NAME="PrivaceraPolicySyncRole"

cat << EOF > ${ROLE_NAME}.yaml
title: "${ROLE_NAME}"
description: "${ROLE_NAME}"
stage: "ALPHA"
includedPermissions:
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
- resourcemanager.projects.setIamPolicy
- iam.roles.list
- iam.roles.get
- iam.roles.create
- iam.roles.update
- bigquery.jobs.create
- bigquery.datasets.get
- bigquery.datasets.create
- bigquery.datasets.update
- bigquery.datasets.delete
- bigquery.datasets.getIamPolicy
- bigquery.datasets.setIamPolicy
- bigquery.tables.list
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.create
- bigquery.tables.update
- bigquery.tables.delete
- bigquery.tables.getIamPolicy
- bigquery.tables.setIamPolicy
- bigquery.rowAccessPolicies.list
- bigquery.rowAccessPolicies.create
- bigquery.rowAccessPolicies.update
- bigquery.rowAccessPolicies.delete
- bigquery.rowAccessPolicies.getIamPolicy
- bigquery.rowAccessPolicies.setIamPolicy

EOF
GCP Project-level access

Note

If you have multiple projects in your GCP organization and want them to be managed by a single BigQuery connector, then repeat the steps below for each project. Assign the role to the same service account which will be used across multiple projects.

  1. Run the following command. Replace <GCP_PROJECT_ID> with your GCP project ID.

    PROJECT_ID="<GCP_PROJECT_ID>"
              
  2. To create PrivaceraPolicySyncRole role in your GCP project, run the following command.

    gcloud iam roles create ${ROLE_NAME} --project=${PROJECT_ID} --file=${ROLE_NAME}.yaml         
GCP Organization-level access
  1. Run the following command. Replace <GCP_ORGANIZATION_ID> with your GCP organization ID.

    ORGANIZATION_ID="<GCP_ORGANIZATION_ID>"                                                                     
  2. To create PrivaceraPolicySyncRole role in your GCP organization, run the following command.

    gcloud iam roles create ${ROLE_NAME} --organization=${ORGANIZATION_ID} --file=${ROLE_NAME}.yaml
Attach IAM Role to Service Account

To attach the PrivaceraPolicySyncRole IAM role created above, do the following steps:

  1. Log in to your GCP console.

  2. Select IAM & admin > Service accounts and click + CREATE SERVICE ACCOUNT.

  3. Enter values in the fields and click CREATE.

  4. In Grant this service account access to project, select the role as PrivaceraPolicySyncRole.

  5. On the Services Account Page, find the newly created service account and copy the email address of the new service account for use in a later step.

    Note

    This email will be the Service Account Email for configuring PolicySync in Privacera Manager.

  6. If you are using a Google VM machine to configure GBQ for PolicySync, then you can attach the service account created above to your VM machine and skip below steps.

  7. On the Services Account Page, go to the Keys tab and click Add Key and select Create New Key.

  8. Select the JSON key type, and click CREATE. A JSON key file downloads to your system. Store the file at an accessible location. It will be used for configuring PolicySync in Privacera Manager.

Refer the Google documentation for a detailed information on creating a service account.

Configure Logs for Auditing

A sink is required to collect all the logs from GBQ. To create a sink, do the following steps:

  1. In the search bar, search for Logging, and then click Logs Router, and click Create Sink.

  2. Enter the sink name as PolicySyncBigQueryAuditSink, and then click Next.

  3. Enter the sink destination.

    1. In the Select sink service, select BigQuery.

    2. In SelectBigQuerydataset, click Create newBigQuerydataset.

    3. Enter the Dataset ID as bigquery_audits and click Create Dataset.

    4. Click Next.

  4. Add theBigQuerylogs in the sink:

    In the Build an inclusion filter, add the following line:

    resource.type="bigquery_resource"
  5. Click Create Sink.

Refer the Google documentation for a detailed information on creating a sink.

CLI Configuration
  1. SSH to the instance where Privacera is installed.

  2. Do the following, if you’re not using VM attached credentials:

    1. Access the JSON file of the service account you downloaded using the steps above.

    2. Copy the JSON to the config/custom-vars folder.

  3. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.policysync.bigquery.yml config/custom-vars/
    vi config/custom-vars/vars.policysync.bigquery.yml
  4. Set the properties for your specific installation. For property details and descriptions, see the Configuration Properties section that follows.

    Notice

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see PolicySync properties.

  5. Run the update.

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
Configuration Properties

JDBC configuration properties

Table 42. JDBC configuration properties

Name

Type

Default

Required

Description

BIGQUERY_PROJECT_LOCATION

string

us

Yes

Specifies the geographical region where the taxonomy for the PolicySync should be created.

BIGQUERY_PROJECT_ID

string

Yes

Specifies the Google project ID where your Google BigQuery data source resides. For example: privacera-demo-project.

BIGQUERY_JDBC_URL

string

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443

No

Specifies the JDBC URL for the Google BigQuery connector.

BIGQUERY_USE_VM_CREDENTIALS

boolean

false

No

Specifies whether the PolicySync uses the service account attached to your virtual machine for the credentials to connect to the data source.

When set to true you do not need to specify a value for BIGQUERY_OAUTH_PRIVATE_KEY_PATH.

BIGQUERY_OAUTH_SERVICE_ACCOUNT_EMAIL

string

Yes

Specifies the service account email address that PolicySync uses. You must specify this value if you are not using a Google Cloud Platform (GCP) virtual machine attached service account.

BIGQUERY_OAUTH_PRIVATE_KEY_PATH

string

/workdir/policysync/cust_conf/policysync-gbq-service-account.json

Yes

Specifies the path of the service account credentials JSON file that you downloaded from your Google Cloud Platform (GCP) account. You must specify this property if BIGQUERY_USE_VM_CREDENTIALS is set to false.

BIGQUERY_OAUTH_PRIVATE_KEY_FILE_NAME

string

Yes

Specifies the name of the JSON file that contains your Google Cloud Platform service account credentials. If specified, this value is combined with BIGQUERY_OAUTH_PRIVATE_KEY_PATH to provide a full path to the credentials JSON file. If you specify BIGQUERY_OAUTH_PRIVATE_KEY_PATH then you must specify this value as well.



Custom IAM roles

Table 43. Custom IAM roles

Name

Type

Default

Required

Description

BIGQUERY_CREATE_CUSTOM_IAM_ROLES

boolean

true

No

Specifies whether PolicySync automatically creates custom IAM roles in your Google Cloud Platform project or organization for fine-grained access control (FGAC). If set to false, you must create all required custom IAM roles manually in your GCP project or organization. The default value is true.

BIGQUERY_CUSTOM_IAM_ROLES_SCOPE

string

project

No

Specifies whether PolicySync creates and uses custom IAM roles at the project or organizational level in Google Cloud Platform (GCP). The following values are allowed:

  • project: Create and use custom IAM roles from each individual project level.

  • org: Create and use custom IAM roles at the organizational level.

BIGQUERY_ORGANIZATION_ID

string

No

Specifies the Google Cloud Platform (GCP) organizational ID. Specify this only if you configured PolicySync to use custom IAM roles at the organizational level.

BIGQUERY_CUSTOM_IAM_ROLES_NAME_MAPPING

string

No

Specifies a list of mappings between PolicySync custom IAM role names and your custom role names. Use the following format when specifying your custom role names:

<PRIVACERA_DEFAULT_ROLE_NAME_1>:<CUSTOM_ROLE_NAME_1>
<PRIVACERA_DEFAULT_ROLE_NAME_2>:<CUSTOM_ROLE_NAME_2>

The following is a list of the default custom role names:

  • PrivaceraGBQProjectListRole

  • PrivaceraGBQJobListRole

  • PrivaceraGBQJobListAllRole

  • PrivaceraGBQJobCreateRole

  • PrivaceraGBQJobGetRole

  • PrivaceraGBQJobUpdateRole

  • PrivaceraGBQJobDeleteRole

  • PrivaceraGBQDatasetCreateRole

  • PrivaceraGBQDatasetGetMetadataRole

  • PrivaceraGBQDatasetUpdateRole

  • PrivaceraGBQDatasetDeleteRole

  • PrivaceraGBQTableListRole

  • PrivaceraGBQTableCreateRole

  • PrivaceraGBQTableGetMetadataRole

  • PrivaceraGBQTableQueryRole

  • PrivaceraGBQTableExportRole

  • PrivaceraGBQTableUpdateMetadataRole

  • PrivaceraGBQTableUpdateRole

  • PrivaceraGBQTableSetCategoryRole

  • PrivaceraGBQTableDeleteRole

  • PrivaceraGBQTransferUpdateRole

  • PrivaceraGBQTransferGetRole



Load keys and intervals

Table 44. Load keys and intervals

Name

Type

Default

Required

Description

BIGQUERY_LOAD_RESOURCES_KEY

string

load_from_dataset_columns

No

Specifies how PolicySync loads resources from Google BigQuery. The following values are allowed:

  • load_md: Load resources from Google BigQuery with a top-down resources approach, that is, it first loads the project and then the dataset followed by tables and its columns.

  • load_from_dataset_columns: Load resources one by one for each resource type that is, it loads all projects first, then it loads all datasets in all projects, followed by all tables in all datasets and its columns. This mode is recommended since it is faster than the load mode.

BIGQUERY_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

BIGQUERY_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

BIGQUERY_PERMISSION_SYNC_INTERVAL

integer

540

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

BIGQUERY_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Resources management

Table 45. Resources management

Name

Type

Default

Required

Description

BIGQUERY_MANAGE_PROJECT_LIST

string

Yes

Specifies a comma-separated list of project names to which access control is managed by PolicySync. If unset, PolicySync manages all projects. If specified, use the following format. You can use wildcards. Names are case-sensitive.

The list of projects to ignore takes precedence over any projects specified by this setting.

An example list of projects might resemble the following: testproject1,testproject2,sales_project*.

BIGQUERY_MANAGE_DATASET_LIST

string

Yes

Specifies a list of comma-separated datasets that PolicySync manages access control to. You can use wildcards in the value. Names are case-sensitive. If you want to manage all datasets, do not set a value. For example:

testproject1.dataset1,testproject2.dataset2,sales_project*.sales*

You can configure the postfix by specifying BIGQUERY_SECURE_VIEW_DATASET_NAME_POSTFIX.

If specified, the BIGQUERY_IGNORE_DATASET_LIST setting takes precedence over this setting.

BIGQUERY_MANAGE_TABLE_LIST

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards.

Use the following format when specifying a table:

<PROJECT_NAME>.<DATASET_NAME>.<TABLE_NAME>

If specified, BIGQUERY_IGNORE_TABLE_LIST takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<PROJECT_NAME>.<DATASET_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all datasets.

  • If set to none no datasets are managed.

BIGQUERY_IGNORE_PROJECT_LIST

string

No

Specifies a comma-separated list of project names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all projects are subject to access control.

For example: testproject1,testproject2,sales_project*.

This setting supersedes any values specified by BIGQUERY_MANAGE_PROJECT_LIST.

BIGQUERY_IGNORE_DATASET_LIST

string

No

Specifies a comma-separated list of dataset names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all datasets are subject to access control.

For example: testproject1.dataset1,testproject2.dataset2,sales_project*.sales*.

This setting supersedes any values specified by BIGQUERY_MANAGE_DATASET_LIST.

BIGQUERY_IGNORE_TABLE_LIST

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all tables are subject to access control. Specify tables using the following format:

<PROJECT_NAME>.<DATASET_NAME>.<TABLE_NAME>

This setting supersedes any values specified by BIGQUERY_MANAGE_TABLE_LIST.



Users, groups, and roles management

Table 46. Users, groups, and roles management

Name

Type

Default

Required

Description

BIGQUERY_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the BIGQUERY_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

BIGQUERY_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the BIGQUERY_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

BIGQUERY_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the BIGQUERY_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

BIGQUERY_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the BIGQUERY_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

BIGQUERY_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, BIGQUERY_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

BIGQUERY_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, BIGQUERY_IGNORE_GROUP_LIST takes precedence over this setting.

BIGQUERY_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by BIGQUERY_MANAGE_USER_LIST.

BIGQUERY_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by BIGQUERY_MANAGE_GROUP_LIST.

BIGQUERY_NATIVE_PUBLIC_GROUP_IDENTITY_NAME

string

Yes

Set this property to your preferred value, policysync uses this native public group for access grants whenever there is policy created referring to public group inside it. The following values are allowed:

  • ALL_AUTHENTICATED_USERS: All gcp project authenticated users.

  • ALL_USERS: All google authenticated users.

BIGQUERY_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by BIGQUERY_MANAGE_GROUP_LIST. The default value is false.



Access control management

Table 47. Access control management

Name

Type

Default

Required

Description

BIGQUERY_COLUMN_ACCESS_CONTROL_TYPE

string

view

No

Specifies how PolicySync manages column-level access control. The following values are allowed:

  • view: Use view-based column level access control. Any columns that a user cannot access appears as null in the secure view of the table or the secure view of the native view.

BIGQUERY_POLICY_NAME_SEPARATOR

string

_

No

Specifies a string to use as part of the name of native row filter and masking policies.

BIGQUERY_ROW_FILTER_POLICY_NAME_TEMPLATE

string

row_filter_item_

No

Specifies a template for the name that PolicySync uses when creating a row filter policy. For example, given a table data from the ds dataset that resides in the proj project, the row filter policy name might resemble the following:

proj_priv_ds_priv_data_<ROW_FILTER_ITEM_NUMBER>

BIGQUERY_ENABLE_ROW_FILTER

boolean

false

No

Specifies whether to use the data source native row filter functionality. This setting is disabled by default. When enabled, you can create row filters only on tables, but not on views.

BIGQUERY_ENABLE_VIEW_BASED_MASKING

boolean

true

No

Specifies whether to use secure view based masking. The default value is true.

BIGQUERY_ENABLE_VIEW_BASED_ROW_FILTER

boolean

true

No

Specifies whether to use secure view based row filtering. The default value is true.

While Google BigQuery supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

BIGQUERY_SECURE_VIEW_CREATE_FOR_ALL

boolean

true

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.



Access audits management

Table 48. Access audits management

Name

Type

Default

Required

Description

BIGQUERY_MASKING_FUNCTIONS_DATASET

string

privacera_dataset

No

Specifies the name of the dataset where PolicySync creates custom masking functions.

BIGQUERY_MASKED_NUMBER_VALUE

integer

0

No

Specifies the masking value used for numeric data types.

BIGQUERY_MASKED_TEXT_VALUE

string

<MASKED>

No

Specifies the masking value used for text or string data types.

BIGQUERY_SECURE_VIEW_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

BIGQUERY_SECURE_VIEW_NAME_POSTFIX

string

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

BIGQUERY_SECURE_VIEW_DATASET_NAME_PREFIX

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a dataset named example1 is dev_example1.

BIGQUERY_SECURE_VIEW_DATASET_NAME_POSTFIX

string

_secure

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a dataset named example1 is example1_dev.

BIGQUERY_SECURE_VIEW_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

BIGQUERY_SECURE_VIEW_DATASET_NAME_REMOVE_SUFFIX_LIST

string

No

Specifies a suffix to remove from a secure view dataset name. For example, if the dataset is named some_name_ds you can remove the _ds string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes, such as _raw,_qa,_prod.

BIGQUERY_AUTHORIZED_VIEW_ACL_UPDATER_INTERVAL

integer

10

No

Specifies the interval at which the authorized view ACLs updater thread updates the permissions in the dataset if any permission updates are pending.

BIGQUERY_GRANT_UPDATES

boolean

true

Yes

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.

BIGQUERY_GRANT_UPDATES_MAX_RETRY_ATTEMPTS

integer

2

No

Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is 2.

BIGQUERY_GRANT_UPDATES_BATCH

boolean

true

No

Specifies whether PolicySync applies grants and revokes in batches. If enabled, this behavior improves overall performance of applying permission changes.

BIGQUERY_ENABLE_DATA_ADMIN

boolean

true

No

This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.

BIGQUERY_AUDIT_ENABLE

boolean

false

Yes

Specifies whether Privacera fetches access audit data from the data source.

BIGQUERY_AUDIT_EXCLUDED_USERS

string

No

Specifies a comma separated list of users to exclude when fetching access audits. For example: "user1,user2,user3".

BIGQUERY_AUDIT_PROJECT_ID

string

No

Specifies the project ID where Google BigQuery stores audit log data.

BIGQUERY_AUDIT_DATASET_NAME

string

No

Specifies the name of the dataset where Google BigQuery logs audit data. Privacera uses this data for running audit queries.

BIGQUERY_AUDIT_LOAD_MAX_INTERVAL_MINUTES

integer

30

No

Specifies the maximum interval, in minutes, of the time window that SQL queries use to retrieve access audit information. If there are a large number of audits records, narrowing the window interval improves performance.

For example, if the interval is set to 30, SQL queries similar to the following are executed:

SELECT * FROM audits where time_from=00:01 and time_to=00:30;
SELECT * FROM audits where time_from=00:31 and time_to=01:00;
SELECT * FROM audits where time_from=01:01 and time_to=01:30;



Validation

Let's test the access control usingBigQueryby defining some test policies for a test user (emily).

  1. Prepare test data in GCP. Refer Google documentation for a detailed information on running queries inBigQuery- Click here.

    1. Log in to GCP console, navigate to BigQuery and then click Compose New Query.

    2. In the Query editor text area, copy the following query:

      -- Create Dataset
      CREATE SCHEMA customer_dataset;
      
      -- Create Table
      CREATE TABLE IF NOT EXISTS customer_dataset.customer_data (
      id INT64, 
      person_name string,
      domain string,
      ssn string,
      country string,
      us_phone string, 
      address string, 
      account_id string, 
      zipcode string);
      
      -- Insert Data into table
      insert into customer_dataset.customer_data values (1, 'Nancy','nancy@yahoo.com','201-99-5532','US','856-232-9702','939 Park Avenue','159635478','33317');
      insert into customer_dataset.customer_data values (2,'Gene','gene@google.us','202-99-5532','UK','954-583-0575','303 Johnston Blvd','236854569','95202');
      insert into customer_dataset.customer_data values (3,'Edward','edward@facebook.com','203-99-5532','US','209-626-9041','130 Hollister','365412985','60173');
      insert into customer_dataset.customer_data values (4,'Pearlene','pearlene@gmail.com','204-99-5532','US','708-471-6810','17 Warren Rd','452189732','90017');
      insert into customer_dataset.customer_data values (5,'James','james@cuvox.de','205-99-5532','US','661-338-6787','898 Newport Gray Rd','517836427','94041');
      insert into customer_dataset.customer_data values (6,'Pamela','pamela@cuvox.de','206-99-5532','UK','650-526-5259','861 Strick Rd','685231473','80214');
      insert into customer_dataset.customer_data values (7,'Donna','donna@fleckens.hu','207-99-5532','US','303-239-4282','1784 S Shore Dr','789563258','1730');
      insert into customer_dataset.customer_data values (8,'Amy','amy@gustr.com','208-99-5532','US','774-553-4736','9522 Apple Valley Dr','854126945','55102');
      insert into customer_dataset.customer_data values (9,'Adam','adam@teleworm.us','209-99-5532','UK','651-297-1448','745 Old Springville Rd','965412381','43201');
      insert into customer_dataset.customer_data values (10,'Lucille','lucille@armyspy.com','210-99-5532','US','740-320-1270','4223  Midway Road','785651236','89102');
      insert into customer_dataset.customer_data values (11,'Edard','edu@gustr.com','211-99-5532','UK','702-257-8796','3659  Dye Street','965121354','53207');
      insert into customer_dataset.customer_data values (12,'Nick','nick@jourrapide.com','212-99-5532','US','414-483-8638','2966  Nutters Barn Lane','563515264','72764');
      insert into customer_dataset.customer_data values (13,'Brian','brian@einrot.com','213-99-5532','US','479-872-9783','3300  Worthington Drive','654621233','91303');
      insert into customer_dataset.customer_data values (14,'Stella','stella@jourrapide.com','214-99-5532','US','818-596-6681','1893  Ingram Road','261613654','35816');
      insert into customer_dataset.customer_data values (15,'Leona','leona@dayrep.com','215-99-5532','UK','256-250-5413','4244  Burnside Court','986513211','75069');
      
      -- Verify table by running select query
      SELECT * FROM customer_dataset.customer_data;
    3. Click Run.

  2. Create test user in Privacera Portal as emily. For more information see User Management.

    bigquery_portal_user.jpg
  3. In GCP console, the user emily gets added after you create the user in step 2.

    1. Check the user emily can list the projects inside your organization.

    2. Check user emily does not have Full Admin or Full Reader access on BigQuery.

  4. Create a policy for emily to run queries and list dataset and tables.

    bigquery_portal_policy_list_datasets_tables.jpg
  5. Check the access control on the test data in GCP.

    A) Table-level Access Control

    1. In Privacera Portal, create a policy Customer Data Full Access for accessing table.

      bigquery_portal_policy_table_access.jpg
    2. Log in to GCP console with credentials of the test user emily.

    3. Navigate to BigQuery.

    4. Run the following query.

      SELECT * FROM customer_dataset.customer_data;
      SELECT * FROM customer_dataset_secure.customer_data;

      User emily can access and view the data.

    5. In Privacera Portal, disable the policy.

    6. In GCP, run the following query.

      SELECT * FROM customer_dataset.customer_data;
      SELECT * FROM customer_dataset_secure.customer_data;

      User emily can not access and view the data.

    B) View-based Column-level Access Control

    In the view-based column-level access control, you have to create a column-level policy on the table. The columns which are not permitted from that policy will be shown as NULL in the secure view of the table.

    1. In Privacera Portal, do the following:

      Create a policy Customer Data Column Level Access granting access to a few columns.

      bigquery_portal_policy_column_secure_view_access.jpg
    2. Log in to GCP console with credentials of the test user emily.

    3. Navigate to BigQuery.

    4. Run the following queries.

      User emily will not see person_name column in the secure view of customer_data table..

      SELECT * FROM customer_dataset_secure.customer_data;
    5. In Privacera Portal, disable the Customer Data Column Level Access policy.

    C) View-based Row-level Filter

    1. In Privacera Portal, enable Customer Data Full Access policy created above.

      emily can view customer_data for US and UK from table and in secure view.

    2. Log in to GCP console with credentials of the test user emily.

    3. Navigate to BigQuery.

    4. Use the below query to view data from customer data secure view. It will show data from countries US and UK.

      SELECT * FROM customer_dataset_secure.customer_data;

      User emily can access and view the data.

    5. In Privacera Portal, create a Customer Access by Country policy to access data only from UK.

      bigquery_portal_policy_row_level_access.jpg
    6. Use the below query to view data from customer data secure view. The row filter policy gets applied showing data from country UK.

      SELECT * FROM customer_dataset_secure.customer_data;

    D) View-based Masking

    1. Log in to GCP console with credentials of the test user emily.

    2. Navigate to BigQuery.

    3. Run the following query.

      SELECT * FROM customer_dataset_secure.customer_data;

      User emily can view the SSN values.

    4. In Privacera Portal, create a Mask SSN policy to mask SSN values for emily.

      bigquery_portal_policy_mask_ssn.jpg
    5. Run the following query.

      SELECT * FROM customer_dataset_secure.customer_data;

      User emily cannot view the the SSN values, since it is masked using md5 encryption.

Create custom IAM roles

By default, PolicySync creates all the IAM roles listed in the table below automatically to perform access control in GBQ. If you want to create the custom IAM roles manually, then disable the BIGQUERY_CREATE_CUSTOM_IAM_ROLES by setting its value to false.

In the GCP console, you need the map the roles with its appropriate permissions. The role and its permissions mapping are given in the table below. For creating a custom role in GCP and adding its corresponding permissions, see Google documentation.

Role Name

GCP Permissions

PrivaceraGBQProjectListRole

resourcemanager.projects.get

PrivaceraGBQJobListRole

bigquery.jobs.list

PrivaceraGBQJobListAllRole

bigquery.jobs.listAll

PrivaceraGBQJobCreateRole

bigquery.jobs.create

PrivaceraGBQJobGetRole

bigquery.jobs.get

PrivaceraGBQJobUpdateRole

bigquery.jobs.update

PrivaceraGBQJobDeleteRole

bigquery.jobs.delete

PrivaceraGBQDatasetCreateRole

bigquery.datasets.create

PrivaceraGBQDatasetGetMetadataRole

bigquery.datasets.get

PrivaceraGBQDatasetUpdateRole

bigquery.datasets.update

PrivaceraGBQDatasetDeleteRole

bigquery.datasets.delete

PrivaceraGBQTableListRole

bigquery.tables.list

PrivaceraGBQTableCreateRole

bigquery.tables.create

PrivaceraGBQTableGetMetadataRole

bigquery.tables.get

PrivaceraGBQTableQueryRole

bigquery.tables.getData

PrivaceraGBQTableExportRole

bigquery.tables.export

PrivaceraGBQTableUpdateMetadataRole

bigquery.tables.update

PrivaceraGBQTableUpdateRole

bigquery.tables.updateData

PrivaceraGBQTableSetCategoryRole

bigquery.tables.setCategory

PrivaceraGBQTableDeleteRole

bigquery.tables.delete

PrivaceraGBQTransferUpdateRole

bigquery.transfers.update

PrivaceraGBQTransferGetRole

bigquery.transfers.get

Based on the GCP Resource hierarchy, you can create the roles above at the organization or project level. If you have multiple projects in your GCP organization, it is recommended to create all the roles at the organization level instead at the project level. Once the roles are defined at the organization level, all the projects under that organization will inherit all the roles.

Power BI

This section covers how to enable configure Privacera Power BI connector for workspace fine-grained access-control on Power BI running in Azure. You can set permissions in a Privacera policy depending on the workspace roles: Admin, Member, Contributor, Viewer. Only users and groups from the Azure Active Directory are allowed in Azure Power BI.

Prerequisites

Ensure that the following prerequisites are met:

  1. Create a service principal and application secret for the Power BI, and get the following information from Azure Portal. For more information, refer the Microsoft Azure documentation - click here.

    • Application (client) ID

    • Directory (tenant) ID

    • Client Secret

  2. Create a group to assign your created Power BI application to it. This is required because the Power BI Admin API allows only the service principal to be an Azure AD Group. For more information, refer the Microsoft Azure documentation - click here.

    Follow the steps in the link given above, and configure the following to create a group and add Power BI as a member:

    1. On the New Group dialog, select security in the Group type, and then add the required group details.

    2. Click Create.

    3. On the +Add members dialog, select your Power BI application.

  3. Configure Power BI Tenant to allow Power BI service principals to read the REST API. For more information, refer the Microsoft Azure documentation - click here.

    Follow the steps in the link given above and configure the following:

    1. In the Developer settings, enable Allow service principals to use Power BI APIs.

    2. Select Specific security groups (Recommended), and then add the Power BI group you created above.

    3. In the Admin API Settings, enable Allow service principals to use read-only Power BI admin APIs (Preview). For more information, refer the Microsoft Azure documentation - click here.

    4. Select Specific security groups, and then add the Power BI group you created above.

  4. Enable Privacera UserSync for AAD to pull groups attribute ID. For more details, refer to the topic Azure Active Directory - Data Access User Synchronization.

CLI Configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following command.

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.policysync.powerbi.yml custom-vars/
    vi custom-vars/vars.policysync.powerbi.yml
  3. Set the properties for your specific installation. For property details and description, see the Configuration Properties section that follows.

    Note

    Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see Power BI Connector.

  4. Run the following command:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
Configuration Properties

Connection configuration related properties

Table 49. Connection configuration related properties

Name

Type

Default

Required

Description

POWER_BI_USERNAME

string

Yes

Specifies the authentication username. If you do not specify this value, you must specify a secret for POWER_BI_CLIENT_SECRET.

POWER_BI_PASSWORD

string

Yes

Specifies the authentication password. If you do not specify this value, you must specify a secret for POWER_BI_CLIENT_SECRET.

POWER_BI_TENANT_ID

string

Yes

Specifies the tenant ID associated with your Microsoft Azure account.

POWER_BI_CLIENT_ID

string

Yes

Specifies the principal ID for authentication.

POWER_BI_CLIENT_SECRET

string

Yes

Specifies a client secret for authentication.

If you do not specify this value, you must specify both POWER_BI_USERNAME and POWER_BI_PASSWORD.



Load keys and intervals

Table 50. Load keys and intervals

Name

Type

Default

Required

Description

POWER_BI_RESOURCE_SYNC_INTERVAL

integer

60

No

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

POWER_BI_PRINCIPAL_SYNC_INTERVAL

integer

420

No

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

POWER_BI_PERMISSION_SYNC_INTERVAL

integer

540

No

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

POWER_BI_AUDIT_SYNC_INTERVAL

integer

30

No

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.



Resources management

Table 51. Resources management

Name

Type

Default

Required

Description

POWER_BI_MANAGE_WORKSPACE_LIST

string

No

Specifies a comma-separated list of workspace names for which PolicySync manages access control. If unset, access control is managed for all workspaces. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of workspaces might resemble the following: demo1,demo2,sales*.

If specified, POWER_BI_IGNORE_WORKSPACE_LIST takes precedence over this setting.

POWER_BI_IGNORE_WORKSPACE_LIST

string

No

Specifies a comma-separated list of workspace names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all workspaces are subject to access control.

This setting supersedes any values specified by POWER_BI_MANAGE_WORKSPACE_LIST.



Users/Groups/Roles management

Table 52. Users/Groups/Roles management

Name

Type

Default

Required

Description

POWER_BI_USER_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the POWER_BI_USER_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

POWER_BI_USER_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the POWER_BI_USER_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

POWER_BI_GROUP_NAME_REPLACE_FROM_REGEX

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

No

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the POWER_BI_GROUP_NAME_REPLACE_TO_STRING setting.

If not specified, no find and replace operation is performed.

POWER_BI_GROUP_NAME_REPLACE_TO_STRING

string

_

No

Specifies a string to replace the characters matched by the regex specified by the POWER_BI_GROUP_NAME_REPLACE_FROM_REGEX setting.

If not specified, no find and replace operation is performed.

POWER_BI_USER_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts user names to lowercase when creating local users. If set to true, case sensitivity is preserved.

POWER_BI_GROUP_NAME_PERSIST_CASE_SENSITIVITY

boolean

false

No

Specifies whether PolicySync converts group names to lowercase when creating local groups. If set to true, case sensitivity is preserved.

POWER_BI_MANAGE_USER_LIST

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, POWER_BI_IGNORE_USER_LIST takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

POWER_BI_MANAGE_GROUP_LIST

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, POWER_BI_IGNORE_GROUP_LIST takes precedence over this setting.

POWER_BI_IGNORE_USER_LIST

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by POWER_BI_MANAGE_USER_LIST.

POWER_BI_IGNORE_GROUP_LIST

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by POWER_BI_MANAGE_GROUP_LIST.

POWER_BI_USER_FILTER_WITH_EMAIL

boolean

false

No

Set this property to true if you only want to manage users who have an email address associated with them in the portal.

POWER_BI_MANAGE_USER_FILTERBY_GROUP

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by POWER_BI_MANAGE_GROUP_LIST. The default value is false.



Access control management

Table 53. Access control management

Name

Type

Default

Required

Description

POWER_BI_GRANT_UPDATES

boolean

true

No

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.



Access audits management

Table 54. Access audits management

Name

Type

Default

Required

Description

POWER_BI_AUDIT_ENABLE

boolean

false

Yes

Specifies whether Privacera fetches access audit data from the data source.

POWER_BI_AUDIT_INITIAL_PULL_MINUTES

integer

30

No

Specifies the initial delay, in minutes, before PolicySync retrieves access audits from Microsoft Power BI.



Limitations
  • The role in a resource policy of Access Management is not supported.

  • Only AAD users/groups are supported in a resource policy of Access Management. The Local users/groups (created manually in Access Management) is not supported.

UserSync
Privacera UserSync
Privacera Data Access User Synchronization

Learn how you can synchronize users and groups from different connectors.

LDAP
  1. Run the following command to enable Privacera UserSync:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.yml config/custom-vars/
  2. Enable the LDAP connector:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.ldap.yml config/custom-vars/ 
    vi config/custom-vars/vars.privacera-usersync.ldap.yml

    Edit the following properties:

    Property

    Description

    Example

    A) LDAP Connector Info

    LDAP_CONNECTOR

    Name of the connector.

    ad

    LDAP_ENABLED

    Enabled status of connector: true or false

    true

    LDAP_SERVICE_TYPE

    Set a service type: ldap or ad

    ad

    LDAP_DATASOURCE_NAME

    Name of the datasource: ldap or ad

    ad

    LDAP_URL

    URL of source LDAP.

    ldap://example.us:389

    LDAP_BIND_DN

    Property is used to connect to LDAP and then query for users and groups.

    CN=Example User,OU=sales,DC=ad,DC=sales,DC=us

    LDAP_BIND_PASSWORD

    LDAP bind password for the bind DN specified above.

    LDAP_AUTH_TYPE

    Authentication type, the default is simple

    simple

    LDAP_REFERRAL

    Set the LDAP context referral: ignore or follow.

    Default value is follow.

    follow

    LDAP_SYNC_INTERVAL

    Frequency of usersync pulls and audit records in seconds. Default value is 3600, minimum value is 300.

    3600

    B) Enable SSL for LDAP Server

    Note

    Support Chain SSL - Preview Functionality

    Previously Privacera services were only using one SSL certificate of LDAP server even if a chain of certificates was available. Now as a Preview functionality, all the certificates which are available in the chain certificate are imported it into the truststore. This is added for Privacera usersync, Ranger usersync and portal SSL certificates.

    PRIVACERA_USERSYNC_SYNC_LDAP_SSL_ENABLED

    Set this property to enable/disable SSL for Privacera Usersync.

    true

    PRIVACERA_USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS

    Set this property if you want Privacera Manager to generate a truststore for your SSL-enabled LDAP server.

    true

    PRIVACERA_USERSYNC_AUTH_SSL_ENABLED

    Set this property if the other Privacera services are not SSL enabled and you are using SSL-enabled LDAP server.

    true

    C) LDAP Search

    LDAP_SEARCH_GROUP_FIRST

    Property to enable to search for groups first, before searching for users.

    true

    LDAP_SEARCH_BASE

    Search base for users and groups.

    DC=ad,DC=sales,DC=us

    LDAP_SEARCH_USER_BASE

    Search base for users.

    ou=example,dc=ad,dc=sales,dc=us

    LDAP_SEARCH_USER_SCOPE

    Set the value for search scope for the users: base, one or sub.

    Default value is sub.

    sub

    LDAP_SEARCH_USER_FILTER

    Optional additional filter constraining the users selected for syncing.

    LDAP_SEARCH_USER_GROUPONLY

    Boolean to only load users in groups.

    false

    LDAP_ATTRIBUTE_ONLY

    Sync only the attributes of users already synced from other services.

    false

    LDAP_SEARCH_INCREMENTAL_ENABLED

    Enable incremental search. Syncing changes only since last search.

    false

    LDAP_PAGED_RESULTS_ENABLED

    Enable paged results control for LDAP Searches. Default is true.

    true

    LDAP_PAGED_CONTROL_CRITICAL

    Set paged results control criticality to CRITICAL. Default is true.

    true

    LDAP_SEARCH_GROUP_BASE

    Search base for groups.

    ou=example,dc=ad,dc=sales,dc=us

    LDAP_SEARCH_GROUP_SCOPE

    Set the value for search scope for the groups: base, one or sub.

    Default value is sub.

    sub

    LDAP_SEARCH_GROUP_FILTER

    Optional additional filter constraining the groups selected for syncing.

    LDAP_SEARCH_CYCLES_BETWEEN_DELETED_DETECTION

    Numeric number of cycles between deleted searches. Default value is 6.

    6

    LDAP_SEARCH_DETECT_DELETED_USERS_GROUPS

    Enables both user and group deleted searches. Default is false.

    false

    LDAP_SEARCH_DETECT_DELETED_USERS

    Override setting for user deleted search. Default value is LDAP_SEARCH_DETECT_DELETED_USERS_GROUPS.

    LDAP_SEARCH_DETECT_DELETED_USERS_GROUPS

    LDAP_SEARCH_DETECT_DELETED_GROUPS

    Override setting for group deleted search. Default value is LDAP_SEARCH_DETECT_DELETED_USERS_GROUPS.

    LDAP_SEARCH_DETECT_DELETED_USERS_GROUPS

    D) LDAP Manage/Ignore List of Users/Groups

    LDAP_MANAGE_USER_LIST

    List of users to manage from sync results. If this list is defined, all users not on this list will be ignored.

    LDAP_IGNORE_USER_LIST

    List of users to ignore from sync results.

    LDAP_MANAGE_GROUP_LIST

    List of groups to manage from sync results. If this list is defined, all groups not on this list will be ignored.

    LDAP_IGNORE_GROUP_LIST

    List of groups to ignore from sync results.

    E) LDAP Object Users/Groups Class

    LDAP_OBJECT_USER_CLASS

    Objectclass to identify user entries.

    user

    LDAP_OBJECT_GROUP_CLASS

    Objectclass to identify group entries.

    group

    F) LDAP User/Group Attributes

    LDAP_ATTRIBUTE_USERNAME

    Attribute from user entry that would be treated as user name.

    SAMAccountName

    LDAP_ATTRIBUTE_FIRSTNAME

    Attribute of a user’s first name. The default is givenName.

    givenName

    LDAP_ATTRIBUTE_LASTNAME

    Attribute of a user’s last name.

    LDAP_ATTRIBUTE_EMAIL

    Attribute from user entry that would be treated as email address.

    mail

    LDAP_ATTRIBUTE_GROUPNAMES

    List of attributes from group entry that would be treated as group name.

    LDAP_ATTRIBUTE_GROUPNAME

    Attribute from group entry that would be treated as group name.

    name

    LDAP_ATTRIBUTE_GROUP_MEMBER

    Attribute from group entry that is list of members.

    member

    G) Username/Group name Attribute Modification

    LDAP_ATTRIBUTE_USERNAME_VALUE_EXTRACTFROMEMAIL

    Extract username from an email address. (e.g. username@domain.com -> username) Default is false.

    false

    LDAP_ATTRIBUTE_USERNAME_VALUE_PREFIX

    Prefix to prepend to the username. Default is blank.

    LDAP_ATTRIBUTE_USERNAME_VALUE_POSTFIX

    Postfix to append pend to the username. Default is blank.

    LDAP_ATTRIBUTE_USERNAME_VALUE_TOLOWER

    Convert the username to lowercase. Default is false.

    false

    LDAP_ATTRIBUTE_USERNAME_VALUE_TOUPPER

    Convert the username to uppercase. Default is false.

    false

    LDAP_ATTRIBUTE_USERNAME_VALUE_REGEX

    Attribute to replace username to matching regex. Default is blank.

    LDAP_ATTRIBUTE_GROUPNAME_VALUE_EXTRACTFROMEMAIL

    Extract the group name from an email address. Default is false.

    false

    LDAP_ATTRIBUTE_GROUPNAME_VALUE_PREFIX

    Prefix to prepend to the group's name. Default is blank.

    LDAP_ATTRIBUTE_GROUPNAME_VALUE_POSTFIX

    Postfix to append pend to the group's name. Default is blank.

    LDAP_ATTRIBUTE_GROUPNAME_VALUE_TOLOWER

    Convert the name to group's name to lower case. Default is false.

    false

    LDAP_ATTRIBUTE_GROUPNAME_VALUE_TOUPPER

    Convert the group's name to uppercase. Default is false.

    false

    LDAP_ATTRIBUTE_GROUPNAME_VALUE_REGEX

    Attribute to replace the group's name to matching regex. Default is blank.

    H) Group Attribute Configuration

    LDAP_GROUP_ATTRIBUTE_LIST

    The list of attribute keys to get from synced groups.

    LDAP_GROUP_ATTRIBUTE_VALUE_PREFIX

    Append prefix to values of group attributes such as group name.

    LDAP_GROUP_ATTRIBUTE_KEY_PREFIX

    Append prefix to key of group attributes such as group name.

    LDAP_GROUP_LEVELS

    Configure Privacera usersync with AD/LDAP nested group membership.

  3. Run the following command:

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
LDAP/AD deleted entity detection

When enabled, LDAP/AD deleted entity detection will perform a soft delete of users or groups in Privacera Portal. A soft delete removes all memberships of the group/user and marks them as “hidden”. Hidden users will not appear in auto completion when modifying access policies. References to users/groups in policies will remain, until manually removed or the user/group is fully deleted from Privacera Portal. Hidden users can be fully deleted by using the Privacera Portal UI or REST APIs.

Properties:

  • Boolean: usersync.connector.0.search.deleted.group.enabled (default: false)

  • Boolean: usersync.connector.0.search.deleted.user.enabled (default: false)

  • Numeric: usersync.connector.#.search.deleted.cycles (default: 6)

Privacera Manager Variables:

In the LDAP connector properties table above, see under User Search (section C).

Azure Active Directory (AAD)
  1. Run the following command to enable Privacera UserSync:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.yml config/custom-vars/
  2. Enable the AAD connector:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.azuread.yml config/custom-vars/ 
    vi config/custom-vars/vars.privacera-usersync.azuread.yml

    Edit the following properties:

    Property

    Description

    Example

    A) AAD Basic Info

    AZURE_AD_CONNECTOR

    Name of the connector.

    AAD1

    AZURE_AD_ENABLED

    Enabled status of connector. (true/false)

    true

    AZURE_AD_SERVICE_TYPE

    Service Type

    AZURE_AD_DATASOURCE_NAME

    Name of the datasource.

    AZURE_AD_ATTRIBUTE_ONLY

    Sync only the attributes of users already synced from other services.

    false

    AZURE_AD_SYNC_INTERVAL

    Frequency of usersync pulls and audit records in seconds. Default value is 3600, minimum value is 300.

    3600

    B) Azure AAD Info: (Get the following information from Azure Portal)

    AZURE_AD_TENANT_ID

    Azure Active Directory Id (Tenant ID)

    1a2b3c4d-azyd-4755-9638-e12xa34p56le

    AZURE_AD_CLIENT_ID

    Azure Active Directory application client ID which will be used for accessing Microsoft Graph API.

    11111111-1111-1111-1111-111111111111

    AZURE_AD_CLIENT_SECRET

    Azure Active Directory application client secret which will be used for accessing Microsoft Graph API.

    AZURE_AD_USERNAME

    Azure Account username which will be used for getting access token to be used on behalf of Azure AD application.

    AZURE_AD_PASSWORD

    Azure Account password which will be used for getting access token to be used on behalf of Azure AD application.

    C) AAD Manage/Ignore List of Users/Groups

    AZURE_AD_MANAGER_USER_LIST

    List of users to manage from sync results. If this list is defined, all users not on this list will be ignored.

    AZURE_AD_IGNORE_USER_LIST

    List of users to ignore from sync results.

    AZURE_AD_MANAGE_GROUP_LIST

    List of groups to manage from sync results. If this list is defined, all groups not on this list will be ignored.

    AZURE_AD_IGNORE_GROUP_LIST

    List of groups to ignore from sync results.

    D) AAD Search

    AZURE_AD_SEARCH_SCOPE

    Azure AD Application Access Scope

    AZURE_AD_SEARCH_USER_GROUPONLY

    Boolean to only load users in groups.

    false

    AZURE_AD_SEARCH_INCREMENTAL_ENABLED

    Enable incremental search. Syncing only changes since last search.

    false

    AZURE_AD_SEARCH_DETECT_DELETED_USERS_GROUPS

    Enables both user and group deleted searches. Default is false.

    false

    AZURE_AD_SEARCH_DETECT_DELETED_USERS

    Override setting for user deleted search. Default value is AZURE_AD_SEARCH_DETECT_DELETED_USERS_GROUPS.

    AZURE_AD_SEARCH_DETECT_DELETED_USERS_GROUPS

    AZURE_AD_SEARCH_DETECT_DELETED_GROUPS

    Override setting for group deleted search. Default value is AZURE_AD_SEARCH_DETECT_DELETED_USERS_GROUPS.

    AZURE_AD_SEARCH_DETECT_DELETED_USERS_GROUPS

    E) Azure Service Principal

    Note

    If Sync Service Principals as Users is enabled, AAD does not require that displayName of a Service Principal be a unique value. In this case a different attribute (such as appId) should be used as the Service Principal Username.

    AZURE_AD_SERVICEPRINCIPAL_ENABLED

    Sync Azure service principal to ranger user entity.

    false

    AZURE_AD_SERVICEPRINCIPAL_USERNAME

    Properties to specify from which key to get values of username in case service principal is mapped to Ranger user entity.

    displayName

    F) AAD User/Group Attributes

    AZURE_AD_ATTRIBUTE_USERNAME

    Attribute of a user’s name (default: userPrincipalName)

    AZURE_AD_ATTRIBUTE_FIRSTNAME

    Attribute of a user’s first name (default: givenName)

    AZURE_AD_ATTRIBUTE_LASTNAME

    Attribute of a user’s last name (default: surname)

    AZURE_AD_ATTRIBUTE_EMAIL

    Attribute from user entry that would be treated as email address.

    AZURE_AD_ATTRIBUTE_GROUPNAME

    Attribute from group entry that would be treated as group name.

    AZURE_AD_SERVICEPRINCIPAL_USERNAME

    Attribute of service principal name.

    G) Username/Group name Attribute Modification

    AZURE_AD_ATTRIBUTE_USERNAME_VALUE_EXTRACTFROMEMAIL

    Extract username from an email address. (e.g. username@domain.com -> username) Default is false.

    false

    AZURE_AD_ATTRIBUTE_USERNAME_VALUE_PREFIX

    Prefix to prepend to the username. Default is blank.

    AZURE_AD_ATTRIBUTE_USERNAME_VALUE_POSTFIX

    Postfix to append pend to the username. Default is blank.

    AZURE_AD_ATTRIBUTE_USERNAME_VALUE_TOLOWER

    Convert the username to lowercase. Default is false.

    false

    AZURE_AD_ATTRIBUTE_USERNAME_VALUE_TOUPPER

    Convert the username to uppercase. Default is false.

    false

    AZURE_AD_ATTRIBUTE_USERNAME_VALUE_REGEX

    Attribute to replace username to matching regex. Default is blank.

    AZURE_AD_ATTRIBUTE_GROUPNAME_VALUE_EXTRACTFROMEMAIL

    Extract the group name from an email address. Default is false.

    false

    AZURE_AD_ATTRIBUTE_GROUPNAME_VALUE_PREFIX

    Prefix to prepend to the group's name. Default is blank.

    AZURE_AD_ATTRIBUTE_GROUPNAME_VALUE_POSTFIX

    Postfix to append pend to the group's name. Default is blank.

    AZURE_AD_ATTRIBUTE_GROUPNAME_VALUE_TOLOWER

    Convert the name to group's name to lower case. Default is false.

    false

    AZURE_AD_ATTRIBUTE_GROUPNAME_VALUE_TOUPPER

    Convert the group's name to uppercase. Default is false.

    false

    AZURE_AD_ATTRIBUTE_GROUPNAME_VALUE_REGEX

    Attribute to replace the group's name to matching regex. Default is blank.

    H) Group Attribute Configuration

    AZURE_AD_GROUP_ATTRIBUTE_LIST

    The list of attribute keys to get from synced groups.

    AZURE_AD_GROUP_ATTRIBUTE_VALUE_PREFIX

    Append prefix to values of group attributes such as group name.

    AZURE_AD_GROUP_ATTRIBUTE_KEY_PREFIX

    Append prefix to key of group attributes such as group name.

    I) Filter Properties

    AZURE_AD_FILTER_USER_LIST

    Filter the AAD user list, supported for non-incremental search. When incremental search is enabled delta search does not support filter properties.

    abc.def@privacera.com

    AZURE_AD_FILTER_SERVICEPRINCIPAL_LIST

    Filter the AAD service principal list, supported for non-incremental search. When incremental search is enabled delta search does not support filter properties.

    abc-testapp

    AZURE_AD_FILTER_GROUP_LIST

    Filter the AAD group list, supported for non-incremental search. When incremental search is enabled delta search does not support filter properties.

    PRIVACERA-AB-GROUP-00

    J) Domain Properties

    AZURE_AD_MANAGE_DOMAIN_LIST

    Only users in manage domain list will be synced.

    Privacera.US

    AZURE_AD_IGNORE_DOMAIN_LIST

    Users in ignore domain list will not be synced.

    Privacera.US

    AZURE_AD_DOMAIN_ATTRIBUTE

    Specify the attribute from which you want to compare user domain, email or username are supported. Default is email.

    username

  3. Run the following command:

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
Azure Active Directory (AAD) deleted entity detection

When enabled, AAD deleted entity detection will perform a soft delete of users or groups in Privacera Portal. A soft delete removes all memberships of the group/user and marks them as “hidden”. Hidden users will not appear in auto completion when modifying access policies. References to users/groups in policies will remain, until manually removed or the user/group is fully deleted from Privacera Portal. Hidden users can be fully deleted by using the Privacera Portal UI or REST APIs.

Properties:

  • Boolean: usersync.connector.3.search.deleted.group.enabled (default: false)

  • Boolean: usersync.connector.3.search.deleted.user.enabled (default: false)

Privacera Manager Variables:

In the AAD connector properties table above, see under AAD Search (section D).

SCIM
  1. Run the following command to enable Privacera UserSync:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.yml config/custom-vars/
  2. Enable the SCIM connector:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.scim.yml config/custom-vars/ 
    vi config/custom-vars/vars.privacera-usersync.scim.yml

    Edit the following properties:

    Property

    Description

    Example

    A) SCIM Connector Info

    SCIM_CONNECTOR

    Name of connector.

    DB1

    SCIM_ENABLED

    Enabled status of connector. (true/false)

    true

    SCIM_SERVICETYPE

    Service Type

    scim

    SCIM_DATASOURCE_NAME

    Name of the datasource.

    databricks1

    SCIM_URL

    Connector URL

    ADMIN_USER_BEARER_TOKEN

    Bearer token

    SCIM_SYNC_INTERVAL

    Frequency of usersync pulls and audit records in seconds. Default value is 3600, minimum value is 300.

    3600

    B) SCIM Manage/Ignore List of Users/Groups

    SCIM_MANAGE_USER_LIST

    List of users to manage from sync results. If this list is defined, all users not on this list will be ignored

    SCIM_IGNORE_USER_LIST

    List of users to ignore from sync results.

    SCIM_MANAGE_GROUP_LIST

    List of groups to manage from sync results. If this list is defined, all groups not on this list will be ignored.

    SCIM_IGNORE_GROUP_LIST

    List of groups to ignore from sync results.

    C) SCIM User/Group Attributes

    SCIM_ATTRIBUTE_USERNAME

    Attribute from user entry that would be treated as user name.

    userName

    SCIM_ATTRIBUTE_FIRSTNAME

    Attribute from user entry that would be treated as firstname.

    name.givenName

    SCIM_ATTRIBUTE_LASTNAME

    Attribute from user entry that would be treated as lastname.

    name.familyName

    SCIM_ATTRIBUTE_EMAIL

    Attribute from user entry that would be treated as email address.

    emails[primary-true].value

    SCIM_ATTRIBUTE_ONLY

    Sync only the attributes of users already synced from other services. (true/false)

    false

    SCIM_ATTRIBUTE_GROUPS

    Attribute of user’s group list.

    groups

    SCIM_ATTRIBUTE_GROUPNAME

    Attribute from group entry that would be treated as group name.

    displayName

    SCIM_ATTRIBUTE_GROUP_MEMBER

    Attribute from group entry that is list of members.

    members

    D) SCIM Server Username Attribute Modifications

    SCIM_ATTRIBUTE_USERNAME_VALUE_EXTRACTFROMEMAIL

    Extract the user’s username from an email address. (e.g. username@domain.com -> username) The default is false.

    false

    SCIM_ATTRIBUTE_USERNAME_VALUE_PREFIX

    Prefix to prepend to username. The default is blank.

    SCIM_ATTRIBUTE_USERNAME_VALUE_POSTFIX

    Postfix to append to the username. The default is blank.

    SCIM_ATTRIBUTE_USERNAME_VALUE_TOLOWER

    Convert the user’s username to lowercase. The default is false.

    false

    SCIM_ATTRIBUTE_USERNAME_VALUE_TOUPPER

    Convert the user’s username to uppercase. The default is false.

    false

    SCIM_ATTRIBUTE_USERNAME_VALUE_REGEX

    Attribute to replace username to matching regex. The default is blank.

    E) SCIM Server Group Name Attribute Modifications

    SCIM_ATTRIBUTE_GROUPNAME_VALUE_EXTRACTFROMEMAIL

    Extract the group’s name from an email address (e.g. groupname@domain.com -> groupname). The default is false.

    false

    SCIM_ATTRIBUTE_GROUPNAME_VALUE_PREFIX

    Prefix to prepend to the group's name. The default is blank.

    SCIM_ATTRIBUTE_GROUPNAME_VALUE_POSTFIX

    Postfix to append to the group's name. The default is blank.

    SCIM_ATTRIBUTE_GROUPNAME_VALUE_TOLOWER

    Convert group's name to lowercase. The default is false.

    false

    SCIM_ATTRIBUTE_GROUPNAME_VALUE_TOUPPER

    Convert the group's name to uppercase. The default is false.

    false

    SCIM_ATTRIBUTE_GROUPNAME_VALUE_REGEX

    Attribute to replace group's name to matching regex. The default is blank.

    F) Group Attribute Configuration

    SCIM_GROUP_ATTRIBUTE_LIST

    The list of attribute keys to get from synced groups.

    SCIM_GROUP_ATTRIBUTE_VALUE_PREFIX

    Append prefix to values of group attributes such as group name.

    SCIM_GROUP_ATTRIBUTE_KEY_PREFIX

    Append prefix to key of group attributes such as group name.

  3. Run the following command:

    cd ~/privacera/privacera-manager ./privacera-manager.sh update
SCIM Server

Note

SCIM Server exposes privacera-usersync service externally on a Public/Internet-facing LB.

  1. Run the following command to enable Privacera UserSync:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.yml config/custom-vars/
  2. Enable the SCIM Server connector:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.scimserver.yml config/custom-vars/ 
    vi config/custom-vars/vars.privacera-usersync.scimserver.yml

    Edit the following properties:

    Property

    Description

    Example

    A) SCIM Server Connector Info

    SCIM_SERVER_CONNECTOR

    Identifying name of this connector.

    DB1

    SCIM_SERVER_ENABLED

    Enabled status of connector. (true/false)

    true

    SCIM_SERVER_SERVICETYPE

    Type of service/connector.

    scimserver

    SCIM_SERVER_DATASOURCE_NAME

    Unique datasource name. Used for identifying source of data and configuring priority list. (Optional)

    databricks1

    SCIM_SERVER_ATTRIBUTE_ONLY

    Sync only the attributes of users already synced from other services. (true/false)

    SCIM_SERVER_BEARER_TOKEN

    Bearer token for auth to SCIM API. When set, SCIM requests with this token will be allowed access.

    SCIM_SERVER_USERNAME

    Basic auth username, when set SCIM requests with this username will be allowed access. (Password also required)

    SCIM_SERVER_PASSWORD

    Basic auth password, when set SCIM requests with this password will be allowed access. (Username also required)

    SCIM_SERVER_SYNC_INTERVAL

    Frequency of usersync audit records in seconds. Default value is 3600, minimum value is 300.

    3600

    B) SCIM Server Manage/Ignore List of Users/Groups

    SCIM_SERVER_MANAGE_USER_LIST

    List of users to manage from sync results. If this list is defined, all users not on this list will be ignored.

    SCIM_SERVER_IGNORE_USER_LIST

    List of users to ignore from sync results.

    SCIM_SERVER_MANAGE_GROUP_LIST

    List of groups to manage from sync results. If this list is defined, all groups not on this list will be ignored.

    SCIM_SERVER_IGNORE_GROUP_LIST

    List of groups to ignore from sync results.

    C) SCIM Server Attributes

    SCIM_SERVER_ATTRIBUTE_USERNAME

    Attribute of a user's name.

    userName

    SCIM_SERVER_ATTRIBUTE_FIRSTNAME

    Attribute of a user's first name.

    name.givenName

    SCIM_SERVER_ATTRIBUTE_LASTNAME

    Attribute of a user's last/family name.

    name.familyName

    SCIM_SERVER_ATTRIBUTE_EMAIL

    Attribute of a user’s email.

    emails[primary-true].value

    SCIM_SERVER_ATTRIBUTE_GROUPS

    Attribute of a user’s group list.

    groups

    SCIM_SERVER_ATTRIBUTE_GROUPNAME

    Attribute of a group's name.

    displayName

    SCIM_SERVER_ATTRIBUTE_GROUP_MEMBER

    Attribute from group entry that is the list of members.

    members

    D) SCIM Server Username Attribute Modifications

    SCIM_SERVER_ATTRIBUTE_USERNAME_VALUE_EXTRACTFROMEMAIL

    Extract the user’s username from an email address. (e.g. username@domain.com -> username) The default is false.

    false

    SCIM_SERVER_ATTRIBUTE_USERNAME_VALUE_PREFIX

    Prefix to prepend to username. The default is blank.

    SCIM_SERVER_ATTRIBUTE_USERNAME_VALUE_POSTFIX

    Postfix to append to the username. The default is blank.

    SCIM_SERVER_ATTRIBUTE_USERNAME_VALUE_TOLOWER

    Convert the user’s username to lowercase. The default is false.

    false

    SCIM_SERVER_ATTRIBUTE_USERNAME_VALUE_TOUPPER

    Convert the user’s username to uppercase. The default is false.

    false

    SCIM_SERVER_ATTRIBUTE_USERNAME_VALUE_REGEX

    Attribute to replace username to matching regex. The default is blank.

    E) SCIM Server Group Name Attribute Modifications

    SCIM_SERVER_ATTRIBUTE_GROUPNAME_VALUE_EXTRACTFROMEMAIL

    Extract the group’s name from an email address (e.g. groupname@domain.com -> groupname). The default is false.

    false

    SCIM_SERVER_ATTRIBUTE_GROUPNAME_VALUE_PREFIX

    Prefix to prepend to the group's name. The default is blank.

    SCIM_SERVER_ATTRIBUTE_GROUPNAME_VALUE_POSTFIX

    Postfix to append to the group's name. The default is blank.

    SCIM_SERVER_ATTRIBUTE_GROUPNAME_VALUE_TOLOWER

    Convert group's name to lowercase. The default is false.

    false

    SCIM_SERVER_ATTRIBUTE_GROUPNAME_VALUE_TOUPPER

    Convert the group's name to uppercase. The default is false.

    false

    SCIM_SERVER_ATTRIBUTE_GROUPNAME_VALUE_REGEX

    Attribute to replace group's name to matching regex. The default is blank.

    F) Group Attribute Configuration

    SCIM_SERVER_GROUP_ATTRIBUTE_LIST

    The list of attribute keys to get from synced groups.

    SCIM_SERVER_GROUP_ATTRIBUTE_VALUE_PREFIX

    Append prefix to values of group attributes such as group name.

    SCIM_SERVER_GROUP_ATTRIBUTE_KEY_PREFIX

    Append prefix to key of group attributes such as group name.

  3. If NGINX Ingress is Enabled, and NGINX controller is running on Internal LB, ensure to disable the ingress for Usersync so that it can pick a Public/Internet facing LB by adding the below variable:

    vi config/custom-vars/vars.kubernetes.nginx-ingress.yml
    
    PRIVACERA_USERSYNC_K8S_NGINX_INGRESS_ENABLE: “false”
  4. Run the following command:

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
OKTA
  1. Run the following command to enable Privacera UserSync:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.yml config/custom-vars/
  2. Enable the OKTA connector:

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.privacera-usersync.okta.yml config/custom-vars/ 
    vi config/custom-vars/vars.privacera-usersync.okta.yml

    Edit the following properties:

    Property

    Description

    Example

    A) OKTA Connector Info

    OKTA_CONNECTOR

    Name of the connector.

    OKTA

    OKTA_ENABLED

    Enabled status of connector. (true/false)

    true

    OKTA_SERVICETYPE

    Type of service/connector.

    okta

    OKTA_DATASOURCE_NAME

    Unique datasource name, used for identifying source of data and configuring priority list. (Optional)

    OKTA_SERVICE_URL

    Connector URL

    https://{myOktaDomain}.okta.com

    OKTA_API_TOKEN

    API token

    A8b2c84d-895a-4fea-82dc-401397b8e50c

    OKTA_SYNC_INTERVAL

    Frequency of usersync pulls and audit records in seconds. Default value is 3600, minimum value is 300.

    3600

    B) OKTA Manage/Ignore List of Users/Groups

    OKTA_USER_LIST

    List of users to manage from sync results. If this list is defined, all users not on this list will be ignored.

    OKTA_IGNORE_USER_LIST

    List of users to ignore from sync results.

    OKTA_USER_LIST_STATUS

    List of users to manage with status as equal to: STAGED, PROVISIONED,ACTIVE,RECOVERY,PASSWORD_EXPIRED,LOCKED_OUT or DEPROVISIONED. If this list is defined, all users not on this list will be ignored.

    ACTIVE,STAGED

    OKTA_USER_LIST_LOGIN

    List of users to manage with user login name (can contain ). If this list is defined, all users not on this list will be ignored.

    sw;mon,san

    OKTA_USER_LIST_PROFILE_FIRSTNAME

    List of users to manage with user first name (can contain ). If this list is defined, all users not on this list will be ignored.

    sw;mon,san

    OKTA_USER_LIST_PROFILE_LASTNAME

    List of users to manage with user last name (can contain ). If this list is defined, all users not on this list will be ignored.

    sw;mon,san

    OKTA_LIST_PROFILE_EMAIL

    List of users to manage with user email (can contain ). If this list is defined, all users not on this list will be ignored.

    sw;mon,san

    OKTA_LIST_TYPE

    List of groups to manage with group type. If this list is defined, all groups not on this list will be ignored.

    APP_GROUP,BUILT_IN,OKTA_GROUP

    OKTA_GROUP_LIST

    List of groups to manage from sync results. If this list is defined, all groups not on this list will be ignored.

    OKTA_IGNORE_GROUP_LIST

    List of groups to ignore from sync results.

    OKTA_GROUP_LIST_SOURCE_ID

    List of groups to manage with group source id. If this list is defined, all groups not on this list will be ignored.

    0oa2v0el0gP90aqjJ0g7,0oa2v0el0gP90aqjJ0g8,0oa2v0el0gP90aqjJ0g0

    OKTA_GROUP_LIST_PROFILE_NAME

    List of groups to manage with group name. If this list is defined, all groups not on this list will be ignored.

    group1,testGroup,testGroup2

    C) OKTA Search

    OKTA_SEARCH_USER_GROUPONLY

    Boolean to only load users in groups.

    false

    OKTA_SEARCH_INCREMENTAL_ENABLED

    Boolean to enable incremental search, syncing only changes since last search.

    false

    D) OKTA User/Group Attributes

    OKTA_ATTRIBUTE_USERNAME

    Attribute from user entry that would be treated as user name.

    login

    OKTA_ATTRIBUTE_FIRSTNAME

    Attribute from user entry that would be treated as firstname.

    firstName

    OKTA_ATTRIBUTE_LASTNAME

    Attribute from user entry that would be treated as lastname.

    lastName

    OKTA_ATTRIBUTE_EMAIL

    Attribute from user entry that would be treated as email address.

    email

    OKTA_ATTRIBUTE_GROUPS

    Attribute of user’s group list.

    groups

    OKTA_ATTRIBUTE_GROUPNAME

    Attribute of a group’s name.

    name

    OKTA_ATTRIBUTE_ONLY

    Sync only the attributes of users already synced from other services. (true/false)

    false

    E) OKTA Username Attribute Modifications

    OKTA_ATTRIBUTE_USERNAME_VALUE_EXTRACTFROMEMAIL

    Extract the user’s username from an email address. (e.g. username@domain.com -> username) The default is false.

    false

    OKTA_ATTRIBUTE_USERNAME_VALUE_PREFIX

    Prefix to prepend to username. The default is blank.

    OKTA_ATTRIBUTE_USERNAME_VALUE_POSTFIX

    Postfix to append to the username. The default is blank.

    OKTA_ATTRIBUTE_USERNAME_VALUE_TOLOWER

    Convert the user’s username to lowercase. The default is false.

    false

    OKTA_ATTRIBUTE_USERNAME_VALUE_TOUPPER

    Convert the user’s username to uppercase. The default is false.

    false

    OKTA_ATTRIBUTE_USERNAME_VALUE_REGEX

    Attribute to replace username to matching regex. The default is blank.

    F) OKTA Group Name Attribute Modifications

    OKTA_ATTRIBUTE_GROUPNAME_VALUE_EXTRACTFROMEMAIL

    Extract the group’s name from an email address (e.g. groupname@domain.com -> groupname). The default is false.

    false

    OKTA_ATTRIBUTE_GROUPNAME_VALUE_PREFIX

    Prefix to prepend to the group's name. The default is blank.

    OKTA_ATTRIBUTE_GROUPNAME_VALUE_POSTFIX

    Postfix to append to the group's name. The default is blank.

    OKTA_ATTRIBUTE_GROUPNAME_VALUE_TOLOWER

    Convert group's name to lowercase. The default is false.

    false

    OKTA_ATTRIBUTE_GROUPNAME_VALUE_TOUPPER

    Convert the group's name to uppercase. The default is false.

    false

    OKTA_ATTRIBUTE_GROUPNAME_VALUE_REGEX

    Attribute to replace group's name to matching regex. The default is blank.

  3. Run the following command:

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
Privacera UserSync REST endpoints

When enabled, Privacera UserSync has REST API endpoints available to allow administrators to push users and groups that already exist in the UserSync cache to Privacera Portal.

Push users
POST - <UserSync_Host>:6086/api/pus/public/cache/load/users

The request body should contain a userList and/or ConnectorList. If no users and connectors are passed, all users will be pushed to Ranger.

Example request:

curl -X 'POST' \
  '<UserSync_Host>:6086/api/pus/public/cache/load/users' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "userList": ["User1", "User2"],
    "connectorList": ["AAD1","OKTA"]
}'

Parameter

Type

Description

userList

string array

List of users to be added to Privacera Portal.

connectorList

string array

All users associated with provided connector(s) will be pushed.

Responses:
  • 200 OK

  • 404 Not Found: If one or more Users or Connectors are not found, JSON response contains error message.

Push groups
POST - <UserSync_Host>:6086/api/pus/public/cache/load/groups

The request body should contain a groupList and/or connectorList. If no groups and connectors are passed, all users will be pushed to Ranger.

Example request:

curl -X 'POST' \
  '<UserSync_Host>:6086/api/pus/public/cache/load/groups' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "groupList": ["Group1", "Group2"],
    "connectorList": ["AAD1","OKTA"]
}'

Parameter

Type

Description

groupList

string array

List of groups to be added to Privacera Portal.

connectorList

string array

All groups associated with provided connector(s) will be pushed.

Responses:
  • 200 OK

  • 404 Not Found: If one or more Groups or Connectors are not found, JSON response contains error message.

Migration from Apache Ranger UserSync to Privacera UserSync

Privacera generally recommends using its own version of UserSync (called Privacera UserSync) over the open-source Apache Ranger UserSync. Privacera has rewritten the Ranger UserSync to improve performance and features.

By default, all PrivaceraCloud customers are provisioned to use Privacera Usersync for improved performance capabilities and feature availability over Ranger UserSync. Below are the steps for platform customers to migrate.

All customers must migrate to use Privacera Usersync by March 31, 2024.

Migration steps

For Privacera Platform customers seeking to transition from Apache Ranger UserSync to Privacera UserSync, there are required manual steps to change the configuration.

  1. Navigate to the privacera-manager/config/custom-vars folder.

    cd privacera-manager/config/custom-vars 
  2. Rename the vars.usersync.ldaps.yml file to have a different extension (e.g. vars.usersync.ldaps.yml.bak).

  3. Ensure that the Ranger UserSync POD/Image has stopped.

    ./privacera_manager.sh stop usersync
  4. Copy the following files:

    • ../sample-vars/vars.privacera-usersync.yml

    • ../sample-vars/vars.privacera-usersync.ldap.yml

  5. Edit the vars.privacera-usersync.ldap.yml file with the desired configurations.

    Ranger UserSync Variable

    Privacera UserSync Variable

    USERSYNC_SYNC_LDAP_URL

    LDAP_URL

    USERSYNC_SYNC_LDAP_BIND_DN

    LDAP_BIND_DN

    USERSYNC_SYNC_LDAP_BIND_PASSWORD

    LDAP_BIND_PASSWORD

    USERSYNC_SYNC_LDAP_SEARCH_BASE

    LDAP_SEARCH_BASE

    USERSYNC_SYNC_LDAP_USER_SEARCH_BASE

    LDAP_SEARCH_USER_BASE

    USERSYNC_SYNC_LDAP_USER_SEARCH_FILTER

    LDAP_SEARCH_USER_FILTER

    USERSYNC_SYNC_GROUP_SEARCH_BASE

    LDAP_SEARCH_GROUP_BASE

    USERSYNC_SYNC_LDAP_GROUP_SEARCH_FILTER

    LDAP_SEARCH_GROUP_FILTER

    USERSYNC_SYNC_LDAP_OBJECT_CLASS

    LDAP_OBJECT_USER_CLASS

    USERSYNC_SYNC_GROUP_OBJECT_CLASS

    LDAP_OBJECT_GROUP_CLASS

    USERSYNC_SYNC_LDAP_SSL_ENABLED

    PRIVACERA_USERSYNC_SYNC_LDAP_SSL_ENABLED

    USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS

    PRIVACERA_USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS

  6. Run PM update to deploy Privacera-UserSync:

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update

For more information, see Privacera UserSync.

LDAP/LDAP-S
LDAP / LDAP-S

This topic covers how you can configure the Privacera Platform to attach and import users and groups defined in an external Active Directory (AD), LDAP, or LDAPS (LDAP over SSL)) directory as data access users and groups.

Prerequisites

Before starting these steps, prepare the following. You need to configure various Privacera properties with these values, as detailed in Configuration.

Determine the following LDAP values:

  • The FQDN and protocol (http or https) of your LDAP server

  • DN

  • Complete Bind DN

  • Bind DN password

  • Top-level search base

  • User search base

To configure an SSL-enabled LDAP-S server, Privacera requires an SSL certificate. You have these alternatives:

  • Set the Privacera property USERSYNC_SYNC_LDAP_SSL_ENABLED: "true".

  • Allow Privacera Manager to download and create the certificate based on the LDAP-S server URL. Set the Privacera property USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS: "true".

  • Manually configure a truststore on the Privacera server that contains the certificate of the LDAP-S server. Set the Privacera property USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS: "false".

Configuration
  1. SSH to instance as ${USER}.

  2. Run the following commands. See Access Manager LDAP-related properties and descriptions.

    USERSYNC_SYNC_LDAP_URL: "<PLEASE_CHANGE>"
    USERSYNC_SYNC_LDAP_BIND_DN: "<PLEASE_CHANGE>"
    USERSYNC_SYNC_LDAP_BIND_PASSWORD: "<PLEASE_CHANGE>"
    USERSYNC_SYNC_LDAP_SEARCH_BASE: "<PLEASE_CHANGE>"
    USERSYNC_SYNC_LDAP_USER_SEARCH_BASE: "<PLEASE_CHANGE>"
    USERSYNC_SYNC_LDAP_SSL_ENABLED: "true"
    USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS: "true"
    
  3. Run Privacera Manager update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration Properties

Property

Description

Example

USERSYNC_SYNC_LDAP_URL

"ldap://dir.ldap.us:389" (when NonSSL)

or

"ldaps://dir.ldap.us:636" (when SSL)

USERSYNC_SYNC_LDAP_BIND_DN

CN=Bind User,OU=example,DC=ad,DC=example,DC=com

USERSYNC_SYNC_LDAP_BIND_PASSWORD

USERSYNC_SYNC_LDAP_SEARCH_BASE

OU=example,DC=ad,DC=example,DC=com

USERSYNC_SYNC_LDAP_USER_SEARCH_BASE

USERSYNC_SYNC_LDAP_SSL_ENABLED

Set this to true if SSL is enabled on the LDAP server.

true

USERSYNC_SYNC_LDAP_SSL_PM_GEN_TS

Set this to true if you want Privacera Manager to generate the truststore certificate.

Set this to false if you want to manually provide the truststore certificate. To learn how to upload SSL certificates, [click here](../pm-ig/upload_custom_cert.md).

true

Azure Active Directory (AAD)
Azure Active Directory - Data Access User Synchronization

This topic covers how you can synchronize users, groups, and service principals from your existing Azure Active Directory (AAD) domain.

Pre-requisites

Ensure the following pre-requisites are met:

  • Create an Azure AD application.

  • Get the values for the following Azure properties: Application (client) ID, Client secrets

CLI Configuration
  1. SSH to the instance as ${USER}.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.usersync.azuread.yml config/custom-vars/
    vi config/custom-vars/vars.usersync.azuread.yml
    
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    USERSYNC_AZUREAD_TENANT_ID: "<PLEASE_CHANGE>"
    USERSYNC_AZUREAD_CLIENT_ID: "<PLEASE_CHANGE>"
    USERSYNC_AZUREAD_CLIENT_SECRET: "<PLEASE_CHANGE>"
    USERSYNC_AZUREAD_DOMAINS: "<PLEASE_CHANGE>"
    USERSYNC_AZUREAD_GROUPS: "<PLEASE_CHANGE>"
    USERSYNC_ENABLE: "true"
    USERSYNC_SOURCE: "azuread"
    USERSYNC_AZUREAD_USE_GROUP_LOOKUP_FIRST: "true"
    USERSYNC_SYNC_AZUREAD_USERNAME_RETRIVAL_FROM: "userPrincipalName"
    USERSYNC_SYNC_AZUREAD_EMAIL_RETRIVAL_FROM: "userPrincipalName"
    USERSYNC_SYNC_AZUREAD_GROUP_RETRIVAL_FROM: "displayName"
    SYNC_AZUREAD_USER_SERVICE_PRINCIPAL_ENABLED: "false"
    SYNC_AZUREAD_USER_SERVICE_PRINCIPAL_USERNAME_RETRIVAL_FROM: "appId"
    
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration Properties

Property Name

Description

Example

USERSYNC_AZUREAD_TENANT_ID

To get the value for this property, Go to Azure portal > Azure Active Directory > Properties > Tenant ID

5a5cxxx-xxxx-xxxx-xxxx-c3172b33xxxx

USERSYNC_AZUREAD_CLIENT_ID

Get the value by following the Pre-requisites section above.

8a08xxxx-xxxx-xxxx-xxxx-6c0c95a0xxxx

USERSYNC_AZUREAD_CLIENT_SECRET

Get the value by following the Pre-requisites section above.

${CLIENT_SECRET}

USERSYNC_AZUREAD_DOMAINS

To get the value for this property, Go to Azure portal > Azure Active Directory > Domains

componydomain1.com,componydomain2.com

USERSYNC_AZUREAD_GROUPS

To get the value for this property, Go to Azure portal > Azure Active Directory > Groups

GROUP1,GROUP2",GROUP3

USERSYNC_ENABLE

Set to true to enable usersync.

true

USERSYNC_SOURCE

Source from which users/groups are synced.

Values: unix, ldap, azuread

azuread

USERSYNC_AZUREAD_USE_GROUP_LOOKUP_FIRST

Set to true if you want to first sync all groups and then all the users within those groups.

true

USERSYNC_SYNC_AZUREAD_USERNAME_RETRIVAL_FROM

Azure provides the user info in a JSON format.

Assign a JSON attribute that is unique. This would be the name of the user in Ranger.

userPrincipalName

USERSYNC_SYNC_AZUREAD_EMAIL_RETRIVAL_FROM

Azure provides the user info in a JSON format.

Set the email from the JSON attribute of the Azure user entity.

userPrincipalName

USERSYNC_SYNC_AZUREAD_GROUP_RETRIVAL_FROM

Azure provides the user info in a JSON format.

Use the JSON attribute to retrieve group information for the user.

displayName

SYNC_AZUREAD_USER_SERVICE_PRINCIPAL_ENABLED

Set to true to sync Azure service principal to the Ranger user entity

false

SYNC_AZUREAD_USER_SERVICE_PRINCIPAL_USERNAME_RETRIVAL_FROM

Azure provides the service principal info in a JSON format.

Assign a JSON attribute that is unique. This would be the name of the user in Ranger.

appId

Privacera Plugin
Databricks
Privacera Plugin in Databricks
Databricks

Privacera provides two types of plugin solutions for access control in Databricks clusters. Both plugins are mutually exclusive and cannot be enabled on the same cluster.

Databricks Spark Fine-Grained Access Control (FGAC) Plugin

  • Recommended for SQL, Python, R language notebooks.

  • Provides FGAC on databases with row filtering and column masking features.

  • Uses privacera_hive, privacera_s3, privacera_adls, privacera_files services for resource-based access control, and privacera_tag service for tag-based access control.

  • Uses the plugin implementation from Privacera.

Databricks Spark Object Level Access Control (OLAC) Plugin

OLAC plugin was introduced to provide an alternative solution for Scala language clusters, since using Scala language on Databricks Spark has some security concerns.

  • Recommended for Scala language notebooks.

  • Provides OLAC on S3 locations which you are trying to access via Spark.

  • Uses privacera_s3 service for resource-based access control and privacera_tag service for tag-based access control.

  • Uses the signed-authorization implementation from Privacera.

Databricks cluster deployment matrix with Privacera plugin

Job/Workflow use-case for automated cluster:

Run-Now will create the new cluster based on the definition mentioned in the job description.

Table 55. 

Job Type  

Languages

FGAC/DBX version

OLAC/DBX Version

Notebook

Python/R/SQL

Supported [7.3, 9.1 , 10.4]

JAR

Java/Scala

Not supported

Supported[7.3, 9.1 , 10.4]

spark-submit

Java/Scala/Python

Not supported

Supported[7.3, 9.1 , 10.4]

Python

Python

Supported [7.3, 9.1 , 10.4]

Python wheel

Python

Supported [9.1 , 10.4]

Delta Live Tables pipeline

Not supported

Not supported



Job on existing cluster:

Run-Now will use the existing cluster which is mentioned in the job description.

Table 56. 

Job Type

Languages

FGAC/DBX version

OLAC

Notebook

Python/R/SQL

supported [7.3, 9.1 , 10.4]

Not supported

JAR

Java/Scala

Not supported

Not supported

spark-submit

Java/Scala/Python

Not supported

Not supported

Python

Python

Not supported

Not supported

Python wheel

Python

supported [9.1 , 10.4]

Not supported

Delta Live Tables pipeline

Not supported

Not supported



Interactive use-case

Interactive use-case is running a notebook of SQL/Python on an interactive cluster.

Table 57. 

Cluster Type

Languages

FGAC

OLAC

Standard clusters

Scala/Python/R/SQL

Not supported

Supported [7.3,9.1,10.4]

High Concurrency clusters

Python/R/SQL

Supported [7.3,9.1,10.4

Supported [7.3,9.1,10.4]

Single Node

Scala/Python/R/SQL

Not supported

Supported [7.3,9.1,10.4]



Databricks Spark Fine-Grained Access Control Plugin [FGAC] [Python, SQL]
Configuration
  1. Run the following commands:

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.databricks.plugin.yml config/custom-vars/
    vi config/custom-vars/vars.databricks.plugin.yml
    
  2. Edit the following properties to allow Privacera Platform to connect to your Databricks host. For property details and description, refer to the Configuration Properties below.

    DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
    DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
    DATABRICKS_WORKSPACES_LIST:
    - alias: DEFAULT
    databricks_host_url: "{{DATABRICKS_HOST_URL}}"
    token: "{{DATABRICKS_TOKEN}}"
    DATABRICKS_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_ENABLE: "true"
    

    You can also add custom properties that are not included by default. .

  3. Run the following commands:

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
  4. (Optional) By default, policies under the default service name, privacera_hive, are enforced. You can customize a different service name and enforce policies defined in the new name. See Configure Service Name for Databricks Spark Plugin.

Configuration properties

Property Name

Description

Example Values

DATABRICKS_HOST_URL

Enter the URL where the Databricks environment is hosted.

For AZURE Databricks,

DATABRICKS_HOST_URL: "https://xdx-66506xxxxxxxx.2.azuredatabricks.net/?o=665066931xxxxxxx"

For AWS Databricks

DATABRICKS_HOST_URL: "https://xxx-7xxxfaxx-xxxx.cloud.databricks.com"

DATABRICKS_TOKEN

Enter the token.

To generate the token,

1. Login to your Databricks account.

2. Click the user profile icon in the upper right corner of your Databricks workspace.

3. Click User Settings.

4. Click the Generate New Token button.

5. Optionally enter a description (comment) and expiration period.

6. Click the Generate button.

7. Copy the generated token.

DATABRICKS_TOKEN: "xapid40xxxf65xxxxxxe1470eayyyyycdc06"

DATABRICKS_WORKSPACES_LIST

Add multiple Databricks workspaces to connect to Ranger.

  1. To add a single workspace, add the following default JSON in the text area to define the host URL and token of the Databricks workspace. The text area should not be left empty and should at least contain the default JSON.

    Note

    Do not edit any of the values in the default JSON.

    [{"alias":"DEFAULT",
    "databricks_host_url":"{{DATABRICKS_HOST_URL}}",
    "token":"{{DATABRICKS_TOKEN}}"}]
    
  2. To add two workspaces, use the following JSON.

    Note

    {{var}} is an Ansible variable. Such a variable re-uses the value of a predefined variable. Hence, do not edit the properties, databricks_host_url and token of the alias: DEFAULT as they are set by DATABRICKS_HOST_URL and DATABRICKS_TOKEN respectively.

    [{"alias":"DEFAULT",
    "databricks_host_url":"{{DATABRICKS_HOST_URL}}",
    "token":"{{DATABRICKS_TOKEN}}"},
    {"alias":"<workspace-2-alias>","databricks_host_url":"<workspace-2-url>",
    "token":"<dbx-token-for-workspace-2>"}]
    

DATABRICKS_ENABLE

If set to 'true' Privacera Manager will create the Databricks cluster Init script "ranger_enable.sh" to:

'~/privacera/privacera-manager/output/databricks/ranger_enable.sh.

"true"

"false"

DATABRICKS_MANAGE_INIT_SCRIPT

If set to 'true' Privacera Manager will upload Init script ('ranger_enable.sh') to the identified Databricks Host.

If set to 'false' upload the following two files to the DBFS location. The files can be located at *~/privacera/privacera-manager/output/databricks*.

  • privacera_spark_plugin_job.conf

  • privacera_spark_plugin.conf

"true"

"false"

DATABRICKS_SPARK_PLUGIN_AGENT_JAR

Use the Java agent to assign a string of extra JVM options to pass to the Spark driver.

-javaagent:/databricks/jars/privacera-agent.jar

DATABRICKS_SPARK_PRIVACERA_CUSTOM_CURRENT_USER_UDF_NAME

Property to map logged-in user to Ranger user for row-filter policy.

It is mapped with the Databricks cluster-level property spark.hadoop.privacera.custom.current_user.udf.names. See Spark Properties. Check if this property is set in your Databricks cluster. If it is being used, then set its value similar to the PM property. If the value of the PM property and Databricks cluster-level property differ, then it can cause an unexpected behavior.

current_user()

DATABRICKS_SPARK_PRIVACERA_VIEW_LEVEL_MASKING_ROWFILTER_EXTENSION_ENABLE

Property to enable masking, row-filter and data_admin access on view.

Property to enable masking, row-filter and data_admin access on view. This property is a Privacera Manager (PM) property

It is mapped with the Databricks cluster-level property spark.hadoop.privacera.spark.view.levelmaskingrowfilter.extension.enable. See Spark Properties. Check if this property is set in your Databricks cluster. If it is being used, then set its value similar to the PM property. If the value of the PM property and Databricks cluster-level property differ, then it can cause an unexpected behavior.

false

DATABRICKS_SQL_CLUSTER_POLICY_SPARK_CONF

Configure Databricks Cluster policy.

Add the following JSON in the text area:

[{"Note":"First spark conf","key":"spark.hadoop.first.spark.test","value":"test1"},{"Note":"Second spark conf","key":"spark.hadoop.first.spark.test","value":"test2"}]

DATABRICKS_POST_PLUGIN_COMMAND_LIST

This property is not part of the default YAML file, but can be added if required.

Use this property, if you want to run a specific set of commands in the Databricks init script.

The following example will be added to the cluster init script to allow Athena JDBC via data access server.

DATABRICKS_POST_PLUGIN_COMMAND_LIST:

- sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8181 -j ACCEPT

- sudo curl -k -u user:password {{PORTAL_URL}}/api/dataserver/cert?type=dataserver_jks -o /etc/ssl/certs/dataserver.jks

- sudo chmod 755 /etc/ssl/certs/dataserver.jks

DATABRICKS_SPARK_PYSPARK_ENABLE_PY4J_SECURITY

This property allows you to backlist APIs to enable security. This property is a Privacera Manager (PM) property

It is mapped with the Databricks cluster-level property spark.databricks.pyspark.enablePy4JSecurity. See Spark Properties. Check if this property is set in your Databricks cluster. If it is being used, then set its value similar to the PM property. If the value of the PM property and Databricks cluster-level property differ, then it can cause an unexpected behavior.

The following example will be added to the cluster init script to allow Athena JDBC via data access server.

DATABRICKS_POST_PLUGIN_COMMAND_LIST:

- sudo iptables -I OUTPUT 1 -p tcp -m tcp --dport 8181 -j ACCEPT

- sudo curl -k -u user:password {{PORTAL_URL}}/api/dataserver/cert?type=dataserver_jks -o /etc/ssl/certs/dataserver.jks

- sudo chmod 755 /etc/ssl/certs/dataserver.jks

Managing init script

Automatic upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is 'true', then the Init script will be uploaded automatically to your Databricks host. The init script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Manual upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is 'false', then the Init script must be uploaded to your Databricks host.

To avoid the manual steps below, you should set DATABRICKS_MANAGE_INIT_SCRIPT=true and follow the instructions outlined in Automatic Upload.

  1. Open a terminal and connect to Databricks account using your Databricks login credentials/token.

    Connect using login credentials:

    1. If you're using login credentials, then run the following command:

      databricks configure --profile privacera
    2. Enter the Databricks URL:

      Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
    3. Enter the username and password:
      Username: email-id@example.com
      Password:

    Connect using Databricks token:

    1. If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.

    2. If you're using token, then run the following command:

      databricks configure --token --profile privacera
    3. Enter the Databricks URL:

      Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
    4. Enter the token:

      Token:
  2. To check if the connection to your Databricks account is established, run the following command:

    dbfs ls dbfs:/ --profile privacera

    You should see the list of files in the output, if you are connected to your account.

  3. Upload files manually to Databricks:

    1. Copy the following files to DBFS, which are available in the PM host at the location, ~/privacera/privacera-manager/output/databricks:

      • ranger_enable.sh

      • privacera_spark_plugin.conf

      • privacera_spark_plugin_job.conf

      • privacera_custom_conf.zip

    2. Run the following command. For the value of <DEPLOYMENT_ENV_NAME>, you can get it from the file, ~/privacera/privacera-manager/config/vars.privacera.yml.

      export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME>
      dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera
      dbfs cp ranger_enable.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_custom_conf.zip dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
    3. Verify the files have been uploaded.

      dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera

      The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Configure Databricks Cluster
  1. Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.

  2. Open the Cluster dialog and enter Edit mode.

  3. In the Configuration tab, select Advanced Options > Spark.

  4. Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.

    New Properties

    Note

    • From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.

    • For Databricks versions &lt; 7.3, Old Properties should only be used since the versions are in extended support.

    spark.databricks.cluster.profile serverless
    spark.databricks.isv.product privacera
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
    spark.databricks.repl.allowedLanguages sql,python,r
    

    Old Properties

    spark.databricks.cluster.profile serverless
    spark.databricks.repl.allowedLanguages sql,python,r
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
    spark.databricks.isv.product privacera
    spark.databricks.pyspark.enableProcessIsolation true
  5. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the <DEPLOYMENT_ENV_NAME> variable, enter the deployment name as defined for the DEPLOYMENT_ENV_NAME variable in the vars.privacera.yml.

    dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable.sh
    
  6. In the Table Access Control section, uncheck Enable table access control and only allow Python and SQL commands and Enable credential passthrough for user-level data access and only allow Python and SQL commands checkboxes.

  7. Save (Confirm) this configuration.

  8. Start (or Restart) the selected Databricks Cluster.

Validation

In order to help evaluate the use of Privacera with Databricks, Privacera provides a set of Privacera Manager 'demo' notebooks. These can be downloaded from Privacera S3 repository using either your favorite browser, or a command line 'wget'. Use the notebook/sql sequence that matches your cluster.

  1. Download using your browser (just click on the correct file for your cluster, below:

    https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql

    If AWS S3 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql

    If ADLS Gen2 is configured from your Databricks cluster: https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql

    or, if you are working from a Linux command line, use the 'wget' command to download.

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPlugin.sql -O PrivaceraSparkPlugin.sql

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginS3.sql -O PrivaceraSparkPluginS3.sql

    wget https://privacera.s3.amazonaws.com/public/pm-demo-data/databricks/PrivaceraSparkPluginADLS.sql -O PrivaceraSparkPluginADLS.sql

  2. Import the Databricks notebook:

    1. Log in to the Databricks Console

    2. Select Workspace > Users > Your User.

    3. From the drop down menu, select Import and choose the file downloaded.

  3. Follow the suggested steps in the text of the notebook to exercise and validate Privacera with Databricks.

Databricks Spark Object-level Access Control Plugin [OLAC] [Scala]
Prerequisites

Ensure the following prerequisites are met:

Configuration
  1. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.databricks.scala.yml config/custom-vars/
    vi config/custom-vars/vars.databricks.scala.yml
    
  2. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    DATASERVER_DATABRICKS_ALLOWED_URLS : "<PLEASE_UPDATE>"
    DATASERVER_AWS_STS_ROLE: "<PLEASE_CHANGE>"
    
  3. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

DATABRICKS_SCALA_ENABLE

Set the property to enable/disable Databricks Scala. This is found under Databricks Signed URL Configuration For Scala Clusters section.

DATASERVER_DATABRICKS_ALLOWED_URLS

Add a URL or comma-separated URLs.

Privacera Dataserver serves only those URLs mentioned in this property.

https://xxx-7xxxfaxx-xxxx.cloud.databricks.com

DATASERVER_AWS_STS_ROLE

Add the instance profile ARN of the AWS role, which can access Delta Files in Databricks.

arn:aws:iam::111111111111:role/assume-role

DATABRICKS_MANAGE_INIT_SCRIPT

Set the init script.

If enabled, Privacera Manager will upload Init script ('ranger_enable.sh') to the identified Databricks Host.

If disabled, Privacera Manager will take no action regarding the Init script for the Databricks File System.

DATABRICKS_SCALA_CLUSTER_POLICY_SPARK_CONF

Configure Databricks Cluster policy.

Add the following JSON in the text area:

[{"Note":"First spark conf",
"key":"spark.hadoop.first.spark.test",
"value":"test1"},
{"Note":"Second spark conf",
"key":"spark.hadoop.first.spark.test",
"value":"test2"}]
Managing init script
Automatic Upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "true", the Init script will be uploaded automatically to your Databricks host. The Init Script will be uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Manual Upload

If DATABRICKS_ENABLE is 'true' and DATABRICKS_MANAGE_INIT_SCRIPT is "false" the Init script must be uploaded to your Databricks host.

  1. Open a terminal and connect to Databricks account using your Databricks login credentials/token.

    • Connect using login credentials:

      1. If you're using login credentials, then run the following command.

        databricks configure --profile privacera
        
      2. Enter the Databricks URL.

        Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
        
      3. Enter the username and password.

        Username: email-id@yourdomain.com
        Password:
        
    • Connect using Databricks token:

      1. If you don't have a Databricks token, you can generate one. For more information, refer Generate a personal access token.

      2. If you're using token, then run the following command.

        databricks configure --token --profile privacera
        
      3. Enter the Databricks URL.

        Databricks Host (should begin with https://): https://dbc-xxxxxxxx-xxxx.cloud.databricks.com/
        
      4. Enter the token.

        Token:
        
  2. To check if the connection to your Databricks account is established, run the following command.

    dbfs ls dbfs:/ --profile privacera
    

    You should see the list of files in the output, if you are connected to your account.

  3. Upload files manually to Databricks.

    1. Copy the following files to DBFS, which are available in the PM host at the location, ~/privacera/privacera-manager/output/databricks:

      • ranger_enable_scala.sh

      • privacera_spark_scala_plugin.conf

      • privacera_spark_scala_plugin_job.conf

    2. Run the following command. For the value of <DEPLOYMENT_ENV_NAME>, you can get it from the file, ~/privacera/privacera-manager/config/vars.privacera.yml.

      export DEPLOYMENT_ENV_NAME=<DEPLOYMENT_ENV_NAME>
      dbfs mkdirs dbfs:/privacera/${DEPLOYMENT_ENV_NAME} --profile privacera
      dbfs cp ranger_enable_scala.sh dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_scala_plugin.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      dbfs cp privacera_spark_scala_plugin_job.conf dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      
    3. Verify the files have been uploaded.

      dbfs ls dbfs:/privacera/${DEPLOYMENT_ENV_NAME}/ --profile privacera
      

      The Init Script is uploaded to dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh, where <DEPLOYMENT_ENV_NAME> is the value of DEPLOYMENT_ENV_NAME mentioned in vars.privacera.yml.

Configure Databricks cluster
  1. Once the update completes successfully, log on to the Databricks console with your account and open the target cluster, or create a new target cluster.

  2. Open the Cluster dialog. enter Edit mode.

  3. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then the Spark tab.

  4. Add the following content to the Spark Config edit box. For more information on the Spark config properties, click here.

    New Properties

    spark.databricks.isv.product privacera
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
    spark.executor.extraJavaOptions -javaagent:/databricks/jars/privacera-agent.jar
    spark.databricks.repl.allowedLanguages sql,python,r,scala
    spark.databricks.delta.formatCheck.enabled false
    

    Old Properties

    spark.databricks.cluster.profile serverless
    spark.databricks.delta.formatCheck.enabled false
    spark.driver.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
    spark.executor.extraJavaOptions -javaagent:/databricks/jars/ranger-spark-plugin-faccess-2.0.0-SNAPSHOT.jar
    spark.databricks.isv.product privaceraspark.databricks.repl.allowedLanguages sql,python,r,scala
    

    Note

    • From Privacera 5.0.6.1 Release onwards, it is recommended to replace the Old Properties with the New Properties. However, the Old Properties will also continue to work.

    • For Databricks versions &lt; 7.3, Old Properties should only be used since the versions are in extended support.

  5. (Optional) To use regional endpoint for S3 access, add the following content to the Spark Config edit box.

    spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com
    spark.hadoop.fs.s3.endpoint https://s3.<region>.amazonaws.com
    spark.hadoop.fs.s3n.endpoint https://s3.<region>.amazonaws.com
    
  6. In the Configuration tab, in Edit mode, Open Advanced Options (at the bottom of the dialog) and then set init script path. For the <DEPLOYMENT_ENV_NAME> variable, enter the deployment name as defined for the DEPLOYMENT_ENV_NAME variable in the vars.privacera.yml.

    dbfs:/privacera/<DEPLOYMENT_ENV_NAME>/ranger_enable_scala.sh
    
  7. Save (Confirm) this configuration.

  8. Start (or Restart) the selected Databricks Cluster.

Related information

For further reading, see:

Spark standalone
Privacera plugin in Spark standalone

This section covers how you can use Privacera Manager to generate the setup script and Spark custom configuration for SSL/TSL to install Privacera Plugin in an open-source Spark environment.

The steps outlined below are only applicable to the Spark 3.x version.

Prerequisites

Ensure the following prerequisites are met:

  • A working Spark environment.

  • Privacera services must be up and running.

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
    vi config/custom-vars/vars.spark-standalone.yml
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    SPARK_STANDALONE_ENABLE:"true"
    SPARK_ENV_TYPE:"<PLEASE_CHANGE>"
    SPARK_HOME:"<PLEASE_CHANGE>"
    SPARK_USER_HOME:"<PLEASE_CHANGE>"
    
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

    After the update is complete, the setup script (privacera_setup.sh, standalone_spark_FGAC.sh, standalone_spark_OLAC.sh) and Spark custom configurations (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.

  5. You can either enable FGAC or OLAC in your Spark environment.

    Enable FGAC

    To enable Fine-grained access control (FGAC), do the following:

    1. Copy standalone_spark_FGAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_FGAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_FGAC.sh

    Enable OLAC

    To enable Object level access control (OLAC), do the following:

    1. Copy standalone_spark_OLAC.sh and spark_custom_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x standalone_spark_OLAC.sh
      
    3. Run the script to install the Privacera plugin in your Spark environment.

      ./standalone_spark_OLAC.sh
      
Configuration properties

Property

Description

Example

SPARK_STANDALONE_ENABLE

Property to enable generating setup script and configs for Spark standalone plugin installation.

true

SPARK_ENV_TYPE

Set the environment type. It can be any user-defined type.

For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod.

local

SPARK_HOME

Home path of your Spark installation.

~/privacera/spark/spark-3.1.1-bin-hadoop3.2

SPARK_USER_HOME

User home directory of your Spark installation.

/home/ec2-user

SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED

Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

To enable the fallback, set to true; to disable, set to false.

true

Validations

To verify the successful installation of Privacera plugin, do the following:

  1. Create an S3 bucket ${S3_BUCKET} for sample testing.

  2. Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).

    wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
    
  3. (Optional) Add AWS JARS in Spark. Download the JARS according to the version of Spark Hadoop in your environment.

    cd  <SPARK_HOME>/jars
    

    For Spark-3.1.1 - Hadoop 3.2 version,

    wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
    wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
    
  4. Run the following command.

    cd <SPARK_HOME>/bin
    
  5. Run the spark-shell to execute scala commands.

    ./spark-shell
    
Validations with JWT Token
  1. Run the following command.

    cd <SPARK_HOME>/bin
    
  2. Set the JWT_TOKEN.

    JWT_TOKEN="<JWT_TOKEN>"
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"  --conf "spark.hadoop.privacera.jwt.oauth.enable=true"
Validations with JWT token and public key
  1. Create a local file with the public key, if the JWT token is generated by private/public key combination.

  2. Set the following according to the payload of JWT Token.

    JWT_TOKEN="<JWT_TOKEN>"
    #The following variables are optional, set it only if token has it else set it empty
    JWT_TOKEN_ISSUER="<JWT_TOKEN_ISSUER>"
    JWT_TOKEN_PUBLIC_KEY_FILE="<JWT_TOKEN_PUBLIC_KEY_FILE_PATH>"
    JWT_TOKEN_USER_KEY="<JWT_TOKEN_USER_KEY>"
    JWT_TOKEN_GROUP_KEY="<JWT_TOKEN_GROUP_KEY>"
    JWT_TOKEN_PARSER_TYPE="<JWT_TOKEN_PARSER_TYPE>"
  3. Run the following command to start spark-shell with parameters.

    ./spark-shell 
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" 
    --conf "spark.hadoop.privacera.jwt.oauth.enable=true" 
    --conf "spark.hadoop.privacera.jwt.token.publickey=${JWT_TOKEN_PUBLIC_KEY_FILE}" 
    --conf "spark.hadoop.privacera.jwt.token.issuer=${JWT_TOKEN_ISSUER}"
    --conf "spark.hadoop.privacera.jwt.token.parser.type=${JWT_TOKEN_PARSER_TYPE}" 
    --conf "spark.hadoop.privacera.jwt.token.userKey=${JWT_TOKEN_USER_KEY}" 
    --conf "spark.hadoop.privacera.jwt.token.groupKey=${JWT_TOKEN_GROUP_KEY}"
Use cases
  1. Add a policy in Access Manager with read permission to ${S3_BUCKET}.

    val file_path = "s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv"
    val df=spark.read.csv(file_path)
    df.show(5)
    
  2. Add a policy in Access Manager with delete and write permission to ${S3_BUCKET}.

    df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/csv/customer_data.csv")
    
Spark on EKS
Privacera plugin in Spark on EKS

This section covers how you can use Privacera Manager to generate the setup script and Spark custom configuration for SSL to install the Privacera plugin in Spark on an EKS cluster.

Prerequisites

Ensure the following prerequisites are met:

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.spark-standalone.yml config/custom-vars/
    vi config/custom-vars/vars.spark-standalone.yml
    
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    SPARK_STANDALONE_ENABLE:"true"
    SPARK_ENV_TYPE:"<PLEASE_CHANGE>"
    SPARK_HOME:"<PLEASE_CHANGE>"
    SPARK_USER_HOME:"<PLEASE_CHANGE>"
  4. Run the following commands:

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    

    After the update is complete, the Spark custom configuration (spark_custom_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/spark-standalone.

  5. Create the Spark Docker Image

    1. Run the following commands to export PRIVACERA_BASE_DOWNLOAD_URL:

      exportPRIVACERA_BASE_DOWNLOAD_URL=<PRIVACERA_BASE_DOWNLOAD_URL>
      
    2. Create a folder.

      mkdir -p ~/privacera-spark-plugin
      cd ~/privacera-spark-plugin
      
    3. Download and extract package using wget.

      wget ${PRIVACERA_BASE_DOWNLOAD_URL}/spark-plugin/k8s-spark-pkg.tar.gz -O k8s-spark-pkg.tar.gz
      tar xzf k8s-spark-pkg.tar.gz
      rm -r k8s-spark-pkg.tar.gz
      
    4. Copy spark_custom_conf.zip file from the Privacera Manager output folder into the files folder.

      cp ~/privacera/privacera-manager/output/spark-standalone/spark_custom_conf.zip files/spark_custom_conf.zip
      
    5. You can either built OLAC Docker image or FGAC Docker image.

      OLAC

      To built the OLAC Docker image, use the following command:

      ./build_image.sh ${PRIVACERA_BASE_DOWNLOAD_URL} OLAC
      

      FGAC

      To built the FGAC Docker image, use the following command:

      ./build_image.sh ${PRIVACERA_BASE_DOWNLOAD_URL} FGAC
      
  6. Test the Spark Docker image.

    1. Create a S3 bucket ${S3_BUCKET} for sample testing.

    2. Download sample data using the following link and put it in the ${S3_BUCKET} at location (s3://${S3_BUCKET}/customer_data).

      wget https://privacera-demo.s3.amazonaws.com/data/uploads/customer_data_clear/customer_data_without_header.csv
      
    3. Start Docker in an interactive mode.

      IMAGE=privacera-spark-plugin:latest
      docker run  --rm -i -t ${IMAGE} bash
      
    4. Start spark-shell inside the Docker container.

      JWT_TOKEN="<PLEASE_CHANGE>"
      cd /opt/privacera/spark/bin
      ./spark-shell \
      --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"\
      --conf "spark.hadoop.privacera.jwt.oauth.enable=true"
    5. Run the following command to read the S3 file:

      val df= spark.read.csv("s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv")
    6. Exit the Docker shell.

      exit
  7. Publish the Spark Docker Image into your Docker Registry.

    • For HUB, HUB_USERNAME, and HUB_PASSWORD, use the Docker hub URL and login credentials.

    • For ENV_TAG, its value can be user-defined depending on your deployment environment such as development, production or test. For example, ENV_TAG=dev can be used for a development environment.

    HUB=<PLEASE_CHANGE>
    HUB_USERNAME=<PLEASE_CHANGE>
    HUB_PASSWORD=<PLEASE_CHANGE>
    ENV_TAG=<PLEASE_CHANGE>
    DEST_IMAGE=${HUB}/privacera-spark-plugin:${ENV_TAG}
    SOURCE_IMAGE=privacera-spark-plugin:latest
    docker login -u ${HUB_USERNAME} -p ${HUB_PASSWORD}${HUB}
    docker tag ${SOURCE_IMAGE}${DEST_IMAGE}
    docker push ${DEST_IMAGE}
  8. Deploy Spark Plugin on EKS cluster.

    1. SSH to EKS cluster where you want to deploy Spark on EKS cluster.

    2. Run the following commands to export PRIVACERA_BASE_DOWNLOAD_URL:

      exportPRIVACERA_BASE_DOWNLOAD_URL=<PRIVACERA_BASE_DOWNLOAD_URL>
      
    3. Create a folder.

      mkdir ~/privacera-spark-plugin
      cd ~/privacera-spark-plugin
      
    4. Download and extract package using wget.

      wget ${PRIVACERA_DOWNLOAD_URL}/plugin/spark/k8s-spark-deploy.tar.gz -O k8s-spark-deploy.tar.gz
      tar xzf k8s-spark-deploy.tar.gz
      rm -r k8s-spark-deploy.tar.gz
      cd k8s-spark-deploy/
      
    5. Open penv.sh file and substitute the values of the following properties, refer to the table below:

      Property

      Description

      Example

      SPARK_NAME_SPACE

      Kubernetes namespace

      privacera-spark-plugin-test

      SPARK_PLUGIN_ROLE_BINDING

      Spark role Binding

      privacera-sa-spark-plugin-role-binding

      SPARK_PLUGIN_SERVICE_ACCOUNT

      Spark services account

      privacera-sa-spark-plugin

      SPARK_PLUGN_ROLE

      Spark services account role

      privacera-sa-spark-plugin-role

      SPARK_PLUGIN_APP_NAME

      Spark services account role

      privacera-sa-spark-plugin-role

      SPARK_PLUGIN_IMAGE

      Docker image with hub

      myhub.docker.com}/privacera-spark-plugin:prod-olac

      SPARK_DOCKER_PULL_SECRET

      Secret for docker-registry

      spark-plugin-docker-hub

    6. Run the following command to replace the properties value in the Kubernetes deployment .yml file:

      mkdir -p backup
      cp *.yml backup/
      ./replace.sh
      
    7. Run the following command to create Kubernetes resources:

      kubectl apply -f namespace.yml
      kubectl apply -f service-account.yml
      kubectl apply -f role.yml
      kubectl apply -f role-binding.yml
      
    8. Run the following command to create secret for docker-registry:

      kubectl create secret docker-registry spark-plugin-docker-hub --docker-server=<PLEASE_CHANGE> --docker-username=<PLEASE_CHANGE>  --docker-password='<PLEASE_CHANGE>' --namespace=<PLEASE_CHANGE>
      
    9. Run the following command to deploy a sample Spark application:

      Note

      This is an sample file used for deployment. As per your use case, you can create Spark deployment file and deploy a Docker image.

      kubectl apply -f privacera-spark-examples.yml -n ${SPARK_NAME_SPACE}

      This will deploy spark application in Kubernetes pod with Privacera plugin and it will keep the pod running, so that you can use it in interactive mode.

Configuration properties

Property

Description

Example

SPARK_STANDALONE_ENABLE

Property to enable generating setup script and configs for Spark standalone plugin installation.

true

SPARK_ENV_TYPE

Set the environment type. It can be any user-defined type.

For example, if you're working in an environment that runs locally, you can set the type as local; for a production environment, set it as prod.

local

SPARK_HOME

Home path of your Spark installation.

~/privacera/spark/spark-3.1.1-bin-hadoop3.2

SPARK_USER_HOME

User home directory of your Spark installation.

/home/ec2-user

SPARK_STANDALONE_RANGER_IS_FALLBACK_SUPPORTED

Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

To enable the fallback, set to true; to disable, set to false.

true

Validation
  1. Get all the resources.

    kubectl get all -n ${SPARK_NAME_SPACE}

    Copy POD ID that you will need for spark-master connection.

  2. Get the cluster info.

    kubectl cluster-info
    

    Copy Kubernetes control plane URL from the above output that we need during spark-shell command, for example ( https://xxxxxxxxxxxxxxxxxxxxxxx.yl4.us-east-1.eks.amazonaws.com).

    When using the URL for EKS_SERVER property in step 4, prefix the property value with k8s://. The following is an example of the property:

    EKS_SERVER="k8s://https://xxxxxxxxxxxxxxxxxxxxxxx.yl4.us-east-1.eks.amazonaws.com"
  3. Connect to Kubernetes master node.

    kubectl -n ${SPARK_NAME_SPACE}exec -it  <POD_ID>  -- bash
    
  4. Set the following properties:

    SPARK_NAME_SPACE="<PLEASE_CHANGE>"
    SPARK_PLUGIN_SERVICE_ACCOUNT="<PLEASE_CHANGE>"
    SPARK_PLUGIN_IMAGE="<PLEASE_CHANGE>"
    SPARK_DOCKER_PULL_SECRET="spark-plugin-docker-hub"
    EKS_SERVER="<PLEASE_CHANGE>"
    JWT_TOKEN="<PLEASE_CHANGE>"
  5. Run the following commands to open spark-shell. The command contains all the setup which is required to open the spark-shell.

    cd /opt/privacera/spark/bin
    ./spark-shell --master ${EKS_SERVER}\
    --deploy-mode client \
    --conf spark.kubernetes.authenticate.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
    --conf spark.kubernetes.namespace=${SPARK_NAME_SPACE}\
    --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
    --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
    --conf spark.kubernetes.container.image=${SPARK_PLUGIN_IMAGE}\
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=${SPARK_DOCKER_PULL_SECRET}\
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"\
    --conf "spark.hadoop.privacera.jwt.oauth.enable=true"\
    --conf spark.driver.bindAddress='0.0.0.0'\
    --conf spark.driver.host=$SPARK_PLUGIN_POD_IP\
    --conf spark.port.maxRetries=4\
    --conf spark.kubernetes.driver.pod.name=$SPARK_PLUGIN_POD_NAME
  6. Run the following command using spark-submit with JWT authentication.

    ./spark-submit \
    --master ${EKS_SERVER}\
    --name spark-cloud-new \
    --deploy-mode cluster \
    --conf spark.kubernetes.authenticate.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
    --conf spark.kubernetes.namespace=${SPARK_NAME_SPACE}\
    --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
    --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=${SPARK_PLUGIN_SERVICE_ACCOUNT}\
    --conf spark.kubernetes.container.image=${SPARK_PLUGIN_IMAGE}\
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=${SPARK_DOCKER_PULL_SECRET}\
    --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}"\
    --conf spark.driver.bindAddress='0.0.0.0'\
    --conf spark.driver.host=$SPARK_PLUGIN_POD_IP\
    --conf spark.port.maxRetries=4\
    --conf spark.kubernetes.driver.pod.name=$SPARK_PLUGIN_POD_NAME\
    --class com.privacera.spark.poc.SparkSample \
    <your-code-jar/file>
    
  7. To check the read access on the S3 file, run the following command in the open spark-shell:

    val df= spark.read.csv("s3a://${S3_BUCKET}/customer_data/customer_data_without_header.csv")
    df.show()
  8. To check the write access on the S3 file, run the following command in the open spark-shell:

    df.write.format("csv").mode("overwrite").save("s3a://${S3_BUCKET}/output/k8s/sample/csv")
  9. Check the Audit logs on the Privacera Portal.

  10. To verify the spark-shell setup, open another SSH connection for Kubernetes cluster and run the following command to check the running pods:

    kubectl get pods -n ${SPARK_NAME_SPACE}

    You will see the spark executor pods -exec-x. For example, spark-shell-xxxxxxxxxxxxxxxx-exec-1 and spark-shell-xxxxxxxxxxxxxxxx-exec-2.

Portal SSO with PingFederate

Privacera portal leverages PingIdentity’s Platform Portal for authentication via SAML. For this integration, there are configuration steps in both Privacera portal and PingIdentity.

Configuration steps for PingIdentity
  1. Sign in to your PingIdentity account.

  2. Under Your Environments , click Administrators.

  3. Select Connections from the left menu.

  4. In the Applications section, click on the + button to add a new application.

  5. Enter an Application Name (such as Privacera Portal SAML) and provide a description (optionally add an icon). For the Application Type, select SAML Application. Then click Configure.

  6. On the SAML Configuration page, under "Provide Application Metadata", select Manually Enter.

  7. Enter the ACS URLs:

    https://<portal_hostname>:<PORT>/saml/SSO

    Enter the Entity ID:

    privacera-portal

    Click the Save button.

  8. On the Overview page for the new application, click on the Attributes edit button. Add the attribute mapping:

    user.login: Username

    Set as Required.

    Note

    If user’s login id is is not the same as the username, for example if user login id is email, this attribute will be considered as username in the portal. The username value would be email with the domain name (@gmail.com) removed. For example "john.joe@company.com", the username would be "john.joe". If there is another attribute which can be used as the username then this value will hold that attribute.

  9. You can optionally add additional attribute mappings:

    user.email: Email Address 
    user.firstName: Given Name
    user.lastName: Family Name
  10. Click the Save button.

  11. Next in your application, select Configuration and then the edit icon.

  12. Set the SLO Endpoint:

    https://<portal_hostname>:<PORT>/login.html

    Click the Save button.

  13. In the Configuration section, under Connection Details, click on Download Metadata button.

  14. Once this file is downloaded, rename it to:

    privacera-portal-aad-saml.xml

    This file will be used in the Privacera Portal configuration.

Configuration steps in Privacera Portal

Now we will configure Privacera Portal using privacera-manager to use the privacera-portal-aad-saml.xml file created in the above steps.

  1. Run the following commands:

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.saml.aad.yml config/custom-vars/
  2. Edit the vars.portal.saml.aad.yml file:

    vi config/custom-vars/vars.portal.saml.aad.yml

    Add the following properties:

    SAML_ENTITY_ID: "privacera-portal"
    SAML_BASE_URL: "https://{{app_hostname}}:{port}"
    PORTAL_UI_SSO_ENABLE: "true"
    PORTAL_UI_SSO_URL: "saml/login"
    PORTAL_UI_SSO_BUTTON_LABEL: "Single Sign On"
    AAD_SSO_ENABLE: "true"
  3. Copy the privacera-portal-aad-saml.xml file to the following folder:

    ~/privacera/privacera-manager/ansible/privacera-docker/roles/templates/custom
  4. Edit the vars.portal.yml file:

    cd ~/privacera/privacera-manager/
    vi config/custom-vars/vars.portal.yml

    Add the following properties and assign your values.

    SAML_EMAIL_ATTRIBUTE: "user.email"
    SAML_USERNAME_ATTRIBUTE: "user.login"
    SAML_LASTNAME_ATTRIBUTE: "user.lastName"
    SAML_FIRSTNAME_ATTRIBUTE: "user.firstName"
  5. Run the following to update privacera-manager:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update

    You should now be able to use Single Sign-on to Privacera using PingFederate.

Trino Open Source
Privacera Plugin in Trino Open Source

Learn how you can use Privacera Manager to generate the setup script and Trino custom configuration for SSL to install Privacera Plugin in an open-source Trino environment.

Privacera Trino supports Trino Open Source with the following catalogs:

  • Hive

  • PostgreSQL DB

  • Redshift

Prerequisites
  • A working Trino environment

  • Privacera services must be up and running.

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands:

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.trino.opensource.yml config/custom-vars/
    vi config/custom-vars/vars.trino.opensource.yml
  3. Edit the following properties. For property details and descriptions, see Table 58, “Trino Open Source Properties.

    TRINO_STANDALONE_ENABLE: "true"
    TRINO_USER_HOME: "<PLEASE_CHANGE>"
    TRINO_INSTALL_DIR_NAME: "<PLEASE_CHANGE>"
  4. Run the following commands:

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update

    After the update is complete, the setup script (privacera_trino_setup.sh) and Trino custom configurations (privacera_trino_plugin_conf.zip) for SSL will be generated at the path, cd ~/privacera/privacera-manager/output/trino-opensource/.

  5. In your Trino environment, do the following:

    1. Copy privacera_trino_setup.sh and privacera_trino_plugin_conf.zip. Both the files should be placed under the same folder.

    2. Add permissions to execute the script.

      chmod +x privacera_trino_setup.sh
    3. Run the script to install the Privacera plugin in your Trino environment.

      ./privacera_trino_setup.sh

Note

To learn more about Trino, see Trino User Guide.

Table Properties for Trino Open Source
Table 58. Trino Open Source Properties

Property

Description

Example

TRINO_OPENSOURCE_ENABLETRINO_OPENSOURCE_ENABLE

Property to enable/disable Trino.

true

TRINO_USER_HOME

Property to set the path to the Trino home directory.

/home/ec2-user

TRINO_INSTALL_DIR_NAME

Property to set the path to the directoy where Trino is installed.

/etc/trino

TRINO_RANGER_SERVICE_REPO

Property to indicate Trino Ranger policy.

privacera_trino

TRINO_AUDITS_URL_EXTERNAL

Solr audit URL or audit server URL.

http://10.100.10.10:8983/solr/ranger_audits

TRINO_RANGER_EXTERNAL_URL

This is a Ranger Admin URL.

/etc/trino

XAAUDIT.SOLR.ENABLE

Enable/Disable solr audit. Set the value to true to enable solr audit.

true

TRINO_HIVE_POLICY_AUTHZ_ENABLED

Enable/Disable Hive policy authorization for the Hive catalog.Set the value to true to use Hive policies to authorize hive catalog queries.

true

TRINO_HIVE_POLICY_REPO_CATALOG_MAPPING

Indicates Hive policy repository and Hive catalog mapping.

Use the following format:

{hive_policy_repo-1}:{comma_separated_hive_catalogs};{hive_policy_repo-2}:{comma_separated_hive_catalogs}

privacera_hive:hiveprivacera_hive:hivecatalog1,

TRINO_RANGER_AUTH_ENABLED

Set the value to true to disable authorization for show catalog query.

true



Migrating from PrestoSQL to Trino

To migrate your existing policies from PrestoSQL to Trino, see Migrating Steps.

Dremio
Introduction

This section covers how you can integrate Dremio with Privacera. You can use Dremio for table-level access control with the native Ranger plugin.

By integrating Dremio with Privacera, you'll be provided with comprehensive data lake security and fine-grained access control across multi-cloud environments. Dremio works directly with data lake storage. Using Dremio's query engine and ability to democratize data access, Privacera implements fine-grained access control policies, then automatically enforces and audits them at enterprise scale.

Dremio is supported with the following data sources:

  • S3

  • ADLS

  • Hive

  • Redshift

Prerequisites

Ensure the following prerequisites are met:

  • A Privacera Manager host where Privacera services are running.

  • A Dremio host where Dremio Enterprise Edition is installed. (The Community Edition is not supported.)

Configuration

To configure Dremio:

Note

There are limitations in the Dremio native Hive plugin because Dremio uses Ranger 1.1.0.

  • Audit Server basic auth needs to be disabled because it's not supported.

  • Dremio does not support solr audits in SSL if it is enabled in the audit server.

  1. Run the following commands:

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dremio.yml config/custom-vars/
    
  2. Update the following properties:

    AUDITSERVER_ENABLE: "true"
    AUDITSERVER_AUTH_TYPE: "none"
    AUDITSERVER_SSL_ENABLE: "false"
  3. Run the following commands to configure the audit server for Dremio Native Hive Ranger Based authorization..

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.auditserver.yml config/custom-vars/ 
    vi config/custom-vars/vars.auditserver.yml

    After the update is completed, the Dremio plugin installation script privacera_dremio.sh and custom configuration archive privacera_custom_conf.tar.gz is generated at the location ~/privacera/privacera-manager/output/dremio

  4. Configure Privacera plugin depending on how you have installed Dremio in your instance.

    For a new or existing data source configured in Dremio Data Lake, ensure Enable external authorization plugin checkbox under Settings > Advanced Options of the data source is selected in the Dremio UI.

  5. Restart the Dremio service.

Kubernetes

Depending on your cloud provider, you can set up Dremio in a Kubernetes container. For more information, see the following links.

After setting up Dremio, perform the following steps to deploy Privacera plugin. The steps assume that your Privacera Manager host instance is separate from your Dremio Kubernetes instance. If they are configured on the single instance, then modify the steps accordingly.

  1. SSH to your instance where Dremio is installed containing the Dremio Kubernetes artifacts and change to the dremio-cloud-tools/charts/dremio_v2/ directory.

  2. Copy the privacera_dremio.sh and privacera_custom_conf.tar.gz files from your Privacera Manager host instance to the dremio_v2 folder in your Dremio Kubernetes instance.

  3. Run the following commands:

    mkdir -p privacera_config 
    mv privacera_dremio.sh privacera_config/ 
    mv privacera_custom_conf.tar.gz privacera_config/
  4. Update configmap.yml to add new configmap for Privacera configuration.

    vi templates/dremio-configmap.yaml
  5. Add the following configuration at the start of the file:

    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: dremio-privacera-install
    data:
    privacera_dremio.sh: |- {{ .Files.Get "privacera_config/privacera_dremio.sh" | nindent 4 }}
    binaryData:
    privacera_custom_conf.tar.gz: {{ .Files.Get "privacera_config/privacera_custom_conf.tar.gz" | b64enc | nindent 4 }}
    ---
  6. Update dremio-env to add Privacera jars and configuration in the Dremio classpath.

    vi config/dremio-env
  7. Add the following variable, or update it if it already exists:

    DREMIO_EXTRA_CLASSPATH=/opt/privacera/conf:/opt/privacera/dremio-ext-jars/*
  8. Update values.yaml.

    vi values.yaml
            
  9. Add the following configuration for extraInitContainers inside the coordinator section:

    extraInitContainers:  |
        - name: install-privacera-dremio-plugin
        image: {{.Values.image}}:{{.Values.imageTag}}
        imagePullPolicy: IfNotPresent
        securityContext:
            runAsUser: 0
        volumeMounts:
        - name: dremio-privacera-plugin-volume
            mountPath: /opt/dremio/plugins/authorizer
        - name: dremio-ext-jars-volume
            mountPath: /opt/privacera/dremio-ext-jars
        - name: dremio-privacera-config
            mountPath: /opt/privacera/conf/
        - name: dremio-privacera-install
            mountPath: /opt/privacera/install/
        command:
            - "bash"
            - "-c"
            - "cd /opt/privacera/install/ && cp * /tmp/ && cd /tmp && ./privacera_dremio.sh"
  10. Update or uncomment the extraVolumes section inside the coordinator section and add the following configuration:

    extraVolumes:
    - name: dremio-privacera-install
        configMap:
        name: dremio-privacera-install
        defaultMode: 0777
    - name: dremio-privacera-plugin-volume
        emptyDir: {}
    - name: dremio-ext-jars-volume
        emptyDir: {}
    - name: dremio-privacera-config
        emptyDir: {}
  11. Update or uncomment the extraVolumeMounts section inside the coordinator section and add the following configuration:

    extraVolumeMounts:
    - name: dremio-ext-jars-volume
        mountPath: /opt/privacera/dremio-ext-jars
    - name: dremio-privacera-plugin-volume
        mountPath: /opt/dremio/plugins/authorizer
    - name: dremio-privacera-config
        mountPath: /opt/privacera/conf
  12. Upgrade your Helm release. Get the release name by running helm list command. The text under the Name column is your Helm release.

    helm upgrade -f values.yaml <release-name>
RPM

To deploy RPM:

  1. SSH to your instance where Dremio RPM is installed.

  2. Copy the privacera_dremio.sh and privacera_custom_conf.tar.gz files from your Privacera Manager host instance to the Home folder in your Dremio instance.

  3. Rum the following commands:

    mkdir -p ~/privacera/install
    mv privacera_dremio.sh ~/privacera/install
    mv privacera_custom_conf.tar.gz ~/privacera/install
  4. Launch the privacera_dremio.sh script.

    cd ~/privacera/install
    chmod +x privacera_dremio.sh
    sudo ./privacera_dremio.sh
  5. Update dremio-env to add Privacera jars and configuration in the Dremio classpath.

    vi ${DREMIO_HOME}/conf/dremio-env
  6. Add the following variable, or update it if it already exists:

    DREMIO_EXTRA_CLASSPATH=/opt/privacera/conf:/opt/privacera/dremio-ext-jars/*
  7. Restart Dremio.

    sudo service dremio restart
AWS EMR

This topic shows how to configure AWS EMR with Privacera using Privacera Manager.

Configuration

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.emr.yml config/custom-vars/
    vi config/custom-vars/vars.emr.yml
  3. Edit the following properties.

    Property

    Description

    Example

    EMR_ENABLE

    Enable EMR template creation.

    true

    EMR_CLUSTER_NAME

    Define a unique name for the EMR cluster.

    Privacera-EMR

    EMR_CREATE_SG

    Set this to true if you don't have existing security groups and want Privacera Manager to take care of adding security group creation steps in the EMR CF template.

    false

    EMR_MASTER_SG_ID

    If EMR_CREATE_SG is false, set this property. Security Group ID for EMR Master Node Group.

    sg-xxxxxxx

    EMR_SLAVE_SG_ID

    If EMR_CREATE_SG is false, set this property. Security Group ID for EMR Slave Node Group.

    sg-xxxxxxx

    EMR_SERVICE_ACCESS_SG_ID

    If EMR_CREATE_SG is false, set this property. Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a Private Network.

    sg-xxxxxxx

    EMR_SG_VPC_ID

    If EMR_CREATE_SG is true, set this property. VPC ID in which you want to create the EMR Cluster.

    vpc-xxxxxxxxxxx

    EMR_MASTER_SG_NAME

    If EMR_CREATE_SG is true, set this property. Security Group Name for EMR Master Node Group. The security group name will be added to the emr-template.json.

    priv-master-sg

    EMR_SLAVE_SG_NAME

    If EMR_CREATE_SG is true, set this property. Security Group Name for EMR Slave Node Group. The security group name will be added to the emr-template.json.

    priv-slave-sg

    EMR_SERVICE_ACCESS_SG_NAME

    If EMR_CREATE_SG is true, set this property. Security Group Name for EMR ServiceAccessSecurity. The security group name will be added to the emr-template.json. Fill this property only if you are creating EMR in a Private Network.

    priv-private-sg

    EMR_SUBNET_ID

    Subnet ID

    EMR_KEYPAIR

    An existing EC2 key pair to SSH into the master node of the cluster.

    privacera-test-pair

    EMR_EC2_MARKET_TYPE

    Set market type as SPOT or ON_DEMAND.

    SPOT

    EMR_EC2_INSTANCE_TYPE

    Set the instance type. Instances can be of different types such as m5.xlarge, r5.xlarge and so on.

    m5.large

    EMR_MASTER_NODE_COUNT

    Node count for Master. The number of nodes can be 1, 2 and so on.

    1

    EMR_CORE_NODE_COUNT

    Node count for Core. The number of cores can be 1, 2 and so on.

    1

    EMR_VERSION

    Version of EMR.

    emr-x.xx.x

    EMR_EC2_DOMAIN

    Domain used by the nodes. It depends on EMR Region, for example, ".ec2.internal" is for us-east-1.

    .ec2.internal

    EMR_USE_STS_REGIONAL_ENDPOINTS

    Set the property to enable/disable regional endpoints for S3 requests.

    Default value is false.

    true

    EMR_TERMINATION_PROTECT

    Set to enable/disable termination protection.

    true

    EMR_LOGS_PATH

    S3 location for storing EMR logs.

    s3://privacera-logs-bucket/

    EMR_KERBEROS_ENABLE

    Set to true if you want to enable kerberization on EMR.

    false

    EMR_KDC_ADMIN_PASSWORD

    If EMR_KERBEROS_ENABLE is true, set this property. The password used within the cluster for the kadmin service.

    EMR_CROSS_REALM_PASSWORD

    If EMR_KERBEROS_ENABLE is true, set this property. The cross-realm trust principal password, which must be identical across realms.

    EMR_SECURITY_CONFIG

    Name of the Security Configurations created for EMR. This can be a pre-created configuration, or Privacera Manager can generate a template through which you can create this configuration.

    EMR_KERB_TICKET_LIFETIME

    Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire.

    EMR_KERB_TICKET_LIFETIME: 24

    EMR_KERB_REALM

    Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The Kerberos realm name for the other realm in the trust relationship.

    EMR_KERB_DOMAIN

    Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The domain name of the other realm in the trust relationship.

    EMR_KERB_ADMIN_SERVER

    Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The fully qualified domain name (FQDN) and an optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used.

    EMR_KERB_KDC_SERVER

    Set this property if you want Privacera Manager to create CF template for creating security configuration and EMR_KERBEROS_ENABLE is true. The fully qualified domain name (FQDN) and an optional port for the KDC in the other realm. If a port is not specified, 88 is used.

    EMR_AWS_ACCT_ID

    AWS Account ID where EMR Cluster resides

    9999999

    EMR_DEFAULT_ROLE

    Default role attached to EMR Cluster for performing cluster-related activities. This should be a pre-created role.

    EMR_DefaultRole

    EMR_ROLE_FOR_CLUSTER_NODES

    The IAM Role will be attached to each node in the EMR Cluster.

    This should have only minimal permissions for downloading the privacera_cust_conf.zip and basic EMR capabilities. It can be an existing one, if not, you can use the IAM role CF template to generate it after the Privacera Manager update.

    restricted_node_role

    EMR_USE_SINGLE_ROLE_FOR_APPS

    If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. Create a Single IAM Role that will be used by All EMR Applications.

    true

    EMR_ROLE_FOR_APPS

    If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. IAM Role name which will be used by all EMR Apps

    app_data_access_role

    EMR_ROLE_FOR_SPARK

    If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. Create multiple IAM Roles to be used by specific applications. Set EMR_USE_SINGLE_ROLE_FOR_APPS to be false. IAM Role name which will be used by Spark Application (Dataserver) for data access.

    spark_data_access_role

    EMR_ROLE_FOR_HIVE

    If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. IAM Role name which will be used by Hive Application for data access.

    hive_data_access_role

    EMR_ROLE_FOR_PRESTO

    If you want Privacera Manager to generate a CF template for IAM roles configuration, set this property. IAM Role name which will be used by Presto Application for data access.

    presto_data_access_role

    EMR_HIVE_METASTORE

    Metastore type. e.g. "glue", "hive" (For external hive-metastore)

    glue

    EMR_HIVE_METASTORE_PATH

    S3 location for hive metastore

    s3://hive-warehouse

    EMR_HIVE_METASTORE_CONNECTION_URL

    If EMR_HIVE_METASTORE is hive, set this property. JDBC Connection URL for connecting to hive.

    jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true

    EMR_HIVE_METASTORE_CONNECTION_DRIVER

    If EMR_HIVE_METASTORE is hive, set this property. JDBC Driver Name

    org.mariadb.jdbc.Driver

    EMR_HIVE_METASTORE_CONNECTION_USERNAME

    If EMR_HIVE_METASTORE is hive, set this property. JDBC UserName

    hive

    EMR_HIVE_METASTORE_CONNECTION_PASSWORD

    If EMR_HIVE_METASTORE is hive, set this property. JDBC Password

    StRong@PassW0rd

    EMR_HIVE_SERVICE_NAME

    Custom hive service name for hive application in EMR

    teamA_policy

    EMR_TRINO_HIVE_SERVICE_NAME

    Custom hive service name for trino application in EMR

    teamB_policy

    EMR_SPARK_HIVE_SERVICE_NAME

    Custom hive access service name for spark applications in EMR

    teamC_policy

    EMR_APP_SPARK_OLAC_ENABLE

    To install Spark application with Privacera plugin, set the property to true. OLAC is known as Object Level Access Control.

    Note:

    • Recommended when complete access control on the objects in AWS S3 is required.

    • When the property is set to true, s3 and s3n protocols will not be supported on EMR clusters while running Spark queries.

    true

    EMR_APP_SPARK_FGAC_ENABLE

    To install Spark application with Privacera plugin, set the property to true. FGAC is known as Fine Grained Access Control for Table and Column.

    Note: Recommended for compliance purposes, since the whole cluster will still have direct access to AWS S3 data.

    false

    EMR_APP_PRESTO_DB_ENABLE

    To install PrestoDB application with Privacera plugin, set the property to true.

    PrestoDB and Trino are mutually exclusive. Only one should be enabled at a time.

    false

    EMR_APP_PRESTO_SQL_ENABLE

    To install Trino application with Privacera plugin, set the property to true.

    PrestoDB and Trino are mutually exclusive. Only one should be enabled at a time.

    Note: Trino is supported for EMR versions 6.1.0 and higher.

    Note: If the EMR version is 6.4.0, setting this flag installs the Trino plugin.

    false

    EMR_APP_HIVE_ENABLE

    To install Hive application with Privacera plugin, set the property to true.

    true

    EMR_APP_ZEPPELIN_ENABLE

    To install Zeppelin application, set the property to true.

    true

    EMR_APP_LIVY_ENABLE

    To install Livy application, set the property to true.

    true

    EMR_CUST_CONF_ZIP_PATH

    A path where the privacera_cust_conf.zip file will be placed should be added. Privacera Manager will generate a privacera_cust_conf.zip under ~/privacera/privacera-manager/output/emr folder. This privacera_cust_conf.zip needs to be placed at an s3 or any https location from which the EMR cluster can download it.

    s3://privacera-artifacts/

    EMR_SPARK_ENABLE_VIEW_LEVEL_ACCESS_CONTROL

    Set the property to true to enable view-level column masking and row filter for SparkSQL. The property can be used only when you set EMR_APP_SPARK_FGAC_ENABLE to true.

    To learn how to use view-level access control in Spark, click here.

    false

    EMR_RANGER_IS_FALLBACK_SUPPORTED

    Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

    To enable the fallback, set to true; to disable, set to false.

    true

    EMR_SPARK_DELTA_LAKE_ENABLE

    Set this property to true to enable Delta Lake on EMR Spark.

    true

    EMR_SPARK_DELTA_LAKE_CORE_JAR_DOWNLOAD_URL

    Download URL of Delta Lake core JAR. The Delta Lake core JAR has dependency with Spark version.

    You have to find the appropriate version for your EMR. See Delta Lake compatibility with Apache Spark.

    Get the appropriate Delta Lake core JAR download link and update the property. See Delta Core.

    For example, for Spark version 3.1.x, the download URL is https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.1/delta-core_2.12-1.0.1.jar.

    https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.1/delta-core_2.12-1.0.1.jar

    If your cluster was running while External Hive Metastore was down, and you are unable to connect to it, restart the following three servers.

    sudo systemctl restart hive-hcatalog-server
    sudo systemctl restart hive-server2
    sudo systemctl restart presto-server
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update

    After the update is finished, all the cloud-formation JSON template files and privacera_cust_conf.zip will be available at the path, ~/privacera/privacera-manager/output/emr.

  5. Configure and run the following in AWS instance where Privacera is installed.

    1. (Optional) Create IAM roles using the emr-roles-creation-template.json template. Run the following command.

      aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-role-creation --template-body file://emr-roles-creation-template.json --capabilities CAPABILITY_NAMED_IAM

      Note

      This will create IAM roles with minimal permissions. You can add bucket permissions into respective IAM roles as per your requirements.

    2. (Optional) Create Security Configurations using the emr-security-config-template.json template. Run the following command.

      aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-security-config-creation  --template-body file://emr-security-config-template.json
    3. Confirm the privacera_cust_conf.zip file has been copied to the location specified in EMR_CUST_CONF_ZIP_PATH.

    4. Create EMR using the emr-template.json template. Run the following command.

      aws --region <AWS-REGION> cloudformation create-stack --stack-name privacera-emr-creation  --template-body file://emr-template.json

      Note

      If you are upgrading EMR to version 6.4 and higher from EMR version <=6.3 to use Trino plug-in, then you must re-create the EMR security configuration based on the new template generated via PM since the security configuration has trino user newly added

Note

  • For PrestoDB, secrets encryption of Solr authentication password is not supported. However, the properties file where the password resides is accessible only to the presto service user, hence it is invulnerable.

  • If your cluster was running while External Hive Metastore was down, and you are unable to connect to it, restart the following three servers:

    sudo systemctl restart hive-hcatalog-server
    sudo systemctl restart hive-server2
    sudo systemctl restart presto-server
    
AWS EMR with Native Apache Ranger

AWS EMR provides native Apache Ranger integration with the open source Apache Ranger plugins for Apache Spark and Hive. By connecting EMR’s native Ranger with Privacera’s Ranger-based data access governance, it gives the following key advantages:

  • Companies will have the ability to sync their existing policies with their EMR solution.

  • Extend Apache Ranger’s open source capabilities to take advantage of Privacera’s centralized enterprise-ready solution.

Note

Supported EMR version: 5.32 and above in EMR 5.x series.

Prerequisites

AWS Secrets are required for the following to store the Ranger Admin and Ranger plugin certificates.

  • ranger-admin-pub-cert

  • ranger-plugin-private-keypair

To create the two secrets in AWS Secret Manager, do the following:

  1. Login to AWS console and navigate to Secrets Manager and then click Store a new secret option.

  2. Select secret type as Other type of secrets and then go to the Plaintext tab. Keep the Default value unchanged. The actual value for this secret will be obtained after the installation is done.

  3. Select the encryption key as per your requirement.

  4. Click Next.

  5. Under Secret name, type a name for the secret in the text field. For example: ranger-admin-pub-cert, ranger-plugin-private-keypair.

  6. Click Next. The Configure automatic rotation page is displayed.

  7. Click Next.

  8. On the Review page, you can check your secret settings and then click Store to save your changes.

    The Secret is stored successfully.

Configuration

  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.emr.native.ranger.yml config/custom-vars/
    vi config/custom-vars/vars.emr.native.ranger.yml
    
  3. Edit the following properties.

    Property

    Description

    Example

    EMR_NATIVE_ENABLE

    Property to enable EMR native Ranger integration.

    EMR_NATIVE_ENABLE: "true"

    Properties for EMR Specifications

    EMR_NATIVE_CLUSTER_NAME

    Name of the EMR Cluster.

    EMR_NATIVE_CLUSTER_NAME: "Privacera-EMR-Native-Ranger"

    EMR_NATIVE_AWS_REGION

    AWS Region where the cluster will reside.

    EMR_NATIVE_AWS_REGION: "{{AWS_REGION}}"

    EMR_NATIVE_AWS_ACCT_ID

    AWS Account ID where the EMR Cluster and its resources will reside.

    EMR_NATIVE_AWS_ACCT_ID: "587946681758"

    EMR_NATIVE_SUBNET_ID

    Subnet ID where the EMR Cluster nodes will reside.

    EMR_NATIVE_SUBNET_ID: ""

    EMR_NATIVE_KEYPAIR

    An existing EC2 key pair to SSH into the node of cluster

    EMR_NATIVE_KEYPAIR: "privacera-test-pair"

    EMR_NATIVE_EC2_MARKET_TYPE

    Market Type for the EMR Cluster nodes. For example, SPOT or ON_DEMAND.

    EMR_NATIVE_EC2_MARKET_TYPE: "SPOT"

    EMR_NATIVE_EC2_INSTANCE_TYPE

    Instance Type for the EMR Cluster nodes.

    EMR_NATIVE_EC2_INSTANCE_TYPE: "m5.2xlarge"

    EMR_NATIVE_MASTER_NODE_COUNT

    Node count for Master.

    EMR_NATIVE_MASTER_NODE_COUNT: "1"

    EMR_NATIVE_CORE_NODE_COUNT

    Node count for Core.

    EMR_NATIVE_CORE_NODE_COUNT: "1"

    EMR_NATIVE_VERSION

    EMR Native Ranger integation is supported from 5.32 and above.

    EMR_NATIVE_VERSION: "emr-5.32.0"

    EMR_NATIVE_TERMINATION_PROTECT

    To enable termination protection.

    EMR_NATIVE_TERMINATION_PROTECT: "true"

    EMR_NATIVE_LOGS_PATH

    S3 location for EMR logs storage.

    EMR_NATIVE_LOGS_PATH: "s3://privacera-emr/logs"

    Properties to configure EMR Security Group

    EMR_NATIVE_CREATE_SG

    Set this to true, if you don't have existing security groups and want Privacera Manager to take care of adding security groups creation steps in EMR CloudFormation Template.

    EMR_NATIVE_CREATE_SG: "false"

    If EMR_NATIVE_CREATE_SG is false, fill the following properties with existing security group ids:

    EMR_NATIVE_MASTER_SG_ID

    Security Group ID for EMR Master Node Group.

    EMR_NATIVE_MASTER_SG_ID: "sg-xxxxxxx"

    EMR_NATIVE_SLAVE_SG_ID

    Security Group ID for EMR Slave Node Group.

    EMR_NATIVE_SLAVE_SG_ID: "sg-xxxxxxx"

    EMR_NATIVE_SERVICE_ACCESS_SG_ID

    Security Group ID for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network.

    EMR_NATIVE_SERVICE_ACCESS_SG_ID: "sg-xxxxxxx"

    If EMR_NATIVE_CREATE_SG is true, fill the following properties to give security group names for new groups which will be added in emr-template.json :

    EMR_NATIVE_SG_VPC_ID

    VPC ID in which you want to create the EMR Cluster.

    EMR_NATIVE_SG_VPC_ID: "vpc-xxxxxxxxxxx"

    EMR_NATIVE_MASTER_SG_NAME

    Security Group Name for EMR Master Node Group.

    EMR_NATIVE_MASTER_SG_NAME: "priv-master-sg"

    EMR_NATIVE_SLAVE_SG_NAME

    Security Group Name for EMR Slave Node Group.

    EMR_NATIVE_SLAVE_SG_NAME: "priv-slave-sg"

    EMR_NATIVE_SERVICE_ACCESS_SG_NAME

    Security Group Name for EMR ServiceAccessSecurity. Fill this property only if you are creating EMR in a private network.

    EMR_NATIVE_SERVICE_ACCESS_SG_NAME: "priv-private-sg"

    EMR_NATIVE_SECURITY_CONFIG

    Name of the security configurations created for EMR. This can be an existing configuration or Privacera Manager can generate a template through which new configurations can be created. The new template will be available at ~/privacera/privacera-manager/output/emr/emr-native-sec-config-template.json after you run the Privacera Manager update command.

    EMR_NATIVE_SECURITY_CONFIG: ""

    Properties for EMR Hive Metastore

    EMR_NATIVE_HIVE_METASTORE

    Metastore type. For example, internal, hive (For external hive-metastore)

    EMR_NATIVE_HIVE_METASTORE: "hive"

    EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH

    S3 location for Hive metastore warehouse

    EMR_NATIVE_HIVE_METASTORE_WAREHOUSE_PATH: "s3://hive-warehouse"

    Fill the following properties, if EMR_NATIVE_HIVE_METASTORE is hive:

    EMR_NATIVE_METASTORE_CONNECTION_URL

    JDBC Connection URL for connecting to Hive Metastore.

    EMR_NATIVE_METASTORE_CONNECTION_URL: jdbc:mysql://<jdbc-host>:3306/<hive-db-name>?createDatabaseIfNotExist=true

    EMR_NATIVE_METASTORE_CONNECTION_DRIVER

    JDBC Driver Name

    EMR_NATIVE_METASTORE_CONNECTION_DRIVER: "org.mariadb.jdbc.Driver"

    EMR_NATIVE_METASTORE_CONNECTION_USERNAME

    JDBC UserName

    EMR_NATIVE_METASTORE_CONNECTION_USERNAME: "hive"

    EMR_NATIVE_METASTORE_CONNECTION_PASSWORD

    JDBC Password

    EMR_NATIVE_METASTORE_CONNECTION_PASSWORD: "StRong@PassWord"

    Properties of Kerberos Server

    EMR_NATIVE_KDC_ADMIN_PASSWORD

    The password used within the cluster for the kadmin service.

    EMR_NATIVE_KDC_ADMIN_PASSWORD: ""

    EMR_NATIVE_CROSS_REALM_PASSWORD

    The cross-realm trust principal password, which must be identical across realms.

    EMR_NATIVE_CROSS_REALM_PASSWORD: ""

    EMR_NATIVE_KERB_TICKET_LIFETIME

    The period for which a Kerberos ticket issued by the cluster’s KDC is valid. Cluster applications and services auto-renew tickets after they expire.

    EMR_NATIVE_KERB_TICKET_LIFETIME: 24

    EMR_NATIVE_KERB_REALM

    The Kerberos realm name for the other realm in the trust relationship.

    EMR_NATIVE_KERB_REALM: ""

    EMR_NATIVE_KERB_DOMAIN

    The domain name of the other realm in the trust relationship.

    EMR_NATIVE_KERB_DOMAIN: ""

    EMR_NATIVE_KERB_ADMIN_SERVER

    The fully qualified domain name (FQDN) and optional port for the Kerberos admin server in the other realm. If a port is not specified, 749 is used.

    EMR_NATIVE_KERB_ADMIN_SERVER: ""

    EMR_NATIVE_KERB_KDC_SERVER

    The fully qualified domain name (FQDN) and optional port for the KDC in the other realm. If a port is not specified, 88 is used.

    EMR_NATIVE_KERB_KDC_SERVER: ""

    Properties of Certificates Secrets

    EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN

    Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger plugin key-pair. This is the secret created in the Prerequisites step above.

    EMR_NATIVE_RANGER_PLUGIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-plugin-key-pair-ixZbO2"

    EMR_NATIVE_RANGER_ADMIN_SECRET_ARN

    Full ARN of AWS secret [stored in AWS Secrets Manager] for Ranger admin public certificate. This is the secret created in the Prerequisites step above.

    EMR_NATIVE_RANGER_ADMIN_SECRET_ARN: "arn:aws:secretsmanager:us-east-1:99999999999:secret:ranger-admin-public-cert-ixfCO5"

    Properties of EMR application

    EMR_NATIVE_APP_SPARK_ENABLE

    Installs Spark application with EMR native Ranger plugin, if set to true.

    EMR_NATIVE_APP_SPARK_ENABLE: "true"

    EMR_NATIVE_APP_HIVE_ENABLE

    Installs Hive application with EMR native Ranger plugin, if set to true.

    EMR_NATIVE_APP_HIVE_ENABLE: "true"

    EMR_NATIVE_APP_ZEPPELIN_ENABLE

    Installs Zeppelin application, if set to true.

    EMR_NATIVE_APP_ZEPPELIN_ENABLE: "true"

    EMR_NATIVE_APP_LIVY_ENABLE

    Installs Livy application, if set to true.

    EMR_NATIVE_APP_LIVY_ENABLE: "true"

    Properties of IAM Role Configuration

    EMR_NATIVE_DEFAULT_ROLE

    Default role attached to EMR cluster for performing cluster related activities. This should be an existing role.

    EMR_NATIVE_DEFAULT_ROLE: "EMR_DefaultRole"

    EMR_NATIVE_INSTANCE_ROLE

    The IAM Role which will be attached to each node in the EMR Cluster. This should have only minimal permissions for basic EMR functionalities.

    EMR_NATIVE_INSTANCE_ROLE: "restricted_instance_role"

    EMR_NATIVE_DATA_ACCESS_ROLE

    This role provides credentials for trusted execution engines, such as Apache Hive and AWS EMR Record Server AWS EMR Components, to access AWS S3 data. Use this role only to access AWS S3 data, including any KMS keys, if you are using S3 SSE-KMS.

    EMR_NATIVE_DATA_ACCESS_ROLE: "emr_native_data_access_role"

    EMR_NATIVE_USER_ACCESS_ROLE

    This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to AWS S3 data, unless its data that should be accessible by all users.

    EMR_NATIVE_USER_ACCESS_ROLE: "emr_native_user_access_role"

    Properties to send EMR Ranger Engines Audits to Solr

    EMR_NATIVE_ENABLE_SOLR_AUDITS

    Enable audits to Solr.

    EMR_NATIVE_ENABLE_SOLR_AUDITS: "true"

    AUDITSERVER_AUTH_TYPE

    EMR Native Ranger Audits Frameworks does not support basic authentication, hence this needs to be disabled. This property needs to changed in vars.auditserver.yml, if already existing.

    AUDITSERVER_AUTH_TYPE: "none"

    AUDITSERVER_SSL_ENABLE

    Incase of self-signed SSL, EMR native Ranger does not support SSL for Solr audits. Hence, AuditServer SSL should be disabled.

    AUDITSERVER_SSL_ENABLE: "false"

    EMR_NATIVE_CLOUDWATCH_GROUPNAME

    Add a CloudWatch LogGroup to push Ranger Audits. This should be an existing Group.

    EMR_NATIVE_CLOUDWATCH_GROUPNAME: "emr_privacera_native_logs"

    Note

    You can also add custom properties that are not included by default. See EMR.

  4. Run the following commands.

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
    
  5. Once update is done, all the CloudFormation JSON template files will be available at ~/privacera/privacera-manager/output/emr-native-ranger path.

  6. Run the following command in the AWS instance where Privacera is installed.

    cd ~/privacera/privacera-manager/output/emr-native-ranger
    
  7. Create the certificates which needs to be added in AWS Secrets Manager.

    You will get multiple prompts to enter the keystore password. Use the property value of RANGER_PLUGIN_SSL_KEYSTORE_PASSWORD set in ~/privacera/privacera-manager/config/custom-vars/vars.ssl.yml for each prompt.

    1. Run the following command.

      ./emr-native-create-certs.sh
      

      This will create the following two files. You need to update the secrets in both the files, which was created in the Prerequisites section above:

      • ranger-admin-pub-cert.pem

      • ranger-plugin-keypair.pem

    2. Display the contents of the ranger-admin-pub-cert.pem file.

      cat ranger-admin-pub-cert.pem
      

      Select the file contents and then right-click in the terminal to copy the contents.

    3. Login to AWS console and navigate to Secrets Manager and then click ranger-admin-pub-cert.

    4. Navigate to Secret value section and then go to Retrieve Secret Value > Edit > Plaintext.

    5. Replace the secrets with the new value, which you copied in step 2.

    6. Similarly, follow the steps b-e above to display the file contents of ranger-plugin-keypair.pem and use the contents to replace the value of the ranger-plugin-private-keypair secrets in the AWS Secrets Manager.

  8. (Optional) Create IAM roles using the emr-native-role-creation-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-role-creation --template-body file://emr-native-role-creation-template.json --capabilities CAPABILITY_NAMED_IAM
    

    Note

    For giving access to data for Apache Hive and Apache Spark services, navigate to IAM Management in your AWS Console and add required S3 policies in the EMR_NATIVE_DATA_ACCESS_ROLE.

  9. (Optional) Create Security Configurations using the emr-native-sec-config-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-security-config-creation  --template-body file://emr-native-sec-config-template.json
    
  10. Create EMR using the emr-native-template.json template.

    aws --region <AWS_REGION> cloudformation create-stack --stack-name privacera-emr-native-creation  --template-body file://emr-native-template.json
    
GCP Dataproc
Privacera plugin in Dataproc

This section covers how you can use Privacera Manager to generate the setup script and Dataproc custom configuration to install Privacera Plugin in the GCP Dataproc environment.

Prerequisites

Ensure the following prerequisites are met:

  • A working Dataproc environment.

  • Privacera services must be up and running.

Configuration

  1. SSH to the instance where Privacera is installed.

  2. Run the following command:

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataproc.yml config/custom-vars/
    vi config/custom-vars/vars.dataproc.yml                          
  3. Edit the following properties:

    Property

    Description

    Example

    DATAPROC_ENABLE

    Enable Dataproc template creation.

    true

    DATAPROC_MANAGE_INIT_SCRIPT

    Set this property to upload the init script to GCP Cloud Storage.

    If the value is set to true, then Privacera will upload the init script to the GCP bucket.

    If the value is set to false, then manually upload the init script to a GCP bucket.

    false

    DATAPROC_PRIVACERA_GS_BUCKET

    Enter the GCP bucket name where the init script will be uploaded.

    gs://privacera-bucket

    DATAPROC_RANGER_IS_FALLBACK_SUPPORTED

    Use the property to enable/disable the fallback behavior to the privacera_files and privacera_hive services. It confirms whether the resources files should be allowed/denied access to the user.

    To enable the fallback, set to true; to disable, set to false.

    true

  4. Run the update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update                           

    After the update is complete, the setup script setup_dataproc.sh and Dataproc custom configurations privacera_cust_conf.zip will be generated at the path, ~/privacera/privacera-manager/output/dataproc.

  5. If DATAPROC_MANAGE_INIT_SCRIPT is set to false, then copy setup_dataproc.sh and privacera_cust_conf.zip. Both the files should be placed under the same folder.

    cd ~/privacera/privacera-manager/output/dataproc
    GS_BUCKET=<PLEASE_CHANGE>
    gsutil cp setup_dataproc.sh gs://${GS_BUCKET}/privacera/dataproc/init/
    gsutil cp privacera_cust_conf.zip gs://${GS_BUCKET}/privacera/dataproc/init/                          
  6. SSH to the instance where the master node of the Dataproc is installed. Then, enter the GCP bucket name and run the setup script.

    sudo su - 
    mkdir -p /opt/privacera/downloads
    cd /opt/privacera/downloads
    GS_BUCKET=privacera-dev
    gsutil cp gs://${GS_BUCKET}/privacera/dataproc/init/setup_dataproc.sh .
    chmod +x setup_dataproc.sh
    ./setup_dataproc.sh                           
Starburst Enterprise
Starburst Enterprise with Privacera

Using Privacera in Starburst Enterprise LTS, you can enforce system-wide access control. The following information can help provide an expedient way of configuring Starburst Enterprise with port 8443 for TLS/HTTPS so that usernames/passwords are possible. Self-signed certificates work well for testing purposes, but not to be used for production deployments.

Prerequisites

The following items need to be enabled/shared prior to deploying a Starburst Docker image:

  • A licensed version of Starburst

  • Docker-ce 18+ must be installed

  • JDK 11 (to generate the Java keystore)

  • Privacera Manager version 4.7 or higher

  • JDBC URL to connect to the Starburst Enterprise instance to access the catalogs and schemas

  • CA-signed SSL certificate for production deployment.

Configuring Privacera Plugin with Starburst Enterprise

Summary of steps:

  1. Generate an access-control file for Starburst.

  2. Generate an access-control file for Hive catalogs [optional].

  3. Generate a Ranger Audit XML file.

  4. Generate a Ranger SSL XML file required for TLS secure Privacera installations.

To configure Privacera plugin:

  1. To enable Privacera for authorization, you need to update the etc/config.properties with one of the following entries:

    # privacera auth for hive and system access control
    access-control.config-files=/etc/starburst/access-control-privacera.properties,/etc/starburst/access-control-priv-hive.properties
    

    Or

    # privacera auth for only system access control
    access-control.config-files=/etc/starburst/access-control-privacera.properties
    
  2. Edit etc/access-control-privacera.properties. The following is an example of the properties. You need to configure the properties in the file, so that it points to the instance where Privacera is installed. Replace <PRIVACERA_HOST_INSTANCE_IP> with the IP address of Privacera host.

    access-control.name=privacera-starburst
    ranger.policy-rest-url=http://<PRIVACERA_HOST_INSTANCE_IP>:6080
    ranger.service-name=privacera_starburstenterprise
    ranger.username=admin
    ranger.password=welcome1
    ranger.policy-refresh-interval=3s
    ranger.config-resources=/etc/starburst/ranger-hive-audit.xml
    ranger.policy-cache-dir=/etc/starburst/tmp/ranger
    

    To install this file into the Docker container, you can add option to your container creation script:

    -v $DOCKER_HOME/$STARBURST_VERSION/etc/access-control-privacera.properties:$STARBURST_TGT/access-control-privacera.properties \
  3. Edit etc/access-control-priv-hive.properties. The following is an example of the properties. You need to configure the properties in the file, so that it points to the instance where Privacera is installed. Replace <PRIVACERA_HOST_INSTANCE_IP> with the IP address of Privacera host. Similarly, you need to configure the properties of the comma-separated files such as Hive, Glue, Delta, and so on.

    This file is optional if you are not configuring Hive catalogs with privacera_hive policies.

    access-control.name=privacera
    ranger.policy-rest-url=http://<PRIVACERA_HOST_INSTANCE_IP>:6080
    ranger.service-name=privacera_hive
    privacera.catalogs=hive,glue
    ranger.username=admin
    ranger.password=welcome1
    ranger.policy-refresh-interval=3s
    ranger.config-resources=/etc/starburst/ranger-hive-audit.xml
    ranger.policy-cache-dir=/etc/starburst/tmp/ranger
    privacera.fallback-access-control=allow-all
    
  4. To install this file into the Docker container, you can add option to your container creation script:

    -v $DOCKER_HOME/$STARBURST_VERSION/etc/access-control-priv-hive.properties:$STARBURST_TGT/access-control-priv-hive.properties \
  5. Edit etc/ranger-hive-audit.xml. This file describes the method of auditing the access from Starburst to Privacera Ranger and Solr. The example below is for unsecured Privacera Ranger deployments only. Replace <PRIVACERA_HOST_INSTANCE_IP> with the IP address of Privacera host.

        <?xml version="1.0" encoding="UTF-8"?>
        <configuration>
        <property>
        <name>ranger.plugin.hive.service.name</name>
        <value>privacera_hive</value>
        </property>
        <property>
        <name>ranger.plugin.hive.policy.pollIntervalMs</name>
        <value>5000</value>
        </property>
        <property>
        <name>ranger.service.store.rest.url</name>
        <value>http://<PRIVACERA_HOST_INSTANCE_IP>:6080</value>
        </property>
        <property>
        <name>ranger.plugin.hive.policy.rest.url</name>
        <value>http://<PRIVACERA_HOST_INSTANCE_IP>:6080</value>
        </property>
        <property>
        <name>xasecure.audit.destination.solr</name>
        <value>true</value>
        </property>
        <property>
        <name>xasecure.audit.destination.solr.batch.filespool.dir</name>
        <value>/opt/presto/logs/audits/solr/</value>
        </property>
        <property>
        <name>xasecure.audit.destination.solr.urls</name>
        <value>http://<PRIVACERA_HOST_INSTANCE_IP>:8983/solr/ranger_audits</value>
        </property>
        <property>
        <name>xasecure.audit.is.enabled</name>
        <value>true</value>
        </property>
        </configuration>
    
  6. To install this file into the Docker container, you can add option to your container creation script:

    -v $DOCKER_HOME/$STARBURST_VERSION/etc/ranger-hive-audit.xml:$STARBURST_TGT/ranger-hive-audit.xml \
Privacera services (Data Assets)
Privacera services

This topic covers how you can enable/disable Data Sets menu on Privacera Portal.

Data Sets allows you to create logical data assets from various data sources such Snowflake, PostgreSQL and so on, and share the data assets with users, groups or roles. You can assign an owner to a data asset who has the privileges to control access to the data within the data asset.

CLI configuration
  1. Run the following command.

    cd privacera/privacera-manager/
    cp config/sample-vars/vars.privacera-services.yml  config/custom-vars/
    vi config/custom-vars/vars.privacera-services.yml
  2. Enable/Disable the property.

    PRIVACERA_SERVICES_ENABLE:"true"
  3. Run the following command.

    cd privacera/privacera-manager/
    ./privacera-manager update
Audit Fluentd

Prerequisites

Ensure the following prerequisites are met:

  • AuditServer must be up and running. For more information, refer to AuditServer.

  • If you're configuring Fluentd for an Azure environment and want to configure User Managed Service Identity (MSI), assign the following two IAM roles to the Azure Storage account for the User Managed Service Identity where the audits will be stored.

    • Owner or Contributor

    • Storage Blob Data Owner or Storage Blob Data Contributor

    Note

    If your Azure environment is Docker-based, then configure MSI on a virtual machine, whereas for a Kubernetes-based environment, configure MSI on a virtual machine scale set (VMSS).

This topic covers how you can store the audits from AuditServer locally, or on a cloud, for example, AWS S3, Azure blob, and Azure ADLS Gen 2. You can also send application logs to the same location as the audit logs.

Procedure

  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.audit-fluentd.yml config/custom-vars/
    vi config/custom-vars/vars.audit-fluentd.yml
  3. Modify the properties below. For property details and description, refer to the Configuration Properties below.

    You can also add custom properties that are not included by default. See Audit Fluentd.

  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
Configuration properties

Property

Description

Example

AUDIT_FLUENTD_AUDIT_DESTINATION

Set the audit destination where the audits will be saved. If the value is set to S3, the audits get stored in the AWS S3 server. For S3, the default time interval to publish the audits is 3600s (1hr).

Local storage should be used only for development and testing purposes. All the audit received are stored in the same container/pod.

Value: local, s3, azure-blob, azure-adls

s3

AUDIT_FLUENTD_EXPORT_APP_LOGS_ENABLE

Specifies whether application logs and PolicySync logs are sent to Fluentd. The default value is false.

true

When the destination is local, edit the following property:

AUDIT_FLUENTD_LOCAL_FILE_TIME_INTERVAL

This is the time interval after which the audits will be pushed to the local destination.

3600s

When the destination is s3, edit the following properties:

AUDIT_FLUENTD_S3_BUCKET

Set the bucket name, if you set the audit destination above to S3.

Leave unchanged, if you set the audit destination to local.

bucket_1

AUDIT_FLUENTD_S3_REGION

Set the bucket region, if you set the audit destination above to S3.

Leave unchanged, if you set the audit destination to local.

us-east-1

AUDIT_FLUENTD_S3_FILE_TIME_INTERVAL

This is the time interval after which the audits will be pushed to the S3 destination.

3600s

AUDIT_FLUENTD_S3_ACCESS_KEY

AUDIT_FLUENTD_S3_SECRET_KEY

Set the access and secret key, if you set the audit destination above to S3.

Leave unchanged, if you set the audit destination to local and are using AWS IAM Instance Role.

AUDIT_FLUENTD_S3_ACCESS_KEY: "AKIAIOSFODNN7EXAMPLE"

AUDIT_FLUENTD_S3_SECRET_KEY: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

AUDIT_FLUENTD_S3_BUCKET_ENCRYPTION_TYPE

Property to encrypt an S3 bucket. You can use the property, if you have set S3 as the audit destination in the property, AUDIT_FLUENTD_AUDIT_DESTINATION.

You can assign one of the following values as the encryption types:

  • SSE-S3

  • SSE-KMS

  • SSE-C

  • NONE

SSE-S3 and SSE-KMS are encryptions managed by AWS. You need to enable the server-side encryption for the S3 bucket. For more information on how to enable SSE-S3 or SSE-KMS encryption types, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-bucket-encryption.html

SSE-C is the custom encryption type, where the encryption key and MD5 have to generated separately.

NONE

AUDIT_FLUENTD_S3_BUCKET_ENCRYPTION_KEY

If you have set SSE-C encryption type in the AUDIT_FLUENTD_S3_BUCKET_ENCRYPTION_TYPE property, then the encryption key is mandatory. It is optional for SSE-KMS encryption type.

AUDIT_FLUENTD_S3_BUCKET_ENCRYPTION_KEY_MD5

If you have set SSE-C encryption type in the AUDIT_FLUENTD_S3_BUCKET_ENCRYPTION_TYPE property, then the MD5 encryption key is mandatory.

To get the MD5 hash for the encryption key, run the following command:

echo -n "<generated-key>"|  openssl dgst -md5 -binary | openssl enc -base64

When the destination is azure-blob or azure-adls, edit the following properties:

AUDIT_FLUENTD_AZURE_STORAGE_ACCOUNT

AUDIT_FLUENTD_AZURE_CONTAINER

Set the storage account and the container, if you set the audit destination above to Azure Blob or Azure ADLS.

To know how to get the ADLS properties, see Get ADLS properties.

Leave unchanged, if you set the audit destination to local.

Note

Currently, it supports Azure blob storage only.

AUDIT_FLUENTD_AZURE_STORAGE_ACCOUNT: "storage_account_1"

AUDIT_FLUENTD_AZURE_CONTAINER: "container_1"

AUDIT_FLUENTD_AZURE_FILE_TIME_INTERVAL

This is the time interval after which the audits will be pushed to the Azure ADLS/Blob destination.

3600s

AUDIT_FLUENTD_AUTH_TYPE

Select an authentication type from the dropdown list.

AUDIT_FLUENTD_AZURE_STORAGE_ACCOUNT_KEY

AUDIT_FLUENTD_AZURE_STORAGE_SAS_TOKEN

Configure this property, if you have selected SAS Key in the property, AUDIT_FLUENTD_AUTH_TYPE.

Set the storage account key and the SAS token, if you set the audit destination above to Azure Blob.

Leave unchanged, if you're using Azure's Managed Identity Service.

AUDIT_FLUENTD_AZURE_OAUTH_TENANT_ID

AUDIT_FLUENTD_AZURE_OAUTH_APP_ID

AUDIT_FLUENTD_AZURE_OAUTH_SECRET

Set the storage account key and the SAS token, if you set the audit destination above to Azure ADLS.

Configure this property, if you have selected OAUTH in the property, AUDIT_FLUENTD_AUTH_TYPE.

Leave unchanged, if you're using Azure's Managed Identity Service.

AUDIT_FLUENTD_AZURE_USER_MANAGED_IDENTITY_ENABLE

AUDIT_FLUENTD_AZURE_USER_MANAGED_IDENTITY

Configure this property, if you have selected MSI (UserManaged) in the property, AUDIT_FLUENTD_AUTH_TYPE.

Related Information

For further reading, see:

Grafana
How to configure Grafana with Privacera

Privacera allows you to use Grafana as a metric and monitoring system. Grafana dashboards are pre-built in Privcera for services such as Dataserver, PolicySync and Usersync to monitor the health of the services. Grafana uses the time-series data from the Privacera services and turns them into graphs and visualizations.

Grafana uses Graphite's query to pull the time-series data and create charts and graphs based on this data.

Supported services

The following services are supported on Grafana:

  • Dataserver

  • PolicySync

  • Usersync

Configuration steps

  1. To enable Grafana, run the following command. This will enable both Grafana and Graphite.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.grafana.yml config/custom-vars/
  2. Run the update.

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh

Note

After configuring Grafana, if the data does not appear on the dashboard, see Grafana service.

Ranger Tagsync

This topic shows how you can configure Ranger TagSync to synchronize the Ranger tag store with Atlas.

Configuration

  1. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.ranger-tagsync.yml config/custom-vars/
    vi config/custom-vars/vars.ranger-tagsync.yml
  2. Edit the following properties.

    Property

    Description

    Example

    RANGER_TAGSYNC_ENABLE

    Property to enable/disable the Ranger TagSync.

    true

    TAGSYNC_TAG_SOURCE_ATLAS_KAFKA_BOOTSTRAP_SERVERS

    Kakfa bootstrap server where Atlas publishes the entities. Tagsync listens and pushes the mapping of Atlas entities and tags to Ranger.

    kafka:9092

    TAGSYNC_TAG_SOURCE_ATLAS_KAFKA_ZOOKEEPER_CONNECT

    Zookeeper URL for Kafka.

    zoo-1:2181

    TAGSYNC_ATLAS_CLUSTER_NAME

    Atlas cluster name.

    privacera

    TAGSYNC_TAGSYNC_ATLAS_TO_RANGER_SERVICE_MAPPING

    (Optional) To map from Atlas Hive cluster-name to Ranger service-name, the following format is used:

    clusterName,componentType,serviceName;clusterName2,componentType2,serviceName2

    Note: There are no spaces in the above format.

    For Hive, the notifications from Atlas include the name of the entities in the following format:

    dbName@clusterName dbName.tblName@clusterName dbName.tblName.colName@clusterName

    Ranger Tagsync needs to derive the name of the Hive service (in Ranger) from the above entity names. By default, Ranger computes Hive service name as: clusterName + “_hive".

    If the name of the Hive service (in Ranger) is different in your environment, use following property to enable Ranger Tagsync to derive the correct Hive service name.

    TAGSYNC_ATLAS_TO_RANGER_SERVICE_MAPPING = clusterName,hive,rangerServiceName

    {{TAGSYNC_ATLAS_CLUSTER_NAME}},hive,privacera_hive;{{TAGSYNC_ATLAS_CLUSTER_NAME}},s3,privacera_s3

    TAGSYNC_TAGSYNC_ATLAS_DEFAULT_CLUSTER_NAME

    (Optional) Default cluster name configured for Atlas.

    {{TAGSYNC_ATLAS_CLUSTER_NAME}}

    TAGSYNC_TAG_SOURCE_ATLAS_KAFKA_ENTITIES_GROUP_ID

    (Optional) Consumer Group Name to be used to consume Kafka events.

    privacera_ranger_entities_consumer

    Note

    You can also add custom properties that are not included by default. See Ranger TagSync.

  3. Run the following command.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update

Discovery

Discovery in Kubernetes
Discovery (Kubernetes Mode)

This section provides setup instructions for Privacera Discovery for a Kubernetes based deployment.

Prerequisites

Ensure the following prerequisite is met:

  • Privacera services must be deployed using Kubernetes.

  • Embedded Spark must be used.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.kubernetes.yml config/custom-vars/
    vi custom-vars/vars.discovery.kubernetes.yml
    
  3. Set value for the following. For property details and description, refer to the Configuration Properties below.

    DISCOVERY_K8S_SPARK_MASTER: "${PLEASE_CHANGE}"
Configuration properties

To get the value of the variable, do the following:

  1. Get the URL for Kubernetes master by executing kubectl cluster-info command.

  2. Copy the Kubernetes control plane URL and paste it.

    kubernetes_url.jpg
Discovery on Databricks
Discovery on Databricks

This topic covers the installation of Privacera Discovery on Databricks.

Configuration
  1. SSH to the instance as USER.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.databricks.yml config/custom-vars/
    vi custom-vars/vars.discovery.databricks.yml
    
  3. Add and provide the following details in custom-vars/vars.discovery.databricks.yml file if the Databricks plugin is not enabled. To configure Databricks plugin, see Configuration in Databricks Spark Fine-Grained Access Control Plugin (FGAC) (Python, SQL).

    DATABRICKS_HOST_URL: "<PLEASE_UPDATE>"
    DATABRICKS_TOKEN: "<PLEASE_UPDATE>"
    
    DATABRICKS_WORKSPACES_LIST:
    - alias: DEFAULT
        databricks_host_url: "{{DATABRICKS_HOST_URL}}"
        token: "{{DATABRICKS_TOKEN}}"
    
  4. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    AWS

    DATABRICKS_DRIVER_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_INSTANCE_TYPE: "m5.xlarge"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"
    DATABRICKS_DISCOVERY_INSTANCE_PROFILE: "arn:aws:iam::<ACCOUNT_ID>:instance-profile/<DATABRICKS_CLUSTER_IAM_ROLE>"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE: "true"
    DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN: "arn:aws:iam::<ACCOUNT_ID>:role/<DISCOVERY_IAM_ROLE>"
    

    Azure

    >
    DATABRICKS_DRIVER_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_INSTANCE_TYPE: "Standard_DS3_v2"
    DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT: "true"
    DATABRICKS_DISCOVERY_SPARK_VERSION: "7.3.x-scala2.12"

Note

PRIVACERA_DISCOVERY_DATABRICKS_DOWNLOAD_URL is no longer in use. The Discovery Databricks packages will be downloaded from PRIVACERA_BASE_DOWNLOAD_URL.

Configuration properties

Property

Description

Example

DATABRICKS_DRIVER_INSTANCE_TYPE

For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge"

For Azure driver's instance type can be "Standard_DS3_v2"

m5.xlarge

DATABRICKS_INSTANCE_TYPE

For AWS driver's instance type can be "m5.xlarge" or "m5.2xlarge"

For Azure driver's instance type can be "Standard_DS3_v2"

m5.xlarge

SETUP_DATABRICKS_JAR

USE_DATABRICKS_SPARK

DATABRICKS_ELASTIC_DISK

DATABRICKS_DISCOVERY_MANAGE_INIT_SCRIPT

Set to true if you want to create databricks init script.

false

DATABRICKS_DISCOVERY_WORKERS

DATABRICKS_DISCOVERY_JOB_NAME

DATABRICKS_DISCOVERY_SPARK_VERSION

Spark version can be as follows:

  • 6.4.x-scala2.11 (Spark 2.4)

  • 7.3.x-scala2.12 (Spark 3.0)

  • 7.4.x-scala2.12 (Spark 3.0)

  • 7.5.x-scala2.12 (Spark 3.0)

  • 7.6.x-scala2.12 (Spark 3.0)

7.3.x-scala2.12

DATABRICKS_DISCOVERY_INSTANCE_PROFILE

Property is used for the instance role, for the Databricks instance node where your discovery will be running

arn:aws:iam::1234564835:instance-profile/privacera_databricks_cluster_iam_role

DISCOVERY_AWS_CLOUD_ASSUME_ROLE

Property to grant Discovery access to AWS services to perform the scanning operation.

true

DISCOVERY_AWS_CLOUD_ASSUME_ROLE_ARN

ARN of the AWS IAM Role

arn:aws:iam::12345671758:role/DiscoveryCrossAccAssumeRole_k

Discovery in AWS
Discovery

This topic allows you to set up the AWS configuration for installing Privacera Discovery in a Docker and Kubernetes (EKS) environment.

IAM policies

To use the Privacera Discovery service, ensure the following IAM policies are attached to the Privacera_PM_Role role to access the AWS services.

Policy to create AWS resources

Policy to create AWS resources is required only during installation or when Discovery is updated through Privacera Manager. This policy gives permissions to Privacera Manager to create AWS resources like DynamoDB, Kinesis, SQS, and S3 using terraform.

  • ${AWS_REGION}: AWS region where the resources will get created.

     {
    "Version":"2012-10-17",
    "Statement":[
        {
            "Sid":"CreateDynamodb",
            "Effect":"Allow",
            "Action":[
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:ListTables",
                "dynamodb:TagResource",
                "dynamodb:UntagResource",
                "dynamodb:UpdateTable",
                "dynamodb:UpdateTableReplicaAutoScaling",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:ListTagsOfResource",
                "dynamodb:DescribeContinuousBackups"
            ],
            "Resource":"arn:aws:dynamodb:${AWS_REGION}:*:table/privacera*"
        },
        {
            "Sid":"CreateKinesis",
            "Effect":"Allow",
            "Action":[
                "kinesis:CreateStream",
                "kinesis:ListStreams",
                "kinesis:UpdateShardCount"
            ],
            "Resource":"arn:aws:kinesis:${AWS_REGION}:*:stream/privacera*"
        },
        {
            "Sid":"CreateS3Bucket",
            "Effect":"Allow",
            "Action":[
                "s3:CreateBucket",
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation"
                
            ],
            "Resource":[
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid":"CreateSQSMessages",
            "Effect":"Allow",
            "Action":[
                "sqs:CreateQueue",
                "sqs:ListQueues"
            ],
            "Resource":[
                "arn:aws:sqs:${AWS_REGION}:${ACCOUNNT_ID}:privacera*"
            ]
        }
    ]
    }
 
CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Configure your environment.

    • Configure Discovery for a Kubernetes environment. You need to set the Kubernetes cluster name. For more information, see Discovery (Kubernetes Mode)

    • For a Docker environment, you can skip this step.

  3. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.aws.yml config/custom-vars/
    vi config/custom-vars/vars.discovery.aws.yml
    
  4. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    DISCOVERY_BUCKET_NAME: "<PLEASE_CHANGE>"
    

    To configure a bucket, add the property as follows, where bucket-1 is the name of the bucket:

    DISCOVERY_BUCKET_NAME: "bucket-1"
    

    To configure a bucket containing a folder, add the property as follows:

    DISCOVERY_BUCKET_NAME: "bucket-1/folder1"
    
  5. Uncomment/Add the following variable to enable Autoscalability of Executor pods:

    DISCOVERY_K8S_SPARK_DYNAMIC_ALLOCATION_ENABLED: "true"
    
  6. (Optional) If you want to customize Discovery configuration further, you can add custom Discovery properties. For more information, refer to Discovery Custom Properties.

    For example, by default, the username and password for the Discovery service is padmin/padmin. If you choose to change it, refer to Add Custom Properties.

  7. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

DISCOVERY_BUCKET_NAME

Set the bucket name where Discovery will store its metadata files

container1

[Properties of Topic and Table names](../pm-ig/customize_topic_and_tables_names.md)

Topic and Table names are assigned by default in Privacera Discovery. To customize any topic or table name, refer to the link.

Enable realtime scan

An AWS SQS queue is required, if you want to enable realtime scan on the S3 bucket.

After running the PM update command, an SQS queue will be created for you automatically with the name, privacera_bucket_sqs_{{DEPLOYMENT_ENV_NAME}}, where {{DEPLOYMENT_ENV_NAME}} is the environment name you set in the vars.privacera.yml file. This queue name will appear in the list of queues of your AWS SQS account.

If you have an SQS queue which you want to use, add the DISCOVERY_BUCKET_SQS_NAME property in the vars.discovery.aws.yml file and assign your SQS queue name.

If you want to enable realtime scan on the bucket, click here.

Discovery in Azure
Azure Discovery

This topic allows you to setup the Azure configuration for installing Privacera Discovery.

Prerequisites

Ensure the following prerequisites are met:

Azure storage account

Azure Cosmos DB account

  • Create an Azure Cosmos DB, For more information, refer to Microsoft's documentation Cosmos DB .

  • Get the URI from the Overview section.

  • Get the Primary Key from the Settings > Keys section.

  • Set the consistency to Strong in the Settings > Default Consistency section.

For Terraform

  • Assign permissions to create Azure resources using managed-identity. For more information, refer to Create Azure Resources .

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Configure your environment.

    • Configure Discovery for a Kubernetes environment. You need to set the Kubernetes cluster name. For more information, see Discovery (Kubernetes Mode)

    • For a Docker environment, you can skip this step.

  3. Run the following commands.

    cd ~/privacera/privacera-manager  
    cp config/sample-vars/vars.kafka.yml config/custom-vars
    vi config/custom-vars/vars.kafka.yml
    
  4. Run the following commands.

    cd ~/privacera/privacera-manager  
    cp config/sample-vars/vars.discovery.azure.yml config/custom-vars
    vi config/custom-vars/vars.discovery.azure.yml
    
  5. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    DISCOVERY_FS_PREFIX: "<PLEASE_CHANGE>"
    DISCOVERY_AZURE_STORAGE_ACCOUNT_NAME: <PLEASE_CHANGE>"
    DISCOVERY_COSMOSDB_URL: <PLEASE_CHANGE>"
    DISCOVERY_COSMOSDB_KEY: "<PLEASE_CHANGE>"
    DISCOVERY_AZURE_STORAGE_ACCOUNT_KEY: "<PLEASE_CHANGE>"
    CREATE_AZURE_RESOURCES: "false"
    DISCOVERY_AZURE_RESOURCE_GROUP: "<PLEASE_CHANGE>"
    DISCOVERY_AZURE_COSMOS_DB_ACCOUNT: "<PLEASE_CHANGE>"
    DISCOVERY_AZURE_LOCATION: "<PLEASE_CHANGE>"
    
  6. (Optional) If you want to customize Discovery configuration further, you can add custom Discovery properties. For more information, refer to Discovery Custom Properties.

    For example, by default, the username and password for the Discovery service is padmin/padmin. If you choose to change it, refer to Add Custom Properties.

  7. To configure real-time scan for audits, refer to Pkafka.

  8. Run the following commands.

    cd ~/privacera/privacera-manager  
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

DISCOVERY_ENABLE

In the **Basic** tab, enable/disable Privacera Discovery.

DISCOVERY_REALTIME_ENABLE

In the **Basic** tab, enable/disable real-time scan in Privacera Discovery.

For real-time scan to work, ensure the following:

  • If you want to scan the default ADLS app registered by the system at the time of installation, keep its app properties unchanged in Privacera Portal.

  • If you want to scan a user-registered app, the app properties in Privacera Portal and its corresponding discovery.yml should be the same.

  • At a time, only one app can be scanned.

DISCOVERY_FS_PREFIX

Enter the container name. Get it from the Prerequisites section.

container1

DISCOVERY_AZURE_STORAGE_ACCOUNT_NAME

Enter the name of the Azure Storage account. Get it from the Prerequisites section.

azurestorage

DISCOVERY_COSMOSDB_URL

DISCOVERY_COSMOSDB_KEY

Enter the Cosmos DB URL and Primary Key. Get it from the Prerequisites section.

DISCOVERY_COSMOSDB_URL: "https://url1.documents.azure.com:443/"

DISCOVERY_COSMOSDB_KEY: "xavosdocof"

DISCOVERY_AZURE_STORAGE_ACCOUNT_KEY

Enter the Access Key of the storage account. Get it from the Prerequisites section.

GMi0xftgifp==

[Properties of Topic and Table names](../pm-ig/customize_topic_and_tables_names.md)

Topic and Table names are assigned by default in Privacera Discovery. To customize any topic or table name, refer to the link.

PKAFKA_EVENT_HUB

In the **Advanced > Pkafka Configuration** section, enter the Event Hub name. Get it from the Prerequisites section.

eventhub1

PKAFKA_EVENT_HUB_NAMESPACE

In the **Advanced > Pkafka Configuration** section, enter the name of the Event Hub namespace. Get it from the Prerequisites section.

eventhubnamespace1

PKAFKA_EVENT_HUB_CONSUMER_GROUP

In the **Advanced > Pkafka Configuration** section, enter the name of the Consumer Group. Get it from the Prerequisites section.

congroup1

PKAFKA_EVENT_HUB_CONNECTION_STRING

In the **Advanced > Pkafka Configuration** section, enter the connection string. Get it from the Prerequisites section.

Endpoint=sb://eventhub1.servicebus.windows.net/;

SharedAccessKeyName=RootManageSharedAccessKey;

SharedAccessKey=sAmPLEP/8PytEsT=

CREATE_AZURE_RESOURCES

For terraform usage, assign the value as true. Its default value is false.

true

DISCOVERY_AZURE_RESOURCE_GROUP

Get the value from the Prerequisite section.

resource1

DISCOVERY_AZURE_COSMOS_DB_ACCOUNT

Get the value from the Prerequisite section.

database1

Discovery in GCP
Discovery

This topic allows you to set up the GCP configuration for installing Privacera Discovery in a Docker and Kubernetes environment.

Prerequisites

Ensure the following prerequisites are met:

  • Create a service account and add the following roles. For more information, refer to Creating a new service account.

    • Editor

    • Owner

    • Private Logs Viewer

    • Kubernetes Engine Admin (Required only for a Kubernetes environment)

  • Create a Bigtable instance and get the Bigtable Instance ID. For more information, refer to Creating a Cloud Bigtable instance.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Configure your environment.

    • Configure Discovery for a Kubernetes environment. You need to set the Kubernetes cluster name. For more information, see Discovery (Kubernetes Mode)

    • For a Docker environment, you can skip this step.

  3. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.discovery.gcp.yml config/custom-vars/
    vi config/custom-vars/vars.discovery.gcp.yml
    
  4. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    BIGTABLE_INSTANCE_ID: "<PLEASE_CHANGE>"
    DISCOVERY_BUCKET_NAME: "<PLEASE_CHANGE>"
    
  5. (Optional) If you want to customize Discovery configuration further, you can add custom Discovery properties. For more information, refer to Discovery Custom Properties.

    For example, by default, the username and password for the Discovery service is padmin/padmin. If you choose to change it, refer to Add Custom Properties.

  6. For real-time scanning, run the following.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.pkafka.gcp.yml config/custom-vars/
    

    Note

    • Recommended: Use Google Sink based approach to enable real-time scan of applications on different projects, click here.

    • Optional: Use Google Logging API based approach to enable real-time scan of applications on different projects, click here.

  7. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

BIGTABLE_INSTANCE_ID

Get the value by navigating to **Navigation Menu->Databases->BigTable->Check the instance id column**.

BIGTABLE_INSTANCE_ID: "table_1"

DISCOVERY_BUCKET_NAME

Give a name where the Discovery will store it's metadata files.

DISCOVERY_BUCKET_NAME="bucket_1"

Pkafka

This topic allows you to enable Pkafka for real-time audits in Privacera Discovery.

Prerequisites

Ensure the following prerequisites are met:

  • Create an Event Hub namespace with a region similar to the region of a Storage Account you want to monitor. For more information, refer to Microsoft's documentation Create an Event Hubs namespace.

  • Create Event Hub in the Event Hub namespace. For more information, refer to Microsoft's documentation Create an event hub.

  • Create a consumer group in the Event Hub.

    Azure Portal > Event Hubs namespace > Event Hub > Consumer Groups > +Consumer Group. The Consumer Groups tab will be under Entities of the Event Hub page.

  • Get the connection string of the Event Hubs namespace. For more information, refer to Microsoft's documentation Get connection string from the portal.

  • Create an Event Subscription for the Event Hubs namespace with the Event Type as Blob Created and Blob Deleted. For more information, refer to Microsoft's documentation Create an Event Grid subscription.

    Note

    When you create an event grid subscription, clear the checkbox Enable subject filtering.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.pkafka.azure.yml config/custom-vars/
    vi config/custom-vars/vars.pkafka.azure.yml
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    PKAFKA_EVENT_HUB: "<PLEASE_CHANGE>"
    PKAFKA_EVENT_HUB_NAMESPACE: "<PLEASE_CHANGE>"
    PKAFKA_EVENT_HUB_CONSUMER_GROUP: "<PLEASE_CHANGE>"
    PKAFKA_EVENT_HUB_CONNECTION_STRING: "<PLEASE_CHANGE>"
    DISCOVERY_REALTIME_ENABLE: "true"
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
Configuration properties

Property

Description

Example

PKAFKA_EVENT_HUB

Enter the Event Hub name. Get it from the Prerequisites section above.

eventhub1

PKAFKA_EVENT_HUB_NAMESPACE

Enter the name of the Event Hub namespace. Get it from the Prerequisites section above.

eventhubnamespace1

PKAFKA_EVENT_HUB_CONSUMER_GROUP

Enter the name of the Consumer Group. Get it from the Prerequisites section above.

congroup1

PKAFKA_EVENT_HUB_CONNECTION_STRING

Enter the connection string. Get it from the Prerequisites section above.

Endpoint=sb://eventhub1.servicebus.windows.net/;

SharedAccessKeyName=RootManageSharedAccessKey;

SharedAccessKey=sAmPLEP/8PytEsT=

DISCOVERY_REALTIME_ENABLE

Add this property to enable/disable real-time scan. By default, it is set to false.

Note: This is a custom property, and has to be added separately to the YAML file.

For real-time scan to work, ensure the following:

  • If you want to scan the default ADLS app registered by the system at the time of installation, keep its app properties unchanged in Privacera Portal.

  • If you want to scan a user-registered app, the app properties in Privacera Portal and its corresponding discovery.yml should be the same.

  • At a time, only one app can be scanned.

true

Portal SSO with PingFederate

Privacera portal leverages PingIdentity’s Platform Portal for authentication via SAML. For this integration, there are configuration steps in both Privacera portal and PingIdentity.

Configuration steps for PingIdentity
  1. Sign in to your PingIdentity account.

  2. Under Your Environments , click Administrators.

  3. Select Connections from the left menu.

  4. In the Applications section, click on the + button to add a new application.

  5. Enter an Application Name (such as Privacera Portal SAML) and provide a description (optionally add an icon). For the Application Type, select SAML Application. Then click Configure.

  6. On the SAML Configuration page, under "Provide Application Metadata", select Manually Enter.

  7. Enter the ACS URLs:

    https://<portal_hostname>:<PORT>/saml/SSO

    Enter the Entity ID:

    privacera-portal

    Click the Save button.

  8. On the Overview page for the new application, click on the Attributes edit button. Add the attribute mapping:

    user.login: Username

    Set as Required.

    Note

    If user’s login id is is not the same as the username, for example if user login id is email, this attribute will be considered as username in the portal. The username value would be email with the domain name (@gmail.com) removed. For example "john.joe@company.com", the username would be "john.joe". If there is another attribute which can be used as the username then this value will hold that attribute.

  9. You can optionally add additional attribute mappings:

    user.email: Email Address 
    user.firstName: Given Name
    user.lastName: Family Name
  10. Click the Save button.

  11. Next in your application, select Configuration and then the edit icon.

  12. Set the SLO Endpoint:

    https://<portal_hostname>:<PORT>/login.html

    Click the Save button.

  13. In the Configuration section, under Connection Details, click on Download Metadata button.

  14. Once this file is downloaded, rename it to:

    privacera-portal-aad-saml.xml

    This file will be used in the Privacera Portal configuration.

Configuration steps in Privacera Portal

Now we will configure Privacera Portal using privacera-manager to use the privacera-portal-aad-saml.xml file created in the above steps.

  1. Run the following commands:

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.saml.aad.yml config/custom-vars/
  2. Edit the vars.portal.saml.aad.yml file:

    vi config/custom-vars/vars.portal.saml.aad.yml

    Add the following properties:

    SAML_ENTITY_ID: "privacera-portal"
    SAML_BASE_URL: "https://{{app_hostname}}:{port}"
    PORTAL_UI_SSO_ENABLE: "true"
    PORTAL_UI_SSO_URL: "saml/login"
    PORTAL_UI_SSO_BUTTON_LABEL: "Single Sign On"
    AAD_SSO_ENABLE: "true"
  3. Copy the privacera-portal-aad-saml.xml file to the following folder:

    ~/privacera/privacera-manager/ansible/privacera-docker/roles/templates/custom
  4. Edit the vars.portal.yml file:

    cd ~/privacera/privacera-manager/
    vi config/custom-vars/vars.portal.yml

    Add the following properties and assign your values.

    SAML_EMAIL_ATTRIBUTE: "user.email"
    SAML_USERNAME_ATTRIBUTE: "user.login"
    SAML_LASTNAME_ATTRIBUTE: "user.lastName"
    SAML_FIRSTNAME_ATTRIBUTE: "user.firstName"
  5. Run the following to update privacera-manager:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update

    You should now be able to use Single Sign-on to Privacera using PingFederate.

Encryption & Masking

Privacera Encryption Gateway (PEG) and Cryptography with Ranger KMS

This topic covers how you can set up and use Privacera Cryptography and Privacera Encryption Gateway (PEG) using Ranger KMS.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Create a 'crypto' configuration file, and set the value of the Ranger KMS Master Key Password.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.crypto.yml config/custom-vars/
    vi config/custom-vars/vars.crypto.yml

    Assign a password to the RANGER_KMS_MASTER_KEY_PASSWORD such as "Str0ngP@ssw0rd".

    RANGER_KMS_MASTER_KEY_PASSWORD: "<PLEASE_CHANGE>"
  3. Run the following command.

    cp config/sample-vars/vars.peg.yml config/custom-vars/
  4. (Optional) If you want to customize PEG configuration further, you can add custom PEG properties. For more information, refer to PEG Custom Properties.

    For example, by default, the username and password for the PEG service is padmin/padmin. If you choose to change it, refer to Add Custom Properties.

  5. Run Privacera Manager to update the Privacera Platform configuration:

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update

    If this is a Kubernetes deployment, update all Privacera services:

    ./privacera-manager.sh update
AWS S3 bucket encryption

You can set up server-side encryption for AWS S3 bucket to encrypt the resources in the bucket. Supported encryption types are Amazon S3 (SSE-S3), AWS Key Management Service (SSE-KMS), and Customer-Provided Keys (SSE-C). Encryption key is mandatory for the encryption type SSE-C and optional for SSE-KMS. No encryption key is required for SSE-S3. For more information, see Protecting data using server-side encryption in the AWS documentation.

Configure bucket encryption in dataserver
  1. SSH to EC2 instance where Privacera Dataserver is installed.

  2. Enable use of bucket encryption configuration in Privacera Dataserver.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.dataserver.aws.yml config/custom-vars/
    vi config/custom-vars/vars.dataserver.aws.yml
    

    Add the new property.

    DATA_SERVER_AWS_S3_ENCRYPTION_ENABLE:"true"DATA_SERVER_AWS_S3_ENCRYPTION_MAPPING:-"bucketA|<encryption-type>|<base64encodedssekey>"-"bucketB*,BucketC|<encryption-type>|<base64encodedssekey>"
    

    Property

    Description

    DATA_SERVER_AWS_S3_ENCRYPTION_ENABLE

    Property to enable or disable the AWS S3 bucket encryption support.

    DATA_SERVER_AWS_S3_ENCRYPTION_MAPPING

    Property to set the mapping of S3 buckets, encryption SSE type, and SSE key (base64 encoded ). For example, "bucketC*,BucketD|SSE-KMS|<base64 encoded sse key>".

    The base64-encoded encryption key should be set for the following: 1) Encryption type is set to SSE-KMS and customer managed CMKs is used for encryption. 2) Encryption type is set to SSE-C.

Server-Side encryption with Amazon S3-Managed Keys (SSE-S3)

Supported S3 APIs for SSE-S3 Encryption:

  • PUT Object

  • PUT Object - Copy

  • POST Object

  • Initiate Multipart Upload

Bucket policy
{"Version":"2012-10-17","Id":"PutObjectPolicy","Statement":[{"Sid":"DenyIncorrectEncryptionHeader","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-s3-encrypted-bucket}}/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"AES256"}}},{"Sid":"DenyUnencryptedObjectUploads","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-s3-encrypted-bucket}}/*","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}}]}
  • Upload a test file.

    aws s3 cp myfile.txt s3://{{sse-s3-encrypted-bucket}}/
    
Server-Side encryption with CMKs stored in AWS Key Management Service (SSE-KMS)

Supported APIs for SSE-KMS Encryption:

  • PUT Object

  • PUT Object - Copy

  • POST Object

  • Initiate Multipart Upload

Your IAM role should have kms:Decrypt permission when you upload or download an Amazon S3 object encrypted with an AWS KMS CMK. This is in addition to the kms:ReEncrypt, kms:GenerateDataKey, and kms:DescribeKey permissions.

AWS Managed CMKs (SSE-KMS)

Bucket Policy

{"Version":"2012-10-17","Id":"PutObjectPolicy","Statement":[{"Sid":"DenyIncorrectEncryptionHeader","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-kms-encrypted-bucket}}/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"aws:kms"}}},{"Sid":"DenyUnencryptedObjectUploads","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-kms-encrypted-bucket}}/*","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}}]}
  • Upload a test file.

    aws s3 cp myfile.txt s3://{{sse-s3-encrypted-bucket}}/
    
Customer Managed CMKs (SSE-KMS)

Bucket Policy

{"Version":"2012-10-17","Id":"PutObjectPolicy","Statement":[{"Sid":"DenyIncorrectEncryptionHeader","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-kms-encrypted-bucket}}/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"aws:kms"}}},{"Sid":"RequireKMSEncryption","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-kms-encrypted-bucket}}/*","Condition":{"StringNotLikeIfExists":{"s3:x-amz-server-side-encryption-aws-kms-key-id":"{{aws-kms-key}}"}}},{"Sid":"DenyUnencryptedObjectUploads","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::{{sse-kms-encrypted-bucket}}/*","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}}]}
  • Upload a test file.

    aws s3 cp privacera_aws.sh s3://{{sse-kms-encrypted-bucket}}/
    
Server-Side encryption with Customer-Provided Keys (SSE-C)

Supported APIs for SSE-C Encryption:

  • PUT Object

  • PUT Object - Copy

  • POST Object

  • Initiate Multipart Upload

  • Upload Part

  • Upload Part - Copy

  • Complete Multipart Upload

  • Get Object

  • Head Object

  • Update the privacera_aws_config.json file with bucket and SSE-C encryption key.

    • Run AWS S3 upload.

      aws s3 cp myfile.txt s3://{{sse-c-encrypted-bucket}}/
      
    • Run head-object.

      aws s3api head-object --bucket {{sse-c-encrypted-bucket}} --key myfile.txt
      

Sample keys:

Key

Value

AES256-bit key

E1AC89EFB167B29ECC15FF75CC5C2C3A

Base64-encoded encryption key (sseKey)

echo -n "E1AC89EFB167B29ECC15FF75CC5C2C3A" | openssl enc -base64

Base64-encoded 128-bit MD5 digest of the encryption key

echo -n "E1AC89EFB167B29ECC15FF75CC5C2C3A" | openssl dgst -md5 -binary | openssl enc -base64

Ranger KMS
Integrate with Azure key vault

This topic shows how to configure Ranger Key Management Storage (KMS) system with Azure Key Vault to enable the use of data encryption. The master key for the encryption is created within the KMS and stored in Azure Key Vault. This section describes how to set up the connection from Ranger KMS to the Azure Key Vault to store the master key in the Azure key vault instead of the Ranger database.

Note: You can manually move the Ranger KMS from the Ranger database to the Azure Key Vault. For more information, refer to Migrate Ranger KMS Master Key

Prerequisites
CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.crypto.azurekeyvault.yml config/custom-vars/
    vi config/custom-vars/vars.crypto.azurekeyvault.yml
  3. Edit the following properties. For property details and description, refer to the Configuration Properties below.

    AZURE_KEYVAULT_SSL_ENABLED: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_CLIENT_ID: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_CLIENT_SECRET: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_CERT_FILE: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_CERTIFICATE_PASSWORD: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_MASTERKEY_NAME: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_MASTER_KEY_TYPE: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_ZONE_KEY_ENCRYPTION_ALGO: "<PLEASE_CHANGE>"
    AZURE_KEYVAULT_URL: "<PLEASE_CHANGE>"
  4. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
Configuration properties

Property

Description

Example

AZURE_KEYVAULT_SSL_ENABLED

Activate Azure Key Vault.

true

AZURE_KEYVAULT_CLIENT_ID

Get the ID by following the Pre-requisites section above.

50fd7ca6-xxxx-xxxx-a13f-1xxxxxxxx

AZURE_KEYVAULT_CLIENT_SECRET

Get the client secret by following the Pre-requisites section above.

<AzureKeyVaultPassword>

AZURE_KEYVAULT_CERT_FILE

Get the file by following the Pre-requisites section above.

Ensure the file is copied in the config/ssl folder, and give it a name.

azure-key-vault.pem

AZURE_KEYVAULT_CERTIFICATE_PASSWORD

Get the value by following the Pre-requisites section above.

certPass

AZURE_KEYVAULT_MASTERKEY_NAME

Enter the name of the master key. A key with this name will be created in Azure Key Vault.

RangerMasterKey

AZURE_KEYVAULT_MASTER_KEY_TYPE

Enter a type of master key.

Values: RSA, RSA_HSM, EC, EC_HSM, OCT

RSA

AZURE_KEYVAULT_ZONE_KEY_ENCRYPTION_ALGO

Enter an encryption algorithm for the master key.

Values: RSA_OAEP, RSA_OAEP_256, RSA1_5, RSA_OAEP

RSA_OAEP

AZURE_KEYVAULT_URL

Get the URL by following the Pre-requisites section above.

https://keyvault.vault.azure.net/

AuthZ / AuthN

LDAP / LDAP-S for Privacera portal access
LDAP / LDAP-S for Privacera Portal access

This configuration sequence configures the Privacera Portal to reference an external LDAP or LDAP over SSL directory for the purpose of Privacera Portal user login authentication.

Prerequisites

Before starting these steps, prepare the following. You need to configure various Privacera properties with these values, as detailed in Configuration.

Determine the following LDAP values:

  • The FQDN and protocol (http or https) of your LDAP server

  • Complete Bind DN

  • Bind DN password

  • Top-level search base

  • User search base

  • Group search base

  • Username attribute

  • DN attribute

To configure an SSL-enabled LDAP server, Privacera requires an SSL certificate. You have these alternatives:

  • Set the Privacera property PORTAL_LDAP_SSL_ENABLED: "true".

  • Allow Privacera Manager to download and create the certificate based on the LDAP server URL. Set the Privacera property PORTAL_LDAP_SSL_PM_GEN_TS: "true".

  • Manually configure a truststore on the Privacera server that contains the certificate of the LDAP server. Set the Privacera property PORTAL_LDAP_SSL_PM_GEN_TS: "false".

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the commands below.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.portal.ldaps.yml config/custom-vars/
    vi config/custom-vars/vars.portal.ldaps.yml
    
  3. Uncomment the properties and edit the configurations as required. For property details and description, refer to the Configuration Properties below.

    PORTAL_LDAP_ENABLE: "true"
    PORTAL_LDAP_URL: "<PLEASE_CHANGE>"
    PORTAL_LDAP_BIND_DN: "<PLEASE_CHANGE>"
    PORTAL_LDAP_BIND_PASSWORD: "<PLEASE_CHANGE>"
    PORTAL_LDAP_SEARCH_BASE: "<PLEASE_CHANGE>"
    PORTAL_LDAP_USER_SEARCH_BASE: "<PLEASE_CHANGE>"
    PORTAL_LDAP_GROUP_SEARCH_BASE: "<PLEASE_CHANGE>"
    PORTAL_LDAP_USERNAME_ATTRIBUTE: "<PLEASE_CHANGE>"
    PORTAL_LDAP_DN_ATTRIBUTE: "<PLEASE_CHANGE>"
    PORTAL_LDAP_BIND_ANONYMOUSLY: "false"
    PORTAL_LDAP_SSL_ENABLED: "true"
    PORTAL_LDAP_SSL_PM_GEN_TS: "true"
    
  4. Run Privacera Manager update.

    >cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Configuration properties

Property

Description

Example

PORTAL_LDAP_URL

Add value as "LDAP_HOST: LDAP_PORT

xxx.example.com:983

PORTAL_LDAP_BIND_DN

CN=Bind User,OU=example,DC=ad,DC=example,DC=com

PORTAL_LDAP_BIND_PASSWORD

Add the password for LDAP

PORTAL_LDAP_SEARCH_BASE

ou=example,dc=ad,dc=example,dc=com

PORTAL_LDAP_USER_SEARCH_BASE

ou=example,dc=ad,dc=example,dc=com

PORTAL_LDAP_GROUP_SEARCH_BASE

OU=example_services,OU=example,DC=ad,DC=example,DC=com

PORTAL_LDAP_USERNAME_ATTRIBUTE

sAMAccountName

PORTAL_LDAP_DN_ATTRIBUTE

PORTAL_LDAP_DN_ATTRIBUTE: dc

PORTAL_LDAP_SSL_ENABLED

For SSL enabled LDAP server, set this value to true.

true

PORTAL_LDAP_SSL_PM_GEN_TS

Set this to true if you want Privacera Manager to generate the truststore for your ldaps server.

Set this to false if you want to manually provide the truststore certificate. To learn how to upload SSL certificates, [click here](../pm-ig/upload_custom_cert.md).

true

Map LDAP roles with the existing Privacera roles

You can associate LDAP users roles to Privacera roles using Privacera LDAP Role Mapping. It allows you to use the access control of Privacera Portal with LDAP user roles.

  1. Log in to Privacera Portal using padmin user credentials or as a user with Privacera ROLE_SYSADMIN role.

  2. Go to Settings > System Configurations.

  3. Select Custom Properties checkbox.

  4. Click on Add Property and enter the new property, auth.ldap.enabled=true.

    image49.jpg
  5. Click Save.

  6. Go to Settings > LDAP Role Mapping.

  7. Add the appropriate role mappings.

    image50.jpg
  8. When you login in back with LDAP user, you will see the new user role. This LDAP user login can be done after the LDAP setup with Privacera Manager is completed.

    image51.jpg
Portal SSO with AAD using SAML

Privacera supports SAML that allows you to authenticate users using single-sign on (SSO) technology. It is way to provide access to use Privacera services.

Using the Azure Active Directory (AAD) SAML Toolkit, you can set up single sign-on (SSO) in Privacera Manager for Active Directory users. After setting up the SSO, you will be provided with an SSO button on the login page of Privacera Portal.

Prerequisites

To configure SSO with Azure Active Directory, you need to configure and enable SSL for the Privacera Portal. See Enable CA Signed Certificates or Enable Self Signed Certificates.

Configuring SAML in Azure AD

The following steps describe how to configure SAML in Azure AD application:

  1. Log in to Azure portal.

  2. On the left navigation pane, select the Azure Active Directory service.

  3. Navigate to Enterprise Applications and then select All Applications.

  4. To add a new application, select New application.

    Note

    If you have an existing Azure AD SAML Toolkit application, select it, and then go to step 8 to continue with the rest of the configuration.

  5. in the search box.Azure AD SAML Toolkit In the Add from the gallery section, type Do the following:

  6. Select Azure AD SAML Toolkit from the results panel and then add the app.

  7. On the Azure AD SAML Toolkit application integration page, in the Manage section and select single sign-on.

  8. On the Select a single sign-on method page, select SAML.

  9. Click the pen icon for Basic SAML Configuration to edit the settings.

  10. On the Basic SAML Configuration page, enter the values for the following fields, and then click Save. You can assign a unique name for the Entity ID.

    • Entity ID = privacera-portal

    • Reply URL = https://${APP_HOSTNAME}:6868/saml/SSO

    • Sign-on URL = https://${APP_HOSTNAME}:6868/login.html

  11. In the SAML Signing Certificate section, find Federation Metadata XML and select Download to download the certificate and save it on your virtual machine.

  12. On the Set up Azure AD SAML Toolkit section, copy the Azure AD Identifier URL.

  13. In the Manage section and select Users and groups.

  14. In the Users and groups dialog, select the user or user group who should be allowed to log in with SSO, then click the Select.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following command:

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.saml.aad.yml config/custom-vars/
  3. Edit the vars.portal.saml.aad.yml file.

    vi config/custom-vars/vars.portal.saml.aad.yml

    Modify the SAML_ENTITY_ID. You need to assign the value of the Entity ID achieved in the above section. For property details and description, refer to the Configuration Properties below.

    SAML_ENTITY_ID: "privacera-portal"
    SAML_BASE_URL: "https://{{app_hostname}}:6868"
    PORTAL_UI_SSO_ENABLE: "true"
    PORTAL_UI_SSO_URL: "saml/login"
    PORTAL_UI_SSO_BUTTON_LABEL: "Azure AD Login"
    AAD_SSO_ENABLE: "true"
  4. Rename the downloaded Federation Metadata XML file as privacera-portal-aad-saml.xml. Copy this file to the ~/privacera/privacera-manager/ansible/privacera-docker/roles/templates/custom folder.

  5. Run the following command:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
  6. If you are configuring the SSL in an Azure Kubernetes environment, then run the following command.

     ./privacera-manager.sh restart portal
Configuration properties

Property

Description

Example

AAD_SSO_ENABLE

Enabled by default.

SAML_ENTITY_ID

Get the value from the Prerequisites section.

privacera-portal

SAML_BASE_URL

https://{{app_hostname}}:6868

PORTAL_UI_SSO_BUTTON_LABEL

Azure AD Login

PORTAL_UI_SSO_URL

saml/login

SAML_GLOBAL_LOGOUT

Enabled by default. The global logout for SAML is enabled. Once a logout is initiated, all the sessions you've accessed from the browser would be terminated from the Identity Provider (IDP).

META_DATA_XML

Browse and select the Federation Metadata XML, which you downloaded in the Prerequisites section.

Validation

Go to the login page of the Privacera Portal. You will see the Azure AD Login button.

Configure SAML assertion attributes

By default, the following assertion attributes are configured with pre-defined values:

  • Email

  • Username

  • Firstname

  • Lastname

You can customize the values for the assertion attributes. To do that, do the following:

  1. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.yml config/custom-vars/
    vi config/custom-vars/vars.portal.yml
  2. Add the following properties and assign your values. For more information on custom properties and its values, click here.

    SAML_EMAIL_ATTRIBUTE: ""
    SAML_USERNAME_ATTRIBUTE: ""
    SAML_LASTNAME_ATTRIBUTE: ""
    SAML_FIRSTNAME_ATTRIBUTE: ""
  3. Add the properties in the YAML file configured in the Configuration above.

     cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
Portal SSO with Okta using SAML

Okta is a third-party identity provider, offering single sign-on (SSO) authentication and identity validation services for a large number of Software-as-a-Service providers. PrivaceraCloud works with Okta's SAML (Security Assertion Markup Language) interface to provide an SSO/Okta login authentication to the Privacera portal. For more information, see CLI configuration.

Integration with Okta begins with configuration steps in the Okta administrator console. These steps also generate a Privacera portal account-specific identity_provider_metadata.xml file and an Identity Provider URL that are used in the Privacera CLI configuration steps.

Prerequisites

To configure SSO with Okta , you need to configure and enable SSL for the Privacera Portal. See Enable CA Signed Certificates or Enable Self Signed Certificates.

Note

To use Okta SSO with Privacera portal, you must have already established an Okta SSO service account. The following procedures require Okta SSO administrative login credentials.

Generate an Okta Identity Provider Metadata File and URL
  1. Log in to your Okta account as the Okta SSO account administrator.

  2. Select Applications from the left navigation panel, then click Applications subcategory.

  3. From the Applications page, click Create App Integration.

    Note

    In addition to creating new applications you can also edit existing apps with new configuration values.

  4. Select SAML 2.0, then click Next.

  5. In General Settings, provide a short descriptive app name in the App name text box. For example, enter Privacera Portal SAML.

  6. Click Next.

  7. In the SAML Settings configuration page, enter the values as shown in the following table:

    Field

    Value

    Single sign on URL

    http://portal_hostname:6868/saml/SSO

    Audience URI (SP Entity ID)

    privacera_portal

    Default RelayState

    The value identifies a specific application resource in an IDP initiated SSO scenario. In most cases this field will be left blank.

    Name ID format

    Unspecified

    Application username

    Okta username

    UserID

    user.login

    Email

    user.email

    Firstname

    user.firstName

    LastName

    user.LastName

    Note

    If user’s login id is is not the same as the username, for example if user login id is email, this attribute will be considered as username in the portal. The username value would be email with the domain name (@gmail.com) removed. For example "john.joe@company.com", the username would be "john.joe". If there is another attribute which can be used as the username then this value will hold that attribute.

  8. Click Next.

  9. Select the Feedback tab and click I'm an Okta customer adding an internal app.

  10. Click Finish.

  11. From the General tab, scroll down to the App Embed Link section. Copy the Embed Link (Identity Provider URL) for PrivaceraCloud.

IdP provider metadata

In this topic, you will learn how to generate and save IdP provider metadata in XML format.

  1. Go to Sign On tab.

    > Settings, select the Identity Provider Metadata link located at the bottom of the Sign on methods area. The configuration file will open in a separate window.

  2. In the SAML Signing Certificates section, click the Generate new certificate button.

  3. In the list, click the Actions dropdown and select View IdP metadata.

    The XML file will be opened in a new tab.

    Note

    Make sure that the certificate you are downloading has an active status.

  4. Save the file in XML format.

Idp initiated SSO
  1. From Applications, login to the Okta Home Page Dashboard as a user by selecting the Okta Dashboard icon.

  2. Login to the Privacera Portal by selecting the newly added app icon.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following command:

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.saml.aad.yml config/custom-vars/
  3. Edit the vars.portal.saml.aad.yml file.

    vi config/custom-vars/vars.portal.saml.aad.yml

    Modify the SAML_ENTITY_ID. You need to assign the value of the Entity ID achieved in the above section. For property details and description, refer to the Configuration Properties below.

    SAML_ENTITY_ID: "privacera-portal"
    SAML_BASE_URL: "https://{{app_hostname}}:6868"
    PORTAL_UI_SSO_ENABLE: "true"
    PORTAL_UI_SSO_URL: "saml/login"
    PORTAL_UI_SSO_BUTTON_LABEL: "Azure AD Login"
    AAD_SSO_ENABLE: "true"
  4. Rename the downloaded Federation Metadata XML file as privacera-portal-aad-saml.xml. Copy this file to the ~/privacera/privacera-manager/ansible/privacera-docker/roles/templates/custom folder.

  5. Run the following command:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
  6. If you are configuring the SSL in an Azure Kubernetes environment, then run the following command.

     ./privacera-manager.sh restart portal
Configuration properties

Property

Description

Example

AAD_SSO_ENABLE

Enabled by default.

SAML_ENTITY_ID

Get the value from the Prerequisites section.

privacera-portal

SAML_BASE_URL

https://{{app_hostname}}:6868

PORTAL_UI_SSO_BUTTON_LABEL

Azure AD Login

PORTAL_UI_SSO_URL

saml/login

SAML_GLOBAL_LOGOUT

Enabled by default. The global logout for SAML is enabled. Once a logout is initiated, all the sessions you've accessed from the browser would be terminated from the Identity Provider (IDP).

META_DATA_XML

Browse and select the Federation Metadata XML, which you downloaded in the Prerequisites section.

Validation

Go to the login page of the Privacera Portal. You will see the Okta Login button.

Configure SAML assertion attributes

By default, the following assertion attributes are configured with pre-defined values:

  • Email

  • Username

  • Firstname

  • Lastname

You can customize the values for the assertion attributes. To do that, do the following:

  1. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.yml config/custom-vars/
    vi config/custom-vars/vars.portal.yml
  2. Add the following properties and assign your values. For more information on custom properties and its values, click here.

    SAML_EMAIL_ATTRIBUTE: ""
    SAML_USERNAME_ATTRIBUTE: ""
    SAML_LASTNAME_ATTRIBUTE: ""
    SAML_FIRSTNAME_ATTRIBUTE: ""
  3. Add the properties in the YAML file configured in the Configuration above.

     cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update
Portal SSO with Okta using OAuth

This topic covers how you can Integratie Okta SSO with Privacera Portal using Privacera Manager. Privacera Portal supports Okta as a login provider using OpenId or OAuth or SAML. For more information about SAML configuration, see Portal SSO with Okta using SAML).

Prerequisites

Before you begin, ensure the following prerequisites are met:

  • Setup an Okta Authorization and get the values for the following to use them in the Configuration section below.

  • authorization_endpoint

  • token_endpoint

  • Client ID

  • Client Secret

  • User Info URI

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following commands.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.okta.yml  config/custom-vars/
    vi config/custom-vars/vars.okta.yml

    Edit the values for the following. For property details and description, refer to the Configuration Properties below.

    OAUTH_CLIENT_CLIENTSECRET : "<PLEASE_CHANGE>"
    OAUTH_CLIENT_CLIENTID : "<PLEASE_CHANGE>"
    OAUTH_CLIENT_TOKEN_URI : "<PLEASE_CHANGE>"
    OAUTH_CLIENT_AUTH_URI : "<PLEASE_CHANGE>"
    OAUTH_RESOURCE_USER_INFO_URI : "<PLEASE_CHANGE>"
    PORTAL_UI_SSO_ENABLE: "true"
  3. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
Configuration properties

Property

Description

Example

OAUTH_CLIENT_CLIENTSECRET

Get it from the Prerequisites section above.

OAUTH_CLIENT_CLIENTSECRET: "4hb88P9UZmxxxxxxxxm1WtqsaQRv1FZDZiaOT0Gm"

OAUTH_CLIENT_CLIENTID

Get it from the Prerequisites section above.

0oa63edjkaoNHGYTS357

OAUTH_CLIENT_TOKEN_URI

Get it from the Prerequisites section above.

https://dev-396511.okta.com/oauth2/default/v1/token

OAUTH_CLIENT_AUTH_URI

Get it from the Prerequisites section above.

https://dev-396511.okta.com/oauth2/default/v1/authorize

OAUTH_RESOURCE_USER_INFO_URI

Get it from the Prerequisites section above.

https://dev-396511.okta.com/oauth2/default/v1/userinfo

PORTAL_UI_SSO_ENABLE

Property to enable/disable OKTA

true

Validation
Login to Privacera Portal using Okta SSO Login
  1. Log in to Privacera Portal.

  2. Click SSO Login button.

    The Okta login page is displayed.

  3. Enter the Okta user login credentials. The Privacera Portal page is displayed.

Login to Privacera Portal using Privacera user credentials
  1. Log in to Privacera Portal.

  2. Enter the user credentials (padmin).

  3. Click Login button. The Privacera Portal page is displayed.

Portal SSO with PingFederate

Privacera portal leverages PingIdentity’s Platform Portal for authentication via SAML. For this integration, there are configuration steps in both Privacera portal and PingIdentity.

Configuration steps for PingIdentity
  1. Sign in to your PingIdentity account.

  2. Under Your Environments , click Administrators.

  3. Select Connections from the left menu.

  4. In the Applications section, click on the + button to add a new application.

  5. Enter an Application Name (such as Privacera Portal SAML) and provide a description (optionally add an icon). For the Application Type, select SAML Application. Then click Configure.

  6. On the SAML Configuration page, under "Provide Application Metadata", select Manually Enter.

  7. Enter the ACS URLs:

    https://<portal_hostname>:<PORT>/saml/SSO

    Enter the Entity ID:

    privacera-portal

    Click the Save button.

  8. On the Overview page for the new application, click on the Attributes edit button. Add the attribute mapping:

    user.login: Username

    Set as Required.

    Note

    If user’s login id is is not the same as the username, for example if user login id is email, this attribute will be considered as username in the portal. The username value would be email with the domain name (@gmail.com) removed. For example "john.joe@company.com", the username would be "john.joe". If there is another attribute which can be used as the username then this value will hold that attribute.

  9. You can optionally add additional attribute mappings:

    user.email: Email Address 
    user.firstName: Given Name
    user.lastName: Family Name
  10. Click the Save button.

  11. Next in your application, select Configuration and then the edit icon.

  12. Set the SLO Endpoint:

    https://<portal_hostname>:<PORT>/login.html

    Click the Save button.

  13. In the Configuration section, under Connection Details, click on Download Metadata button.

  14. Once this file is downloaded, rename it to:

    privacera-portal-aad-saml.xml

    This file will be used in the Privacera Portal configuration.

Configuration steps in Privacera Portal

Now we will configure Privacera Portal using privacera-manager to use the privacera-portal-aad-saml.xml file created in the above steps.

  1. Run the following commands:

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.portal.saml.aad.yml config/custom-vars/
  2. Edit the vars.portal.saml.aad.yml file:

    vi config/custom-vars/vars.portal.saml.aad.yml

    Add the following properties:

    SAML_ENTITY_ID: "privacera-portal"
    SAML_BASE_URL: "https://{{app_hostname}}:{port}"
    PORTAL_UI_SSO_ENABLE: "true"
    PORTAL_UI_SSO_URL: "saml/login"
    PORTAL_UI_SSO_BUTTON_LABEL: "Single Sign On"
    AAD_SSO_ENABLE: "true"
  3. Copy the privacera-portal-aad-saml.xml file to the following folder:

    ~/privacera/privacera-manager/ansible/privacera-docker/roles/templates/custom
  4. Edit the vars.portal.yml file:

    cd ~/privacera/privacera-manager/
    vi config/custom-vars/vars.portal.yml

    Add the following properties and assign your values.

    SAML_EMAIL_ATTRIBUTE: "user.email"
    SAML_USERNAME_ATTRIBUTE: "user.login"
    SAML_LASTNAME_ATTRIBUTE: "user.lastName"
    SAML_FIRSTNAME_ATTRIBUTE: "user.firstName"
  5. Run the following to update privacera-manager:

    cd ~/privacera/privacera-manager/
    ./privacera-manager.sh update

    You should now be able to use Single Sign-on to Privacera using PingFederate.

JSON Web Tokens (JWT)

This topic shows how to authenticate Privacera services using JSON web tokens (JWT).

Supported services:

Prerequisites

Ensure the following prerequisites are met:

  • Get the identity provider URL that is allowed in the issuer claim of a JWT.

  • Get the public key from the provider that Privacera services can use to validate JWT.

Configuration
  1. SSH to the instance as USER.

  2. Copy the public key in ~/privacera/privacera-manager/config/custom-properties folder. If you are configuring more than one JWT, then copy all the public keys associated with the JWT tokens to the same path.

  3. Run the following commands.

    cd ~/privacera/privacera-manager/config
    cp sample-vars/vars.jwt-auth.yaml custom-vars
    vi custom-vars/vars.jwt-auth.yaml
  4. Edit the properties.

    Table 59. JWT Properties

    Property

    Description

    Example

    JWT_OAUTH_ENABLE

    Property to enable JWT auth in Privacera services.

    TRUE

    JWT_CONFIGURATION_LIST

    Property to set multiple JWT configurations.

    • issuer: URL of the identity provider.

    • subject: Subject of the JWT (the user).

    • secret: If the JWT token has been encrypted using secret.

    • publickey: JWT file name that you copied in step 2 above.

    • userKey: Define a unique userkey.

    • groupKey: Define a unique group key.

    • parserType:  Assign one of the following values.

      • PING_IDENTITY: When scope/group is array.

      • KEYCLOAK: When scope/group is space separator.

    JWT_CONFIGURATION_LIST:
      - index: 0
        issuer: "https://your-idp-domain.com/websec"
        subject: "api-token"
        secret: "tprivacera-api"
        publickey: "jwttoken.pub"
        userKey: "client_id"
        groupKey: "scope"
        parserType: "KEYCLOAK"
      - index: 1
        issuer: "https://your-idp-domain.com/websec2"
        publickey: "jwttoken2.pub"
        parserType: "PING_IDENTITY"
      - index: 2
        issuer: "https://your-idp-domain.com/websec3"
        publickey: "jwttoken3.pub"


  5. Run the update.

    cd ~/privacera/privacera-manager/
    
    ./privacera-manager.sh update
    
JWT for Databricks
Configure

To configure JWT for Databricks, do the following:

  1. Enable JWT. To enable JWT, refer Configuration.

  2. (Optional) Create a JWT, if you do not have one. Skip this step, if you already have an existing token.

    To create a token, see JWT and use the following details. For more details, refer the JWT docs.

    • Algorithm=RSA256

    • When JWT_PARSER_TYPE is KEYCLOAKS (scope/group is space separator)

      {
      "scope": "jwt:role1 jwt:role2",
      "client_id": "privacera-test-jwt-user",
      "iss": "privacera","exp": <PLEASE_UPDATE>
      }
    • When JWT_PARSER_TYPE is PING_IDENTITY (scope/group is array)

      {
      "scope": [
          "jwt:role1",
          "jwt:role1"
      ],
      "client_id": "privacera-test-jwt-user",
      "iss": "privacera",
      "exp": <PLEASE_UPDATE>
      }
    • Paste public/private key in input box.

    • Copy the generated JWT Token.

  3. Log in to Databricks portal and write the following JWT file in a cluster file. Then the Privacera plugin can read and perform access-control based on the token user.

    %python
    JWT_TOKEN="<PLEASE_UPDATE>"
    TOKEN_LOCAL_FILE="/tmp/ptoken.dat"
    f = open(TOKEN_LOCAL_FILE, "w")
    f.write(JWT_TOKEN)
    f.close()
Use case

Reading files from the cloud using JWT token

  1. Read the files in the file explorer of your cloud provider from your notebook. Depending on your cloud provider, enter the location of your cloud files in the <path-to-your-cloud-files>.

                    %python spark.read.csv("<path-to-your-cloud-files>").show()
  2. Check the audits. To learn how to check the audits, click here.

    You should get JWT user (privacera-test-jwt-user) which was specified in the payload while creating the JWT.

  3. To give permissions on a resource, create a group in Privacera Portal similar to the scope of the JWT payload and give access to the group, It's not necessary to create a user.

    Privacera plugin extracts the JWT payload and passes the group during access check. In other words, it takes user-group mapping from JWT payload itself, so it's not required to do user-group mapping in Privacera.

JWT for EMR FGAC Spark
Prerequisite
Configuration Steps
  1. First enable JWT, see Configuration above.

  2. Open the vars.emr.yml file.

    cd ~/privacera/privacera-managervi 
    config/custom-vars/vars.emr.yml
  3. Add following property to enable JWT for EMR.

    EMR_JWT_OAUTH_ENABLE: "true"
  4. Run the update.

    cd ~/privacera/privacera-manager/ 
    
    ./privacera-manager.sh update
Validations with JWT Token
  1. Create a JWT, see Step 2 above.

  2. SSH to the EMR master node.

  3. Configure the Spark application as follows:

    JWT_TOKEN=eyJhbGciOiJSU-XXXXXX–X2BAIGWTbywHkfTxxw
    spark-sql --conf "spark.hadoop.privacera.jwt.token.str=${JWT_TOKEN}" --conf "spark.hadoop.privacera.jwt.oauth.enable=true"

Security

Enable self signed certificates with Privacera Platform

This topic provides instructions for use of Self-Signed Certificates with Privacera services including Privacera Portal, Apache Ranger, Apache Ranger KMS, and Privacera Encryption Gateway. It establishes a secure connection between internal Privacera components (Dataserver, Ranger KMS, Discovery, PolicySync, and UserSync) and SSL-enabled servers.

Note

Support Chain SSL - Preview Functionality

Previously Privacera services were only using one SSL certificate of LDAP server even if a chain of certificates was available. Now as a Preview functionality, all the certificates which are available in the chain certificate are imported it into the truststore. This is added for Privacera usersync, Ranger usersync and portal SSL certificates.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following command.

    cd ~/privacera/privacera-manager 
    cp config/sample-vars/vars.ssl.yml config/custom-vars/ 
    vi config/custom-vars/vars.ssl.ym
  3. Set the passwords for the following configuration. The passwords must be at least six characters and should include alpha, symbol, numerical characters.

    SSL_DEFAULT_PASSWORD: "<PLEASE_CHANGE>" 
    RANGER_PLUGIN_SSL_KEYSTORE_PASSWORD: "<PLEASE_CHANGE>" 
    RANGER_PLUGIN_SSL_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"

    Note

    You can enable/disable SSL for specific Privacera services. For more information, refer to Configure SSL for Privacera Services.

  4. Run Privacera Manager update.

    cd ~/privacera/privacera-manager
    
    ./privacera-manager.sh update
    
  5. For Kubernetes based deployments, restart services:

    cd ~/privacera/privacera-manager
    
    ./privacera-manager.sh restart
Enable CA signed certificates with Privacera Platform
Enable CA signed certificates with Privacera Platform

This topic provides instructions for use of CA Signed Certificates with Privacera services including Privacera Portal, Apache Ranger, Apache Ranger KMS, and Privacera Encryption Gateway. It establishes a secure connection between internal Privacera components (Dataserver, Ranger KMS, Discovery, PolicySync, and UserSync) and SSL-enabled servers.

Certificate Authority (CA) or third-party generated certificates must be created for the specific hostname subdomain.

Privacera supports signed certificates as 'pem' files.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Copy the public (ssl_cert_full_chain.pem) and private key (ssl_cert_private_key.pem) files to the ~/privacera/privacera-manager/config/ssl/ location.

  3. Create and open the vars.ssl.yml file.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.ssl.yml config/custom-vars/
    vi config/custom-vars/vars.ssl.yml
    
  4. Set values for the following properties:

    • SSL_SELF_SIGNED: false;

    • SSL_DEFAULT_PASSWORD (Use a strong password with upper and lower case, symbols, and numbers);

    • Uncomment Property/Value pairs and set the appropriate value for:

      #PRIVACERA_PORTAL_KEYSTORE_ALIAS
      
      #PRIVACERA_PORTAL_KEYSTORE_PASSWORD
      
      #PRIVACERA_PORTAL_TRUSTSTORE_PASSWORD
      
      #RANGER_ADMIN_KEYSTORE_ALIAS
      
      #RANGER_ADMIN_KEYSTORE_PASSWORD
      
      #RANGER_ADMIN_TRUSTSTORE_PASSWORD
      
      #DATASERVER_SSL_TRUSTSTORE_PASSWORD
      
      #USERSYNC_AUTH_SSL_TRUSTSTORE_PASSWORD
      

      If KMS is enabled, uncomment, and set the following:

      >#RANGER_KMS_KEYSTORE_ALIAS
      
      #RANGER_KMS_KEYSTORE_PASSWORD: "&lt;PLEASE_CHANGE&gt;"
      
      #RANGER_KMS_TRUSTSTORE_PASSWORD: "&lt;PLEASE_CHANGE&gt;"
      

      If PEG enabled, uncomment, and set the following:

      #PEG_KEYSTORE_ALIAS
      
      #PEG_KEYSTORE_PASSWORD
      
      #PEG_TRUSTSTORE_PASSWORD
      
      SSL_SELF_SIGNED: "false"
      SSL_DEFAULT_PASSWORD: "<PLEASE_CHANGE>"
      #SSL_SIGNED_PEM_FULL_CHAIN: "ssl_cert_full_chain.pem"
      #SSL_SIGNED_PEM_PRIVATE_KEY: "ssl_cert_private_key.pem"
      SSL_SIGNED_CERT_FORMAT: "pem"
      
      #PRIVACERA_PORTAL_KEYSTORE_ALIAS: "<PLEASE_CHANGE>"
      #PRIVACERA_PORTAL_KEYSTORE_PASSWORD: "<PLEASE_CHANGE>"
      #PRIVACERA_PORTAL_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"
      
      #RANGER_ADMIN_KEYSTORE_ALIAS: "<PLEASE_CHANGE>"
      #RANGER_ADMIN_KEYSTORE_PASSWORD: "<PLEASE_CHANGE>"
      #RANGER_ADMIN_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"
      
      #DATASERVER_SSL_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"
      
      #USERSYNC_AUTH_SSL_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"
      
      #Below is need only if you have KMS enabled
      #RANGER_KMS_KEYSTORE_ALIAS: "<PLEASE_CHANGE>"
      #RANGER_KMS_KEYSTORE_PASSWORD: "<PLEASE_CHANGE>"
      #RANGER_KMS_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"
      
      #Below is needed only if you have PEG enabled
      #PEG_KEYSTORE_ALIAS: "<PLEASE_CHANGE>"
      #PEG_KEYSTORE_PASSWORD: "<PLEASE_CHANGE>"
      #PEG_TRUSTSTORE_PASSWORD: "<PLEASE_CHANGE>"
      
  5. Add domain names for the Privacera services. See Add Domain Names for Privacera Service URLs.

  6. Run the following commands.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
  7. For Kubernetes based deployments, restart services:

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh restart
    
Add domain names for Privacera service URLs

Note

If you have Nginx ingress enabled in your environment, then the configuration described below would not be required. For more information on Nginx ingress, see Externalize Access to Privacera Services - Nginx Ingress.

You can expose Privacera services such as Portal, Ranger, AuditServer, DataServer and PEG to be accessed externally and configure a domain name to point to them. You can use DNS service to host DNS records needed for them.

Configuration
  1. Create a vars.service_hostname.yml file.

    vi config/custom-vars/vars.service_hostname.yml
    
  2. Depending on the services you want to expose, add the properties in the file. Replace <PLEASE_CHANGE> with a hostname.

    PORTAL_HOST_NAME:"<PLEASE_CHANGE>"DATASERVER_HOST_NAME:"<PLEASE_CHANGE>"RANGER_HOST_NAME:"<PLEASE_CHANGE>"PEG_HOST_NAME:"<PLEASE_CHANGE>"AUDITSERVER_HOST_NAME:"<PLEASE_CHANGE>"
    
  3. Create CNAME records to point them to the service load balancer URLs. If you are installing Privacera and its services for the first time, you must complete the installation and then return to this step to create CNAME records.

    1. Run the following command to get the service URL. Replace <name_space> with your Kubernetes namespace.

      kubectl get svc -n <name_space>
      
    2. To create CNAME records using the service URLs, do the following:

  4. Run the update.

    cd ~/privacera/privacera-manager
    ./privacera-manager.sh update
    
Enable password encryption for Privacera services
Enable password encryption for Privacera services

This topic covers how you can enable encryption of secrets for Privacera services such as Privacera Portal, Privacera Dataserver, Privacera Ranger, Ranger Usersync, Privacera Discovery, Ranger KMS, Crypto, PEG, and Privacera PolicySync. The passwords will be stored safely in keystores, instead of being exposed in plaintext.

By default, all the sensitive data of the Privacera services are encrypted.

CLI configuration
  1. SSH to the instance where Privacera is installed.

  2. Run the following command.

    cd ~/privacera/privacera-manager
    cp config/sample-vars/vars.encrypt.secrets.yml config/custom-vars/
    vi config/custom-vars/vars.encrypt.secrets.yml
    
  3. In this file set values for the following:

    Enter a password for the keystore that will hold all the secrets. e.g. Str0ngP@ssw0rd

    GLOBAL_DEFAULT_SECRETS_KEYSTORE_PASSWORD:"<PLEASE_CHANGE>"

    If you want to encrypt data of a Privacera service, you can enter the name of the property.

    Examples

    To encrypt properties used by Privacera Portal:

    PORTAL_ADD_ENCRYPT_PROPS_LIST:-PRIVACERA_PORTAL_DATASOURCE_URL-PRIVACERA_PORTAL_DATASOURCE_USERNAME
    

    To encrypt properties used by Dataserver:

    DATASERVER_ADD_ENCRYPT_PROPS_LIST:-DATASERVER_MAC_ALGORITHM

    To encrypt new properties used by PolicySync, add the property in the vars.encrypt.secrets.ymlfile displayed in the New tab. And to encrypt the old properties, uncomment the property in the vars.encrypt.secrets.yml file displayed in the Old tab. For more information on the new and old properties, see PolicySync - Redshift.

    New

    POLICYSYNC_V2_ADD_ENCRYPT_PROPS_LIST:-REDSHIFT_PASSWORD
    

    Old

    POLICYSYNC_ADD_ENCRYPT_PROPS_LIST:-REDSHIFT_PASSWORD
    

    To encrypt properties used by Encryption:

    >#Additional properties to be encrypted for Crypto
    CRYPTO_ENCRYPT_PROPS_LIST:-

    To

  4. Run the following command.

    >./privacera-manager.sh update
    

    For a Kubernetes configuration, you also need to run the following command:

    ./privacera-manager.sh restart
    
  5. To check keystores generated for the respective services.

    ls ~/privacera/privacera-manager/config/keystores