Skip to main content

Privacera Documentation

Table of Contents

Add JDBC-based systems as data sources for Discovery on Privacera Platform

The following systems can be connected to Privacera Discovery as data sources via Java Database Connectivity (JDBC):

  • Amazon Aurora

  • Microsoft SQL Server

  • MySQL

  • Oracle

  • Postgres

  • PrestoSQL

    Note

    Starburst PrestoSQL versions are supported through version 350-e.

  • Redshift

  • Snowflake

  • Spark SQL

  • Synapse

  • Trino

  • Starburst

  • Databricks Unity Catalog

  • Vertica

  • EMR Hive

The general process to connect the JDBC-based systems as data sources for Discovery is as follows:

  1. Create or identify a service user in the target system with read/write privileges.

  2. Determine the JDBC connection string to the data and database name in that target.

  3. Define these details as properties in the Privacera Platform.

Prerequisites

Ensure the following details are available before starting the data source definition in Privacera:

  • A username and password in the target system that has read/write permission.

  • The name of the JDBC driver you need.

  • A JDBC connection string to communicate with the target data source.

  • Keytab and krb5.conf file (Applicable only for EMR Hive)

    Note

    For EMR Hive, the Username and Password are not required. You can type any dummy or random values in Username and Password fields only for the system acceptance purpose.

Required properties for JBDC data sources on Privacera Platform

Values for the following properties are described in Required Name of JDBC Driver per Target System, Username and Password, and Required JDBC Connection String.

Note

The format of the jdbc.url value varies by target system. Not all systems require databaseName.

jdbc.driver.class=<jdbc_driver_name>
jdbc.username=<user_with_readwrite_permission>
jdbc.password=<login_credentials_of_identified_user>
jdbc.url=jdbc:<protocol>://<hostname>:<port>;databaseName=<database_name>

Note

Additionally, following properties are required only for EMR Hive:

kerberos.principal = <principle>
Enable Kerberos = true

Required name of JDBC Driver per target system

Depending on the target system, for the jdbc.driver.class definition you enter in the Privacera properties, use one of the JDBC drivers shown as follows:

  • Amazon Aurora: org.mariadb.jdbc.Driver

  • Microsoft SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver

  • MySQL: com.mysql.jdbc.Driver

  • Oracle: oracle.jdbc.driver.OracleDriver

  • Postgres: org.postgresql.Driver

  • PrestoSQL: org.apache.hive.jdbc.HiveDriver

  • Redshift: com.amazon.redshift.jdbc.Driver

  • Snowflake: net.snowflake.client.jdbc.SnowflakeDriver

  • Spark SQL (Databricks): org.apache.hive.jdbc.HiveDriver

  • Synapse: com.microsoft.sqlserver.jdbc.SQLServerDriver

  • Trino: io.trino.jdbc.TrinoDriver

  • Starburst: io.trino.jdbc.TrinoDriver

  • Databricks Unity Catalog: com.databricks.client.jdbc.Driver

  • Vertica: com.vertica.jdbc.Driver

  • EMR Hive: org.apache.hive.jdbc.HiveDriver

Username and password

Identify the name of a user who must have read/write permission in your data source and the login credentials for that user. These values are needed for jdbc.username and jdbc.password properties in Privacera.

Note

For EMR Hive, type any dummy values in the Username and Password fields only for the system acceptance purpose. Authentication for the EMR is done using keytab file.

Required JDBC connection string

The jdbc.url value you enter in the Privacera properties must be one of the following, where <domainName>, <port>, and other variables are for your specific environment:

  • Amazon Aurora: jdbc:mysql://<domainName>:<port>/<dbname>

  • Microsoft SQL Server: jdbc:sqlserver://<domainName>:<port>;databaseName=<db_name>

  • MySQL: jdbc:mysql://<domainName>:<port>/<dbname>

  • Oracle: jdbc:oracle:thin:@//<domainName>:<port>/<dbname>.localdomain

  • Postgres: jdbc:postgresql://<domainName>:<port>/<dbname>

  • PrestoSQL: jdbc:presto://<domainName>:<port>/<catalog_name>

  • Redshift: jdbc:postgresql://<domainName>:<port>/<dbname>

  • Snowflake: jdbc:snowflake://<domainName>:<port>/?warehouse=<name_of_policysync_warehouse>

  • Databricks Spark SQL: jdbc:hive2://<domainName>:<port>/default;transportMode=http;ssl=true;httpPath=sql/protocolv1/o/0/xxxx-xxxxxx-xxxxxxxx;AuthMech=3;

  • Synapse: jdbc:sqlserver://<domainName>:<port>;databaseName=<dbname>

  • Trino: jdbc:trino://<host>:<port>/<catalog>

  • Starburst: jdbc:trino://<host>:<port>/<catalog>

    Note

    The following three databases can be added as catalog on Trino and Starburst server:

    • MySQL

    • Oracle

    • PostgreSQL

  • Databricks Unity Catalog:jdbc:databricks://hostname:<port>/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/xxxxxxxx;

  • Vertica: jdbc:vertica://<host>:<port>/<database_name>

  • EMR Hive: jdbc:hive2://<host>:<port>/<database>;principal=<principal>

Add JDBC-Based data source on Privacera Platform

These are the setup and steps to add a JDBC-based data source. Have the details listed in the planning sections above ready to enter into the data source definition in Privacera

To add a JDBC-based data source in Privacera Platform:

  1. From the navigation menu, select Settings > Data Source Registration.

  2. (Optional) Click Add System or modify an existing data source.

  3. Enter a name and description for this data source.

  4. Click Save.

  5. Locate the new data source system name and from the wrench icon on the right, select Add Data Source.

  6. In the Add Data Source dialog, on the Choose tab, select JDBC APPLICATION.

  7. On the Configure tab:

  8. Enter a required Application Name of your choice.

  9. Enter a required Application Code of your choice. This is an identifier for your own use.

  10. If you have prepared a properties file in JSON format, click Import Properties and load the file.

  11. Scroll to find the following properties and enter the values you prepared:

    Note

    For EMR Hive, provide dummy values for jdbc.username and jdbc.password only for the system acceptance purpose. Authentication for the EMR is done using keytab file. Refer to the section Provide Keytab values to the Privacera Platform (Applicable for EMR Hive) to provide the keytab values to the Privacera Platform.

    • jdbc.username

    • jdbc.password

    • jdbc.driver.class

    • jdbc.url

  12. Accept the default values for all other properties or modify them, if needed.

  13. To verify the properties, click Test Connection.

  14. Click Next to save the data source or Back to return to the Choose tab.

Provide Keytab values to the Privacera Platform (Applicable for EMR Hive)

Perform following steps to provide the keytab values to the Privacera Platform for the EMR Hive:

  1. Run the following command to change directory to privacera manager:

    cd ~/privacera/privacera-manager/
  2. Run the following commands to copy the keytab and krb5.conf files:

    cp <keytab file full path> config/custom-properties/
    cp <krb5.conf file full path> config/custom-properties/
  3. Run the following command to copy the vars.discovery.emr.yml file to custom-vars location:

    cp config/sample-vars/vars.discovery.emr.yml config/custom-vars/
  4. Run the following command to edit the vars.discovery.emr.yml file:

    vi config/custom-vars/vars.discovery.emr.yml
  5. Update the following properties in vars.discovery.emr.yml file:

    DISCOVERY_EMR_KEYTAB: "<file name>"
    DISCOVERY_EMR_KRB5_FILENAME: "<file name>"
  6. Run the following command to update the privacera manager:

    ./privacera-manager update