Skip to main content

Privacera Documentation

Connect Google BigQuery to PrivaceraCloud

This topic describes how to connect a BigQuery application to PrivaceraCloud for access management and Data Discovery.

Enable Privacera Access Management for BigQuery

  1. Go to SettingsApplications.

  2. On the Applications screen, select BigQuery.

  3. Enter the application Name and Description, and then click SAVE.

    Edit application page appears.

  4. In the Access Management section, click the toggle button.

  5. In the BASIC tab, enter the values in the required(*) fields and click SAVE. For more information, see Basic properties table.

  6. In the ADVANCED tab, you can add Advanced properties and Custom properties table.

    Caution

    Advanced properties should be modified in consultation with Privacera.

  7. Click the IMPORT PROPERTIES link to browse and import application properties.

Google BigQuery connector properties on PrivaceraCloud

Table 5. Basic fields

Field name

Type

Default

Required

Description

BigQuery project location

string

us

Yes

Specifies the geographical region where the taxonomy for the PolicySync should be created.

BigQuery project id

string

Yes

Specifies the Google project ID where your Google BigQuery data source resides. For example: privacera-demo-project.

Service account email

string

Yes

Specifies the service account email address that PolicySync uses. You must specify this value if you are not using a Google Cloud Platform (GCP) virtual machine attached service account.

BigQuery private key content

string

No

Specifies the Google Cloud Platform (GCP) account credential key JSON content. PolicySync uses this data to connect to Google BigQuery.

Projects to set access control policies

string

Yes

Specifies a comma-separated list of project names to which access control is managed by PolicySync. If unset, PolicySync manages all projects. If specified, use the following format. You can use wildcards. Names are case-sensitive.

The list of projects to ignore takes precedence over any projects specified by this setting.

An example list of projects might resemble the following: testproject1,testproject2,sales_project*.

Native public group identity name

string

Yes

Set this property to your preferred value, policysync uses this native public group for access grants whenever there is policy created referring to public group inside it. The following values are allowed:

  • ALL_AUTHENTICATED_USERS: All gcp project authenticated users.

  • ALL_USERS: All google authenticated users.

Enable audit

boolean

false

Yes

Specifies whether Privacera fetches access audit data from the data source.



Table 6. Advanced fields

Field name

Type

Default

Required

Description

Create custom iam roles in gcp

boolean

true

No

Specifies whether PolicySync automatically creates custom IAM roles in your Google Cloud Platform project or organization for fine-grained access control (FGAC). If set to false, you must create all required custom IAM roles manually in your GCP project or organization. The default value is true.

GCP custom iam roles scope

string

project

No

Specifies whether PolicySync creates and uses custom IAM roles at the project or organizational level in Google Cloud Platform (GCP). The following values are allowed:

  • project: Create and use custom IAM roles from each individual project level.

  • org: Create and use custom IAM roles at the organizational level.

GCP organization id

string

No

Specifies the Google Cloud Platform (GCP) organizational ID. Specify this only if you configured PolicySync to use custom IAM roles at the organizational level.

Datasets to set access control policies

string

Yes

Specifies a list of comma-separated datasets that PolicySync manages access control to. You can use wildcards in the value. Names are case-sensitive. If you want to manage all datasets, do not set a value. For example:

testproject1.dataset1,testproject2.dataset2,sales_project*.sales*

You can configure the postfix by specifying Secure view dataset name postfix.

If specified, the Datasets to ignore while setting access control policies setting takes precedence over this setting.

Tables to set access control policies

string

No

Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards.

Use the following format when specifying a table:

<PROJECT_NAME>.<DATASET_NAME>.<TABLE_NAME>

If specified, Tables to ignore while setting access control policies takes precedence over this setting.

If you specify a wildcard, such as in the following example, all matched tables are managed:

<PROJECT_NAME>.<DATASET_NAME>.*

The specified value, if any, is interpreted in the following ways:

  • If unset, access control is managed for all datasets.

  • If set to none no datasets are managed.

Projects to ignore while setting access control policies

string

No

Specifies a comma-separated list of project names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all projects are subject to access control.

For example: testproject1,testproject2,sales_project*.

This setting supersedes any values specified by Projects to set access control policies.

Datasets to ignore while setting access control policies

string

No

Specifies a comma-separated list of dataset names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all datasets are subject to access control.

For example: testproject1.dataset1,testproject2.dataset2,sales_project*.sales*.

This setting supersedes any values specified by Datasets to set access control policies.

Tables to ignore while setting access control policies

string

No

Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all tables are subject to access control. Specify tables using the following format:

<PROJECT_NAME>.<DATASET_NAME>.<TABLE_NAME>

This setting supersedes any values specified by Tables to set access control policies.

Users to set access control policies

string

No

Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive.

If not specified, PolicySync manages access control for all users.

If specified, Users to be ignored by access control policies takes precedence over this setting.

An example user list might resemble the following: user1,user2,dev_user*.

Groups to set access control policies

string

No

Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive.

An example list of projects might resemble the following: group1,group2,dev_group*.

If specified, Groups to be ignored by access control policies takes precedence over this setting.

Users to be ignored by access control policies

string

No

Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control.

This setting supersedes any values specified by Users to set access control policies.

Groups to be ignored by access control policies

string

No

Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control.

This setting supersedes any values specified by Groups to set access control policies.

Set access control policies only on the users from managed groups

boolean

false

No

Specifies whether to manage only the users that are members of groups specified by Groups to set access control policies. The default value is false.

Enforce bigquery native row filter

boolean

false

No

Specifies whether to use the data source native row filter functionality. This setting is disabled by default. When enabled, you can create row filters only on tables, but not on views.

Enforce masking policies using secure views

boolean

true

No

Specifies whether to use secure view based masking. The default value is true.

Enforce row filter policies using secure views

boolean

true

No

Specifies whether to use secure view based row filtering. The default value is true.

While Google BigQuery supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.

Create secure view for all tables/views

boolean

true

No

Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.

Default masking value for numeric datatype

integer

0

No

Specifies the masking value used for numeric data types.

Default masking value for text/string datatype

string

<MASKED>

No

Specifies the masking value used for text or string data types.

Secure view name prefix

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a table named example1 is dev_example1.

Secure view name postfix

string

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a table named example1 is example1_dev.

Secure view dataset name prefix

string

No

Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name prefix, specify a value for this setting. For example, if the prefix is dev_, then the secure view name for a dataset named example1 is dev_example1.

Secure view dataset name postfix

string

_secure

No

Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name.

If you want to change the secure view dataset name postfix, specify a value for this setting. For example, if the postfix is _dev, then the secure view name for a dataset named example1 is example1_dev.

Enable this for policy enforcements and user/group/role management.

boolean

true

Yes

Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is true.

Enable to use data admin functionality.

boolean

true

No

This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.

ignore audit for users

string

No

Specifies a comma separated list of users to exclude when fetching access audits. For example: "user1,user2,user3".

project id used to fetch BigQuery audits

string

No

Specifies the project ID where Google BigQuery stores audit log data.

dataset used to fetch BigQuery audits

string

No

Specifies the name of the dataset where Google BigQuery logs audit data. Privacera uses this data for running audit queries.



Table 7. Custom fields

Canonical name

Type

Default

Description

use.vm.credentials

boolean

false

Specifies whether the PolicySync uses the service account attached to your virtual machine for the credentials to connect to the data source.

custom.iam.roles.name.mapping

string

Specifies a list of mappings between PolicySync custom IAM role names and your custom role names. Use the following format when specifying your custom role names:

<PRIVACERA_DEFAULT_ROLE_NAME_1>:<CUSTOM_ROLE_NAME_1>
<PRIVACERA_DEFAULT_ROLE_NAME_2>:<CUSTOM_ROLE_NAME_2>

The following is a list of the default custom role names:

  • PrivaceraGBQProjectListRole

  • PrivaceraGBQJobListRole

  • PrivaceraGBQJobListAllRole

  • PrivaceraGBQJobCreateRole

  • PrivaceraGBQJobGetRole

  • PrivaceraGBQJobUpdateRole

  • PrivaceraGBQJobDeleteRole

  • PrivaceraGBQDatasetCreateRole

  • PrivaceraGBQDatasetGetMetadataRole

  • PrivaceraGBQDatasetUpdateRole

  • PrivaceraGBQDatasetDeleteRole

  • PrivaceraGBQTableListRole

  • PrivaceraGBQTableCreateRole

  • PrivaceraGBQTableGetMetadataRole

  • PrivaceraGBQTableQueryRole

  • PrivaceraGBQTableExportRole

  • PrivaceraGBQTableUpdateMetadataRole

  • PrivaceraGBQTableUpdateRole

  • PrivaceraGBQTableSetCategoryRole

  • PrivaceraGBQTableDeleteRole

  • PrivaceraGBQTransferUpdateRole

  • PrivaceraGBQTransferGetRole

load.resources

string

load_from_dataset_columns

Specifies how PolicySync loads resources from Google BigQuery. The following values are allowed:

  • load_md: Load resources from Google BigQuery with a top-down resources approach, that is, it first loads the project and then the dataset followed by tables and its columns.

  • load_from_dataset_columns: Load resources one by one for each resource type that is, it loads all projects first, then it loads all datasets in all projects, followed by all tables in all datasets and its columns. This mode is recommended since it is faster than the load mode.

sync.interval.sec

integer

60

Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.

sync.serviceuser.interval.sec

integer

420

Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.

sync.servicepolicy.interval.sec

integer

540

Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.

audit.interval.sec

integer

30

Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.

user.name.replace.from.regex

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the user.name.replace.to.string setting.

If not specified, no find and replace operation is performed.

user.name.replace.to.string

string

_

Specifies a string to replace the characters matched by the regex specified by the user.name.replace.from.regex setting.

If not specified, no find and replace operation is performed.

group.name.replace.from.regex

string

[~`$&+:;=?@#|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]

Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the group.name.replace.to.string setting.

If not specified, no find and replace operation is performed.

group.name.replace.to.string

string

_

Specifies a string to replace the characters matched by the regex specified by the group.name.replace.from.regex setting.

If not specified, no find and replace operation is performed.

column.access.control.type

string

view

Specifies how PolicySync manages column-level access control. The following values are allowed:

  • view: Use view-based column level access control. Any columns that a user cannot access appears as null in the secure view of the table or the secure view of the native view.

policy.name.separator

string

_

Specifies a string to use as part of the name of native row filter and masking policies.

row.filter.policy.name.template

string

row_filter_item_

Specifies a template for the name that PolicySync uses when creating a row filter policy. For example, given a table data from the ds dataset that resides in the proj project, the row filter policy name might resemble the following:

proj_priv_ds_priv_data_<ROW_FILTER_ITEM_NUMBER>

masking.functions.dataset.name

string

privacera_dataset

Specifies the name of the dataset where PolicySync creates custom masking functions.

secure.view.name.remove.suffix.list

string

Specifies a suffix to remove from a table or view name. For example, if the table is named example_suffix you can remove the _suffix string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes.

secure.view.dataset.name.remove.suffix.list

string

Specifies a suffix to remove from a secure view dataset name. For example, if the dataset is named some_name_ds you can remove the _ds string. This transformation is applied before any custom prefix or postfix is applied.

You can specify a single suffix or a comma separated list of suffixes, such as _raw,_qa,_prod.

authorized.view.acl.updater.interval.sec

integer

10

Specifies the interval at which the authorized view ACLs updater thread updates the permissions in the dataset if any permission updates are pending.

perform.grant.updates.max.retry.attempts

integer

2

Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is 2.

perform.grant.updates.batch

boolean

true

Specifies whether PolicySync applies grants and revokes in batches. If enabled, this behavior improves overall performance of applying permission changes.

audit.log.load.max.interval.minutes

integer

30

Specifies the maximum interval, in minutes, of the time window that SQL queries use to retrieve access audit information. If there are a large number of audits records, narrowing the window interval improves performance.

For example, if the interval is set to 30, SQL queries similar to the following are executed:

SELECT * FROM audits where time_from=00:01 and time_to=00:30;
SELECT * FROM audits where time_from=00:31 and time_to=01:00;
SELECT * FROM audits where time_from=01:01 and time_to=01:30;


Enable Data Discovery for BigQuery

  1. In the Data Discovery section, click the toggle button.

  2. On the BASIC tab, paste credential JSON file details in Google Cloud Service Account Credential field.

  3. On the ADVANCED tab, you can add custom properties.

  4. Using IMPORT PROPERTIES , you can browse and import application properties.

  5. Click the TEST CONNECTION button to check if the connection is successful, and then click Save.

Add and scan resources

Go to PrivaceraCloud > Privacera Discovery > Data Source to add a resources using this connection as Discovery targets. See Privacera Discovery scan targets for quick start steps.