Connect Google BigQuery to PrivaceraCloud

This topic describes how to connect a BigQuery application to PrivaceraCloud for access management and Data Discovery.

Enable Privacera Access Management for BigQuery

Go to Settings > Applications.
On the Applications screen, select BigQuery.
Enter the application Name and Description, and then click SAVE.
Edit application page appears.
In the Access Management section, click the toggle button.
In the BASIC tab, enter the values in the required(*) fields and click SAVE. For more information, see Basic properties table.
In the ADVANCED tab, you can add Advanced properties and Custom properties table.
Caution
Advanced properties should be modified in consultation with Privacera.
Click the IMPORT PROPERTIES link to browse and import application properties.

Google BigQuery connector properties on PrivaceraCloud

Table 5. Basic fields

Field name	Type	Default	Required	Description
BigQuery project location	`string`	`us`	Yes	Specifies the geographical region where the taxonomy for the PolicySync should be created.
BigQuery project id	`string`		Yes	Specifies the Google project ID where your Google BigQuery data source resides. For example: `privacera-demo-project`.
Service account email	`string`		Yes	Specifies the service account email address that PolicySync uses. You must specify this value if you are not using a Google Cloud Platform (GCP) virtual machine attached service account.
BigQuery private key content	`string`		No	Specifies the Google Cloud Platform (GCP) account credential key JSON content. PolicySync uses this data to connect to Google BigQuery.
Projects to set access control policies	`string`		Yes	Specifies a comma-separated list of project names to which access control is managed by PolicySync. If unset, PolicySync manages all projects. If specified, use the following format. You can use wildcards. Names are case-sensitive. The list of projects to ignore takes precedence over any projects specified by this setting. An example list of projects might resemble the following: `testproject1,testproject2,sales_project*`.
Native public group identity name	`string`		Yes	Set this property to your preferred value, policysync uses this native public group for access grants whenever there is policy created referring to public group inside it. The following values are allowed: `ALL_AUTHENTICATED_USERS`: All gcp project authenticated users. `ALL_USERS`: All google authenticated users.
Enable audit	`boolean`	`false`	Yes	Specifies whether Privacera fetches access audit data from the data source.

Table 6. Advanced fields

Field name	Type	Default	Required	Description
Create custom iam roles in gcp	`boolean`	`true`	No	Specifies whether PolicySync automatically creates custom IAM roles in your Google Cloud Platform project or organization for fine-grained access control (FGAC). If set to `false`, you must create all required custom IAM roles manually in your GCP project or organization. The default value is `true`.
GCP custom iam roles scope	`string`	`project`	No	Specifies whether PolicySync creates and uses custom IAM roles at the project or organizational level in Google Cloud Platform (GCP). The following values are allowed: `project`: Create and use custom IAM roles from each individual project level. `org`: Create and use custom IAM roles at the organizational level.
GCP organization id	`string`		No	Specifies the Google Cloud Platform (GCP) organizational ID. Specify this only if you configured PolicySync to use custom IAM roles at the organizational level.
Datasets to set access control policies	`string`		Yes	Specifies a list of comma-separated datasets that PolicySync manages access control to. You can use wildcards in the value. Names are case-sensitive. If you want to manage all datasets, do not set a value. For example: testproject1.dataset1,testproject2.dataset2,sales_project.sales You can configure the postfix by specifying Secure view dataset name postfix. If specified, the Datasets to ignore while setting access control policies setting takes precedence over this setting.
Tables to set access control policies	`string`		No	Specifies a comma-separated list of table names for which PolicySync manages access control. You can use wildcards. Use the following format when specifying a table: <PROJECT_NAME>.<DATASET_NAME>.<TABLE_NAME> If specified, Tables to ignore while setting access control policies takes precedence over this setting. If you specify a wildcard, such as in the following example, all matched tables are managed: `<PROJECT_NAME>.<DATASET_NAME>.*` The specified value, if any, is interpreted in the following ways: If unset, access control is managed for all datasets. If set to `none` no datasets are managed.
Projects to ignore while setting access control policies	`string`		No	Specifies a comma-separated list of project names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all projects are subject to access control. For example: `testproject1,testproject2,sales_project`. This setting supersedes any values specified by Projects to set access control policies*.
Datasets to ignore while setting access control policies	`string`		No	Specifies a comma-separated list of dataset names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all datasets are subject to access control. For example: `testproject1.dataset1,testproject2.dataset2,sales_project.sales`. This setting supersedes any values specified by Datasets to set access control policies.
Tables to ignore while setting access control policies	`string`		No	Specifies a comma-separated list of table names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all tables are subject to access control. Specify tables using the following format: <PROJECT_NAME>.<DATASET_NAME>.<TABLE_NAME> This setting supersedes any values specified by Tables to set access control policies.
Users to set access control policies	`string`		No	Specifies a comma-separated list of user names for which PolicySync manages access control. You can use wildcards. Names are case-sensitive. If not specified, PolicySync manages access control for all users. If specified, Users to be ignored by access control policies takes precedence over this setting. An example user list might resemble the following: `user1,user2,dev_user*`.
Groups to set access control policies	`string`		No	Specifies a comma-separated list of group names for which PolicySync manages access control. If unset, access control is managed for all groups. If specified, use the following format. You can use wildcards. Names are case-sensitive. An example list of projects might resemble the following: `group1,group2,dev_group`. If specified, Groups to be ignored by access control policies* takes precedence over this setting.
Users to be ignored by access control policies	`string`		No	Specifies a comma-separated list of user names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all users are subject to access control. This setting supersedes any values specified by Users to set access control policies.
Groups to be ignored by access control policies	`string`		No	Specifies a comma-separated list of group names that PolicySync does not provide access control for. You can specify wildcards. Names are case-sensitive. If not specified, all groups are subject to access control. This setting supersedes any values specified by Groups to set access control policies.
Set access control policies only on the users from managed groups	`boolean`	`false`	No	Specifies whether to manage only the users that are members of groups specified by Groups to set access control policies. The default value is false.
Enforce bigquery native row filter	`boolean`	`false`	No	Specifies whether to use the data source native row filter functionality. This setting is disabled by default. When enabled, you can create row filters only on tables, but not on views.
Enforce masking policies using secure views	`boolean`	`true`	No	Specifies whether to use secure view based masking. The default value is `true`.
Enforce row filter policies using secure views	`boolean`	`true`	No	Specifies whether to use secure view based row filtering. The default value is `true`. While Google BigQuery supports native filtering, PolicySync provides additional functionality that is not available natively. Enabling this setting is recommended.
Create secure view for all tables/views	`boolean`	`true`	No	Specifies whether to create secure views for all tables and views that are created by users. If enabled, PolicySync creates secure views for resources regardless of whether masking or filtering policies are enabled.
Default masking value for numeric datatype	`integer`	`0`	No	Specifies the masking value used for numeric data types.
Default masking value for text/string datatype	`string`	`<MASKED>`	No	Specifies the masking value used for text or string data types.
Secure view name prefix	`string`		No	Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name. If you want to change the secure view dataset name prefix, specify a value for this setting. For example, if the prefix is `dev_`, then the secure view name for a table named `example1` is `dev_example1`.
Secure view name postfix	`string`		No	Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name. If you want to change the secure view dataset name postfix, specify a value for this setting. For example, if the postfix is `_dev`, then the secure view name for a table named `example1` is `example1_dev`.
Secure view dataset name prefix	`string`		No	Specifies a prefix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name. If you want to change the secure view dataset name prefix, specify a value for this setting. For example, if the prefix is `dev_`, then the secure view name for a dataset named `example1` is `dev_example1`.
Secure view dataset name postfix	`string`	`_secure`	No	Specifies a postfix string for secure views. By default view-based row filter and masking-related secure views have the same dataset name as the table dataset name. If you want to change the secure view dataset name postfix, specify a value for this setting. For example, if the postfix is `_dev`, then the secure view name for a dataset named `example1` is `example1_dev`.
Enable this for policy enforcements and user/group/role management.	`boolean`	`true`	Yes	Specifies whether PolicySync performs grants and revokes for access control and creates, updates, and deletes queries for users, groups, and roles. The default value is `true`.
Enable to use data admin functionality.	`boolean`	`true`	No	This property is used to enable the data admin feature. With this feature enabled you can create all the policies on native tables/views, and respective grants will be made on the secure views of those native tables/views. These secure views will have row filter and masking capability. In case you need to grant permission on the native tables/views then you can select the permission you want plus data admin in the policy. Then those permissions will be granted on both the native table/view as well as its secure view.
ignore audit for users	`string`		No	Specifies a comma separated list of users to exclude when fetching access audits. For example: `"user1,user2,user3"`.
project id used to fetch BigQuery audits	`string`		No	Specifies the project ID where Google BigQuery stores audit log data.
dataset used to fetch BigQuery audits	`string`		No	Specifies the name of the dataset where Google BigQuery logs audit data. Privacera uses this data for running audit queries.

Table 7. Custom fields

Canonical name	Type	Default	Description
`use.vm.credentials`	`boolean`	`false`	Specifies whether the PolicySync uses the service account attached to your virtual machine for the credentials to connect to the data source.
`custom.iam.roles.name.mapping`	`string`		Specifies a list of mappings between PolicySync custom IAM role names and your custom role names. Use the following format when specifying your custom role names: <PRIVACERA_DEFAULT_ROLE_NAME_1>:<CUSTOM_ROLE_NAME_1> <PRIVACERA_DEFAULT_ROLE_NAME_2>:<CUSTOM_ROLE_NAME_2> The following is a list of the default custom role names: `PrivaceraGBQProjectListRole` `PrivaceraGBQJobListRole` `PrivaceraGBQJobListAllRole` `PrivaceraGBQJobCreateRole` `PrivaceraGBQJobGetRole` `PrivaceraGBQJobUpdateRole` `PrivaceraGBQJobDeleteRole` `PrivaceraGBQDatasetCreateRole` `PrivaceraGBQDatasetGetMetadataRole` `PrivaceraGBQDatasetUpdateRole` `PrivaceraGBQDatasetDeleteRole` `PrivaceraGBQTableListRole` `PrivaceraGBQTableCreateRole` `PrivaceraGBQTableGetMetadataRole` `PrivaceraGBQTableQueryRole` `PrivaceraGBQTableExportRole` `PrivaceraGBQTableUpdateMetadataRole` `PrivaceraGBQTableUpdateRole` `PrivaceraGBQTableSetCategoryRole` `PrivaceraGBQTableDeleteRole` `PrivaceraGBQTransferUpdateRole` `PrivaceraGBQTransferGetRole`
`load.resources`	`string`	`load_from_dataset_columns`	Specifies how PolicySync loads resources from Google BigQuery. The following values are allowed: `load_md`: Load resources from Google BigQuery with a top-down resources approach, that is, it first loads the project and then the dataset followed by tables and its columns. `load_from_dataset_columns`: Load resources one by one for each resource type that is, it loads all projects first, then it loads all datasets in all projects, followed by all tables in all datasets and its columns. This mode is recommended since it is faster than the load mode.
`sync.interval.sec`	`integer`	`60`	Specifies the interval in seconds for PolicySync to wait before checking for new resources or changes to existing resources.
`sync.serviceuser.interval.sec`	`integer`	`420`	Specifies the interval in seconds for PolicySync to wait before reconciling principals with those in the data source, such as users, groups, and roles. When differences are detected, PolicySync updates the principals in the data source accordingly.
`sync.servicepolicy.interval.sec`	`integer`	`540`	Specifies the interval in seconds for PolicySync to wait before reconciling Apache Ranger access control policies with those in the data source. When differences are detected, PolicySync updates the access control permissions on data source accordingly.
`audit.interval.sec`	`integer`	`30`	Specifies the interval in seconds to elapse before PolicySync retrieves access audits and saves the data in Privacera.
`user.name.replace.from.regex`	`string`	[~`$&+:;=?@#\|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]	Specifies a regular expression to apply to a username and replaces each matching character with the value specified by the `user.name.replace.to.string` setting. If not specified, no find and replace operation is performed.
`user.name.replace.to.string`	`string`	`_`	Specifies a string to replace the characters matched by the regex specified by the `user.name.replace.from.regex` setting. If not specified, no find and replace operation is performed.
`group.name.replace.from.regex`	`string`	[~`$&+:;=?@#\|'<>.^*()_%\\\\[\\\\]!\\\\-\\\\/\\\\\\\\{}]	Specifies a regular expression to apply to a group and replaces each matching character with the value specified by the `group.name.replace.to.string` setting. If not specified, no find and replace operation is performed.
`group.name.replace.to.string`	`string`	`_`	Specifies a string to replace the characters matched by the regex specified by the `group.name.replace.from.regex` setting. If not specified, no find and replace operation is performed.
`column.access.control.type`	`string`	`view`	Specifies how PolicySync manages column-level access control. The following values are allowed: `view`: Use view-based column level access control. Any columns that a user cannot access appears as null in the secure view of the table or the secure view of the native view.
`policy.name.separator`	`string`	`_`	Specifies a string to use as part of the name of native row filter and masking policies.
`row.filter.policy.name.template`	`string`	`row_filter_item_`	Specifies a template for the name that PolicySync uses when creating a row filter policy. For example, given a table `data` from the `ds` dataset that resides in the `proj` project, the row filter policy name might resemble the following: proj_priv_ds_priv_data_<ROW_FILTER_ITEM_NUMBER>
`masking.functions.dataset.name`	`string`	`privacera_dataset`	Specifies the name of the dataset where PolicySync creates custom masking functions.
`secure.view.name.remove.suffix.list`	`string`		Specifies a suffix to remove from a table or view name. For example, if the table is named `example_suffix` you can remove the `_suffix` string. This transformation is applied before any custom prefix or postfix is applied. You can specify a single suffix or a comma separated list of suffixes.
`secure.view.dataset.name.remove.suffix.list`	`string`		Specifies a suffix to remove from a secure view dataset name. For example, if the dataset is named `some_name_ds` you can remove the `_ds` string. This transformation is applied before any custom prefix or postfix is applied. You can specify a single suffix or a comma separated list of suffixes, such as `_raw,_qa,_prod`.
`authorized.view.acl.updater.interval.sec`	`integer`	`10`	Specifies the interval at which the authorized view ACLs updater thread updates the permissions in the dataset if any permission updates are pending.
`perform.grant.updates.max.retry.attempts`	`integer`	`2`	Specifies the maximum number of attempts that PolicySync makes to execute a grant query if it is unable to do so successfully. The default value is `2`.
`perform.grant.updates.batch`	`boolean`	`true`	Specifies whether PolicySync applies grants and revokes in batches. If enabled, this behavior improves overall performance of applying permission changes.
`audit.log.load.max.interval.minutes`	`integer`	`30`	Specifies the maximum interval, in minutes, of the time window that SQL queries use to retrieve access audit information. If there are a large number of audits records, narrowing the window interval improves performance. For example, if the interval is set to `30`, SQL queries similar to the following are executed: SELECT * FROM audits where time_from=00:01 and time_to=00:30; SELECT * FROM audits where time_from=00:31 and time_to=01:00; SELECT * FROM audits where time_from=01:01 and time_to=01:30;

Enable Data Discovery for BigQuery

In the Data Discovery section, click the toggle button.
On the BASIC tab, paste credential JSON file details in Google Cloud Service Account Credential field.
On the ADVANCED tab, you can add custom properties.
Using IMPORT PROPERTIES , you can browse and import application properties.
Click the TEST CONNECTION button to check if the connection is successful, and then click Save.

Add and scan resources

Go to PrivaceraCloud > Privacera Discovery > Data Source to add a resources using this connection as Discovery targets. See Privacera Discovery scan targets for quick start steps.

Privacera Documentation

Table of ContentsTable of Contents

Connect Google BigQuery to PrivaceraCloud

Enable Privacera Access Management for BigQuery

Caution

Google BigQuery connector properties on PrivaceraCloud

Enable Data Discovery for BigQuery

Add and scan resources