Google BigQuery(GBQ)#
Google BigQuery provides fine-grained access control on BigQuery Datasets. This includes
- Table-level Access Control
- Column-level Access Control
- Native/Dynamic secure view-based Row Filter
- Masking With Dynamic secure views created using PolicySync
Privacera access control for BigQuery relies on the privacera_bigquery connector service.
Prerequisites#
Create PrivaceraPolicySyncRole
IAM Role#
You need to give Privacera PolicySync basic access to GCP. To grant that access, create PrivaceraPolicySyncRole
IAM role in your GCP project or GCP organization using the following commands on Google Cloud's shell (gcloud). The shell can be installed and accessed locally or through Google Console.
Run the following command to create the file containing the permissions required for the PrivaceraPolicySyncRole
role:
ROLE_NAME="PrivaceraPolicySyncRole"
cat << EOF > ${ROLE_NAME}.yaml
title: "${ROLE_NAME}"
description: "${ROLE_NAME}"
stage: "ALPHA"
includedPermissions:
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
- resourcemanager.projects.setIamPolicy
- iam.roles.list
- iam.roles.get
- iam.roles.create
- iam.roles.update
- bigquery.jobs.create
- bigquery.datasets.get
- bigquery.datasets.create
- bigquery.datasets.update
- bigquery.datasets.delete
- bigquery.datasets.getIamPolicy
- bigquery.datasets.setIamPolicy
- bigquery.tables.list
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.create
- bigquery.tables.update
- bigquery.tables.delete
- bigquery.tables.getIamPolicy
- bigquery.tables.setIamPolicy
- bigquery.rowAccessPolicies.list
- bigquery.rowAccessPolicies.create
- bigquery.rowAccessPolicies.update
- bigquery.rowAccessPolicies.delete
- bigquery.rowAccessPolicies.getIamPolicy
- bigquery.rowAccessPolicies.setIamPolicy
EOF
GCP Project-level access#
Note
If you have multiple projects in your GCP organization and want them to be managed by a single BigQuery connector, then repeat the steps below for each project. Assign the role to the same service account which will be used across multiple projects.
-
Run the following command. Replace
<GCP_PROJECT_ID>
with your GCP project ID.PROJECT_ID="<GCP_PROJECT_ID>"
-
To create
PrivaceraPolicySyncRole
role in your GCP project, run the following command.gcloud iam roles create ${ROLE_NAME} --project=${PROJECT_ID} --file=${ROLE_NAME}.yaml
GCP Organization-level access#
-
Run the following command. Replace
<GCP_ORGANIZATION_ID>
with your GCP organization ID.ORGANIZATION_ID="<GCP_ORGANIZATION_ID>"
-
To create
PrivaceraPolicySyncRole
role in your GCP organization, run the following command.gcloud iam roles create ${ROLE_NAME} --organization=${ORGANIZATION_ID} --file=${ROLE_NAME}.yaml
Attach IAM Role to Service Account#
To attach the PrivaceraPolicySyncRole
IAM role created above, do the following steps:
-
Log in to your GCP console.
-
Select IAM & admin > Service accounts and click + CREATE SERVICE ACCOUNT.
-
Enter values in the fields and click CREATE.
-
In Grant this service account access to project, select the role as
PrivaceraPolicySyncRole
. -
On the Services Account Page, find the newly created service account and copy the email address of the new service account for use in a later step.
Note
This email will be the Service Account Email for configuring PolicySync in Privacera Manager.
-
If you are using a Google VM machine to configure GBQ for PolicySync, then you can attach the service account created above to your VM machine and skip below steps.
-
On the Services Account Page, go to the Keys tab and click Add Key and select Create New Key.
-
Select the JSON key type, and click CREATE. A JSON key file downloads to your system. Store the file at an accessible location. It will be used for configuring PolicySync in Privacera Manager.
Related Information
Refer the Google documentation for a detailed information on creating a service account - Click here.
Configure Logs for Auditing#
A sink is required to collect all the logs from GBQ. To create a sink, do the following steps:
-
In the search bar, search for Logging, and then click Logs Router, and click Create Sink.
-
Enter the sink name as PolicySyncBigQueryAuditSink, and then click Next.
-
Enter the sink destination.
-
In the Select sink service, select BigQuery.
-
In Select BigQuery dataset, click Create new BigQuery dataset.
-
Enter the Dataset ID as bigquery_audits and click Create Dataset.
-
Click Next.
-
-
Add the BigQuery logs in the sink:
In the Build an inclusion filter, add the following line:
resource.type="bigquery_resource"
-
Click Create Sink.
Related Information
Refer the Google documentation for a detailed information on creating a sink - Click here.
CLI Configuration#
-
SSH to the instance where Privacera is installed.
-
Do the following, if you’re not using VM attached credentials:
-
Access the JSON file of the service account you downloaded using the steps above.
-
Copy the JSON to the
config/custom-vars
folder.
-
-
Run the following commands.
cd ~/privacera/privacera-manager/ cp config/sample-vars/vars.policysync.bigquery.yml config/custom-vars/ vi config/custom-vars/vars.policysync.bigquery.yml
-
Set the properties for your specific installation. For property details and descriptions, see the Configuration Properties section that follows.
Note
Along with the above properties, you can add custom properties that are not included by default. For more information about these properties, see BigQuery Connector.
-
Run the update.
cd ~/privacera/privacera-manager ./privacera-manager.sh update
Configuration Properties#
Property | Mandatory | Description | Default Value | Example |
---|---|---|---|---|
BIGQUERY_V2_ENABLE |
Property to enable/disable BigQuery. | true |
||
BIGQUERY_PROJECT_ID |
Yes | This property is used to set the Google project Id for initial interaction with BigQuery APIs to explore other projects and datasets. | privacera-demo-project |
|
BIGQUERY_PROJECT_LOCATION |
Yes | Set this property to the geographical region where the PolicySync's taxonomy will be created. | us |
|
BIGQUERY_JDBC_URL |
This property is used to set the JDBC URL for connecting to the bigquery server. | jdbc:bigquery:example.com |
||
BIGQUERY_USE_VM_CREDENTIALS |
Yes | This property is used if you want to connect toBigQueryusing the service account attached to your virtual machine. |
true |
|
BIGQUERY_OAUTH_SERVICE_ACCOUNT_EMAIL |
Yes | This property is used to specify the Email Address for the service account that will be used by PolicySync to connect to BigQuery. | ||
BIGQUERY_OAUTH_PRIVATE_KEY_PATH |
This property sets the path of the service account credential JSON file downloaded from the Google Service account keys section. This property value is needed when |
/workdir/policysync/cust_conf/policysync-gbq-service-account.json |
||
BIGQUERY_OAUTH_PRIVATE_KEY_FILE_NAME |
This property sets the name of the service account credential JSON file downloaded from the Google Service account keys section. Then |
|||
BIGQUERY_AUDIT_ENABLE |
Yes | This property enables access audit retrieval from BigQuery. | false |
|
BIGQUERY_AUDIT_EXCLUDED_USERS |
Yes | This property is used to specify a list of users whose access rights we want to ignore. It takes list of comma-separated Email Addresses of the users. |
||
BIGQUERY_MANAGE_DATASET_LIST |
Yes | This property is used to set comma-separated dataset FQDN on which access control will be managed by PolicySync. If you want to manage all datasets then keep the value blank. It supports wildcards. The ignore dataset list has precedence over manage dataset list. | testproject1.dataset1,testproject2.dataset2,sales_project*.sales* |
|
BIGQUERY_MANAGE_PROJECT_LIST |
Yes | This property is used to set comma-separated project names on which access control will be managed by PolicySync. If you want to manage all projects then keep the value blank. It supports wildcards. The ignore projects list has precedence over the manage projects list. | testproject1,testproject2,sales_project* |
|
BIGQUERY_GRANT_UPDATES |
Yes | This property determines whether actual grant/revoke runs on bigquery. | true |
|
BIGQUERY_NATIVE_PUBLIC_GROUP_IDENTITY_NAME |
Yes | Set this property to your preferred value, and policysync will use this native public group for access grants whenever a policy with a public group reference is created. | All authenticated users: all GCP project authenticated users. All users: all Google authenticated users. |
Limitations#
Policy Permissions | Limitation |
---|---|
View-based Row Filter | Only the first satisfied condition is evaluated from row filter policy items. |
Native Row Filter | All the row filter policy items from the row filter policy will be evaluated as policysync translates each policy item as single row access policy on the BigQuery table and then the BigQuery handles evaluation part. |
Lag on Access Control Enforcement | PolicySync only process any changes in the policy in Ranger and submit it as a request to BigQuery. BigQuery takes its time to enforce the policies. For more information, see the following: |
User/Group/Role Policies | BigQuery connector won’t create any user/group/roles in Google. Only user/group level policies will work. |
Validation#
Let's test the access control using BigQuery by defining some test policies for a test user (emily).
-
Prepare test data in GCP. Refer Google documentation for a detailed information on running queries in BigQuery - Click here.
-
Log in to GCP console, navigate to BigQuery and then click Compose New Query.
-
In the Query editor text area, copy the following query:
-- Create Dataset CREATE SCHEMA customer_dataset; -- Create Table CREATE TABLE IF NOT EXISTS customer_dataset.customer_data ( id INT64, person_name string, domain string, ssn string, country string, us_phone string, address string, account_id string, zipcode string); -- Insert Data into table insert into customer_dataset.customer_data values (1, 'Nancy','nancy@yahoo.com','201-99-5532','US','856-232-9702','939 Park Avenue','159635478','33317'); insert into customer_dataset.customer_data values (2,'Gene','gene@google.us','202-99-5532','UK','954-583-0575','303 Johnston Blvd','236854569','95202'); insert into customer_dataset.customer_data values (3,'Edward','edward@facebook.com','203-99-5532','US','209-626-9041','130 Hollister','365412985','60173'); insert into customer_dataset.customer_data values (4,'Pearlene','pearlene@gmail.com','204-99-5532','US','708-471-6810','17 Warren Rd','452189732','90017'); insert into customer_dataset.customer_data values (5,'James','james@cuvox.de','205-99-5532','US','661-338-6787','898 Newport Gray Rd','517836427','94041'); insert into customer_dataset.customer_data values (6,'Pamela','pamela@cuvox.de','206-99-5532','UK','650-526-5259','861 Strick Rd','685231473','80214'); insert into customer_dataset.customer_data values (7,'Donna','donna@fleckens.hu','207-99-5532','US','303-239-4282','1784 S Shore Dr','789563258','1730'); insert into customer_dataset.customer_data values (8,'Amy','amy@gustr.com','208-99-5532','US','774-553-4736','9522 Apple Valley Dr','854126945','55102'); insert into customer_dataset.customer_data values (9,'Adam','adam@teleworm.us','209-99-5532','UK','651-297-1448','745 Old Springville Rd','965412381','43201'); insert into customer_dataset.customer_data values (10,'Lucille','lucille@armyspy.com','210-99-5532','US','740-320-1270','4223 Midway Road','785651236','89102'); insert into customer_dataset.customer_data values (11,'Edard','edu@gustr.com','211-99-5532','UK','702-257-8796','3659 Dye Street','965121354','53207'); insert into customer_dataset.customer_data values (12,'Nick','nick@jourrapide.com','212-99-5532','US','414-483-8638','2966 Nutters Barn Lane','563515264','72764'); insert into customer_dataset.customer_data values (13,'Brian','brian@einrot.com','213-99-5532','US','479-872-9783','3300 Worthington Drive','654621233','91303'); insert into customer_dataset.customer_data values (14,'Stella','stella@jourrapide.com','214-99-5532','US','818-596-6681','1893 Ingram Road','261613654','35816'); insert into customer_dataset.customer_data values (15,'Leona','leona@dayrep.com','215-99-5532','UK','256-250-5413','4244 Burnside Court','986513211','75069'); -- Verify table by running select query SELECT * FROM customer_dataset.customer_data;
-
Click Run.
-
-
Create test user in Privacera Portal as emily. For more information, click here.
-
In GCP console, the user emily gets added after you create the user in step 2.
-
Check the user emily can list the projects inside your organization.
-
Check user emily does not have Full Admin or Full Reader access on BigQuery.
-
-
Create a policy for emily to run queries and list dataset and tables.
-
Check the access control on the test data in GCP.
A) Table-level Access Control
-
In Privacera Portal, create a policy Customer Data Full Access for accessing table.
-
Log in to GCP console with credentials of the test user emily.
-
Navigate to BigQuery.
-
Run the following query.
SELECT * FROM customer_dataset.customer_data; SELECT * FROM customer_dataset_secure.customer_data;
User emily can access and view the data.
-
In Privacera Portal, disable the policy.
-
In GCP, run the following query.
SELECT * FROM customer_dataset.customer_data; SELECT * FROM customer_dataset_secure.customer_data;
User emily can not access and view the data.
B) View-based Column-level Access Control
In the view-based column-level access control, you have to create a column-level policy on the table. The columns which are not permitted from that policy will be shown as NULL in the secure view of the table.
-
In Privacera Portal, do the following:
Create a policy Customer Data Column Level Access granting access to a few columns.
-
Log in to GCP console with credentials of the test user emily.
-
Navigate to BigQuery.
-
Run the following queries.
User emily will not see person_name column in the secure view of customer_data table..
SELECT * FROM customer_dataset_secure.customer_data;
-
In Privacera Portal, disable the Customer Data Column Level Access policy.
C) View-based Row-level Filter
-
In Privacera Portal, enable Customer Data Full Access policy created above.
emily can view
customer_data
for US and UK from table and in secure view. -
Log in to GCP console with credentials of the test user emily.
-
Navigate to BigQuery.
-
Run the following queries.
Query to view data from customer data secure view. It will show data from countries US and UK.
SELECT * FROM customer_dataset_secure.customer_data;
User emily can access and view the data.
-
In Privacera Portal, create a Customer Access by Country policy to access data only from UK.
-
Run the following queries.
Query to view data from customer data secure view. The row filter policy gets applied showing data from country UK.
SELECT * FROM customer_dataset_secure.customer_data;
D) View-based Masking
-
Log in to GCP console with credentials of the test user emily.
-
Navigate to BigQuery.
-
Run the following query.
SELECT * FROM customer_dataset_secure.customer_data;
User emily can view the SSN values.
-
In Privacera Portal, create a Mask SSN policy to mask SSN values for emily.
-
Run the following query.
SELECT * FROM customer_dataset_secure.customer_data;
User emily cannot view the the SSN values, since it is masked using md5 encryption.
-
Create custom IAM roles#
By default, PolicySync creates all the IAM roles listed in the table below automatically to perform access control in GBQ. If you want to create the custom IAM roles manually, then disable the BIGQUERY_CREATE_CUSTOM_IAM_ROLES
by setting its value to false
.
In the GCP console, you need the map the roles with its appropriate permissions. The role and its permissions mapping are given in the table below. For creating a custom role in GCP and adding its corresponding permissions, see Creating a custom role.
Role Name | GCP Permissions |
---|---|
PrivaceraGBQProjectListRole | resourcemanager.projects.get |
PrivaceraGBQJobListRole | bigquery.jobs.list |
PrivaceraGBQJobListAllRole | bigquery.jobs.listAll |
PrivaceraGBQJobCreateRole | bigquery.jobs.create |
PrivaceraGBQJobGetRole | bigquery.jobs.get |
PrivaceraGBQJobUpdateRole | bigquery.jobs.update |
PrivaceraGBQJobDeleteRole | bigquery.jobs.delete |
PrivaceraGBQDatasetCreateRole | bigquery.datasets.create |
PrivaceraGBQDatasetGetMetadataRole | bigquery.datasets.get |
PrivaceraGBQDatasetUpdateRole | bigquery.datasets.update |
PrivaceraGBQDatasetDeleteRole | bigquery.datasets.delete |
PrivaceraGBQTableListRole | bigquery.tables.list |
PrivaceraGBQTableCreateRole | bigquery.tables.create |
PrivaceraGBQTableGetMetadataRole | bigquery.tables.get |
PrivaceraGBQTableQueryRole | bigquery.tables.getData |
PrivaceraGBQTableExportRole | bigquery.tables.export |
PrivaceraGBQTableUpdateMetadataRole | bigquery.tables.update |
PrivaceraGBQTableUpdateRole | bigquery.tables.updateData |
PrivaceraGBQTableSetCategoryRole | bigquery.tables.setCategory |
PrivaceraGBQTableDeleteRole | bigquery.tables.delete |
PrivaceraGBQTransferUpdateRole | bigquery.transfers.update |
PrivaceraGBQTransferGetRole | bigquery.transfers.get |
Based on the GCP Resource hierarchy, you can create the roles above at the organization or project level. If you have multiple projects in your GCP organization, it is recommended to create all the roles at the organization level instead at the project level. Once the roles are defined at the organization level, all the projects under that organization will inherit all the roles.