Skip to content

Google BigQuery (GBQ)#

Google BigQuery provides fine-grained access control on BigQuery Datasets. This includes

  • Table-level Access Control
  • Column-level Access Control
  • Native/Dynamic secure view-based Row Filter
  • Masking With Dynamic secure views created using PolicySync

Prerequisites#

Create IAM Role#

Create PrivaceraPolicySyncRole IAM role in your GCP project using the following commands on Google Cloud's shell (gcloud). The shell can be installed and accessed locally or through Google Console.

  1. Run the following command. Replace <GCP_PROJECT_ID> with your GCP project ID.

    PROJECT_ID="<GCP_PROJECT_ID>"
    
  2. To create PrivaceraPolicySyncRole role in your GCP project, run the following command.

    ROLE_NAME="PrivaceraPolicySyncRole"
    
    cat << EOF > ${ROLE_NAME}.yaml
    title: "${ROLE_NAME}"
    description: "${ROLE_NAME}"
    stage: "ALPHA"
    includedPermissions:
    - bigquery.datasets.create
    - bigquery.datasets.delete
    - bigquery.datasets.get
    - bigquery.datasets.getIamPolicy
    - bigquery.datasets.setIamPolicy
    - bigquery.datasets.update
    - bigquery.datasets.updateTag
    - bigquery.jobs.create
    - bigquery.jobs.get
    - bigquery.jobs.list
    - bigquery.jobs.listAll
    - bigquery.models.create
    - bigquery.models.delete
    - bigquery.models.export
    - bigquery.models.getData
    - bigquery.models.getMetadata
    - bigquery.models.list
    - bigquery.models.updateData
    - bigquery.models.updateMetadata
    - bigquery.models.updateTag
    - bigquery.routines.create
    - bigquery.routines.delete
    - bigquery.routines.get
    - bigquery.routines.list
    - bigquery.routines.update
    - bigquery.tables.create
    - bigquery.tables.delete
    - bigquery.tables.export
    - bigquery.tables.get
    - bigquery.tables.getData
    - bigquery.tables.getIamPolicy
    - bigquery.tables.list
    - bigquery.tables.setCategory
    - bigquery.tables.setIamPolicy
    - bigquery.tables.update
    - bigquery.tables.updateData
    - bigquery.tables.updateTag
    - resourcemanager.projects.get
    - resourcemanager.projects.getIamPolicy
    - resourcemanager.projects.setIamPolicy
    - bigquery.rowAccessPolicies.create
    - bigquery.rowAccessPolicies.delete
    - bigquery.rowAccessPolicies.getIamPolicy
    - bigquery.rowAccessPolicies.list
    - bigquery.rowAccessPolicies.setIamPolicy
    - bigquery.rowAccessPolicies.update
    EOF
    
    gcloud iam roles create ${ROLE_NAME} --project=${PROJECT_ID} --file=${ROLE_NAME}.yaml
    

Attach IAM Role to Service Account#

To attach the PrivaceraPolicySyncRole IAM role created above, do the following steps:

  1. Log in to your GCP console.

  2. Select IAM & admin > Service accounts and click + CREATE SERVICE ACCOUNT.

  3. Enter values in the fields and click CREATE.

  4. In Grant this service account access to project, select the role as PrivaceraPolicySyncRole and Role Administrator.

  5. On the Services Account Page, find the newly created service account and copy the email address of the new service account for use in a later step.

    Note

    This email will be the Service Account Email for configuring PolicySync in Privacera Manager.

  6. If you are using a Google VM machine to configure GBQ for PolicySync, then you can attach the service account created above to your VM machine and skip below steps.

  7. On the Services Account Page, go to the Keys tab and click Add Key and select Create New Key.

  8. Select the JSON key type, and click CREATE. A JSON key file downloads to your system. Store the file at an accessible location. It will be used for configuring PolicySync in Privacera Manager.

Related Information

Refer the Google documentation for a detailed information on creating a service account - Click here.

Configure Logs for Auditing#

A sink is required to collect all the logs from GBQ. To create a sink, do the following steps:

  1. In the search bar, search for Logging, and then click Logs Router, and click Create Sink.

  2. Enter the sink name as PolicySyncBigQueryAuditSink, and then click Next.

  3. Enter the sink destination.

    1. In the Select sink service, select BigQuery.

    2. In Select BigQuery dataset, click Create new BigQuery dataset.

    3. Enter the Dataset ID as bigquery_audits and click Create Dataset.

    4. Click Next.

  4. Add the BigQuery logs in the sink:

    In the Build an inclusion filter, add the following line:

    resource.type="bigquery_resource"
    
  5. Click Create Sink.

Related Information

Refer the Google documentation for a detailed information on creating a sink - Click here.

CLI Configuration#

  1. SSH to the instance where Privacera is installed.

  2. Do the following, if you’re not using VM attached credentials:

    1. Access the JSON file of the service account you downloaded using the steps above.

    2. Copy the JSON to the config/custom-vars folder.

  3. Run the following commands.

    cd ~/privacera/privacera-manager/
    cp config/sample-vars/vars.policysync.bigquery.yml config/custom-vars/
    vi config/custom-vars/vars.policysync.bigquery.yml
    
  4. Edit the properties. For property details and description, refer to the Configuration Properties below.

  5. Run the update.

    cd ~/privacera/privacera-manager 
    ./privacera-manager.sh update
    

Configuration Properties#

Property Description Example
BIGQUERY_PROJECT_ID Set this property to specify a Google project ID. test-project-12345
BIGQUERY_PROJECT_LOCATION Set this property to specify the geographical region where the taxonomy for the PolicySync should be created. us
BIGQUERY_USE_VM_CREDENTIALS

This is property is used if you want to use the service account attached to your virtual machine for getting credentials to connect to BigQuery.

When this property is used, it is not required to set BIGQUERY_OAUTH_PRIVATE_KEY_PATH.

false
BIGQUERY_OAUTH_SERVICE_ACCOUNT_EMAIL Set this property to specify service account email that you want to use for PolicySync. This needs to be specified if you are not using a Google VM attached service account.  
BIGQUERY_OAUTH_PRIVATE_KEY_PATH

This property sets the path of the service account credential JSON file downloaded from the Google Service account keys section. This property value is needed when BIGQUERY_OAUTH_PRIVATE_KEY_PATH is set to false.

/workdir/policysync/cust_conf/policysync-gbq-service-account.json
BIGQUERY_OAUTH_PRIVATE_KEY_FILE_NAME

This property sets the name of the service account credential JSON file downloaded from the Google Service account keys section. Then BIGQUERY_OAUTH_PRIVATE_KEY_PATH will be auto-populated based on the filename. This property value is needed when BIGQUERY_OAUTH_PRIVATE_KEY_PATH is set to false.

 
BIGQUERY_MANAGE_PROJECT_LIST

This property is used to set comma-separated project names on which access control will be managed by PolicySync. If you want to manage all projects then keep the value blank. It supports wildcards. The ignore projects list has precedence over the manage projects list.

For example, testproject1,testproject2,sales_project*

BIGQUERY_MANAGE_DATASET_LIST This property is used to set comma-separated dataset FQDN on which access control will be managed by PolicySync. If you want to manage all datasets then keep the value blank. It supports wildcards. The ignore dataset list has precedence over manage dataset list. testproject1.dataset1,testproject2.dataset2,sales_project*.sales*
BIGQUERY_MANAGE_TABLE_LIST This property is used to set comma-separated table/view FQDN on which access control will be managed by PolicySync. If you want to manage all tables/views then keep the value blank. It supports wildcards. The ignore table list has precedence over manage table list. testproject1.dataset1.table1,testproject2.dataset2.view2,sales_project*.sales*.*
BIGQUERY_MANAGE_USER_LIST This property is used to set comma-separated user names on which access control will be managed by PolicySync. If you want to manage all users then keep the value blank. It supports wildcards. The ignore users list has precedence over manage users list. user1,user2,user3
BIGQUERY_MANAGE_GROUP_LIST This property is used to set comma-separated group names on which access control will be managed by PolicySync. If you want to manage all group then keep the value blank. It supports wildcards. The ignore group list has precedence over manage group list. group1,group2,group3
BIGQUERY_MANAGE_ROLE_LIST Add the roles to be managed. Only these roles will be provided with access control in a BigQuery policy. role1,role2,role3
BIGQUERY_COLUMN_ACCESS_CONTROL_TYPE

Set this property to specify a way to handle column-level access control by PolicySync.

  • view - PolicySync will create a secure view for the table and if any column is restricted from the table then that column will be shown as null to the user in the secure view. This is recommended by privacera as tags approach has limitations.
view
BIGQUERY_ENABLE_VIEW_BASED_ROW_FILTER Set this property to true, if you want to enable secure view based row filter in BigQuery PolicySync.

Note: BigQuery supports native row filters, but due to some limitations, it is recommended to use view based row filter.
true
BIGQUERY_ENABLE_VIEW_BASED_MASKING

Set this property to true, if you want to enable secure view-based masking.

Note:- Bigquery doesn't support native masking, so it is recommended to use view-based masking.

true
BIGQUERY_IGNORE_USER_LIST This property is used to set comma-separated user names on which you don't want access control to be managed by PolicySync. If you want to manage all the users then keep the value blank. It supports wildcards. This has precedence over manage users list. user1,user2,dev_user*
BIGQUERY_IGNORE_GROUP_LIST This property is used to set comma-separated group names on which you don't access control to be managed by PolicySync. If you want to manage all the groups then keep the value blank. It supports wildcards. This has precedence over manage groups list. group1,group2,dev_group*
BIGQUERY_IGNORE_ROLE_LIST Add the roles to be ignored. These roles will not provided with access control in a BigQuery policy. role1,role2,role3
BIGQUERY_MANAGE_USER_FILTERBY_GROUP Set this property to true, if you want to manage only the users who belong to the groups defined in `BIGQUERY_MANAGE_GROUP_LIST` property. false
BIGQUERY_ENABLE_ROW_FILTER Set this property to true, if you want to enable native row filter functionality. This is not recommended, since the native row filters can only be created on tables, they can't be created on views. false

Note

  • You can also add custom properties that are not included by default. See PolicySync.

Limitations#

Policy Permissions Limitation
View-based Row Filter Only the first satisfied condition is evaluated from row filter policy items.
Native Row Filter All the row filter policy items from the row filter policy will be evaluated as policysync translates each policy item as single row access policy on the BigQuery table and then the BigQuery handles evaluation part.
Lag on Access Control Enforcement PolicySync only process any changes in the policy in Ranger and submit it as a request to BigQuery. BigQuery takes its time to enforce the policies.

For more information, see the following:
User/Group/Role Policies BigQuery connector won’t create any user/group/roles in Google. Only user/group level policies will work.

Validation#

Let's test the access control using BigQuery by defining some test policies for a test user (emily).

  1. Prepare test data in GCP. Refer Google documentation for a detailed information on running queries in BigQuery - Click here.

    1. Log in to GCP console, navigate to BigQuery and then click Compose New Query.

    2. In the Query editor text area, copy the following query:

      -- Create Dataset
      CREATE SCHEMA customer_dataset;
      
      -- Create Table
      CREATE TABLE IF NOT EXISTS customer_dataset.customer_data (
      id INT64, 
      person_name string,
      domain string,
      ssn string,
      country string,
      us_phone string, 
      address string, 
      account_id string, 
      zipcode string);
      
      -- Insert Data into table
      insert into customer_dataset.customer_data values (1, 'Nancy','nancy@yahoo.com','201-99-5532','US','856-232-9702','939 Park Avenue','159635478','33317');
      insert into customer_dataset.customer_data values (2,'Gene','gene@google.us','202-99-5532','UK','954-583-0575','303 Johnston Blvd','236854569','95202');
      insert into customer_dataset.customer_data values (3,'Edward','edward@facebook.com','203-99-5532','US','209-626-9041','130 Hollister','365412985','60173');
      insert into customer_dataset.customer_data values (4,'Pearlene','pearlene@gmail.com','204-99-5532','US','708-471-6810','17 Warren Rd','452189732','90017');
      insert into customer_dataset.customer_data values (5,'James','james@cuvox.de','205-99-5532','US','661-338-6787','898 Newport Gray Rd','517836427','94041');
      insert into customer_dataset.customer_data values (6,'Pamela','pamela@cuvox.de','206-99-5532','UK','650-526-5259','861 Strick Rd','685231473','80214');
      insert into customer_dataset.customer_data values (7,'Donna','donna@fleckens.hu','207-99-5532','US','303-239-4282','1784 S Shore Dr','789563258','1730');
      insert into customer_dataset.customer_data values (8,'Amy','amy@gustr.com','208-99-5532','US','774-553-4736','9522 Apple Valley Dr','854126945','55102');
      insert into customer_dataset.customer_data values (9,'Adam','adam@teleworm.us','209-99-5532','UK','651-297-1448','745 Old Springville Rd','965412381','43201');
      insert into customer_dataset.customer_data values (10,'Lucille','lucille@armyspy.com','210-99-5532','US','740-320-1270','4223  Midway Road','785651236','89102');
      insert into customer_dataset.customer_data values (11,'Edard','edu@gustr.com','211-99-5532','UK','702-257-8796','3659  Dye Street','965121354','53207');
      insert into customer_dataset.customer_data values (12,'Nick','nick@jourrapide.com','212-99-5532','US','414-483-8638','2966  Nutters Barn Lane','563515264','72764');
      insert into customer_dataset.customer_data values (13,'Brian','brian@einrot.com','213-99-5532','US','479-872-9783','3300  Worthington Drive','654621233','91303');
      insert into customer_dataset.customer_data values (14,'Stella','stella@jourrapide.com','214-99-5532','US','818-596-6681','1893  Ingram Road','261613654','35816');
      insert into customer_dataset.customer_data values (15,'Leona','leona@dayrep.com','215-99-5532','UK','256-250-5413','4244  Burnside Court','986513211','75069');
      
      -- Verify table by running select query
      SELECT * FROM customer_dataset.customer_data;
      
    3. Click Run.

  2. Create test user in Privacera Portal as emily. For more information, click here.

  3. In GCP console, the user emily gets added after you create the user in step 2.

    1. Check the user emily can list the projects inside your organization.

    2. Check user emily does not have Full Admin or Full Reader access on BigQuery.

  4. Create a policy for emily to run queries and list dataset and tables.

  5. Check the access control on the test data in GCP.

    A) Table-level Access Control

    1. In Privacera Portal, create a policy Customer Data Full Access for accessing table.

    2. Log in to GCP console with credentials of the test user emily.

    3. Navigate to BigQuery.

    4. Run the following query.

      SELECT * FROM customer_dataset.customer_data;
      SELECT * FROM customer_dataset_secure.customer_data;
      

      User emily can access and view the data.

    5. In Privacera Portal, disable the policy.

    6. In GCP, run the following query.

      SELECT * FROM customer_dataset.customer_data;
      SELECT * FROM customer_dataset_secure.customer_data;
      

      User emily can not access and view the data.

    B) View-based Column-level Access Control

    In the view-based column-level access control, you have to create a column-level policy on the table. The columns which are not permitted from that policy will be shown as NULL in the secure view of the table.

    1. In Privacera Portal, do the following:

      Create a policy Customer Data Column Level Access granting access to a few columns.

    2. Log in to GCP console with credentials of the test user emily.

    3. Navigate to BigQuery.

    4. Run the following queries.

      User emily will not see person_name column in the secure view of customer_data table..

      SELECT * FROM customer_dataset_secure.customer_data;
      
    5. In Privacera Portal, disable the Customer Data Column Level Access policy.

    C) View-based Row-level Filter

    1. In Privacera Portal, enable Customer Data Full Access policy created above.

      emily can view customer_data for US and UK from table and in secure view.

    2. Log in to GCP console with credentials of the test user emily.

    3. Navigate to BigQuery.

    4. Run the following queries.

      Query to view data from customer data secure view. It will show data from countries US and UK.

      SELECT * FROM customer_dataset_secure.customer_data;
      

      User emily can access and view the data.

    5. In Privacera Portal, create a Customer Access by Country policy to access data only from UK.

    6. Run the following queries.

      Query to view data from customer data secure view. The row filter policy gets applied showing data from country UK.

      SELECT * FROM customer_dataset_secure.customer_data;
      

    D) View-based Masking

    1. Log in to GCP console with credentials of the test user emily.

    2. Navigate to BigQuery.

    3. Run the following query.

      SELECT * FROM customer_dataset_secure.customer_data;
      

      User emily can view the SSN values.

    4. In Privacera Portal, create a Mask SSN policy to mask SSN values for emily.

    5. Run the following query.

      SELECT * FROM customer_dataset_secure.customer_data;
      

      User emily cannot view the the SSN values, since it is masked using md5 encryption.