Skip to main content

PrivaceraCloud Documentation

Where is the sample dataset in my Databricks Workspace?

:

In your Databricks Workspace, select the Data menu option in the sidebar to launch the Data Explorer. Then select your catalog and expand it. You should see the sales_schema schema containing the sales_data table. The sales_schema schema name might have a timesI tamp suffix to make it unique in the catalog.

In the Databricks Workspace, select the SQL Editor from the sidebar. Run the following query,

select count(*) from <your-catalog>.sales_schema.sales_data

You will get the following error message because access to this table is managed by PrivaceraCloud access policies:

User does not have USE SCHEMA on Schema `<catalog>.sales_schema`

You will also notice another schema named sales_schema_secure which has a sales_data view under it. Now that PrivaceraCloud is managing the access to the sales_schema, you can no longer access it. From now on, you will be interacting with secure views under the sales_schema_secure schema instead of accessing the tables under the original sales_schema schema.

You can run a query against this view: -

select count(*) from <your-catalog>.sales_schema_secure.sales_data

You should be able to get the result as the PrivaceraCloud policy has been set so that you can SQL select access to this table. You can explore the data by running,

select * from <your-catalog>.sales_schema_secure.sales_data

You will see that the table shows the sales (sales_amount column) generated by various sales team members (name column) in various sales territories (country, region, city columns). We can come up with various access scenarios for this table such as disallow access to this table if you are not a sales team member, or hide the name of the sales person if you are an analyst interested in computing statistics based on the sales_amount, or de-identify the city column to prevent sales team members from competing on the same territory or to restrict access to rows belonging to the same country as your location for compliance.

If you have selected hive_metastore as your catalog to connect to Privacera, you can still follow the tutorial, but to see the effects of the Privacera policies you need to have a user in Databricks Workspace who does not have administrator privileges. With hive_metastore, an administrator has access to all tables and so the policies do not take effect.