Skip to main content

Privacera Documentation

Where is the sample dataset in my Databricks Workspace?

In your Databricks Workspace, select the Data menu option in the sidebar to launch the Data Explorer. Then select your catalog and expand it. You should see the newly-created privacera_sales_schema_<date_timestamp> schema containing the sales_data table. The date timestamp is added as a suffix to make it unique in the catalog. In addition to the base table, we also automatically created a corresponding secure view. Generally, business users access the secure view for accessing the dataset. All the fine-grained policies are dynamically applied on secure view for the users.

quickstart1__1_.png

In the Databricks Workspace, select the SQL Editor from the sidebar. Run the following query,

select count(*) from <your-catalog>.privacera_sales_schema_<date_timestamp>.sales_data

You will get the following error message because access to the base table is managed by PrivaceraCloud access policies and general users are not allowed to access it:

User does not have USE SCHEMA on Schema `<catalog>.privacera_sales_schema_<date_timestamp>`

You will also notice another schema named privacera_sales_schema_<date_timestamp>_secure which has a sales_data view under it. From now on, you will be interacting with secure views under the privacera_sales_schema_<date_timestamp>_secure schema instead of accessing the tables under the original privacera_sales_schema_<date_timestamp> schema.

You can run a query against this view:

select count(*) from <your-catalog>.privacera_sales_schema_<date_timestamp>_secure.sales_data

You should be able to get the result as the PrivaceraCloud policy has been set so that you can SQL select access to this table. You can explore the data by running,

select * from <your-catalog>.privacera_sales_schema_<date_timestamp>_secure.sales_data

You will see that the table shows the sales (sales_amount column) generated by various sales team members (name column) in various sales territories (country, region, city columns). We can come up with various access scenarios for this table such as disallowing access to this table if you are not a sales team member, or hide the name of the sales person if you are an analyst interested in computing statistics based on the sales_amount, or de-identify the city column to prevent sales team members from competing on the same territory or to restrict access to rows belonging to the same country as your location for compliance.

If you have selected hive_metastore as your catalog to connect to Privacera, you can still follow the tutorial, but to see the effects of the Privacera policies you need to have a user in Databricks Workspace who does not have administrator privileges. With hive_metastore, an administrator has access to all tables and so the policies do not take effect.