- Platform Release 6.5
- Privacera Platform Installation
- Privacera Platform User Guide
- Privacera Discovery User Guide
- Privacera Encryption Guide
- Privacera Access Management User Guide
- AWS User Guide
- Overview of Privacera on AWS
- Configure policies for AWS services
- Using Athena with data access server
- Using DynamoDB with data access server
- Databricks access manager policy
- Accessing Kinesis with data access server
- Accessing Firehose with Data Access Server
- EMR user guide
- AWS S3 bucket encryption
- Getting started with Minio
- Plugins
- How to Get Support
- Coordinated Vulnerability Disclosure (CVD) Program of Privacera
- Shared Security Model
- Privacera Platform documentation changelog
Data Source Scanning
Register Data Sources
Before you can use Privacera Discovery productively, make sure you have registered your data sources, including any JDBC-based systems you have.
Behavior of trailing '/' in data source URL/URIs
If a data source URL/URI has a trailing /
, then Privacera Discovery will scan the folders in the bucket individually. If the data source URL/URI does not have a trailing /
, then the folders in the bucket will be scanned together.
For example, say the following three folders are in an S3 bucket:
A
A_1
A_1_2
If these three folders need to be scanned individually, then the URL/URI in the data source should be listed as:
s3://bucket/A/
s3://bucket/A_1/
s3://bucket/A_1_2/
If the three folders need to be scanned together, then the URL/URI in the data source should be listed as:
s3://bucket/A
Or, if you want to scan A_1 and A_1_2, then the URL/URI should be listed as:
s3://bucket/A_1
This will scan both s3://bucket/A_1
and s3://bucket/A_1_2
.
Adjusting default scan depth
Privacera Discovery operations are computationally intensive. Therefore, Discovery defaults to scanning only a sample of targeted data in order to determine whether sensitive information is present.
Individual customers are responsible for determining what level of scanning is necessary to meet their regulatory requirements. You can adjust the sampling size by setting the DISCOVERY*MAX*
variables detailed in Discovery Custom Properties.
Scan setup
Using Privacera Discovery, you can configure scans and set threshold scores to determine if a resource should be reviewed for non-compliance. This is done from the Scan Setup page.
To view the Scan Setup page, select Discovery > Scan Setup from the navigation menu.
The Scan Setup page displays the following information:
Application Status: The total number of enabled and disabled applications.
System Classification: This allows you to set the global value at what percentage match will cause the scanned resource to be classified. To automatically classify the associated tags, enable the auto classification feature using the enable/disable toggle.
Minimum Review: This allows you to set the global minimum value that will send the tagged resources to the Pending Review status under classification for manual verification. Tag scores falling below the review score are ignored.
Reduce Score: If a column has empty data but is meta-tagged with 100% score, this reduces the score with the value that is set here. For example: If it is configured to 50, then the final score set for that column tag will be 50 and it will be re-evaluated based on the auto-classification and review score threshold.
If you toggle the reduce score enable, it will reduce. If you toggle the reduce score enable, it will reduce the score of the associated meta tag. If you disable the reduce score feature, the meta tags will not be auto-classified.
Rescan Type: For file system and database applications, scanning options include:
Incremental: Only scans resources that have been modified since the previous scan.
Scan: Rescans the resource completely regardless of previous scans.
S3
Create privacera_tags in the Ranger Tag Based Policy
Associate the privacera_tags to S3 Service.
Create a JSON file where you can add tags.
vi s3_tag.json
{"op":"add_or_update","serviceName":"${S3_Service_Name}","tagVersion":0,"tagDefinitions":{"0":{"name":"${Tag_Name}","source":"Atlas","attributeDefs":[],"id":0,"isEnabled":true}},"tags":{"0":{"type":"${Tag_Type}","owner":0,"attributes":{},"id":0,"isEnabled":true}},"serviceResources":[{"serviceName":"${S3_Service_Name}","resourceElements":{"bucketname":{"values":["${Bucket_Name}"],"isExcludes":false,"isRecursive":false},"objectpath":{"values":["${Resource_Path_Name}"],"isExcludes":false,"isRecursive":false}},"id":0,"isEnabled":true}],"resourceToTagIds":{"0":[0]}}
Sample JSON:
{"op":"add_or_update","serviceName":"privacera_s3","tagVersion":0,"tagDefinitions":{"0":{"name":"SSN","source":"Atlas","attributeDefs":[],"id":0,"isEnabled":true}},"tags":{"0":{"type":"SSN","owner":0,"attributes":{},"id":0,"isEnabled":true}},"serviceResources":[{"serviceName":"privacera_s3","resourceElements":{"bucketname":{"values":["pscanzone"],"isExcludes":false,"isRecursive":false},"objectpath":{"values":["finance/finance_us.csv"],"isExcludes":false,"isRecursive":false}},"id":0,"isEnabled":true}],"resourceToTagIds":{"0":[0]}}
Push the tag to Ranger.
curl -i -L -k -u admin:welcome1 -H "Content-type: application/json" -d @s3_tag.json -X PUT http://${RANGER_HOST}.privacera.com:6080/service/tags/importservicetags
Response:
HTTP/1.1 204 No Content Set-Cookie: RANGERADMINSESSIONID=517FD2032481415D188C6925FA96E7E3; Path=/; HttpOnly X-Frame-Options: DENY X-XSS-Protection: 1; mode=block Strict-Transport-Security: max-age=31536000; includeSubDomains Content-Security-Policy: default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; connect-src 'self'; img-src 'self'; style-src 'self' 'unsafe-inline';font-src 'self' Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: 0 X-Content-Type-Options: nosniff Content-Type: application/json Date: Sun, 08 Mar 2020 18:55:44 GMT Server: Apache Ranger
To get the tagged resources list.
curl -i -L -k -u admin:welcome1 -H "Content-type: application/json" -X GET http://${RANGER_HOST}.privacera.com:6080/service/tags/resources
Response:
[{"id":5,"guid":"6b9234f1-69d9-40b0-9865-fe5bec45b469","isEnabled":true,"createdBy":"Admin","updatedBy":"Admin","createTime":1581570409000,"updateTime":1581570409000,"version":2,"serviceName":"privacera_hive","resourceElements":{"database":{"values":["sales"],"isExcludes":false,"isRecursive":false},"column":{"values":["name"],"isExcludes":false,"isRecursive":false},"table":{"values":["sales_data"],"isExcludes":false,"isRecursive":false}},"resourceSignature":"82a4eb3e2148ee77686538a653dc6d8e027e9b3443b5b09494af6a38db815a64"},{"id":7,"guid":"76ef1384-8432-4ed5-9778-c305bfb6d4c0","isEnabled":true,"createdBy":"Admin","updatedBy":"Admin","createTime":1583715849000,"updateTime":1583715849000,"version":2,"serviceName":"privacera_s3","resourceElements":{"bucketname":{"values":["pscanzone"],"isExcludes":false,"isRecursive":false},"objectpath":{"values":["finance/finance_us.csv"],"isExcludes":false,"isRecursive":false}},"resourceSignature":"02d7ffe3fc9065ed63c935faec14268cc6f3823aa68b2b81a030e5c93cb60843"}]
Test the Tag-Based Policies for S3 with the sample given above:
Create user <kate> in EC2 and add permissions read, metaread, write, metawrite to the S3 bucket ${Bucket_Name} in privacera_s3 service.
Create a deny tag-based policy for user <kate> - tag = SSN, Component = S3, permissions = read, write.
Now try to access the ${Bucket_Name} with user <kate>.
Denied audit is seen with ${SSN} tag in the audits.
Start offline and realtime scansdic
There are two ways to scan resources in Privacera Discovery:
Start offline scanning
You can manually scan resources (offline scanning) from the Data Source page.
To start offline scanning, follow these steps:
From the navigation menu, select Discovery > Data Sources.
Select a resource from the Applications list.
Note
Ensure that the application is enabled.
Under Include Resource tab, check the Rescan checkbox of the resource to be scanned.
The Info and Success dialog is displayed.
Start realtime scanning
By default, Privacera Discovery scans resources that you add to an application (realtime scanning). When a new file is added to the Include Resource tab of the Data Source page, realtime scanning occurs.
To scan the resource in realtime, the application should be enabled and resource should be added to the Include Resource tab in the application. For example, to copy a file from the cluster to HDFS, use the following command:
hdfs dfs -put -f <local-src> … <HDFS_dest_path>
For AWS S3, you can fetch S3 tags. For more information, see Configure S3 for Real-Time Scanning
View classification results
You can view scan results on the Classification page. For more information, see Classification.