Discovery Scan API User Guide¶
Getting started¶
This guide shows you how to use Privacera Discovery APIs for offline scanning. You'll learn to configure resources, check scan status, and retrieve classification results using simple REST API calls.
The Discovery service scans your resources from different connectors and classifies sensitive information like PII, financial data, and other protected content.
Offline Scan API overview¶
To perform an offline scan using the Privacera Discovery API, the process typically involves the following steps:
- Configure resource for discovery scanning - Configure resources for discovery scanning
- Start offline scan - Initiate offline scan for the configured resources
- Monitor scan requests - Query scan requests to check scan status and progress
- Retrieve scan results - Retrieve resouce classification information from the completed scans
- Retrieve classification by application name - Retrieve classification information filtered by application name
API endpoints¶
Step | Endpoint | Method | Description |
---|---|---|---|
Prerequisites | /api/data/dic/systems | GET | Get application ID, code, and name from the system |
1 | /api/data/dic/application/config/bulk | POST | Add resources to be scanned by creating landing zone configuration |
2 | /api/data/dic/application/config/bulk | POST | Start the actual scanning process for configured resources |
3 | /api/discovery/scan_request | GET | Check the status and progress of running or completed scans |
4 | /api/discovery/resourceinfo | GET | Get detailed results from completed scans including classified tags |
5 | /api/discovery/resourceinfo | GET | Filter scan results by specific application name |
Prerequisites¶
Before configuring resources for discovery scanning, you need to retrieve the application ID, code and name from the system.
Get application ID, code and name¶
Retrieve the list of systems and applications setup in it to get their required details which are used in further scanning process.
Note
Replace the placeholder values with your actual configuration:
PORTAL_URL: Privacera portal URL.
USERNAME: Privacera portal login username.
PASSWORD: Privacera portal login password.
Endpoint: GET /api/data/dic/systems
Bash | |
---|---|
Query parameters:
Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.
Response:
- Use the
id
field from the response asAPP_ID
- Use the
name
field asAPP_NAME
- Use the
uniqueCode
field asAPP_CODE
in the subsequent API calls.
Additional parameters reference¶
The following table contains all additional parameters used across different API endpoints in this guide:
Additional Parameters | Description |
---|---|
size | Number of results to display per page in case pagination exists. Default value is 15. You can change it as per your requirement. |
serviceId | GCP project ID from which the resource to be scanned |
databaseName | Target database name for scanning resource in it |
tableName | Specific table or wildcard for all under mentioned database |
globalId | Global identifier for the scan configuration, or (Optional) Global ID to filter results for a specific scan request |
groupPartFiles | Whether to group partition files (true/false). When scanning a single resource that contains multiple partition files, set this to true to group and view them together as a single entity, or false to see each partition file individually |
groupTables | Whether to group tables (true/false). For database resources with multiple tables, set this to true to group related tables together, or false to view each table separately |
searchType | How the resource should match your search term. Use partial_match to find results that contain your resource, or exact_match to find results that match exactly the scanned result |
autoClassified | Set to 1 to include resources with tags automatically classified by the system, or 0 to exclude them |
manualReviewed | Set to 1 to include resources with tags that have been manually reviewed by a user, or 0 to exclude them |
Discovery API endpoints and examples¶
Important
Replace the placeholder values with your actual configuration:
PORTAL_URL: Privacera portal URL
USERNAME: Privacera portal login username.
PASSWORD: Privacera portal login password
APP_ID: The application ID (You can refer to steps from here)
APP_NAME: The application name (You can refer to steps from here)
APP_CODE: The application code (You can refer to steps from here)
RESOURCE_PATH: The resource path to scan. (You can follow the steps outlined here to add a resource for discovery scanning.)
Tip
Example of URL encoding:
-
For a resource path like
s3://my-bucket/data/file.csv
, the URL-encoded value would bes3%3A%2F%2Fmy-bucket%2Fdata%2Ffile.csv
. -
Use this encoded value in API requests where the resource path is required in the URL.
1. Configure resource for discovery scanning¶
Configure resources for discovery scanning by creating a landing zone. You can add filesystem or database resources for scanning using the methods described here. The request parameters will vary based on the connector type.
Note
Additional parameters based on resource type:
- GCS & GBQ resources:
serviceId
parameter will be added in the request body - Database resources:
databaseName
andtableName
parameters will be added in the request body
Endpoint: POST /api/data/dic/application/config/bulk
Bash | |
---|---|
Query parameters:
appId
: Application ID for the connector (You can refer to steps from here)appCode
: Application code for the connector (You can refer to steps from here)rescanEnabled
: Controls the scanning behavior for resources. Set tofalse
when creating a landing zone configuration to register resources without immediate scanning. Set totrue
when starting an offline scan to actually initiate the discovery process on the configured resources. This parameter determines whether the system should begin scanning immediately or just configure the resource for future scanning.configType
: Configuration type (L for landing zone and D for datazone)type
: Operation type (I for include and E for exclude) while adding resource in landing zoneresource
: Resource path to be scanned in the form mentioned here
Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.
Response:
JSON | |
---|---|
Tip
The globalId
value found in this response is used in further processes.
2. Start offline scan¶
Initiate an offline scan for the configured resources. This endpoint triggers the discovery scanning process for the resources that were previously configured.
Note
Additional parameters based on resource type:
- GCS & GBQ resources:
serviceId
parameter will be added in the request body - Database resources:
databaseName
andtableName
parameters will be added in the request body
Endpoint: POST /api/data/dic/application/config/bulk
Query parameters:
globalId
: Global identifier for the scan configurationappId
: Application ID for the connector (You can refer to steps from here)appCode
: Application code for the connector (You can refer to steps from here)type
: Operation type (I for include, E for exclude)resource
: Resource path to be scanned in the form mentioned hererescanEnabled
: Controls the scanning behavior for resources. Set tofalse
when creating a landing zone configuration to register resources without immediate scanning. Set totrue
when starting an offline scan to actually initiate the discovery process on the configured resources. This parameter determines whether the system should begin scanning immediately or just configure the resource for future scanning.configType
: Configuration type (L for landing zone, D for datazone)rescanType
: Type of rescan operation (RESCAN)active
: Whether the configuration is active (true/false)
Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.
Response:
JSON | |
---|---|
3. Monitor scan requests¶
Monitor scan request execution to track offline scan progress either for all scan requests or for a single scan request using the scan ID search filter.
Endpoint: GET /api/discovery/scan_request
-
For querying all scan requests:
-
For querying a single scan request:
Query parameters:
sort
: Sort criteria (e.g., createTime,DESC)type
: Scan type (OFFLINE)
Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.
Response:
4. Retrieve scan results¶
Retrieve detailed resource information from completed scans.
Endpoint: GET /api/discovery/resourceinfo
You can retrieve scan results in two different ways:
1. Navigate by scan ID
Use this approach when you want to see all resources discovered in a specific scan.
Bash | |
---|---|
2. Navigate by resource
Use this approach when you want to see scan results for a specific resource path.
Bash | |
---|---|
Query parameters:
resource
: The path of the resource you want to filter results by. Make sure to URL-encode this value (for example, uses3%3A%2F%2Fmy-bucket%2Fdata.csv
fors3://my-bucket/data.csv
). This helps you see scan results only for a specific file, folder, or database table.scanGlobalId
: The unique ID of the scan you want results for. Use thescanGlobalId
value returned from the scan status or scan list API to filter results to a specific scan.
Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.
Response:
5. Retrieve classification by application name¶
Retrieve classification information for resources filtered by application name.
Endpoint: GET /api/discovery/resourceinfo
Bash | |
---|---|
Query parameters:
appNames
: Use this to filter results by the name of your application. Just type the application name you want to search for. If the name has spaces or special characters, use URL encoding (for example, if your app name isRedshift-Discovery
, you would useRedshift-Discovery
as is, but if it wasRedshift Discovery
with a space, you would useRedshift%20Discovery
). You can find your application name by following the steps in the prerequisites section.
Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.
Response:
- Prev topic: Apache Ranger Java API User Guide
- Next topic: PEG API - Self Managed