Discovery Scan API User Guide¶

Getting started¶

This guide shows you how to use Privacera Discovery APIs for offline scanning. You'll learn to configure resources, check scan status, and retrieve classification results using simple REST API calls.

The Discovery service scans your resources from different connectors and classifies sensitive information like PII, financial data, and other protected content.

Offline Scan API overview¶

To perform an offline scan using the Privacera Discovery API, the process typically involves the following steps:

Configure resource for discovery scanning - Configure resources for discovery scanning
Start offline scan - Initiate offline scan for the configured resources
Monitor scan requests - Query scan requests to check scan status and progress
Retrieve scan results - Retrieve resouce classification information from the completed scans
Retrieve classification by application name - Retrieve classification information filtered by application name

API endpoints¶

Step	Endpoint	Method	Description
Prerequisites	`/api/data/dic/systems`	GET	Get application ID, code, and name from the system
1	`/api/data/dic/application/config/bulk`	POST	Add resources to be scanned by creating landing zone configuration
2	`/api/data/dic/application/config/bulk`	POST	Start the actual scanning process for configured resources
3	`/api/discovery/scan_request`	GET	Check the status and progress of running or completed scans
4	`/api/discovery/resourceinfo`	GET	Get detailed results from completed scans including classified tags
5	`/api/discovery/resourceinfo`	GET	Filter scan results by specific application name

Prerequisites¶

Before configuring resources for discovery scanning, you need to retrieve the application ID, code and name from the system.

Get application ID, code and name¶

Retrieve the list of systems and applications setup in it to get their required details which are used in further scanning process.

Note

Replace the placeholder values with your actual configuration:

PORTAL_URL: Privacera portal URL.

USERNAME: Privacera portal login username.

PASSWORD: Privacera portal login password.

Endpoint: GET /api/data/dic/systems

Bash
curl -X GET "https://<PORTAL_URL>/api/data/dic/systems?size=15" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

Query parameters:

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
  "content": [
    {
      "id": 2,
      "name": "aws",
      "applications": [
        {
          "id": 3,
          "updatedByUserName": "padmin",
          "name": "AWS_S3",
          "description": "Created by System",
          "systemId": 2,
          "properties": "",
          "enableDiscovery": true,
          "reDiscovery": false,
          "applicationProperties": [],
          "type": "aws_s3",
          "uniqueCode": "aws_s3",
          "topicName": "privacera_scan_worker_aws_s3_XXXX",
          "active": true
        }
      ],
      "active": true
    }
  ]
}

Use the id field from the response as APP_ID
Use the name field as APP_NAME
Use the uniqueCode field as APP_CODE in the subsequent API calls.

Additional parameters reference¶

The following table contains all additional parameters used across different API endpoints in this guide:

Additional Parameters	Description
`size`	Number of results to display per page in case pagination exists. Default value is 15. You can change it as per your requirement.
`serviceId`	GCP project ID from which the resource to be scanned
`databaseName`	Target database name for scanning resource in it
`tableName`	Specific table or wildcard for all under mentioned database
`globalId`	Global identifier for the scan configuration, or (Optional) Global ID to filter results for a specific scan request
`groupPartFiles`	Whether to group partition files (true/false). When scanning a single resource that contains multiple partition files, set this to `true` to group and view them together as a single entity, or `false` to see each partition file individually
`groupTables`	Whether to group tables (true/false). For database resources with multiple tables, set this to `true` to group related tables together, or `false` to view each table separately
`searchType`	How the resource should match your search term. Use `partial_match` to find results that contain your resource, or `exact_match` to find results that match exactly the scanned result
`autoClassified`	Set to `1` to include resources with tags automatically classified by the system, or `0` to exclude them
`manualReviewed`	Set to `1` to include resources with tags that have been manually reviewed by a user, or `0` to exclude them

Discovery API endpoints and examples¶

Important

Replace the placeholder values with your actual configuration:

PORTAL_URL: Privacera portal URL

USERNAME: Privacera portal login username.

PASSWORD: Privacera portal login password

APP_ID: The application ID (You can refer to steps from here)

APP_NAME: The application name (You can refer to steps from here)

APP_CODE: The application code (You can refer to steps from here)

RESOURCE_PATH: The resource path to scan. (You can follow the steps outlined here to add a resource for discovery scanning.)

Tip

Example of URL encoding:

For a resource path like s3://my-bucket/data/file.csv, the URL-encoded value would be s3%3A%2F%2Fmy-bucket%2Fdata%2Ffile.csv.
Use this encoded value in API requests where the resource path is required in the URL.

1. Configure resource for discovery scanning¶

Configure resources for discovery scanning by creating a landing zone. You can add filesystem or database resources for scanning using the methods described here. The request parameters will vary based on the connector type.

Note

Additional parameters based on resource type:

GCS & GBQ resources: serviceId parameter will be added in the request body
Database resources: databaseName and tableName parameters will be added in the request body

Endpoint: POST /api/data/dic/application/config/bulk

Bash
curl -X POST "https://<PORTAL_URL>/api/data/dic/application/config/bulk" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -u "USERNAME:PASSWORD" \
  -d '[
    {
      "appId": "APP_ID",
      "rescanEnabled": false,
      "configType": "L",
      "type": "I",
      "resource": "RESOURCE_PATH"
    }
  ]'

Query parameters:

appId: Application ID for the connector (You can refer to steps from here)
appCode: Application code for the connector (You can refer to steps from here)
rescanEnabled: Controls the scanning behavior for resources. Set to false when creating a landing zone configuration to register resources without immediate scanning. Set to true when starting an offline scan to actually initiate the discovery process on the configured resources. This parameter determines whether the system should begin scanning immediately or just configure the resource for future scanning.
configType: Configuration type (L for landing zone and D for datazone)
type: Operation type (I for include and E for exclude) while adding resource in landing zone
resource: Resource path to be scanned in the form mentioned here

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
[
    {
        "status": 1,
        "globalId": "1752759544619_291_14",
        "appId": APP_ID,
        "appCode": "APP_CODE",
        "type": "I",
        "databaseName": null,
        "tableName": null,
        "resource": "RESOURCE_PATH",
        "rescanEnabled": true,
        "configType": "L",
        "rescanType": "RESCAN",
        "serviceId": "",
        "active": true
    }
]

Tip

The globalId value found in this response is used in further processes.

2. Start offline scan¶

Initiate an offline scan for the configured resources. This endpoint triggers the discovery scanning process for the resources that were previously configured.

Note

Additional parameters based on resource type:

GCS & GBQ resources: serviceId parameter will be added in the request body
Database resources: databaseName and tableName parameters will be added in the request body

Endpoint: POST /api/data/dic/application/config/bulk

Bash
curl -X POST "https://<PORTAL_URL>/api/data/dic/application/config/bulk" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -u "USERNAME:PASSWORD" \
  -d '[
    {
        "globalId": "1752759544619_291_14",
        "appId": APP_ID,
        "appCode": "APP_CODE",
        "type": "I",
        "databaseName": null,
        "tableName": null,
        "resource": "RESOURCE_PATH",
        "rescanEnabled": true,
        "configType": "L",
        "rescanType": "RESCAN",
        "serviceId": "",
        "active": true
    }
]'

Query parameters:

globalId: Global identifier for the scan configuration
appId: Application ID for the connector (You can refer to steps from here)
appCode: Application code for the connector (You can refer to steps from here)
type: Operation type (I for include, E for exclude)
resource: Resource path to be scanned in the form mentioned here
rescanEnabled: Controls the scanning behavior for resources. Set to false when creating a landing zone configuration to register resources without immediate scanning. Set to true when starting an offline scan to actually initiate the discovery process on the configured resources. This parameter determines whether the system should begin scanning immediately or just configure the resource for future scanning.
configType: Configuration type (L for landing zone, D for datazone)
rescanType: Type of rescan operation (RESCAN)
active: Whether the configuration is active (true/false)

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
        "status": 1,
        "globalId": "1752759544619_291_14",
        "appId": APP_ID,
        "appCode": "APP_CODE",
        "type": "I",
        "databaseName": null,
        "tableName": null,
        "resource": "RESOURCE_PATH",
        "rescanEnabled": true,
        "configType": "L",
        "rescanType": "RESCAN",
        "serviceId": "",
        "active": true
}

3. Monitor scan requests¶

Monitor scan request execution to track offline scan progress either for all scan requests or for a single scan request using the scan ID search filter.

Endpoint: GET /api/discovery/scan_request

For querying all scan requests:

Bash
curl -X GET "https://<PORTAL_URL>/api/discovery/scan_request?size=15&sort=createTime,DESC&type=OFFLINE" \
-H "Accept: application/json" \
-u "USERNAME:PASSWORD"

For querying a single scan request:

Bash
curl -X GET "https://<PORTAL_URL>/api/discovery/scan_request?size=15&sort=createTime,DESC&type=OFFLINE&globalId=1752759544619_291_14" \
-H "Accept: application/json" \
-u "USERNAME:PASSWORD"

Query parameters:

sort: Sort criteria (e.g., createTime,DESC)
type: Scan type (OFFLINE)

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
    "content": [
        {
            "globalId": "1752759544619_291_14",
            "updatedByUserName": "privacera_service_discovery",
            "resources": "RESOURCE_PATH",
            "appId": APP_ID,
            "appCode": "APP_CODE",
            "scanStatus": "SUCCESS",
            "configId": 641,
            "startTime": 1752820114000,
            "endTime": 1752820179000,
            "percentageComplete": 100,
            "scheduleId": null,
            "summaryInfo":"",
            "type": "OFFLINE",
            "scanInfo": [
                {
                    "globalId": "1752820114405_440_2",
                    "scanId": "1752820101451_465_1",
                    "startTime": 1752820114000,
                    "endTime": 1752820178000,
                    "scanTime": 64905,
                    "type": "SCAN_LISTING",
                    "active": true
                },
                {
                    "globalId": "1752820179008_253_3",
                    "scanId": "1752820101451_465_1",
                    "startTime": 1752820137000,
                    "endTime": 1752820160000,
                    "scanTime": 22485,
                    "type": "SCAN_TIME",
                    "active": true
                }
            ],
            "scanInfoJobStatus": {
                "percentageComplete": 100,
                "status": "SUCCESS"
            },
            "rescanType": "RESCAN",
            "active": true
        }]

}

4. Retrieve scan results¶

Retrieve detailed resource information from completed scans.

Endpoint: GET /api/discovery/resourceinfo

You can retrieve scan results in two different ways:

1. Navigate by scan ID

Use this approach when you want to see all resources discovered in a specific scan.

Bash
curl -X GET "https://<PORTAL_URL>/api/discovery/resourceinfo?size=15&groupPartFiles=false&groupTables=false&scanGlobalId=*1752759544619_291_14*&autoClassified=1&manualReviewed=1" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

2. Navigate by resource

Use this approach when you want to see scan results for a specific resource path.

Bash
curl -X GET "https://<PORTAL_URL>/api/discovery/resourceinfo?size=15&groupPartFiles=false&groupTables=false&resource=s3%3A%2F%2FRESOURCE_PATH&scanGlobalId=*1752759544619_291_14*&searchType=partial_match&autoClassified=1&manualReviewed=1" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

Query parameters:

resource: The path of the resource you want to filter results by. Make sure to URL-encode this value (for example, use s3%3A%2F%2Fmy-bucket%2Fdata.csv for s3://my-bucket/data.csv). This helps you see scan results only for a specific file, folder, or database table.
scanGlobalId: The unique ID of the scan you want results for. Use the scanGlobalId value returned from the scan status or scan list API to filter results to a specific scan.

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
  "content": [
    {
      "application": "APP_NAME",
      "appCode": "APP_CODE",
      "appType": "APP_NAME",
      "resourceId": "APP_CODE@RESOURCE_PATH-9999",
      "resource": "RESOURCE_PATH",
      "dataZone": "",
      "dataZoneIds": "",
      "tagsInfo": [
        {
          "tag": "EMAIL",
          "snippets": [
            {
              "resource": "RESOURCE_PATH/email",
              "metaName": "email",
              "score": 90.9090909090909,
              "position": 2,
              "fieldType": "string",
              "tagReason": "TAG_REASON_PATTERN_MATCH_CONTENT",
              "sampleValues": [],
              "tagName": "EMAIL",
              "tagStatus": "TAG_STATUS_AUTO_CLASSIFIED",
              "tagType": "TAG_TYPE_CONTENT_TAG",
              "levelType": "FIELD"
            }
          ]
        },
        {
          "tag": "SSN",
          "snippets": [
            {
              "resource": "RESOURCE_PATH/ssn",
              "metaName": "ssn",
              "score": 90.9090909090909,
              "position": 3,
              "fieldType": "string",
              "tagReason": "TAG_REASON_ML_MODEL",
              "sampleValues": [],
              "tagName": "SSN",
              "tagStatus": "TAG_STATUS_AUTO_CLASSIFIED",
              "tagType": "TAG_TYPE_CONTENT_TAG",
              "levelType": "FIELD"
            }
          ]
        }
      ],
      "resourceMetaInfo": {
        "dataType": "text/csv",
        "resourceType": "FILE",
        "path": "RESOURCE_PATH"
      },
      "superDataType": "STRUCTURED_DATA",
      "recordCount": 11,
      "scanGlobalId": "1752759544619_291_14"
    }
  ]
}

5. Retrieve classification by application name¶

Retrieve classification information for resources filtered by application name.

Endpoint: GET /api/discovery/resourceinfo

Bash
curl -X GET "https://<PORTAL_URL>/api/discovery/resourceinfo?size=15&groupPartFiles=false&groupTables=false&autoClassified=1&manualReviewed=1&appNames=%22APP_NAME%22" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

Query parameters:

appNames: Use this to filter results by the name of your application. Just type the application name you want to search for. If the name has spaces or special characters, use URL encoding (for example, if your app name is Redshift-Discovery, you would use Redshift-Discovery as is, but if it was Redshift Discovery with a space, you would use Redshift%20Discovery). You can find your application name by following the steps in the prerequisites section.

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
  "content": [
    {
      "application": "APP_NAME",
      "appCode": "APP_CODE",
      "appKind": "APP_KIND_FILESYSTEM",
      "appType": "APP_NAME",
      "resourceId": "APP_CODE@RESOURCE_PATH-9999",
      "resource": "RESOURCE_PATH",
      "dataZone": "",
      "dataZoneIds": "",
      "tagsInfo": [
        {
          "tag": "PERSON_NAME",
          "snippets": [
            {
              "resource": "RESOURCE_PATH/first_name",
              "metaName": "first_name",
              "score": 90.9090909090909,
              "position": 0,
              "fieldType": "string",
              "key": "PERSON_NAME_LOOKUP",
              "tagReason": "TAG_REASON_LOOKUP",
              "sampleValues": [],
              "tagLocations": [],
              "tagName": "PERSON_NAME",
              "tagStatus": "TAG_STATUS_AUTO_CLASSIFIED",
              "tagType": "TAG_TYPE_CONTENT_TAG",
              "deleted": false,
              "actualResource": "RESOURCE_PATH/first_name",
              "levelType": "FIELD",
              "tagAttributes": []
            }
          ]
        }
      ],
      "resourceMetaInfo": {
        "dataType": "text/csv",
        "resourceType": "FILE",
        "fieldMetaInfos": [
          {
            "dataType": "string",
            "resourceType": "FIELD",
            "resource": "RESOURCE_PATH/first_name",
            "metaName": "first_name"
          }
        ]
      },
      "superDataType": "STRUCTURED_DATA",
      "scanGlobalId": "1752759544619_291_14"
    }
  ]
}

Prev topic: Apache Ranger Java API User Guide
Next topic: PEG API - Self Managed

Discovery Scan API User Guide¶

Getting started¶

Offline Scan API overview¶

API endpoints¶

Prerequisites¶

Get application ID, code and name¶

Additional parameters reference¶

Discovery API endpoints and examples¶

1. Configure resource for discovery scanning¶

2. Start offline scan¶

3. Monitor scan requests¶

4. Retrieve scan results¶

5. Retrieve classification by application name¶

Comments