Skip to content

Discovery Scan API User Guide

Getting started

This guide shows you how to use Privacera Discovery APIs for offline scanning. You'll learn to configure resources, check scan status, and retrieve classification results using simple REST API calls.

The Discovery service scans your resources from different connectors and classifies sensitive information like PII, financial data, and other protected content.

Offline Scan API overview

To perform an offline scan using the Privacera Discovery API, the process typically involves the following steps:

  1. Configure resource for discovery scanning - Configure resources for discovery scanning
  2. Start offline scan - Initiate offline scan for the configured resources
  3. Monitor scan requests - Query scan requests to check scan status and progress
  4. Retrieve scan results - Retrieve resouce classification information from the completed scans
  5. Retrieve classification by application name - Retrieve classification information filtered by application name

API endpoints

Step Endpoint Method Description
Prerequisites /api/data/dic/systems GET Get application ID, code, and name from the system
1 /api/data/dic/application/config/bulk POST Add resources to be scanned by creating landing zone configuration
2 /api/data/dic/application/config/bulk POST Start the actual scanning process for configured resources
3 /api/discovery/scan_request GET Check the status and progress of running or completed scans
4 /api/discovery/resourceinfo GET Get detailed results from completed scans including classified tags
5 /api/discovery/resourceinfo GET Filter scan results by specific application name

Prerequisites

Before configuring resources for discovery scanning, you need to retrieve the application ID, code and name from the system.

Get application ID, code and name

Retrieve the list of systems and applications setup in it to get their required details which are used in further scanning process.

Note

Replace the placeholder values with your actual configuration:

PORTAL_URL: Privacera portal URL.

USERNAME: Privacera portal login username.

PASSWORD: Privacera portal login password.

Endpoint: GET /api/data/dic/systems

Bash
1
2
3
curl -X GET "https://<PORTAL_URL>/api/data/dic/systems?size=15" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

Query parameters:

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
  "content": [
    {
      "id": 2,
      "name": "aws",
      "applications": [
        {
          "id": 3,
          "updatedByUserName": "padmin",
          "name": "AWS_S3",
          "description": "Created by System",
          "systemId": 2,
          "properties": "",
          "enableDiscovery": true,
          "reDiscovery": false,
          "applicationProperties": [],
          "type": "aws_s3",
          "uniqueCode": "aws_s3",
          "topicName": "privacera_scan_worker_aws_s3_XXXX",
          "active": true
        }
      ],
      "active": true
    }
  ]
}

  • Use the id field from the response as APP_ID
  • Use the name field as APP_NAME
  • Use the uniqueCode field as APP_CODE in the subsequent API calls.

Additional parameters reference

The following table contains all additional parameters used across different API endpoints in this guide:

Additional Parameters Description
size Number of results to display per page in case pagination exists. Default value is 15. You can change it as per your requirement.
serviceId GCP project ID from which the resource to be scanned
databaseName Target database name for scanning resource in it
tableName Specific table or wildcard for all under mentioned database
globalId Global identifier for the scan configuration, or (Optional) Global ID to filter results for a specific scan request
groupPartFiles Whether to group partition files (true/false). When scanning a single resource that contains multiple partition files, set this to true to group and view them together as a single entity, or false to see each partition file individually
groupTables Whether to group tables (true/false). For database resources with multiple tables, set this to true to group related tables together, or false to view each table separately
searchType How the resource should match your search term. Use partial_match to find results that contain your resource, or exact_match to find results that match exactly the scanned result
autoClassified Set to 1 to include resources with tags automatically classified by the system, or 0 to exclude them
manualReviewed Set to 1 to include resources with tags that have been manually reviewed by a user, or 0 to exclude them

Discovery API endpoints and examples

Important

Replace the placeholder values with your actual configuration:

PORTAL_URL: Privacera portal URL

USERNAME: Privacera portal login username.

PASSWORD: Privacera portal login password

APP_ID: The application ID (You can refer to steps from here)

APP_NAME: The application name (You can refer to steps from here)

APP_CODE: The application code (You can refer to steps from here)

RESOURCE_PATH: The resource path to scan. (You can follow the steps outlined here to add a resource for discovery scanning.)

Tip

Example of URL encoding:

  • For a resource path like s3://my-bucket/data/file.csv, the URL-encoded value would be s3%3A%2F%2Fmy-bucket%2Fdata%2Ffile.csv.

  • Use this encoded value in API requests where the resource path is required in the URL.

1. Configure resource for discovery scanning

Configure resources for discovery scanning by creating a landing zone. You can add filesystem or database resources for scanning using the methods described here. The request parameters will vary based on the connector type.

Note

Additional parameters based on resource type:

  • GCS & GBQ resources: serviceId parameter will be added in the request body
  • Database resources: databaseName and tableName parameters will be added in the request body

Endpoint: POST /api/data/dic/application/config/bulk

Bash
curl -X POST "https://<PORTAL_URL>/api/data/dic/application/config/bulk" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -u "USERNAME:PASSWORD" \
  -d '[
    {
      "appId": "APP_ID",
      "rescanEnabled": false,
      "configType": "L",
      "type": "I",
      "resource": "RESOURCE_PATH"
    }
  ]'

Query parameters:

  • appId: Application ID for the connector (You can refer to steps from here)
  • appCode: Application code for the connector (You can refer to steps from here)
  • rescanEnabled: Controls the scanning behavior for resources. Set to false when creating a landing zone configuration to register resources without immediate scanning. Set to true when starting an offline scan to actually initiate the discovery process on the configured resources. This parameter determines whether the system should begin scanning immediately or just configure the resource for future scanning.
  • configType: Configuration type (L for landing zone and D for datazone)
  • type: Operation type (I for include and E for exclude) while adding resource in landing zone
  • resource: Resource path to be scanned in the form mentioned here

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
[
    {
        "status": 1,
        "globalId": "1752759544619_291_14",
        "appId": APP_ID,
        "appCode": "APP_CODE",
        "type": "I",
        "databaseName": null,
        "tableName": null,
        "resource": "RESOURCE_PATH",
        "rescanEnabled": true,
        "configType": "L",
        "rescanType": "RESCAN",
        "serviceId": "",
        "active": true
    }
]

Tip

The globalId value found in this response is used in further processes.

2. Start offline scan

Initiate an offline scan for the configured resources. This endpoint triggers the discovery scanning process for the resources that were previously configured.

Note

Additional parameters based on resource type:

  • GCS & GBQ resources: serviceId parameter will be added in the request body
  • Database resources: databaseName and tableName parameters will be added in the request body

Endpoint: POST /api/data/dic/application/config/bulk

Bash
curl -X POST "https://<PORTAL_URL>/api/data/dic/application/config/bulk" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -u "USERNAME:PASSWORD" \
  -d '[
    {
        "globalId": "1752759544619_291_14",
        "appId": APP_ID,
        "appCode": "APP_CODE",
        "type": "I",
        "databaseName": null,
        "tableName": null,
        "resource": "RESOURCE_PATH",
        "rescanEnabled": true,
        "configType": "L",
        "rescanType": "RESCAN",
        "serviceId": "",
        "active": true
    }
]'

Query parameters:

  • globalId: Global identifier for the scan configuration
  • appId: Application ID for the connector (You can refer to steps from here)
  • appCode: Application code for the connector (You can refer to steps from here)
  • type: Operation type (I for include, E for exclude)
  • resource: Resource path to be scanned in the form mentioned here
  • rescanEnabled: Controls the scanning behavior for resources. Set to false when creating a landing zone configuration to register resources without immediate scanning. Set to true when starting an offline scan to actually initiate the discovery process on the configured resources. This parameter determines whether the system should begin scanning immediately or just configure the resource for future scanning.
  • configType: Configuration type (L for landing zone, D for datazone)
  • rescanType: Type of rescan operation (RESCAN)
  • active: Whether the configuration is active (true/false)

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
        "status": 1,
        "globalId": "1752759544619_291_14",
        "appId": APP_ID,
        "appCode": "APP_CODE",
        "type": "I",
        "databaseName": null,
        "tableName": null,
        "resource": "RESOURCE_PATH",
        "rescanEnabled": true,
        "configType": "L",
        "rescanType": "RESCAN",
        "serviceId": "",
        "active": true
}

3. Monitor scan requests

Monitor scan request execution to track offline scan progress either for all scan requests or for a single scan request using the scan ID search filter.

Endpoint: GET /api/discovery/scan_request

  1. For querying all scan requests:

    Bash
    1
    2
    3
    curl -X GET "https://<PORTAL_URL>/api/discovery/scan_request?size=15&sort=createTime,DESC&type=OFFLINE" \
    -H "Accept: application/json" \
    -u "USERNAME:PASSWORD"
    
  2. For querying a single scan request:

    Bash
    1
    2
    3
    curl -X GET "https://<PORTAL_URL>/api/discovery/scan_request?size=15&sort=createTime,DESC&type=OFFLINE&globalId=1752759544619_291_14" \
    -H "Accept: application/json" \
    -u "USERNAME:PASSWORD"
    

Query parameters:

  • sort: Sort criteria (e.g., createTime,DESC)
  • type: Scan type (OFFLINE)

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
    "content": [
        {
            "globalId": "1752759544619_291_14",
            "updatedByUserName": "privacera_service_discovery",
            "resources": "RESOURCE_PATH",
            "appId": APP_ID,
            "appCode": "APP_CODE",
            "scanStatus": "SUCCESS",
            "configId": 641,
            "startTime": 1752820114000,
            "endTime": 1752820179000,
            "percentageComplete": 100,
            "scheduleId": null,
            "summaryInfo":"",
            "type": "OFFLINE",
            "scanInfo": [
                {
                    "globalId": "1752820114405_440_2",
                    "scanId": "1752820101451_465_1",
                    "startTime": 1752820114000,
                    "endTime": 1752820178000,
                    "scanTime": 64905,
                    "type": "SCAN_LISTING",
                    "active": true
                },
                {
                    "globalId": "1752820179008_253_3",
                    "scanId": "1752820101451_465_1",
                    "startTime": 1752820137000,
                    "endTime": 1752820160000,
                    "scanTime": 22485,
                    "type": "SCAN_TIME",
                    "active": true
                }
            ],
            "scanInfoJobStatus": {
                "percentageComplete": 100,
                "status": "SUCCESS"
            },
            "rescanType": "RESCAN",
            "active": true
        }]

}

4. Retrieve scan results

Retrieve detailed resource information from completed scans.

Endpoint: GET /api/discovery/resourceinfo

You can retrieve scan results in two different ways:

1. Navigate by scan ID

Use this approach when you want to see all resources discovered in a specific scan.

Bash
1
2
3
curl -X GET "https://<PORTAL_URL>/api/discovery/resourceinfo?size=15&groupPartFiles=false&groupTables=false&scanGlobalId=*1752759544619_291_14*&autoClassified=1&manualReviewed=1" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

2. Navigate by resource

Use this approach when you want to see scan results for a specific resource path.

Bash
1
2
3
curl -X GET "https://<PORTAL_URL>/api/discovery/resourceinfo?size=15&groupPartFiles=false&groupTables=false&resource=s3%3A%2F%2FRESOURCE_PATH&scanGlobalId=*1752759544619_291_14*&searchType=partial_match&autoClassified=1&manualReviewed=1" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

Query parameters:

  • resource: The path of the resource you want to filter results by. Make sure to URL-encode this value (for example, use s3%3A%2F%2Fmy-bucket%2Fdata.csv for s3://my-bucket/data.csv). This helps you see scan results only for a specific file, folder, or database table.
  • scanGlobalId: The unique ID of the scan you want results for. Use the scanGlobalId value returned from the scan status or scan list API to filter results to a specific scan.

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
  "content": [
    {
      "application": "APP_NAME",
      "appCode": "APP_CODE",
      "appType": "APP_NAME",
      "resourceId": "APP_CODE@RESOURCE_PATH-9999",
      "resource": "RESOURCE_PATH",
      "dataZone": "",
      "dataZoneIds": "",
      "tagsInfo": [
        {
          "tag": "EMAIL",
          "snippets": [
            {
              "resource": "RESOURCE_PATH/email",
              "metaName": "email",
              "score": 90.9090909090909,
              "position": 2,
              "fieldType": "string",
              "tagReason": "TAG_REASON_PATTERN_MATCH_CONTENT",
              "sampleValues": [],
              "tagName": "EMAIL",
              "tagStatus": "TAG_STATUS_AUTO_CLASSIFIED",
              "tagType": "TAG_TYPE_CONTENT_TAG",
              "levelType": "FIELD"
            }
          ]
        },
        {
          "tag": "SSN",
          "snippets": [
            {
              "resource": "RESOURCE_PATH/ssn",
              "metaName": "ssn",
              "score": 90.9090909090909,
              "position": 3,
              "fieldType": "string",
              "tagReason": "TAG_REASON_ML_MODEL",
              "sampleValues": [],
              "tagName": "SSN",
              "tagStatus": "TAG_STATUS_AUTO_CLASSIFIED",
              "tagType": "TAG_TYPE_CONTENT_TAG",
              "levelType": "FIELD"
            }
          ]
        }
      ],
      "resourceMetaInfo": {
        "dataType": "text/csv",
        "resourceType": "FILE",
        "path": "RESOURCE_PATH"
      },
      "superDataType": "STRUCTURED_DATA",
      "recordCount": 11,
      "scanGlobalId": "1752759544619_291_14"
    }
  ]
}

5. Retrieve classification by application name

Retrieve classification information for resources filtered by application name.

Endpoint: GET /api/discovery/resourceinfo

Bash
1
2
3
curl -X GET "https://<PORTAL_URL>/api/discovery/resourceinfo?size=15&groupPartFiles=false&groupTables=false&autoClassified=1&manualReviewed=1&appNames=%22APP_NAME%22" \
  -H "Accept: application/json" \
  -u "USERNAME:PASSWORD"

Query parameters:

  • appNames: Use this to filter results by the name of your application. Just type the application name you want to search for. If the name has spaces or special characters, use URL encoding (for example, if your app name is Redshift-Discovery, you would use Redshift-Discovery as is, but if it was Redshift Discovery with a space, you would use Redshift%20Discovery). You can find your application name by following the steps in the prerequisites section.

Refer to Additional parameters reference for additional parameters that can be configured with this endpoint.

Response:

JSON
{
  "content": [
    {
      "application": "APP_NAME",
      "appCode": "APP_CODE",
      "appKind": "APP_KIND_FILESYSTEM",
      "appType": "APP_NAME",
      "resourceId": "APP_CODE@RESOURCE_PATH-9999",
      "resource": "RESOURCE_PATH",
      "dataZone": "",
      "dataZoneIds": "",
      "tagsInfo": [
        {
          "tag": "PERSON_NAME",
          "snippets": [
            {
              "resource": "RESOURCE_PATH/first_name",
              "metaName": "first_name",
              "score": 90.9090909090909,
              "position": 0,
              "fieldType": "string",
              "key": "PERSON_NAME_LOOKUP",
              "tagReason": "TAG_REASON_LOOKUP",
              "sampleValues": [],
              "tagLocations": [],
              "tagName": "PERSON_NAME",
              "tagStatus": "TAG_STATUS_AUTO_CLASSIFIED",
              "tagType": "TAG_TYPE_CONTENT_TAG",
              "deleted": false,
              "actualResource": "RESOURCE_PATH/first_name",
              "levelType": "FIELD",
              "tagAttributes": []
            }
          ]
        }
      ],
      "resourceMetaInfo": {
        "dataType": "text/csv",
        "resourceType": "FILE",
        "fieldMetaInfos": [
          {
            "dataType": "string",
            "resourceType": "FIELD",
            "resource": "RESOURCE_PATH/first_name",
            "metaName": "first_name"
          }
        ]
      },
      "superDataType": "STRUCTURED_DATA",
      "scanGlobalId": "1752759544619_291_14"
    }
  ]
}

Comments