Skip to content

Discovery Scanning Overview

This section introduces Discovery Scanning, explaining its functionality within the Discovery framework and its integration with rule-based classification and configuration setups to identify sensitive data in your environment.

How Scanning Works

Discovery scanning is the core mechanism that inspects structured and semi-structured data sources to identify and classify sensitive information. It operates by applying user-defined or built-in rules, dictionaries, and patterns across datasets in supported data sources.

The scanning process follows these key steps:

  1. Data Source Connection: Discovery connects to configured data sources (e.g., databases, data lakes).
  2. Metadata Extraction: It reads table and column metadata to prepare for scanning.
  3. Sample Data Retrieval: A sample of data is fetched (configurable) to perform analysis.
  4. Rule Application: Discovery applies classification rules using dictionaries, patterns, and models.
  5. Tagging: Matched results are tagged with appropriate metadata labels for further use.
  6. Reporting: Results are indexed and made available in the portal and via Data Inventory reports.
flowchart LR
    A[Data Source Connected] --> B[Metadata & Sample Data Retrieved]
    B --> C[Classification Rules Applied]
    C --> D[Tags Assigned]
    D --> E[Results Indexed & Reported]

Setting Up Scanning

To fully leverage Discovery’s scanning capabilities, you need to configure the system appropriately. This includes setting up data sources, defining rules, and customizing classification logic.

Please visit the Setup Guide to:

  • Connect and configure supported data sources
  • Define classification rules for scanning
  • Manage dictionary and pattern libraries
  • Enable scan scheduling and data retention policies

Integrating with Classification Techniques

To make scanning meaningful and effective, it must be coupled with classification logic. Classification rules define how and what Discovery should tag during a scan.

Refer to the Classification Techniques Guide to understand how to:

  • Use keyword and lookup dictionaries
  • Apply regex-based pattern matching
  • Leverage model-based classification
  • Combine multiple techniques for precise tagging
flowchart TD
    A[Rule Definition] --> B[Dictionaries / Patterns / Models]
    B --> C[Rule Evaluation During Scan]
    C --> D[Tag Assigned to Data]
    D --> E[Used in Policies & Data Inventory]

By combining the setup, scanning, and classification components, Discovery provides a powerful and flexible data detection engine tailored to your governance and compliance goals.


Comments