Skip to content

Rules in Privacera Discovery

Rules in Privacera Discovery define how tags are assigned to data elements by combining logic from dictionaries and patterns. They provide the intelligence to determine what constitutes sensitive data and ensure consistent tagging across scans.


What Are Rules?

A rule specifies conditions under which a tag is applied. It links together one or more classification techniques such as dictionaries (based on content or column names) and/or models (heuristic and logic based) and defines how they should be interpreted.

Tags assigned via rules can then be used for classification, monitoring, and access control.


Rule Structure

Each rule can include the following:

  • Name: Unique identifier for the rule.
  • Description (optional): Explains the purpose of the rule.
  • Must Have: Required dictionary or pattern keys (e.g., m_EMAIL_KEYWORD, c_STATE_LOOKUP). The rule applies only if all these keys are present.
  • Must Not Have: Exclusion criteria. The rule is skipped if all these keys are present.
  • Score Type: Defines how the result is handled. Options include:
    • REVIEW SCORE: Tag requires manual review.
    • AUTO YES SCORE: Tag is auto-assigned without review.
    • ACTUAL SCORE: Tag is determined based on calculated score.
  • Key for Samples: Feature key from which to retrieve sample values.
  • Output tags: The tag to apply when the rule matches
  • Enabled: Flag to activate or deactivate the rule.
  • Order: Execution priority (earlier rules take precedence)

Feature Key Naming Conventions

When a dictionary or model is created in Discovery, it is configured to operate either on metadata (e.g., column names) or on content (e.g., data values). Based on this selection, Discovery automatically generates a corresponding feature key:

  • m_ prefix: Indicates the feature is based on metadata and applies to column names.
  • c_ prefix: Indicates the feature is based on content and applies to actual data values.

For dictionaries, the key also includes a suffix that specifies the match type: - KEYWORD: Used when matching column names (metadata). - LOOKUP: Used when matching actual data values (content).

📌 Note: Models do not use suffixes in their generated feature keys. Only the m_ or c_ prefix is added based on their application mode.

Understanding these conventions is essential when building rules. To construct a rule, you must use the appropriate key based on the type of detection you intend to perform.

Example:

  • A dictionary named PERSON_NAME, when applied to metadata, results in a key: m_PERSON_NAME_KEYWORD.
  • A model named CC_MODEL, when applied to content, results in a key: c_CC_MODEL.

These keys are then referenced in the rule's Must Have or Must Not Have fields to achieve your classification objective. The keys specified are logically combined using AND condition to determine if the rule should be applied. To achieve OR condition, you can create multiple rules with the same tag.

Key Naming Examples

Type Mode Prefix Suffix Example Key
Dictionary Metadata m_ KEYWORD m_PERSON_NAME_KEYWORD
Dictionary Content c_ LOOKUP c_STATE_LOOKUP
Model Metadata m_ (none) m_CREDIT_CARD_MODEL
Model Content c_ (none) c_CREDIT_CARD_MODEL

Models in Rules

Models are heuristic and logic-based techniques used to identify structured data (e.g., credit cards, CVVs). To use a model in a rule:

  1. Define the model under DiscoveryModels.
  2. Create a rule with the model key in the Must Have field.

Example:

  • Model: CREDIT_CARD_MODEL
  • Rule: Match if model CREDIT_CARD_MODEL identifies a credit card number in a column or file

Creating a Rule – Example Workflow

Step 1: Add a Dictionary

  • Go to: Discovery -> Dictionaries -> Add
  • Example: BANK_DICT for detecting terms like CVV
  • Type: LOOKUP for content, KEYWORD for headers

Step 2: Add a Pattern (if needed)

  • Go to: Discovery -> Patterns -> Add
  • Example: Pattern BANK_CVV with regex \b(\d{3})\b

Step 3: Add a Rule

  • Go to: Discovery -> Rules -> Add
  • Define name, tag, and add Must Have keys like m_BANK_DICT_KEYWORD or BANK_CVV
  • Set order to determine rule priority

Step 4: Run Scan

  • Upload data and run a scan.
  • Review the results under Data Inventory to see if the rule was applied correctly.

Rule Execution Priority

Rules are evaluated based on their defined order:

  • Rules at the top of the list are applied first.
  • When a rule matches all conditions, it applies its tag and stops further evaluation.
  • Use stricter, high-confidence rules earlier, followed by more general rules.
  • Use the Reorder Rules option to adjust execution order.

Use Cases for Rules

  • Tagging emails using both pattern match and keyword in header (EMAIL, EMAIL_ADDR)
  • Combining metaname and content-based lookups for multi-level detection
  • Custom rules for industry-specific tags (e.g., ACCOUNT_ID, DOB, NATIONAL_ID)

Conclusion

Rules are the decision-making layer in Privacera Discovery. They combine dictionaries, patterns, and logic to accurately tag data elements with relevant classifications. By managing rules effectively, organizations can ensure consistent, precise, and policy-ready data tagging.

Comments