Rules in Privacera Discovery¶
Rules in Privacera Discovery define how tags are assigned to data elements by combining logic from dictionaries and patterns. They provide the intelligence to determine what constitutes sensitive data and ensure consistent tagging across scans.
What Are Rules?¶
A rule specifies conditions under which a tag is applied. It links together one or more classification techniques such as dictionaries (based on content or column names) and/or models (heuristic and logic based) and defines how they should be interpreted.
Tags assigned via rules can then be used for classification, monitoring, and access control.
Rule Structure¶
Each rule can include the following:
- Name: Unique identifier for the rule.
- Description (optional): Explains the purpose of the rule.
- Must Have: Required dictionary or pattern keys (e.g.,
m_EMAIL_KEYWORD
,c_STATE_LOOKUP
). The rule applies only if all these keys are present. - Must Not Have: Exclusion criteria. The rule is skipped if all these keys are present.
- Score Type: Defines how the result is handled. Options include:
- REVIEW SCORE: Tag requires manual review.
- AUTO YES SCORE: Tag is auto-assigned without review.
- ACTUAL SCORE: Tag is determined based on calculated score.
- Key for Samples: Feature key from which to retrieve sample values.
- Output tags: The tag to apply when the rule matches
- Enabled: Flag to activate or deactivate the rule.
- Order: Execution priority (earlier rules take precedence)
Feature Key Naming Conventions¶
When a dictionary or model is created in Discovery, it is configured to operate either on metadata (e.g., column names) or on content (e.g., data values). Based on this selection, Discovery automatically generates a corresponding feature key:
m_
prefix: Indicates the feature is based on metadata and applies to column names.c_
prefix: Indicates the feature is based on content and applies to actual data values.
For dictionaries, the key also includes a suffix that specifies the match type: - KEYWORD
: Used when matching column names (metadata). - LOOKUP
: Used when matching actual data values (content).
📌 Note: Models do not use suffixes in their generated feature keys. Only the
m_
orc_
prefix is added based on their application mode.
Understanding these conventions is essential when building rules. To construct a rule, you must use the appropriate key based on the type of detection you intend to perform.
Example:¶
- A dictionary named
PERSON_NAME
, when applied to metadata, results in a key:m_PERSON_NAME_KEYWORD
. - A model named
CC_MODEL
, when applied to content, results in a key:c_CC_MODEL
.
These keys are then referenced in the rule's Must Have
or Must Not Have
fields to achieve your classification objective. The keys specified are logically combined using AND condition to determine if the rule should be applied. To achieve OR condition, you can create multiple rules with the same tag.
Key Naming Examples¶
Type | Mode | Prefix | Suffix | Example Key |
---|---|---|---|---|
Dictionary | Metadata | m_ | KEYWORD | m_PERSON_NAME_KEYWORD |
Dictionary | Content | c_ | LOOKUP | c_STATE_LOOKUP |
Model | Metadata | m_ | (none) | m_CREDIT_CARD_MODEL |
Model | Content | c_ | (none) | c_CREDIT_CARD_MODEL |
Models in Rules¶
Models are heuristic and logic-based techniques used to identify structured data (e.g., credit cards, CVVs). To use a model in a rule:
- Define the model under Discovery → Models.
- Create a rule with the model key in the Must Have field.
Example:¶
- Model:
CREDIT_CARD_MODEL
- Rule: Match if model
CREDIT_CARD_MODEL
identifies a credit card number in a column or file
Creating a Rule – Example Workflow¶
Step 1: Add a Dictionary¶
- Go to:
Discovery -> Dictionaries -> Add
- Example:
BANK_DICT
for detecting terms likeCVV
- Type:
LOOKUP
for content,KEYWORD
for headers
Step 2: Add a Pattern (if needed)¶
- Go to:
Discovery -> Patterns -> Add
- Example: Pattern
BANK_CVV
with regex\b(\d{3})\b
Step 3: Add a Rule¶
- Go to:
Discovery -> Rules -> Add
- Define name, tag, and add
Must Have
keys likem_BANK_DICT_KEYWORD
orBANK_CVV
- Set order to determine rule priority
Step 4: Run Scan¶
- Upload data and run a scan.
- Review the results under
Data Inventory
to see if the rule was applied correctly.
Rule Execution Priority¶
Rules are evaluated based on their defined order:
- Rules at the top of the list are applied first.
- When a rule matches all conditions, it applies its tag and stops further evaluation.
- Use stricter, high-confidence rules earlier, followed by more general rules.
- Use the Reorder Rules option to adjust execution order.
Use Cases for Rules¶
- Tagging emails using both pattern match and keyword in header (
EMAIL
,EMAIL_ADDR
) - Combining metaname and content-based lookups for multi-level detection
- Custom rules for industry-specific tags (e.g.,
ACCOUNT_ID
,DOB
,NATIONAL_ID
)
Conclusion¶
Rules are the decision-making layer in Privacera Discovery. They combine dictionaries, patterns, and logic to accurately tag data elements with relevant classifications. By managing rules effectively, organizations can ensure consistent, precise, and policy-ready data tagging.
- Previous topic: Models
- Next topic: Scanning Ways