Skip to content

Using Dictionaries

In Privacera Discovery, a Dictionary is a powerful classification mechanism used to detect and tag sensitive data by matching it against predefined terms or patterns. Dictionaries are particularly useful for identifying industry-specific terminology, business-sensitive information, or common sensitive terms across datasets.

Discovery supports three types of dictionary matching modes:

1. Fuzzy Match

  • Description: Matches source data entries that are approximately similar to the dictionary terms. This is useful when the data may contain variations, misspellings, or alternate forms of the term.
  • Use Case Example: Matching names of medications or organizations with slight spelling differences.

  • Sample Dictionary Entries:

    Text Only
    1
    2
    3
    diabetes
    hypertension
    cardiovascular
    

  • Example Matches:
  • "diabtes"
  • "cardio vascular"
  • "hypertensn"

2. Exact Match

  • Description: Requires the data to match the dictionary entries exactly as specified. This is ideal for scenarios where data values are consistent and strictly formatted.
  • Use Case Example: Matching a set of sanctioned entity names or internal project codes.

  • Sample Dictionary Entries:

    Text Only
    1
    2
    3
    ProjectX123
    ConfidentialClient
    RestrictedGroupA
    

  • Example Matches:
  • "ProjectX123" ➝ Match
  • "ProjectX" ➝ No Match

Note

All patterns in dictionaries use case-insensitive matching. This means the system treats “Email,” “email,” and “EMAIL” as the same word. This ensures that variations in capitalization do not affect detection accuracy across different match modes.

3. Pattern Match (Regex)

  • Description: Allows the use of regular expressions to define complex patterns. This is particularly useful for matching structured patterns like IDs, codes, or specific text formats. All regex matches are case-insensitive.

  • Use Case Example: Detecting customer IDs, invoice numbers, or formatted codes.

  • Sample Dictionary Entries (Regex):

    Text Only
    ^INV[0-9]{5}$
    ^CUS-[A-Z]{3}-[0-9]{4}$
    

  • Example Matches:

  • "INV12345" ➝ Match
  • "inv12345" ➝ Match
  • "CUS-ABC-2023" ➝ Match
  • "cus-abc-2023" ➝ Match
  • "CUS-abc-2023" ➝ Match (case insensitive)
Age Keyword

Age Keyword

The product includes a fuzzy-match keyword dictionary for age column names. It pairs with the bundled AGE pattern — see Age Pattern for value-matching rules and supported formats. Tag: AGE.

Dictionary key Match type Category Typical file Drives tag
AGE_KEYWORD Fuzzy metaname age_keyword.txt AGE

Typical entries include age, patient_age, years old, and age_group. In the dictionary, set Tags (input tags) so matches associate with AGE.

Rules

  • Structured: Age Strictc_AGE + m_AGE_KEYWORDAGE (ACTUAL_SCORE). No bundled review-only structured rule.
  • Unstructured: rule_agec_AGE + c_AGE_KEYWORD within 5 words → AGE. Confirm Rules Mapping for c_AGE.

Structured vs. unstructured feature prefixes

Structured rules reference column-name keywords with the m_ prefix (for example m_AGE_KEYWORD). Unstructured proximity rules reference the same dictionary with the c_ prefix (for example c_AGE_KEYWORD). Enable the dictionary once; both rule types consume it through their respective feature keys.

Quick checklist

  • Enable the AGE tag and AGE pattern (see Age Pattern).
  • Enable AGE_KEYWORD dictionary and link the AGE tag.
  • Enable Age Strict structured rule and rule_age unstructured rule; confirm Rules Mapping.
Gender Keyword

Gender Keyword

The product includes a fuzzy-match keyword dictionary for gender or sex column names. It pairs with the bundled GENDER pattern — see Gender Pattern for value-matching rules and supported formats. Tag: GENDER.

Dictionary key Match type Category Typical file Drives tag
GENDER_KEYWORD Fuzzy metaname gender_keyword.txt GENDER

Typical entries include gender, sex, gender_code, patient_gender, and gender_identity. In the dictionary, set Tags (input tags) so matches associate with GENDER.

Rules

  • Structured: Gender Strictc_GENDER + m_GENDER_KEYWORDGENDER (ACTUAL_SCORE); Genderc_GENDER alone → review.
  • Unstructured: rule_genderc_GENDER + c_GENDER_KEYWORD within 5 words → GENDER. Confirm Rules Mapping for c_GENDER.

Quick checklist

  • Enable the GENDER tag and GENDER pattern (see Gender Pattern).
  • Enable GENDER_KEYWORD dictionary and link the GENDER tag.
  • Enable Gender Strict and Gender structured rules and rule_gender unstructured rule; confirm Rules Mapping.
Property Name

Property Name

The Property Name keyword dictionary pairs with the Property Name model (PROPERTY_NAME_ML_MODEL). See Property Name Model for model validation rules and supported name formats. Tag: PROPERTY_NAME.

Dictionary key Match type Category Typical file Drives tag
PROPERTY_NAME_KEYWORD Fuzzy metaname property_name_keyword.txt PROPERTY_NAME

Typical entries include property_name, estate_name, lot_name, and farm_name. In the dictionary, set Tags (input tags) so matches associate with PROPERTY_NAME.

Rules

  • Structured: Property Name Strictc_PROPERTY_NAME_ML_MODEL + m_PROPERTY_NAME_KEYWORDPROPERTY_NAME (ACTUAL_SCORE); Property Name — model alone → review.
  • Unstructured: rule_property_namec_PROPERTY_NAME_ML_MODEL + c_PROPERTY_NAME_KEYWORD within 5 words → PROPERTY_NAME. Confirm Rules Mapping for c_PROPERTY_NAME_ML_MODEL.

Quick checklist

  • Enable the PROPERTY_NAME tag.
  • Enable PROPERTY_NAME_ML_MODEL (Models).
  • Enable PROPERTY_NAME_KEYWORD dictionary and link the PROPERTY_NAME tag.
  • Enable Property Name Strict and Property Name structured rules and rule_property_name unstructured rule; confirm Rules Mapping.
Canada Address, Person Name, City, and Province

Canada Address, Person Name, City, and Province

The product includes Canadian dictionaries and tags for address context, person names, cities, and provinces. Enable these under Discovery → Dictionaries and Discovery → Tags when Canada-specific classification is required.

Tags and purpose

Tag What it detects
CANADA_ADDRESS Canadian address context via CANADA_ADDRESS_KEYWORD; stricter rules combine keywords with the Canada Postal Code model.
CANADA_PERSON_NAME Names matched against CANADA_PERSON_NAME_LOOKUP.
CANADA_CITY Cities matched against CANADA_CITY_LOOKUP (exact list).
CANADA_PROVINCE Provinces and territories via CANADA_PROVINCE_LOOKUP (names, abbreviations, common short forms).

Dictionary keys and files

Dictionary key Match type Typical file Drives tag
CANADA_ADDRESS_KEYWORD Fuzzy canada_address_keyword.txt CANADA_ADDRESS
CANADA_PERSON_NAME_LOOKUP Exact canada_person_name_lookup.txt CANADA_PERSON_NAME
CANADA_CITY_LOOKUP Exact canada_city_lookup.txt CANADA_CITY
CANADA_PROVINCE_LOOKUP Exact canada_province_lookup.txt CANADA_PROVINCE

In each dictionary, set Tags (input tags) so matches associate with the right CANADA_* tag.

Rules and postal dependency

  • Structured rules use column-level features such as c_CANADA_ADDRESS_KEYWORD, c_CANADA_PERSON_NAME_LOOKUP, c_CANADA_CITY_LOOKUP, c_CANADA_PROVINCE_LOOKUP, and for strict address patterns c_CANADA_POSTAL_CODE_ML_MODEL. Enable the Canada structured rules you need under Discovery → Rules.
  • Unstructured rules combine these features within a word proximity window (for example address keyword + postal model). Use Discovery → Rules Mapping so outputs map to CANADA_ADDRESS, CANADA_CITY, CANADA_PROVINCE, and CANADA_PERSON_NAME.
  • If a rule requires it, enable CANADA_POSTAL_CODE (tag, model CANADA_POSTAL_CODE_ML_MODEL, and CANADA_POSTAL_CODE_KEYWORD when your rules reference it). Without the postal model, strict Canada address rules may not apply.

What to disable for Canada-first classification

US and generic detectors overlap the same signals (names, cities, states, street-style text). Leaving them enabled can produce PERSON_NAME, US_CITY, US_STATE, or US_ADDRESS alongside or instead of CANADA_* tags.

For Canada-first scans on the same data:

  1. Tags — Disable PERSON_NAME, US_CITY, US_STATE, and US_ADDRESS when you want only CANADA_PERSON_NAME, CANADA_CITY, CANADA_PROVINCE, and CANADA_ADDRESS respectively.
  2. Dictionaries — Disable US counterparts that feed those tags (for example PERSON_NAME_LOOKUP, US_CITY_LOOKUP, US_STATE_LOOKUP, and dictionaries tied to US_ADDRESS / street-address patterns).
  3. Rules — Disable structured and unstructured rules that output the US tags above; keep Canada dictionary and model rules enabled.
  4. Models — Keep CANADA_POSTAL_CODE_ML_MODEL enabled if your Canada address rules depend on it.

Portal naming

Exact labels can vary by release. Search Tags, Dictionaries, and Rules for the names above.

Quick checklist

  • Enable CANADA_ADDRESS, CANADA_PERSON_NAME, CANADA_CITY, CANADA_PROVINCE (as needed).
  • Enable the four Canada dictionaries and link tags on each.
  • Enable Canada postal tag/model/keyword when address rules need the postal model.
  • Enable Canada rules and Rules Mapping for unstructured output.
  • Disable PERSON_NAME, US_CITY, US_STATE, US_ADDRESS (and related rules/dictionaries) when you want Canadian identifiers without US overlap.
Australia Business Number (ABN)

Australia Business Number (ABN)

The product includes a fuzzy-match keyword dictionary for Australian Business Numbers (ABNs). Enable the dictionary under Discovery → Dictionaries when you need to classify Australia-specific business identifiers.

Tag and purpose

Tag What it detects
AU_ABN Detects 11-digit Australian Business Numbers issued by the Australian Taxation Office (ATO). Values are validated using the official ABN check algorithm.

Dictionary key and file

Dictionary key Match type Typical file Drives tag
AU_ABN_KEYWORD Fuzzy au_abn_keyword.txt AU_ABN

In the dictionary, set Tags (input tags) so matches associate with AU_ABN. Typical entries include terms such as ABN, Australian Business Number, and business number.

Rules and model dependency

  • Structured rules use column-level features c_AU_ABN_ML_MODEL and, for the strict rule, m_AU_ABN_KEYWORD. Enable the Australia ABN structured rules you need under Discovery → Rules (for example AU ABN Strict and AU ABN).
  • The strict rule requires both the model and keyword dictionary; the review rule uses the model alone when no keyword is present in the column context.

To enable ABN detection, configure the following assets:

  • Tag: AU_ABN
  • Model: AU_ABN_ML_MODEL
  • Dictionary: AU_ABN_KEYWORD (required for the strict rule)

Without the model, ABN rules are not evaluated.

Quick checklist

  • Enable AU_ABN tag.
  • Enable AU_ABN_KEYWORD dictionary and link the tag.
  • Enable AU_ABN_ML_MODEL model.
  • Enable ABN structured rules and Rules Mapping as needed.
Australia Company Number (ACN)

Australia Company Number (ACN)

The product includes a fuzzy-match keyword dictionary for Australian Company Numbers (ACNs). Enable it under Discovery → Dictionaries when Australia-specific company-identifier classification is required.

Tag and purpose

Tag What it detects
AU_ACN Detects 9-digit Australian Company Numbers issued by ASIC, including values with leading zeros. Values are validated using the official ACN check-digit algorithm.

Dictionary key and file

Dictionary key Match type Typical file Drives tag
AU_ACN_KEYWORD Fuzzy au_acn_keyword.txt AU_ACN

In the dictionary, set Tags (input tags) so matches associate with AU_ACN. Typical entries include terms such as ACN, Australian Company Number, and company number.

Rules and model dependency

  • Structured rules use c_AU_ACN_ML_MODEL and, for the strict rule, m_AU_ACN_KEYWORD. Enable the Australia ACN structured rules you need under Discovery → Rules (for example AU ACN Strict and AU ACN).
  • The strict rule requires both the model and keyword dictionary; the review rule uses the model alone when no keyword is present.

To enable ACN detection, configure the following assets:

  • Tag: AU_ACN
  • Model: AU_ACN_ML_MODEL
  • Dictionary: AU_ACN_KEYWORD (required for the strict rule)

Without the model, ACN rules are not evaluated.

Quick checklist

  • Enable AU_ACN tag.
  • Enable AU_ACN_KEYWORD dictionary and link the tag.
  • Enable AU_ACN_ML_MODEL model.
  • Enable ACN structured rules and Rules Mapping as needed.
New Zealand IRD Number

New Zealand IRD Number

The product includes a fuzzy-match keyword dictionary for New Zealand Inland Revenue Department (IRD) numbers. Enable the dictionary under Discovery → Dictionaries when you need to classify New Zealand tax-identifier identifiers.

Tag and purpose

Tag What it detects
NZ_IRD Detects 8- or 9-digit IRD numbers used for tax identification in New Zealand. Values are validated using the IRD check-digit algorithm.

Dictionary key and file

Dictionary key Match type Typical file Drives tag
NZ_IRD_KEYWORD Fuzzy nz_ird_keyword.txt NZ_IRD

In the dictionary, set Tags (input tags) so matches associate with NZ_IRD. Typical entries include terms such as IRD, IRD number, Inland Revenue, and tax number.

Rules and model dependency

  • Structured rules use c_NZ_IRD_ML_MODEL and, for the strict rule, m_NZ_IRD_KEYWORD. Enable the New Zealand IRD structured rules under Discovery → Rules (for example New Zealand IRD Number Strict and New Zealand IRD Number).
  • Unstructured rules combine c_NZ_IRD_ML_MODEL with c_NZ_IRD_KEYWORD within a word-proximity window (for example rule_nz_ird). Use Discovery → Rules Mapping so outputs map to NZ_IRD.

To enable IRD detection, configure the following assets:

  • Tag: NZ_IRD
  • Model: NZ_IRD_ML_MODEL
  • Dictionary: NZ_IRD_KEYWORD (required for strict and unstructured rules)

Without the model, IRD rules are not evaluated.

Quick checklist

  • Enable NZ_IRD tag.
  • Enable NZ_IRD_KEYWORD dictionary and link the tag.
  • Enable NZ_IRD_ML_MODEL model.
  • Enable IRD structured and unstructured rules and Rules Mapping as needed.
Australia and New Zealand Bank Account Numbers

Australia and New Zealand Bank Account Numbers

The product includes fuzzy-match keyword dictionaries for Australian and New Zealand bank account numbers. Enable them under Discovery → Dictionaries when ANZ bank-account classification is required.

Tags and purpose

Tag What it detects
AU_BANK_ACCOUNT Detects Australian bank account numbers consisting of a 6-digit BSB (Bank-State-Branch) code plus an account number.
NZ_BANK_ACCOUNT Detects New Zealand bank account numbers in the standard format (bank–branch–account–suffix).

Dictionary keys and files

Dictionary key Match type Typical file Drives tag
AU_BANK_ACCOUNT_KEYWORD Fuzzy au_bank_account_keyword.txt AU_BANK_ACCOUNT
NZ_BANK_ACCOUNT_KEYWORD Fuzzy nz_bank_account_keyword.txt NZ_BANK_ACCOUNT

In each dictionary, set Tags (input tags) so matches associate with the corresponding tag. Typical entries include terms such as BSB, bank account, account number, and bank account number.

Rules and model dependency

  • Structured rules use c_AU_BANK_ACCOUNT_ML_MODEL / c_NZ_BANK_ACCOUNT_ML_MODEL and, for strict rules, m_AU_BANK_ACCOUNT_KEYWORD / m_NZ_BANK_ACCOUNT_KEYWORD. Enable the bank-account structured rules under Discovery → Rules (for example Australian Bank Account Number Strict, Australian Bank Account Number, New Zealand Bank Account Number Strict, and New Zealand Bank Account Number).
  • Unstructured rules combine each model with its keyword dictionary within a word-proximity window (for example rule_au_bank_account and rule_nz_bank_account). Use Discovery → Rules Mapping so outputs map to AU_BANK_ACCOUNT and NZ_BANK_ACCOUNT.
  • Enable each tag, model (AU_BANK_ACCOUNT_ML_MODEL, NZ_BANK_ACCOUNT_ML_MODEL), and keyword dictionary when using the corresponding strict or unstructured rules.

Quick checklist

  • Enable AU_BANK_ACCOUNT and/or NZ_BANK_ACCOUNT tags (as needed).
  • Enable the matching keyword dictionaries and link tags on each.
  • Enable AU_BANK_ACCOUNT_ML_MODEL and/or NZ_BANK_ACCOUNT_ML_MODEL models.
  • Enable bank-account structured and unstructured rules and Rules Mapping as needed.
Australia Medicare Number

Australia Medicare Number

The product includes an Australian Medicare dictionary and tag that pair with the Australia Medicare Number model. Enable these under Discovery → Dictionaries and Discovery → Tags when Australian Medicare number classification is required.

Tags and purpose

Tag What it detects
AU_MEDICARE Australian Medicare card numbers detected by AU_MEDICARE_ML_MODEL; strict rules combine the model with AU_MEDICARE_KEYWORD.

Dictionary keys and files

Dictionary key Match type Typical file Drives tag
AU_MEDICARE_KEYWORD Fuzzy au_medicare_keyword.txt AU_MEDICARE

In the dictionary, set Tags (input tags) so matches associate with the AU_MEDICARE tag.

Rules

  • Structured rules use column-level features:
    • c_AU_MEDICARE_ML_MODEL + m_AU_MEDICARE_KEYWORDAustralia Medicare Number Strict (auto-tag).
    • c_AU_MEDICARE_ML_MODEL alone → Australia Medicare Number (review).
  • Unstructured rules combine the model feature with the keyword feature within a 5-word, order-strict proximity window:
    • rule_au_medicarec_AU_MEDICARE_ML_MODEL + c_AU_MEDICARE_KEYWORDAU_MEDICARE.
  • Use Discovery → Rules Mapping to confirm AU_MEDICARE is mapped to its model feature key for unstructured output.

Portal naming

Exact labels can vary by release. Search Tags, Dictionaries, and Rules for the names above.

Quick checklist

  • Enable the AU_MEDICARE tag.
  • Enable the AU_MEDICARE_ML_MODEL model.
  • Enable the AU_MEDICARE_KEYWORD dictionary and link the AU_MEDICARE tag on it.
  • Enable the Australia Medicare structured rules (Strict + non-strict variants) under Discovery → Rules.
  • Enable the Australia Medicare unstructured rule and confirm Rules Mapping under Discovery → Rules Mapping.
Australia and New Zealand Passport

Australia and New Zealand Passport

The product includes ANZ passport dictionaries and tags that pair with the Australian Passport and New Zealand Passport models. Enable these under Discovery → Dictionaries and Discovery → Tags when ANZ passport classification is required.

Tags and purpose

Tag What it detects
AU_PASSPORT Australian passport numbers detected by AUSTRALIA_PASSPORT_ML_MODEL; strict rules combine the model with AU_PASSPORT_KEYWORD.
NZ_PASSPORT New Zealand passport numbers detected by NEW_ZEALAND_PASSPORT_ML_MODEL; strict rules combine the model with NZ_PASSPORT_KEYWORD.

Dictionary keys and files

Dictionary key Match type Typical file Drives tag
AU_PASSPORT_KEYWORD Fuzzy australia_passport_keyword.txt AU_PASSPORT
NZ_PASSPORT_KEYWORD Fuzzy nz_passport_keyword.txt NZ_PASSPORT

In each dictionary, set Tags (input tags) so matches associate with the right AU_PASSPORT / NZ_PASSPORT tag.

Rules

  • Structured rules use column-level features:
    • c_AUSTRALIA_PASSPORT_ML_MODEL + m_AU_PASSPORT_KEYWORDAustralian Passport Strict (auto-tag).
    • c_AUSTRALIA_PASSPORT_ML_MODEL alone → Australian Passport (review).
    • c_NEW_ZEALAND_PASSPORT_ML_MODEL + m_NZ_PASSPORT_KEYWORDNew Zealand Passport Strict (auto-tag).
    • c_NEW_ZEALAND_PASSPORT_ML_MODEL alone → New Zealand Passport (review).
  • Unstructured rules combine the model feature with the keyword feature within a 5-word, order-strict proximity window:
    • rule_au_passportc_AUSTRALIA_PASSPORT_ML_MODEL + c_AU_PASSPORT_KEYWORDAU_PASSPORT.
    • rule_nz_passportc_NEW_ZEALAND_PASSPORT_ML_MODEL + c_NZ_PASSPORT_KEYWORDNZ_PASSPORT.
  • Use Discovery → Rules Mapping to confirm AU_PASSPORT and NZ_PASSPORT are mapped to their model feature keys for unstructured output.

Portal naming

Exact labels can vary by release. Search Tags, Dictionaries, and Rules for the names above.

Quick checklist

  • Enable the AU_PASSPORT and/or NZ_PASSPORT tags (as needed).
  • Enable the AUSTRALIA_PASSPORT_ML_MODEL and/or NEW_ZEALAND_PASSPORT_ML_MODEL models.
  • Enable the AU_PASSPORT_KEYWORD / NZ_PASSPORT_KEYWORD dictionaries and link tags on each.
  • Enable the AU / NZ structured rules (Strict + non-strict variants) under Discovery → Rules.
  • Enable the AU / NZ unstructured rules and confirm Rules Mapping under Discovery → Rules Mapping.
Australia and New Zealand Driver Licence

Australia and New Zealand Driver Licence

The product includes ANZ driver licence dictionaries and tags that pair with the Australian Driver Licence and New Zealand Driver Licence models. Enable these under Discovery → Dictionaries and Discovery → Tags when ANZ driver licence classification is required.

Tags and purpose

Tag What it detects
AU_DRIVER_LICENSE Australian driver licence numbers detected by AUSTRALIA_DRIVER_LICENSE_ML_MODEL; strict rules combine the model with AU_DRIVER_LICENSE_KEYWORD.
NZ_DRIVER_LICENSE New Zealand driver licence numbers detected by NEW_ZEALAND_DRIVER_LICENSE_ML_MODEL; strict rules combine the model with NZ_DRIVER_LICENSE_KEYWORD.

Dictionary keys and files

Dictionary key Match type Typical file Drives tag
AU_DRIVER_LICENSE_KEYWORD Fuzzy australia_driver_license_keyword.txt AU_DRIVER_LICENSE
NZ_DRIVER_LICENSE_KEYWORD Fuzzy nz_driver_license_keyword.txt NZ_DRIVER_LICENSE

In each dictionary, set Tags (input tags) so matches associate with the right AU_DRIVER_LICENSE / NZ_DRIVER_LICENSE tag.

Rules

  • Structured rules use column-level features:
    • c_AUSTRALIA_DRIVER_LICENSE_ML_MODEL + m_AU_DRIVER_LICENSE_KEYWORDAustralian Driver Licence Strict (ACTUAL_SCORE).
    • c_AUSTRALIA_DRIVER_LICENSE_ML_MODEL alone → Australian Driver Licence (review).
    • c_NEW_ZEALAND_DRIVER_LICENSE_ML_MODEL + m_NZ_DRIVER_LICENSE_KEYWORDNew Zealand Driver Licence Strict (ACTUAL_SCORE).
    • c_NEW_ZEALAND_DRIVER_LICENSE_ML_MODEL alone → New Zealand Driver Licence (review).
  • Unstructured rules combine the model feature with the keyword feature within a 5-word, order-strict proximity window:
    • rule_au_driver_licensec_AUSTRALIA_DRIVER_LICENSE_ML_MODEL + c_AU_DRIVER_LICENSE_KEYWORDAU_DRIVER_LICENSE.
    • rule_nz_driver_licensec_NEW_ZEALAND_DRIVER_LICENSE_ML_MODEL + c_NZ_DRIVER_LICENSE_KEYWORDNZ_DRIVER_LICENSE.
  • Use Discovery → Rules Mapping to confirm AU_DRIVER_LICENSE and NZ_DRIVER_LICENSE are mapped to their model feature keys for unstructured output.

NZ format overlap

The NZ driver licence format (2 letters + 6 digits, e.g. AB123456) is identical to the NZ passport format. Column-name keyword context (m_NZ_DRIVER_LICENSE_KEYWORD) is the primary differentiator in structured scans. In unstructured scans both the NZ_DRIVER_LICENSE and NZ_PASSPORT detectors may fire on the same value — this is expected behaviour.

Portal naming

Exact labels can vary by release. Search Tags, Dictionaries, and Rules for the names above.

Quick checklist

  • Enable the AU_DRIVER_LICENSE and/or NZ_DRIVER_LICENSE tags (as needed).
  • Enable the AUSTRALIA_DRIVER_LICENSE_ML_MODEL and/or NEW_ZEALAND_DRIVER_LICENSE_ML_MODEL models.
  • Enable the AU_DRIVER_LICENSE_KEYWORD / NZ_DRIVER_LICENSE_KEYWORD dictionaries and link tags on each.
  • Enable the AU / NZ structured rules (Strict + non-strict variants) under Discovery → Rules.
  • Enable the AU / NZ unstructured rules and confirm Rules Mapping under Discovery → Rules Mapping.
Australia and New Zealand Phone Numbers

Australia and New Zealand Phone Numbers

Tags and purpose

Tag Purpose
AU_PHONE_NUMBER Australian mobile and landline phone numbers detected by AUSTRALIA_PHONE_NUMBER_ML_MODEL.
NZ_PHONE_NUMBER New Zealand mobile and landline phone numbers detected by NEW_ZEALAND_PHONE_NUMBER_ML_MODEL.

Dictionary key and file

Dictionary key Type File Linked tag(s)
ANZ_PHONE_NUMBER_KEYWORD Fuzzy anz_phone_number_keyword.txt AU_PHONE_NUMBER, NZ_PHONE_NUMBER

A single shared dictionary covers both countries because column-name signals (phone, mobile, contact_number, etc.) are country-agnostic. Country is resolved by the content detectors (AUSTRALIA_PHONE_NUMBER_ML_MODEL / NEW_ZEALAND_PHONE_NUMBER_ML_MODEL), not the dictionary.

Rules and model dependency

  • Structured strict rules: combine the detector + the shared keyword:
    • c_AUSTRALIA_PHONE_NUMBER_ML_MODEL + m_ANZ_PHONE_NUMBER_KEYWORDAustralia Phone Number Strict (auto-tag).
    • c_NEW_ZEALAND_PHONE_NUMBER_ML_MODEL + m_ANZ_PHONE_NUMBER_KEYWORDNew Zealand Phone Number Strict (auto-tag).
  • Cross-family guards: each strict rule excludes the other country's content (hasNotKeys includes the opposite c_*_PHONE_NUMBER_ML_MODEL) so columns where both detectors match are not silently auto-classified.
  • Ambiguous rules: when both AU and NZ content detectors match the same column, two REVIEW-grade rules (Australia Phone Number Ambiguous, New Zealand Phone Number Ambiguous) emit AU and NZ tags for operator review.
  • Unstructured rules: rule_au_phone_keyword_unstruct / rule_nz_phone_keyword_unstruct apply when the keyword appears in free text near a matching phone-shaped value.

Quick checklist

  • Enable the AU_PHONE_NUMBER and/or NZ_PHONE_NUMBER tags (as needed).
  • Enable the AUSTRALIA_PHONE_NUMBER_ML_MODEL and/or NEW_ZEALAND_PHONE_NUMBER_ML_MODEL models.
  • Enable the shared ANZ_PHONE_NUMBER_KEYWORD dictionary and link both tags to it.
  • Enable the AU / NZ structured rules (Strict + Ambiguous variants) under Discovery → Rules.
  • Enable the AU / NZ unstructured rules under Discovery → Rules.
Australia Tax File Number (TFN)

Australia Tax File Number (TFN)

Tag and purpose

Tag Purpose
AU_TFN Australian Tax File Numbers detected by AUSTRALIA_TFN_ML_MODEL with mod-11 checksum.

Dictionary key and file

Dictionary key Type File Linked tag
AU_TFN_KEYWORD Fuzzy au_tfn_keyword.txt AU_TFN

Includes column-name terms such as TFN, tax_file_number, tax_file_no, au_tfn.

Rules and model dependency

  • Structured strict rule: c_AUSTRALIA_TFN_ML_MODEL + m_AU_TFN_KEYWORDAustralia TFN Strict (auto-tag).
  • Structured review rule: content-only detection — c_AUSTRALIA_TFN_ML_MODEL without the keyword → Australia TFN (REVIEW).
  • Unstructured rule: rule_au_tfn — content and keyword proximity in free text.
  • PIN disambiguation: Security PIN Strict excludes columns where c_AUSTRALIA_TFN_ML_MODEL also matches, so 8-digit values that pass the TFN mod-11 check fall through to AU_TFN (review) instead of auto-tagging as SECURITY_PIN.

Quick checklist

  • Enable the AU_TFN tag.
  • Enable the AUSTRALIA_TFN_ML_MODEL model.
  • Enable the AU_TFN_KEYWORD dictionary and link the tag.
  • Enable the structured rules (Strict + review variants) under Discovery → Rules.
  • Enable the unstructured rule_au_tfn rule.
Vehicle Identification Number (VIN)

Vehicle Identification Number (VIN)

Tag and purpose

Tag Purpose
VIN ISO 3779 17-character Vehicle Identification Numbers detected by the VIN model with mod-11 check digit.

Dictionary key and file

Dictionary key Type File Linked tag
VIN_KEYWORD Fuzzy vin_keyword.txt VIN

Includes column-name terms such as VIN, vehicle_id, chassis_no, vehicle_identification_number.

Rules and model dependency

  • Structured strict rule: c_VIN_ML_MODEL + m_VIN_KEYWORDVehicle Identification Number Strict (auto-tag).
  • Structured review rule: c_VIN_ML_MODEL content-only → Vehicle Identification Number (REVIEW).
  • Unstructured rule: rule_vin_keyword_unstruct for free-text scanning.

Quick checklist

  • Enable the VIN tag.
  • Enable the VIN_ML_MODEL model.
  • Enable the VIN_KEYWORD dictionary and link the tag.
  • Enable the structured rules (Strict + review variants) under Discovery → Rules.
  • Enable the unstructured rule_vin_keyword_unstruct rule.
ANZ Vehicle Number Plate

ANZ Vehicle Number Plate

Tag and purpose

Tag Purpose
ANZ_VEHICLE_NUMBER_PLATE Australian and New Zealand vehicle license plate / registration numbers detected by the ANZ plate model.

Dictionary key and file

Dictionary key Type File Linked tag
ANZ_VEHICLE_PLATE_KEYWORD Fuzzy anz_vehicle_plate_keyword.txt ANZ_VEHICLE_NUMBER_PLATE

Includes column-name terms such as plate, registration, rego, vehicle_plate, number_plate.

Rules and model dependency

  • Structured strict rule: c_ANZ_VEHICLE_NUMBER_PLATE_ML_MODEL + m_ANZ_VEHICLE_PLATE_KEYWORDANZ Vehicle Number Plate Strict (auto-tag).
  • Structured review rule: content-only — c_ANZ_VEHICLE_NUMBER_PLATE_ML_MODELANZ Vehicle Number Plate (REVIEW).
  • Unstructured rule: rule_anz_vehicle_plate_keyword_unstruct.
  • PIN disambiguation: Security PIN Strict excludes columns where c_ANZ_VEHICLE_NUMBER_PLATE_ML_MODEL also matches, so values that satisfy both the plate format and the 3–8 digit PIN pattern are routed to the plate review path rather than auto-tagged as SECURITY_PIN.

Quick checklist

  • Enable the ANZ_VEHICLE_NUMBER_PLATE tag.
  • Enable the ANZ_VEHICLE_NUMBER_PLATE_ML_MODEL model.
  • Enable the ANZ_VEHICLE_PLATE_KEYWORD dictionary and link the tag.
  • Enable the structured rules (Strict + review variants) under Discovery → Rules.
  • Enable the unstructured rule_anz_vehicle_plate_keyword_unstruct rule.
Security PIN

Security PIN

SECURITY_PIN is a pattern-only detector — it ships as a regex stored in p_pattern_config, not as a heuristic ML model. It surfaces ATM / debit PINs, CVV / CVC / CSC card-verification values, and short account-access passcodes.

Info

Unlike the ANZ identifiers above, Security PIN has no Java detector class with configurable parameters. The only knob is the regex itself, configured via the portal under Discovery → Patterns rather than Discovery → Models.

Tag and purpose

Tag Purpose
SECURITY_PIN ATM / debit PINs, CVV / CVC / CSC card-verification values, and short account-access passcodes (3–8 digits).

Pattern definition

SECURITY_PIN_PATTERN matches a 3- to 8-digit purely numeric value with a word boundary on either side:

Text Only
(?:[\p{Punct}&&[^\.-]]|\s|^)(\d{3,8})(?:[\p{Punct}&&[^\.-]]|\s|$)

Example matches: 1234, 5678, 123456, 12345678. The feature key emitted is c_SECURITY_PIN_PATTERN.

To narrow or widen the digit range, edit the regex in the portal — for traditional 4–6 digit ATM PINs, change \d{3,8} to \d{4,6}.

Dictionary key and file

Dictionary key Type File Linked tag
PIN_KEYWORD Fuzzy pin_keyword.txt SECURITY_PIN

Includes column-name terms such as pin, mpin, tpin, atm_pin, card_pin, passcode, cvv, cvc, csc, card_security_code, transaction_pin, auth_pin.

Note

Postal-code-style terms (e.g. pincode) are intentionally not included — they commonly label postal / ZIP fields in AU and IN usage, and including them would cause the Security PIN Strict rule to misfire on postal-code columns.

Rules and pattern dependency

Detection requires the pattern paired with PIN keyword context — there is no content-only Review fallback because short-numeric content alone produces unacceptable false-positive rates on years, ZIP codes, ages, and quantities.

  • Structured scansSecurity PIN Strict (ACTUAL_SCORE): c_SECURITY_PIN_PATTERN (content) + m_PIN_KEYWORD (column-name match).
  • Unstructured scansrule_security_pin_keyword_unstruct: c_SECURITY_PIN_PATTERN and c_PIN_KEYWORD appearing within five words of each other in free text (orderStrict: true, wordDistance: 5).

Mutual-exclusion guards

Security PIN Strict excludes columns where any of the following also match, so higher-specificity classifiers win on shared digit shapes:

Excluded feature Reason
m_CC_KEYWORD Credit-card-named columns belong to the CC tag.
m_ANZ_PHONE_NUMBER_KEYWORD Phone-named columns belong to AU / NZ phone tags.
m_US_ZIP_KEYWORD ZIP-named columns belong to the US_ZIP tag.
c_AUSTRALIA_TFN_ML_MODEL 8-digit values that pass the TFN mod-11 check belong to AU_TFN.
c_ANZ_VEHICLE_NUMBER_PLATE_ML_MODEL Values matching the plate format belong to ANZ_VEHICLE_NUMBER_PLATE.

Quick checklist

  • Enable the SECURITY_PIN tag (Tags).
  • Enable the SECURITY_PIN_PATTERN pattern under Discovery → Patterns.
  • Enable the PIN_KEYWORD dictionary and link the tag (Dictionaries).
  • Enable the Security PIN Strict rule and the unstructured rule rule_security_pin_keyword_unstruct under Discovery → Rules.
  • Confirm the mutual-exclusion guard list (hasNotKeys) is intact on the strict rule.

Best Practices

  • Use Fuzzy Match for unstructured or user-generated content where spelling or word variations may occur.
  • Use Exact Match for clean, controlled and consistent data like predefined lists or official terms.
  • Use Pattern Match for structured formats and identifiers such as IDs, email addresses, or phone numbers.

Conclusion

Dictionaries in Privacera Discovery provide flexibility and control in identifying sensitive information. By leveraging the appropriate match type—fuzzy, exact, or regex—you can improve detection accuracy and enhance your data governance policies.