Using Patterns¶
In Privacera Discovery, a Pattern is a regex-based detector configured under Discovery → Patterns. Patterns match data values in file content and table columns. Each pattern produces a content feature key with the c_ prefix (for example c_AGE).
Rules combine pattern features with keyword dictionaries, models, or other features to assign tags. For unstructured scans, confirm Rules Mapping links the pattern feature key to the output tag.
Age Pattern¶
The AGE pattern detects person age values in healthcare, HR, and customer records where age is stored as a numeric value with or without a unit suffix. Tag: AGE.
Tip
The tag, pattern, and dictionary are disabled by default; rules are enabled by default. To detect age values in your data, enable the following in the portal:
- Tag: Enable the
AGEtag (Tags). - Pattern: Enable the AGE pattern (Patterns). The bundled pattern name is
AGE; the content feature key isc_AGE. - Dictionary: Enable the
AGE_KEYWORDdictionary if you use the strict rule (Dictionaries). See Age Keyword. - Rules: Ensure the Age Strict structured rule and the unstructured rule
rule_ageare present and enabled (Rules).
The pattern identifies age values based on the following criteria:
-
Format: Numeric values from 0 through 129 (inclusive), with an optional decimal portion, in one of these shapes:
Format Example With unit suffix 45 years old,32 yo,67 years,21 yrs,8 yr old,55 yo,10 y.o.Bare numeric 45,12.5,73Word boundaries prevent matching digit substrings inside longer numbers.
-
Structured rules: Only a strict rule ships by default — Age Strict requires both
c_AGE(pattern match) andm_AGE_KEYWORD(column-name keyword). There is no review fallback rule for structured data, because bare numeric values in unrelated columns (quantities, years, IDs) produce unacceptable false-positive rates without column-name context. -
Unstructured rules: The bundled rule
rule_agerequiresc_AGEandc_AGE_KEYWORDwithin a 5-word proximity window. Use Discovery → Rules Mapping to confirmc_AGEmaps toAGE.
| Pattern name | Apply for | Content feature key | Test value |
|---|---|---|---|
AGE | file_content, table_column | c_AGE | 45 years old |
Overlap with other detectors
Bare numeric age values can overlap with quantities, years, and other short numeric fields. Prefer the Age Strict structured rule (pattern + column keyword) rather than enabling pattern-only detection on wide tables. Review DOB / date-of-birth rules if your program classifies birth dates separately from age.
Quick checklist¶
- Enable the
AGEtag. - Enable the AGE pattern under Discovery → Patterns.
- Enable
AGE_KEYWORDdictionary and link theAGEtag. - Enable Age Strict structured rule and
rule_ageunstructured rule; confirm Rules Mapping forc_AGE.
Gender Pattern¶
The GENDER pattern detects gender or sex values commonly stored in demographic, healthcare, and HR datasets. Tag: GENDER.
Tip
The tag, pattern, and dictionary are disabled by default; rules are enabled by default. To detect gender values in your data, enable the following in the portal:
- Tag: Enable the
GENDERtag (Tags). - Pattern: Enable the GENDER pattern (Patterns). The bundled pattern name is
GENDER; the content feature key isc_GENDER. - Dictionary: Enable the
GENDER_KEYWORDdictionary if you use the strict rule (Dictionaries). See Gender Keyword. - Rules: Ensure the Gender Strict and Gender structured rules and the unstructured rule
rule_genderare present and enabled (Rules).
The pattern identifies gender values based on the following criteria:
-
Format: Single-letter codes and common full-word values, including:
Category Examples Single-letter codes M,F,X,U,O(case-insensitive)Full words male,female,non-binary,non binary,intersex,other,unspecified,unknown,transgender,genderqueer,agender,cisgenderDeclined to answer prefer not to say,prefer not to disclose,not applicableWord boundaries prevent matching gender tokens inside longer words.
-
Structured rules:
- Gender Strict —
c_GENDER+m_GENDER_KEYWORD→ high-confidence (ACTUAL_SCORE). - Gender —
c_GENDERwithoutm_GENDER_KEYWORD→ review score.
- Gender Strict —
-
Unstructured rules: The bundled rule
rule_genderrequiresc_GENDERandc_GENDER_KEYWORDwithin a 5-word proximity window. Use Discovery → Rules Mapping to confirmc_GENDERmaps toGENDER.
| Pattern name | Apply for | Content feature key | Test value |
|---|---|---|---|
GENDER | file_content, table_column | c_GENDER | female |
Quick checklist¶
- Enable the
GENDERtag. - Enable the GENDER pattern under Discovery → Patterns.
- Enable
GENDER_KEYWORDdictionary and link theGENDERtag. - Enable Gender Strict and Gender structured rules and
rule_genderunstructured rule; confirm Rules Mapping forc_GENDER.
Conclusion¶
Patterns provide regex-based value detection for formats that do not need a full heuristic model. Combine patterns with keyword dictionaries in rules to reduce false positives — especially for fields like age where bare numbers are ambiguous without column-name context.
- Previous topic: Using Dictionaries
- Next topic: Heuristic Analysis