Skip to content

Using Patterns

In Privacera Discovery, a Pattern is a regex-based detector configured under Discovery → Patterns. Patterns match data values in file content and table columns. Each pattern produces a content feature key with the c_ prefix (for example c_AGE).

Rules combine pattern features with keyword dictionaries, models, or other features to assign tags. For unstructured scans, confirm Rules Mapping links the pattern feature key to the output tag.


Age Pattern

The AGE pattern detects person age values in healthcare, HR, and customer records where age is stored as a numeric value with or without a unit suffix. Tag: AGE.

Tip

The tag, pattern, and dictionary are disabled by default; rules are enabled by default. To detect age values in your data, enable the following in the portal:

  • Tag: Enable the AGE tag (Tags).
  • Pattern: Enable the AGE pattern (Patterns). The bundled pattern name is AGE; the content feature key is c_AGE.
  • Dictionary: Enable the AGE_KEYWORD dictionary if you use the strict rule (Dictionaries). See Age Keyword.
  • Rules: Ensure the Age Strict structured rule and the unstructured rule rule_age are present and enabled (Rules).

The pattern identifies age values based on the following criteria:

  • Format: Numeric values from 0 through 129 (inclusive), with an optional decimal portion, in one of these shapes:

    Format Example
    With unit suffix 45 years old, 32 yo, 67 years, 21 yrs, 8 yr old, 55 yo, 10 y.o.
    Bare numeric 45, 12.5, 73

    Word boundaries prevent matching digit substrings inside longer numbers.

  • Structured rules: Only a strict rule ships by default — Age Strict requires both c_AGE (pattern match) and m_AGE_KEYWORD (column-name keyword). There is no review fallback rule for structured data, because bare numeric values in unrelated columns (quantities, years, IDs) produce unacceptable false-positive rates without column-name context.

  • Unstructured rules: The bundled rule rule_age requires c_AGE and c_AGE_KEYWORD within a 5-word proximity window. Use Discovery → Rules Mapping to confirm c_AGE maps to AGE.

Pattern name Apply for Content feature key Test value
AGE file_content, table_column c_AGE 45 years old

Overlap with other detectors

Bare numeric age values can overlap with quantities, years, and other short numeric fields. Prefer the Age Strict structured rule (pattern + column keyword) rather than enabling pattern-only detection on wide tables. Review DOB / date-of-birth rules if your program classifies birth dates separately from age.

Quick checklist

  • Enable the AGE tag.
  • Enable the AGE pattern under Discovery → Patterns.
  • Enable AGE_KEYWORD dictionary and link the AGE tag.
  • Enable Age Strict structured rule and rule_age unstructured rule; confirm Rules Mapping for c_AGE.

Gender Pattern

The GENDER pattern detects gender or sex values commonly stored in demographic, healthcare, and HR datasets. Tag: GENDER.

Tip

The tag, pattern, and dictionary are disabled by default; rules are enabled by default. To detect gender values in your data, enable the following in the portal:

  • Tag: Enable the GENDER tag (Tags).
  • Pattern: Enable the GENDER pattern (Patterns). The bundled pattern name is GENDER; the content feature key is c_GENDER.
  • Dictionary: Enable the GENDER_KEYWORD dictionary if you use the strict rule (Dictionaries). See Gender Keyword.
  • Rules: Ensure the Gender Strict and Gender structured rules and the unstructured rule rule_gender are present and enabled (Rules).

The pattern identifies gender values based on the following criteria:

  • Format: Single-letter codes and common full-word values, including:

    Category Examples
    Single-letter codes M, F, X, U, O (case-insensitive)
    Full words male, female, non-binary, non binary, intersex, other, unspecified, unknown, transgender, genderqueer, agender, cisgender
    Declined to answer prefer not to say, prefer not to disclose, not applicable

    Word boundaries prevent matching gender tokens inside longer words.

  • Structured rules:

    • Gender Strictc_GENDER + m_GENDER_KEYWORD → high-confidence (ACTUAL_SCORE).
    • Genderc_GENDER without m_GENDER_KEYWORD → review score.
  • Unstructured rules: The bundled rule rule_gender requires c_GENDER and c_GENDER_KEYWORD within a 5-word proximity window. Use Discovery → Rules Mapping to confirm c_GENDER maps to GENDER.

Pattern name Apply for Content feature key Test value
GENDER file_content, table_column c_GENDER female

Quick checklist

  • Enable the GENDER tag.
  • Enable the GENDER pattern under Discovery → Patterns.
  • Enable GENDER_KEYWORD dictionary and link the GENDER tag.
  • Enable Gender Strict and Gender structured rules and rule_gender unstructured rule; confirm Rules Mapping for c_GENDER.

Conclusion

Patterns provide regex-based value detection for formats that do not need a full heuristic model. Combine patterns with keyword dictionaries in rules to reduce false positives — especially for fields like age where bare numbers are ambiguous without column-name context.