Expunge
Prerequisites
- Ensure all prerequisites outlined in the prerequisites are completed.
- Before configuring the Expunge policy under a Data Zone, ensure you have followed the steps outlined here to create and manage a Data Zone.
Table of Contents¶
Section | Description |
---|---|
Introduction | Overview of Expunge and its significance |
Supported Connectors | List of supported connectors for Expunge |
Supported File Formats | File formats supported by Expunge |
Steps for Configuration | Step-by-step guide to configuring Expunge |
Expunge Policy Fields | Key fields used in the Expunge policy |
Validation of Expunge Policy | How Expunge policies are validated |
Introduction¶
The Expunge Policy removes entire records (rows) that contain sensitive information from the source data. This is done by matching the content of the source data against a lookup file. Any records identified for expunging are then moved to a quarantine location as specified in the policy.
Process Flow¶
The following steps must be performed to ensure the Expunge Policy operates as expected:
-
Source Data Scanning:
• The source data must be scanned first.
• During scanning, Discovery classifies the data and assigns Tags to relevant fields.
-
Record Matching with Lookup File:
• Each record in the source data is evaluated based on its tagged fields.
• Tagged fields in the source data are compared with the corresponding fields in the lookup file.
• If a tagged field’s value in the current record matches a value in the lookup file, that record is considered for further action.
-
Matching Conditions:
• If multiple tagged fields exist in a record, all of them must match corresponding values in the lookup file for the record to be considered for expunging.
• This follows an AND condition, meaning every specified tag must match for a record to qualify for expunging.
-
Lookup File Format & Field Matching:
• The lookup file must be in CSV format.
• The column headers in the lookup file are matched with the Tag names to identify the relevant fields for comparison.
• Extra fields in the lookup file are ignored unless their column headers match a Tag name.
Key Considerations¶
-
Scanning is mandatory before applying the Expunge Policy.
-
Only tagged fields are used for matching; other fields in the lookup file are ignored.
-
If more than one tag is specified in the “Search for Tags” field, all matching tagged fields in the record must match (AND condition) for the record to be considered for policy enforcement.
Note
The resource should be scanned before applying the Expunge policy. The Expunge policy does not gets applied during real-time or offline scans. It is applied when you choose re-evaluate option on the data zone resource.
Example
You can apply this policy to JDBC and FileSystem connectors. For example, when using AWS S3 as a FileSystem, if you have a resource like source_data_file.csv stored in an AWS S3 location and need to expunge sensitive information using the Expunge policy, follow these steps:
source_data_file.csv
Download source_data_file.csv or create with below sample data.
lookup.csv
Download lookup.csv or create with below sample data.
- Scan the
source_data_file.csv
file in the AWS S3 location using the outlined scanning steps, either through offline or real-time scanning. - Create a lookup file lookup.csv in .csv format under the AWS S3 location, ensuring that EMAIL and SSN is a column header.
- Configure a Data Zone and an Expunge policy by following the provided steps.
- Add the
source_data_file.csv
resource to the Data Zone, specifying the AWS S3 location, and initiate re-evaluation. - After approximately 60 seconds, verify that the records which has sensitive data in
source_data_file.csv
has been expunged to quarantine location and other records with no sensitive data will be seen under same landing location.
Note
- The column headers in the lookup file must match those in the scanned resource and align with the tags defined in the Expunge policy.
- The resource must be scanned before applying the Expunge policy.
- Expunge policy is not applied during real-time or offline scans; they are enforced only when the Re-evaluate option is selected for the Data Zone resource.
Supported Connectors¶
For a list of supported connectors, refer to Supported Connectors for Discovery Compliance Policies.
Supported File Formats¶
For a list of file formats supported by the Expunge policy, see Supported File Formats.
Steps for Configuration¶
To configure the Expunge policy, follow the steps outlined in Steps for Configuration.
Expunge Policy Fields¶
Field Name | Description |
---|---|
Name | Specifies the name of the Expunge policy. |
Type | Defines the policy type. For Expunge, select Expunge. |
Alert Level | Select the severity level of the alert High, Medium, or Low created after the policy is applied on resource. |
Description (Optional) | Provides details about the Expunge policy and its purpose. |
Status | Enables or disables the Expunge policy (enabled by default). |
Application | Select the connector where the Expunge policy will be enforced. For the above example, source_data_file.csv is an AWS S3 file, so select AWS S3. |
Lookup Application | Select the appropriate filesystem connector where lookup file is stored. For the above example, lookup.csv is an AWS S3 file, so select AWS S3. |
Lookup File Location | Provide the path to the lookup.csv file. The lookup file must be in .csv format, with header column names matching with the tag names. |
Archive Location (Optional) | Specify the archive location to store original copy of resource before data is expunged.. For file systems, an archive folder is created automatically. For JDBC connectors, an existing database or schema must be provided. |
Quarantine Location (Optional) | Specify the quarantine location to store expunged sensitive records. For file systems, an quarantine folder is created automatically. For JDBC connectors, an existing database or schema must be provided. |
Search for Tags | Specifies one or more tags used to identify sensitive data. In the above example, the tag EMAIL and SSN should be mentioned here. If multiple Tags are specified, then they all are matched(AND condition) with the corresponding source data record. |
Auto Run (Optional) | This feature is now deprecated and will be removed from the UI in future updates. Users are advised to updated policies as applicable. |
Note
- For connectors like Snowflake and Databricks Unity Catalog, use the [Db].[Schema].[Table] and [Catalog].[Schema].[Table] structures, respectively. When specifying the archive location for these connectors, ensure the format is [Db].[Schema] or [Catalog].[Schema], as applicable.
- Verify that the [Db].[Schema] or [Catalog].[Schema] exists prior to executing the RTP policy for JDBC connectors.
- The lookup file must be stored on a file storage system. While the policy can be applied to other types of connectors, the lookup file will always be saved on the file storage system.
Validation of Expunge Policy¶
After executing a Re-evaluate on the Data Zone, use the following steps to validate the Expunge policy:
- The policy is applied, and records matching the lookup file are moved to a file in the specified quarantine location.
- A copy of the original source file, prior to policy enforcement, is saved in the archive location.
- The source file is updated in place, retaining only non-sensitive records in the original location.
- The classification appears in the Data Inventory > Classifications section.
- Policy alerts appear on the Compliance > Alerts Dashboard and on the Data Inventory > Classifications > Resource Detail > ALERTS DETAILS tab.
Important
• When the Expunge policy is executed multiple times on the same dataset with identical configurations, the process relies on the lookup file to identify and remove matching records.
• During each execution, the policy overwrites previously quarantined and archived files in the respective locations.
• This ensures that only the most recent instance of expunged data is retained in quarantine and archive storage, replacing any earlier versions.
- Previous topic: RTP Policy