Skip to content

Expunge Policy

With lookup data, sensitive information such as usernames and email ids are removed. Such information is moved into a quarantine folder.

The fields in the lookup file are compared to the records in the resource files. If the tag is found (the value in the lookup file matches the value in the resource file for the specified tag (Search for tags)), then the field value in the resource file will be deleted. Ensure that the header of the lookup file matches the header of the tag to be searched.

Note

The resource file should be scanned before applying the Expunge policy. The Expunge policy does not work on real-time or offline scans.

Supported Data Sources

The following data sources are supported for the Expunge policy. Click the tab to display the data sources that are supported in the cloud.

  • S3
  • Snowflake
  • Redshift
  • AuroraDB Postgres
  • AuroraDB MySQL
  • PostgreSQL
  • MSSQL Server Synapse
  • GCS

Supported File Formats

For the supported file formats on which the policy can be applied, see Matrix for Supported File Formats.

The following fields are included in the Expunge policy:

  • Name: This field indicates the name of the Expunge Policy.

  • Type: This field indicates the type of policy.

  • Alert Level: This field indicates the level of alert: High, Medium or Low.

  • Description: This field contains the description of the Expunge Policy.

  • Status: This field indicates the policy is enabled or disabled. It is enabled by default.

  • Application: This field specifies the data source from which the scanned resources can be accessed and where the Expunge policy will be applied.

  • Lookup Application: This field specifies the name of the data source containing lookup file. The lookup file should be in .csv format, with tag names in the header columns.

  • Lookup File Location: This field specifies the location where a lookup file is kept.

  • Quarantine Location: This field specifies the location where the removed data from the input file will be stored.

    Some applications such as Snowflake and Presto SQL follow the [Db].[Schema].[Table] hierarchy. You need to provide the Quarantine location in the correct format [Db].[Schema] for these applications.

  • Archive Location (Optional): This field specifies the location where a copy of the original file is stored before any tagged records are removed from it.

    Some applications such as Snowflake and Presto SQL follow the [Db].[Schema].[Table] hierarchy. You need to provide the Archive location in the correct format [Db].[Schema] for these applications.

  • Search for tags: The tags specified in this field help in identifying or classifying the data to be removed.

  • Auto Run: If this feature is enabled, the Expunge policy is applied after a specified time interval.

Here is an example of the Expunge Policy:

  • Lookup File Location: Add a .csv file to the Lookup File Location field, and it should specify which sensitive data needs to be removed from resources based on tags. For example: File name is input.csv file with EMAIL tag (sample@gmail.com).

  • When the file is being scanned, if “sample@gmail.com” tagged with EMAIL is matched, then this row will be removed.

Consider the following:

  1. The following test_file.csv is added in the data zone.

    We have added Search for as EMAIL tag.

  2. Next, the scheduler will be triggered and system will apply the Expunge policy on the resource (test_file.csv) which we have attached to the Data Zone.

  3. After applying the Expunge policy, the row 'alex' will be moved to the specified Quarantine Location.

  4. Now, the final test_file.csv will not have the row with 'alex'. The Expunge policy will remove the entire row from the test_file.csv.