Skip to content

Overview of Supported Data Sources

Privacera Discovery's key feature is scanning your data for sensitive information, providing tagged classifications for analysis and refinement.

Privacera Discovery crawls targeted data sources to identify and applies metadata labels called tags to potentially sensitive data, such as credit card numbers or email addresses.

Access Manager Tag Policies can then be created so that user access can be controlled and monitored.

Connecting Data sources to Discovery

In cse of PrivaceraCloud: You can configure the applications supported by Privacera Discovery in Settings > Applications.

In case of Self Managed and Data Plane: Below are the steps to create a System and add the relevant Data Sources in it.

  1. Log in to Privacera Self Managed or Data Plane portal.

  2. Go to Settings > Data Source Registration.

  3. Click ADD SYSTEM.

  4. Enter the System name and description (optional) in the Name and Description field respectively.

  5. Click SAVE.

Data sources Connectors supported by Privacera Discovery

Category Data Source Connectors
Databases Vertica, Databricks Unity Catalog, Databricks Spark SQL, Amazon Redshift, Apache Hive, Apache HBase, Apache Kudu, Apache Phoenix, Azure SQL Database, Google BigQuery, Microsoft SQL Server, Synapse, MySQL, Oracle, PostgreSQL, Snowflake, Trino, Teradata
Cloud Storage Amazon S3, Azure Data Lake Storage (ADLS), Google Cloud Storage (GCS)
File Systems HDFS, Local Filesystem, NFS, SFTP

Supported File Formats for Discovery Scans

Type Formats
Structured Data .avro, .avro (nested), .csv, .html, .json, .json (nested), .orc, .parquet, .parquet (nested), .sas, .tsv, .xls, .xlsx, .xml
Compressed/Archive Data .gzip (single or multiple files), .gz (single or multiple files), .lzo/.lzop, .jar (single or multiple files), .tar.gz (single or multiple files), .snappy.parquet, .snappy.orc, .snappy.avro, .zip (single or multiple files), .zlib.orc, .zlib.parquet, .zlib.avro
Unstructured Data .dat, .doc, .docx, .pdf, .txt
Media Data For metadata extraction only: .jpeg, .mp4, .mpeg
Database Data For Database binary type columns is skipped and only non-binary columns data is scanned:

Comments