Scannable File Formats#
Discovery can scan the following file formats:
-
Structured data with taggable content and metadata:
- .csv
- .tsv
- .json
- .parquet
- .orc
- .avro
- .avro (nested)
- .parquet (nested)
- .json (nested)
- .sas
- .xml
- .html
-
Compressed/archive data with taggable content and metadata:
- .snappy.parquet
- .snappy.orc
- .snappy.avro
- .zlib.orc
- .zlib.parquet
- .zlib.avro
- .gzip (single or multiple files)
- .zip (single or multiple files)
- .jar (single or multiple files)
- .tar.gz (single or multiple files)
- .gz (single or multiple files)
- .lzo/.lzop
-
Unstructured data with taggable content and metadata:
- .txt
- .dat
- .xls
- .xlsx
- .doc
- .docx
-
Media data with taggable metadata. For the following file formats, Discovery only supports metadata extraction:
- .jpeg
- .mp4
- .mpeg
Last update: April 19, 2022