fileFormat
Standard file format name or file extension
Valid value descriptions created by ChatGPT 4o
Valid Values | Description | Source |
---|---|---|
bai | A BAM index file used in bioinformatics. It accompanies a BAM file (Binary Alignment/Map) and allows for rapid access to specific portions of the BAM file without reading through the entire dataset. | file.org |
bam | The compressed binary version of the Sequence Alignment/Map (SAM) format. It is used for storing large amounts of nucleotide sequence alignment data efficiently. | Wikipedia |
bed | Browser Extensible Data is a text file format used to define data lines that are displayed in an annotation track within the genome browser. Each line describes a genomic region with fields including chromosome, start position, end position, and optionally, name, score, and strand. | Wikipedia |
bim | A .bim file is part of the PLINK binary format used in genomics. It contains extended variant information accompanying a .bed binary genotype table, including fields such as chromosome code, variant identifier, genetic distance, base-pair coordinate, and allele information. | plink.readthedocs.io |
csv | Comma-Separated Values file is a plain text file that contains data separated by commas. Each line in the file corresponds to a data record, and each record consists of fields separated by commas. CSV files are commonly used for data exchange between different applications. | general knowledge |
docx | A .docx file is a Microsoft Word Open XML Format Document file. It is a zipped, XML-based file format developed by Microsoft for representing word processing documents. DOCX files can contain text, images, tables, and other document elements. | general knowledge |
dose | context dependent | |
erate | context dependent | |
fam | PLINK binary format used for genomics data. It contains sample information accompanying a .bed binary genotype table, including fields such as family ID, individual ID, paternal ID, maternal ID, sex, and phenotype. | plink.readthedocs.io |
fastq | A text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. It is commonly used in bioinformatics for storing high-throughput sequencing data. | Wikipedia |
fcs | Flow Cytometry Standard file is a standard format for flow cytometry data. It contains information about the physical and chemical properties of particles or cells as they flow in a fluid stream through a beam of light. | general knowledge |
geojson | A format for encoding a variety of geographic data structures using JavaScript Object Notation (JSON). It supports point, line, polygon, and multi-part collections of these types, making it useful for web-based mapping and GIS applications. | general knowledge |
h5 | Hierarchical Data Format version 5 (HDF5) file. It is used to store large amounts of data, including complex data types, and is commonly used in scientific computing. | general knowledge |
h5ad | ||
info | A generic file extension that can refer to various types of information files, often containing metadata or documentation related to a specific application or dataset. | general knowledge |
mtx | Matrix Market file format used to represent sparse matrices. It is commonly used in scientific computing to exchange matrix data. | general knowledge |
parquet | A columnar storage file format optimized for use with big data processing frameworks. It is efficient for both storage and retrieval, supporting complex nested data structures. | general knowledge |
Portable Document Format file is a file format developed by Adobe that presents documents independently of software, hardware, or operating systems. It can contain text, images, and vector graphics. | general knowledge | |
rds | An R data file used to store a single R object. It is used in the R programming environment for saving and restoring R objects. | general knowledge |
rec | This file extension is used by various applications to denote a recording file, which may contain audio, video, or other recorded data. The specific format can vary depending on the application that created it. | general knowledge |
tar.gz | A compressed archive file created by combining files into a single archive using the tar command and then compressing it using Gzip compression. It is commonly used in Unix and Linux environments for file distribution and backup. | general knowledge |
tbi | Tabix index file used in bioinformatics. It indexes .vcf.gz files (compressed Variant Call Format) to allow rapid retrieval of data lines overlapping regions of interest. | general knowledge |
tgz | A compressed archive file equivalent to a .tar.gz file. It is created by combining files into a single archive using the tar command and then compressing it using Gzip compression. | general knowledge |
tsv | Tab-Separated Values file is similar to a CSV file, but it uses tabs instead of commas to separate values. It is commonly used for representing structured data. | general knowledge |
txt | Text file is a standard text document that contains unformatted text. It is recognized by any text editing or word processing program. | general knowledge |
vcf | Variant Call Format file is used in bioinformatics for storing gene sequence variations. It is a text file format containing meta-information lines, a header line, and data lines. | general knowledge |
xls | A Microsoft Excel file format used to store spreadsheets. It supports tabular data, formulas, charts, and more. It has been largely replaced by the .xlsx format. | general knowledge |
xlsx | A Microsoft Excel Open XML Format Spreadsheet file. It is a zipped, XML-based file format used to represent spreadsheets. | general knowledge |
zip | An archive file format that supports lossless data compression. It can contain one or more compressed files or directories. | general knowledge |