scRNASeq Processed Data Annotation Template
A data contributor template outlining metadata to be collected as file annotations for scRNA-seq processed data files (i.e., anything not a fastq file).
| Attribute | Description | Required | Valid Values |
|---|---|---|---|
| fileFormat | Standard file format name or file extension | TRUE | csv, tsv, txt, xlsx, xls, fam, bim, bed, bam, h5, mtx, bai, rds, tgz, zip, h5ad |
| dataLevel | Level of data processing applied to file. Levels refer to pre-defined standards of processing for the given assay. | True | 1, 2, 3, 4, 5 |
| Component | A high-level attribute for grouping attributes into templates. | True | |
| resourceType | High-level classification of the file content | TRUE | experimental data, metadata |
| specimenModality | Label assigned to experimental data files indicating whether the data contained corresponds to a single or multiple biospecimens | True | multispecimen, single specimen |
| assay | The technology used to generate the data in this file. For multimodal datasets with concomitant profiling of biospecimen select all assays that apply. e.g., the GEX files from a CITE-seq experiment should be labeled with both 'scRNASeq' and 'CITESeq'. | True | ASAPSeq, CE-MS, CITESeq, CosMX, CyTOF, GenePS SeqFISH, H&E, LC-MS/MS, NULISA, Olink Explore HT, Olink Flex, Olink Focus, Olink Reveal, Olink Target 48, Olink Target 96, RNASeq, SNP array, SomaScan, VDJSeq, Visium, WES, WGS, Xenium, feature barcode sequencing, flow cytometry, imaging mass cytometry, imaging mass spectrometry, kiloplex, multiplexed ELISA, scRNASeq, scVDJSeq, serial IHC, snATACSeq, snRNASeq |
| cellRangerOutput | 10x Genomics Cell Ranger software output several different counts results and formats, some with different processing applied. This label distinguishes between these types and is particularly helpful when multiple files are uploaded with the sample name, e.g., barcodes.tsv.gz | True | Not Applicable, filtered MEX, filtered_feature_bc_matrix, filtered_peak_bc_matrix, raw MEX, raw_feature_bc_matrix, raw_peak_bc_matrix |
| RObjectClass | Rds files store R objects, one per file. This label details the class of the R object saved to the Rds file or other similar file types. | False | ROCR prediction.object, Seurat object, SummarizedExperiment, Symphony reference, data.frame, list, matrix, sparse matrix, vector |
| processedDataType | A label used for file annotations to provide a brief description of the processed data file. | False | barcode counts, differential expression results, epigenomic peaks, gene counts |
| metadataType | A label further classifying the content of metadata resource. | TRUE | single-cell metadata |
| individualID | Unique identifier assigned to each study participant. For multi-specimen data files provide all IDs in a comma-separated list. | True | |
| biospecimenID | A unique identifier assigned to specimens collected from study participants. For multi-specimen data files provide all IDs in a comma-separated list. | True | |
| targetPanelSize | The number of gene, transcript, protein, etc., targets profiled in the assay for assays that use a pre-defined set of probes, antibodies, etc., to measure biological components in samples. The input value is expected to be a whole integer that matches the number of targets described in the accompanying target panel metadata (i.e., targetPanelSynID). | False | |
| targetPanelSynID | In most cases an accompanying metadata file should be provided that details information about the targets profiled in the experiment. This attribute links experimental data files to the target panel metadata via the synapse ID of that file. | True | |
| targetPanel | A unique or established human-readable name assigned to the panel of targets profiled in the experiment. For example, the panel of antibodies and corresponding fluorophores used in a flow cytometry experiment or panel used in a Xenium spatial transcriptomics experiment. | True |