Dataset Annotation Template
A template outlining dataset metadata to use as annotations for a synapse dataset entity.
Attribute | Description | Required | Valid Values |
---|---|---|---|
Component | A high-level attribute for grouping attributes into templates. | TRUE | |
program | Name of the funding program that supported the generation of data and associated files | TRUE | AMP RA/SLE, AMP AIM, Community Contribution |
project | A sub-level attribute of `program` specifying a research initiative working to investigate particular hypotheses. | TRUE | RA, SLE, AIM for RA, STAMP, LOCKIT, V-CoRT, ELLIPSS |
datasetType | High-level classification of dataset entity distinguishing between datasets compiled for a specific publication or as a general data resource. | TRUE | experimental, publication |
biospecimenType | A label indicating the biological material collected for experimentation and data collection. Where applicable, provide all types in a comma-separated list. | TRUE | none, urine, stool, whole blood, serum, plasma, PBMCs, total leukocytes, kidney biopsy, synovial fluid, skin biopsy, salivary gland, saliva, skin swab, synovial tissue, suction blister cells, suction blister fluid, uvea |
biospecimenSubtype | Biospecimen status before sample is processed into a scRNA-seq library. Several scRNA-seq technologies support a variety of sample processing methods which can introduces sources of technical variation. | FALSE | nuclei suspension, cell suspension, fresh tissue, frozen tissue, FFPE tissue, flow-sorted cells, PFA-fixed tissue |
diagnosis | A high-level classifier indicating the disease status of an individual. | TRUE | control, SLE, RA, At-Risk RA, vitiligo, dermatomyositis, PsO, PsA, scleroderma, SjD, LN, CLE |
acknowledgmentStatement | A Dataset-specific attribute specifying the path to the wiki subpage within the ARK Portal - backend project that contains the acknowledgement statement that must be included in publications using data from the given dataset as a stipulation of the conditions of use. | TRUE | syn26710600/wiki/619685 |
datasetDescription | A Dataset-specific attribute specifying the synID of the folder that contains a wiki write-up of the dataset description. This wiki content will be surfaced on the ARK Portal frontend site. | TRUE | |
ARKRelease | A Dataset-specific attribute specifying the ARK Portal release in which this dataset was first made available to the public. | TRUE | 1.0, 2.0, 2024.06.R1, 2024.07.R1, 2024.08.R1, 2024.09.R1, 2024.10.R1, 2024.12.R1, 2025.01.R1, 2025.02.R1, 2025.03.R1, 2025.04.R1, 2025.05.R1, 2025.06.R1, 2025.07.R1, 2025.08.R1, 2025.09.R1, 2025.10.R1, 2025.11.R1, 2025.12.R1 |
dataType | High-level classification of the type of data contained in the file, loosely related to the experimental method or biological entity that is being profiled. Select all that apply using a comma-delimited list, though in most cases only a single label is expected. For multimodal datasets with concomitant profiling of biospecimen include 'multimodal'. | TRUE | transcriptomics, immune repertoire profiling, proteomics, multimodal, epigenomics, genomics, metabolomics, lipidomics, microbiome, histology, immunostaining, cytometry |
dataSubtype | General classification to differentiate between omics profiling modalities. If N/A please select 'none'. Multiple selections can be provided in a comma-delimited list, however this is largely only expected in the context of Datasets and files that contain integrated experimental data spanning multiple types. | TRUE | bulk, pseudobulk, single-cell, single-nucleus, spatial, none |
assay | The technology used to generate the data in this file. For multimodal datasets with concomitant profiling of biospecimen select all assays that apply. e.g., the GEX files from a CITE-seq experiment should be labeled with both 'scRNASeq' and 'CITESeq'. | TRUE | scRNASeq, CyTOF, Xenium, Olink Explore HT, CITESeq, snRNASeq, snATACSeq, RNASeq, multiplexed ELISA, SNP array, imaging mass cytometry, H&E, ASAPSeq, CosMX, serial IHC, imaging mass spectrometry, LC-MS/MS, CE-MS, VDJSeq, scVDJSeq, feature barcode sequencing, SomaScan, WES, WGS, flow cytometry, NULISA |
publicationSynID | The synID of the corresponding Synapse entity that stores metadata about the publication. This is used to differentiate publication-specific files, often consisting of level 4 processed data and expanded subject metadata, in a publication dataset that also includes raw or minimally processed files from experimental datasets. This provides an easy way to distinguish and select for the publication-specific data from which the research findings were derived. When this attribute is used to annotate a Dataset it serves as a way to directly link the Dataset entity with the publication metadata stored in Synapse. | FALSE | |
associatedDataset | The synID of a Dataset entity. This serves to link other Synapse entities to Dataset entities. When used to annotate a publication Dataset this attribute should include the synID for an experimental Datasets from which the publication data was derived. Multiple synID can be specified using a comma-delimited list. | FALSE | |
associatedCodeURL | A URL to the repository where associated code is available. | FALSE | |
dbGapAccession | NIH policy requires large-scale human genomics studies to be registered in dbGap. This is a Dataset-specific attribute indicating the unique identifier (i.e., accession) of the corresponding study that is registered in dbGap. | FALSE | |
ImmPortAccession | Accession to corresponding information in ImmPort. | FALSE |