Dataset Annotation Template

A template outlining dataset metadata to use as annotations for a synapse dataset entity.

Attribute Description Required Valid Values
Component A high-level attribute for grouping attributes into templates. TRUE
program Name of the funding program that supported the generation of data and associated files TRUE AMP RA/SLE, AMP AIM, Community Contribution
project A sub-level attribute of `program` specifying a research initiative working to investigate particular hypotheses. TRUE RA, SLE, AIM for RA, STAMP, LOCKIT, V-CoRT, ELLIPSS
datasetType High-level classification of dataset entity distinguishing between datasets compiled for a specific publication or as a general data resource. TRUE experimental, publication
biospecimenType A label indicating the biological material collected for experimentation and data collection. Where applicable, provide all types in a comma-separated list. TRUE none, urine, stool, whole blood, serum, plasma, PBMCs, total leukocytes, kidney biopsy, synovial fluid, skin biopsy, salivary gland, saliva, skin swab, synovial tissue, suction blister cells, suction blister fluid, uvea
biospecimenSubtype Biospecimen status before sample is processed into a scRNA-seq library. Several scRNA-seq technologies support a variety of sample processing methods which can introduces sources of technical variation. FALSE nuclei suspension, cell suspension, fresh tissue, frozen tissue, FFPE tissue, flow-sorted cells, PFA-fixed tissue
diagnosis A high-level classifier indicating the disease status of an individual. TRUE control, SLE, RA, At-Risk RA, vitiligo, dermatomyositis, PsO, PsA, scleroderma, SjD, LN, CLE
acknowledgmentStatement A Dataset-specific attribute specifying the path to the wiki subpage within the ARK Portal - backend project that contains the acknowledgement statement that must be included in publications using data from the given dataset as a stipulation of the conditions of use. TRUE syn26710600/wiki/619685
datasetDescription A Dataset-specific attribute specifying the synID of the folder that contains a wiki write-up of the dataset description. This wiki content will be surfaced on the ARK Portal frontend site. TRUE
ARKRelease A Dataset-specific attribute specifying the ARK Portal release in which this dataset was first made available to the public. TRUE 1.0, 2.0, 2024.06.R1, 2024.07.R1, 2024.08.R1, 2024.09.R1, 2024.10.R1, 2024.12.R1, 2025.01.R1, 2025.02.R1, 2025.03.R1, 2025.04.R1, 2025.05.R1, 2025.06.R1, 2025.07.R1, 2025.08.R1, 2025.09.R1, 2025.10.R1, 2025.11.R1, 2025.12.R1
dataType High-level classification of the type of data contained in the file, loosely related to the experimental method or biological entity that is being profiled. Select all that apply using a comma-delimited list, though in most cases only a single label is expected. For multimodal datasets with concomitant profiling of biospecimen include 'multimodal'. TRUE transcriptomics, immune repertoire profiling, proteomics, multimodal, epigenomics, genomics, metabolomics, lipidomics, microbiome, histology, immunostaining, cytometry
dataSubtype General classification to differentiate between omics profiling modalities. If N/A please select 'none'. Multiple selections can be provided in a comma-delimited list, however this is largely only expected in the context of Datasets and files that contain integrated experimental data spanning multiple types. TRUE bulk, pseudobulk, single-cell, single-nucleus, spatial, none
assay The technology used to generate the data in this file. For multimodal datasets with concomitant profiling of biospecimen select all assays that apply. e.g., the GEX files from a CITE-seq experiment should be labeled with both 'scRNASeq' and 'CITESeq'. TRUE scRNASeq, CyTOF, Xenium, Olink Explore HT, CITESeq, snRNASeq, snATACSeq, RNASeq, multiplexed ELISA, SNP array, imaging mass cytometry, H&E, ASAPSeq, CosMX, serial IHC, imaging mass spectrometry, LC-MS/MS, CE-MS, VDJSeq, scVDJSeq, feature barcode sequencing, SomaScan, WES, WGS, flow cytometry, NULISA
publicationSynID The synID of the corresponding Synapse entity that stores metadata about the publication. This is used to differentiate publication-specific files, often consisting of level 4 processed data and expanded subject metadata, in a publication dataset that also includes raw or minimally processed files from experimental datasets. This provides an easy way to distinguish and select for the publication-specific data from which the research findings were derived. When this attribute is used to annotate a Dataset it serves as a way to directly link the Dataset entity with the publication metadata stored in Synapse. FALSE
associatedDataset The synID of a Dataset entity. This serves to link other Synapse entities to Dataset entities. When used to annotate a publication Dataset this attribute should include the synID for an experimental Datasets from which the publication data was derived. Multiple synID can be specified using a comma-delimited list. FALSE
associatedCodeURL A URL to the repository where associated code is available. FALSE
dbGapAccession NIH policy requires large-scale human genomics studies to be registered in dbGap. This is a Dataset-specific attribute indicating the unique identifier (i.e., accession) of the corresponding study that is registered in dbGap. FALSE
ImmPortAccession Accession to corresponding information in ImmPort. FALSE