Dataset Annotation Template

A template outlining dataset metadata to use as annotations for a synapse dataset entity.

Attribute Description Required Valid Values
associatedCodeURL A URL to the repository where associated code is available. False
datasetType High-level classification of dataset entity distinguishing between datasets compiled for a specific publication or as a general data resource. True experimental, publication
ImmPortAccession Accession to corresponding information in ImmPort. False
project A sub-level attribute of `program` specifying a research initiative working to investigate particular hypotheses. True AIM for RA, ELLIPSS, LOCKIT, RA, SLE, STAMP, V-CoRT
biospecimenSubtype Biospecimen status before sample is processed into a scRNA-seq library. Several scRNA-seq technologies support a variety of sample processing methods which can introduces sources of technical variation. False FFPE tissue, PFA-fixed tissue, cell or tissue lysate, cell suspension, flow-sorted cells, fresh tissue, frozen tissue, nuclei suspension, supernatant
publicationSynID The synID of the corresponding Synapse entity that stores metadata about the publication. This is used to differentiate publication-specific files, often consisting of level 4 processed data and expanded subject metadata, in a publication dataset that also includes raw or minimally processed files from experimental datasets. This provides an easy way to distinguish and select for the publication-specific data from which the research findings were derived. When this attribute is used to annotate a Dataset it serves as a way to directly link the Dataset entity with the publication metadata stored in Synapse. False
dbGapAccession NIH policy requires large-scale human genomics studies to be registered in dbGap. This is a Dataset-specific attribute indicating the unique identifier (i.e., accession) of the corresponding study that is registered in dbGap, formatted as a compact URI, e.g., dbgap:phs003417.v2.p1 False
species The genus species of sample or subject origin. True Homo sapiens
acknowledgmentStatement A Dataset-specific attribute specifying the path to the wiki subpage within the ARK Portal - backend project that contains the acknowledgement statement that must be included in publications using data from the given dataset as a stipulation of the conditions of use. True syn26710600/wiki/619685
biospecimenType A label indicating the biological material collected for experimentation and data collection. Where applicable, provide all types in a comma-separated list. True PBMCs, cell line, fibroblast-like synoviocyte, kidney biopsy, none, plasma, primary cell culture, saliva, salivary gland, serum, skin biopsy, skin swab, stool, suction blister cells, suction blister fluid, synovial fluid, synovial tissue, total leukocytes, urine, uvea, whole blood
assay The technology used to generate the data in this file. For multimodal datasets with concomitant profiling of biospecimen select all assays that apply. e.g., the GEX files from a CITE-seq experiment should be labeled with both 'scRNASeq' and 'CITESeq'. True ASAPSeq, CE-MS, CITESeq, CosMX, CyTOF, GenePS SeqFISH, H&E, LC-MS/MS, NULISA, Olink Explore HT, Olink Flex, Olink Focus, Olink Reveal, Olink Target 48, Olink Target 96, RNASeq, SNP array, SomaScan, VDJSeq, Visium, WES, WGS, Xenium, feature barcode sequencing, flow cytometry, imaging mass cytometry, imaging mass spectrometry, kiloplex, multiplexed ELISA, scRNASeq, scVDJSeq, serial IHC, snATACSeq, snRNASeq
program Name of the funding program that supported the generation of data and associated files True AMP AIM, AMP RA/SLE, Community Contribution
datasetDescription A Dataset-specific attribute specifying the synID of the folder that contains a wiki write-up of the dataset description. This wiki content will be surfaced on the ARK Portal frontend site. True
associatedDataset The synID of a Dataset entity. This serves to link other Synapse entities to Dataset entities. When used to annotate a publication Dataset this attribute should include the synID for an experimental Datasets from which the publication data was derived. Multiple synID can be specified using a comma-delimited list. False
Component A high-level attribute for grouping attributes into templates. True
diagnosis A high-level classifier indicating the disease status of an individual. True At-Risk RA, OA, RA, SLE, Sjogren's disease, control, cutaneous lupus erythematosus, dermatomyositis, lupus nephritis, psoriasis, psoriatic arthritis, scleroderma, unknown, vitiligo
datasetStatus A categorical label indicating the status of an ARK Portal dataset. This is applied to improve downstream management of datasets as well as various ETL workflows. True deprecated, released, test, under peer review, unreleased
dataType High-level classification of the type of data contained in the file, loosely related to the experimental method or biological entity that is being profiled. Select all that apply using a comma-delimited list, though in most cases only a single label is expected. For multimodal datasets with concomitant profiling of biospecimen include 'multimodal'. True cytometry, epigenomics, genomics, histology, immune repertoire profiling, immunostaining, lipidomics, metabolomics, microbiome, multimodal, proteomics, transcriptomics
dataSubtype General classification to differentiate between omics profiling modalities. If N/A please select 'none'. Multiple selections can be provided in a comma-delimited list, however this is largely only expected in the context of Datasets and files that contain integrated experimental data spanning multiple types. True bulk, none, pseudobulk, single-cell, single-nucleus, spatial
ARKRelease A Dataset-specific attribute specifying the ARK Portal release in which this dataset was first made available to the public. True 1.0, 2.0, 2024.06.R1, 2024.07.R1, 2024.08.R1, 2024.09.R1, 2024.10.R1, 2024.12.R1, 2025.01.R1, 2025.02.R1, 2025.03.R1, 2025.04.R1, 2025.05.R1, 2025.06.R1, 2025.07.R1, 2025.08.R1, 2025.09.R1, 2025.10.R1, 2025.11.R1, 2025.12.R1
programPhase A label noting which AMP RA/SLE program phase generated the data. True I, II