Dataset Annotation Template

A template outlining dataset metadata to use as annotations for a synapse dataset entity.

Attribute	Description	Required	Valid Values
ARKRelease	A Dataset-specific attribute specifying the ARK Portal release in which this dataset was first made available to the public.	TRUE	1.0, 2.0, 2024.06.R1, 2024.07.R1, 2024.08.R1, 2024.09.R1, 2024.10.R1, 2024.12.R1, 2025.01.R1, 2025.02.R1, 2025.03.R1, 2025.04.R1, 2025.05.R1, 2025.06.R1, 2025.07.R1, 2025.08.R1, 2025.09.R1, 2025.10.R1, 2025.11.R1, 2025.12.R1
Component	A high-level attribute for grouping attributes into templates.	TRUE
ImmPortAccession	Accession to corresponding information in ImmPort.	FALSE
acknowledgmentStatement	A Dataset-specific attribute specifying the path to the wiki subpage within the ARK Portal - backend project that contains the acknowledgement statement that must be included in publications using data from the given dataset as a stipulation of the conditions of use.	TRUE	syn26710600/wiki/619685
assay	The technology used to generate the data in this file. For multimodal datasets with concomitant profiling of biospecimen select all assays that apply. e.g., the GEX files from a CITE-seq experiment should be labeled with both 'scRNASeq' and 'CITESeq'.	TRUE	ASAPSeq, CE-MS, CITESeq, CosMX, CyTOF, GenePS SeqFISH, H&E, LC-MS/MS, NULISA, Olink Explore HT, Olink Flex, Olink Focus, Olink Reveal, Olink Target 48, Olink Target 96, RNASeq, SNP array, SomaScan, VDJSeq, Visium, WES, WGS, Xenium, feature barcode sequencing, flow cytometry, imaging mass cytometry, imaging mass spectrometry, kiloplex, multiplexed ELISA, scRNASeq, scVDJSeq, serial IHC, snATACSeq, snRNASeq
associatedAccession	This is a File and Dataset annotation attribute indicating additional accessions (i.e., unique identifiers) associated with the data when the data has also been submitted to or can be found in other repositories such as GEO, SRA, dbGaP, etc.	FALSE
associatedCodeURL	A URL to the repository where associated code is available.	FALSE
associatedDataset	The synID of a Dataset entity. This serves to link other Synapse entities to Dataset entities. When used to annotate a publication Dataset this attribute should include the synID for an experimental Datasets from which the publication data was derived. Multiple synID can be specified using a comma-delimited list.	FALSE
biospecimenSubtype	Biospecimen status before sample is processed into a scRNA-seq library. Several scRNA-seq technologies support a variety of sample processing methods which can introduces sources of technical variation.	FALSE	FFPE tissue, PFA-fixed tissue, cell or tissue lysate, cell suspension, flow-sorted cells, fresh tissue, frozen tissue/fluid, nuclei suspension, supernatant
biospecimenType	A label indicating the biological material collected for experimentation and data collection. Where applicable, provide all types in a comma-separated list.	TRUE	PBMCs, cell line, kidney biopsy, none, plasma, primary cell culture, saliva, salivary gland, serum, skin biopsy, skin swab, stool, suction blister cells, suction blister fluid, synovial fluid, synovial tissue, total leukocytes, urine, uvea, whole blood
dataSubtype	General classification to differentiate between omics profiling modalities. If N/A please select 'none'. Multiple selections can be provided in a comma-delimited list, however this is largely only expected in the context of Datasets and files that contain integrated experimental data spanning multiple types.	TRUE	bulk, none, pseudobulk, single-cell, single-nucleus, spatial
dataType	High-level classification of the type of data contained in the file, loosely related to the experimental method or biological entity that is being profiled. Select all that apply using a comma-delimited list, though in most cases only a single label is expected. For multimodal datasets with concomitant profiling of biospecimen include 'multimodal'.	TRUE	cytometry, epigenomics, genomics, histology, immune repertoire profiling, immunostaining, lipidomics, metabolomics, microbiome, multimodal, proteomics, transcriptomics
datasetDescription	A Dataset-specific attribute specifying the synID of the folder that contains a wiki write-up of the dataset description. This wiki content will be surfaced on the ARK Portal frontend site.	TRUE
datasetStatus	A categorical label indicating the status of an ARK Portal dataset. This is applied to improve downstream management of datasets as well as various ETL workflows.	TRUE	deprecated, released, test, under peer review, unreleased
datasetType	High-level classification of dataset entity distinguishing between datasets compiled for a specific publication or as a general data resource.	TRUE	experimental, publication
diagnosis	A high-level classifier indicating the disease status of an individual.	TRUE	At-Risk RA, Not Applicable, OA, RA, SLE, Sjogren's disease, control, cutaneous lupus erythematosus, dermatomyositis, discoid lupus erythematosus, lupus nephritis, psoriasis, psoriatic arthritis, scleroderma, unknown, vitiligo
program	Name of the funding program that supported the generation of data and associated files	TRUE	AMP AIM, AMP RA/SLE, Community Contribution
programPhase	A label noting which AMP RA/SLE program phase generated the data.	TRUE	I, II
project	A sub-level attribute of `program` specifying a research initiative working to investigate particular hypotheses.	TRUE	AIM for RA, EDP1, EDP2, ELLIPSS, LOCKIT, METRO, RA, SLE, STAMP, UMass V-CoRT
publicationSynID	The synID of the corresponding Synapse entity that stores metadata about the publication. This is used to differentiate publication-specific files, often consisting of level 4 processed data and expanded subject metadata, in a publication dataset that also includes raw or minimally processed files from experimental datasets. This provides an easy way to distinguish and select for the publication-specific data from which the research findings were derived. When this attribute is used to annotate a Dataset it serves as a way to directly link the Dataset entity with the publication metadata stored in Synapse.	FALSE
species	The genus species of sample or subject origin.	TRUE	Homo sapiens