Dataset Annotation Template
A template outlining dataset metadata to use as annotations for a synapse dataset entity.
| Attribute | Description | Required | Valid Values |
|---|---|---|---|
| associatedCodeURL | A URL to the repository where associated code is available. | False | |
| datasetType | High-level classification of dataset entity distinguishing between datasets compiled for a specific publication or as a general data resource. | True | experimental, publication |
| ImmPortAccession | Accession to corresponding information in ImmPort. | False | |
| project | A sub-level attribute of `program` specifying a research initiative working to investigate particular hypotheses. | True | AIM for RA, ELLIPSS, LOCKIT, RA, SLE, STAMP, V-CoRT |
| biospecimenSubtype | Biospecimen status before sample is processed into a scRNA-seq library. Several scRNA-seq technologies support a variety of sample processing methods which can introduces sources of technical variation. | False | FFPE tissue, PFA-fixed tissue, cell or tissue lysate, cell suspension, flow-sorted cells, fresh tissue, frozen tissue, nuclei suspension, supernatant |
| publicationSynID | The synID of the corresponding Synapse entity that stores metadata about the publication. This is used to differentiate publication-specific files, often consisting of level 4 processed data and expanded subject metadata, in a publication dataset that also includes raw or minimally processed files from experimental datasets. This provides an easy way to distinguish and select for the publication-specific data from which the research findings were derived. When this attribute is used to annotate a Dataset it serves as a way to directly link the Dataset entity with the publication metadata stored in Synapse. | False | |
| dbGapAccession | NIH policy requires large-scale human genomics studies to be registered in dbGap. This is a Dataset-specific attribute indicating the unique identifier (i.e., accession) of the corresponding study that is registered in dbGap, formatted as a compact URI, e.g., dbgap:phs003417.v2.p1 | False | |
| species | The genus species of sample or subject origin. | True | Homo sapiens |
| acknowledgmentStatement | A Dataset-specific attribute specifying the path to the wiki subpage within the ARK Portal - backend project that contains the acknowledgement statement that must be included in publications using data from the given dataset as a stipulation of the conditions of use. | True | syn26710600/wiki/619685 |
| biospecimenType | A label indicating the biological material collected for experimentation and data collection. Where applicable, provide all types in a comma-separated list. | True | PBMCs, cell line, fibroblast-like synoviocyte, kidney biopsy, none, plasma, primary cell culture, saliva, salivary gland, serum, skin biopsy, skin swab, stool, suction blister cells, suction blister fluid, synovial fluid, synovial tissue, total leukocytes, urine, uvea, whole blood |
| assay | The technology used to generate the data in this file. For multimodal datasets with concomitant profiling of biospecimen select all assays that apply. e.g., the GEX files from a CITE-seq experiment should be labeled with both 'scRNASeq' and 'CITESeq'. | True | ASAPSeq, CE-MS, CITESeq, CosMX, CyTOF, GenePS SeqFISH, H&E, LC-MS/MS, NULISA, Olink Explore HT, Olink Flex, Olink Focus, Olink Reveal, Olink Target 48, Olink Target 96, RNASeq, SNP array, SomaScan, VDJSeq, Visium, WES, WGS, Xenium, feature barcode sequencing, flow cytometry, imaging mass cytometry, imaging mass spectrometry, kiloplex, multiplexed ELISA, scRNASeq, scVDJSeq, serial IHC, snATACSeq, snRNASeq |
| program | Name of the funding program that supported the generation of data and associated files | True | AMP AIM, AMP RA/SLE, Community Contribution |
| datasetDescription | A Dataset-specific attribute specifying the synID of the folder that contains a wiki write-up of the dataset description. This wiki content will be surfaced on the ARK Portal frontend site. | True | |
| associatedDataset | The synID of a Dataset entity. This serves to link other Synapse entities to Dataset entities. When used to annotate a publication Dataset this attribute should include the synID for an experimental Datasets from which the publication data was derived. Multiple synID can be specified using a comma-delimited list. | False | |
| Component | A high-level attribute for grouping attributes into templates. | True | |
| diagnosis | A high-level classifier indicating the disease status of an individual. | True | At-Risk RA, OA, RA, SLE, Sjogren's disease, control, cutaneous lupus erythematosus, dermatomyositis, lupus nephritis, psoriasis, psoriatic arthritis, scleroderma, unknown, vitiligo |
| datasetStatus | A categorical label indicating the status of an ARK Portal dataset. This is applied to improve downstream management of datasets as well as various ETL workflows. | True | deprecated, released, test, under peer review, unreleased |
| dataType | High-level classification of the type of data contained in the file, loosely related to the experimental method or biological entity that is being profiled. Select all that apply using a comma-delimited list, though in most cases only a single label is expected. For multimodal datasets with concomitant profiling of biospecimen include 'multimodal'. | True | cytometry, epigenomics, genomics, histology, immune repertoire profiling, immunostaining, lipidomics, metabolomics, microbiome, multimodal, proteomics, transcriptomics |
| dataSubtype | General classification to differentiate between omics profiling modalities. If N/A please select 'none'. Multiple selections can be provided in a comma-delimited list, however this is largely only expected in the context of Datasets and files that contain integrated experimental data spanning multiple types. | True | bulk, none, pseudobulk, single-cell, single-nucleus, spatial |
| ARKRelease | A Dataset-specific attribute specifying the ARK Portal release in which this dataset was first made available to the public. | True | 1.0, 2.0, 2024.06.R1, 2024.07.R1, 2024.08.R1, 2024.09.R1, 2024.10.R1, 2024.12.R1, 2025.01.R1, 2025.02.R1, 2025.03.R1, 2025.04.R1, 2025.05.R1, 2025.06.R1, 2025.07.R1, 2025.08.R1, 2025.09.R1, 2025.10.R1, 2025.11.R1, 2025.12.R1 |
| programPhase | A label noting which AMP RA/SLE program phase generated the data. | True | I, II |