Version: 1.0.0

Public Datasets

To help using Gosling for the first time, this document provides some useful datasets that can be directly used in Gosling.

For the flexible data exploration, Gosling supports two different kinds of datasets:

Plain Datasets: For the convenience, Gosling allows to use several data formats directly in the system without requiring to preprocess data or set up a dedicated server (i.e., HiGlass server).

Pre-aggregated Datasets: To allow scalable data exploration, Gosling supports using HiGlass' preprocessed datasets which requires the dedicated HiGlass server.

In this document, we provide a list of public datasets in those two classes. For the pre-aggregated ones listed below, we provide them in our Gosling's server, so you do not need to set up your own HiGlass server to visualize them.

Tip. When you are using online editor, you can check out a "Data Preview" panel on the right-bottom part of the interface to see how the actual data looks like.

List of Plain Datasets

Type: CSV
URL: https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv
Description: hg38 cytoband
Soruce: UCSC

Type: BigWig
URL: https://s3.amazonaws.com/gosling-lang.org/data/4DNFIMPI5A9N.bw
Source: 4DN (https://data.4dnucleome.org/files-processed/4DNFIMPI5A9N/#file-overview)

Type: CSV
URL: https://s3.amazonaws.com/gosling-lang.org/data/COVID/NC_045512.2-Genes.csv
Description: Gene Annotations of SARS-CoV-2 in NC_045512.2
Source: https://genome.ucsc.edu/covid19.html

Type: CSV
URL: https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/circos-segdup-edited.txt
Description: Circos Segmental Duplication
Source: https://github.com/nicgirault/circosJS/blob/master/demo/data/segdup.csv

List of Pre-aggregated Datasets

Type: BEDDB (Originally, a bed file)
URL: https://server.gosling-lang.org/api/v1/tileset_info/?d=gene-annotation
Description: hg38 gene annotation
Source: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/

Type: Multivec (Originally, multiple bigwig files)
URL: https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec
Description: Multiple samples of ChIP-seq and DNase-seq analysis
Source: Cistrome Browser (http://cistrome.org/db/#/)

Type: Multivec (Originally, a FASTA file)
URL: https://server.gosling-lang.org/api/v1/tileset_info/?d=sequence-multivec
Description: hg38 sequence
Source: UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/)

Type: BEDDB (Originally, a VCF file)
URL: https://server.gosling-lang.org/api/v1/tileset_info/?d=clinvar-beddb
Description: hg38 ClinVar data
Source: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/

Type: Multivec (Originally, a VCF file)
URL: https://server.gosling-lang.org/api/v1/tileset_info/?d=clinvar-multivec
Description: Density of ClinVar data by seven pathogenicity categories
Source: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/

categories: [
    'Benign',
    'Benign/Likely_benign',
    'Likely_benign',
    'Uncertain_significance',
    'Likely_pathogenic',
    'Pathogenic/Likely_pathogenic',
    'Pathogenic',
    'risk_factor',
    'Conflicting_interpretations_of_pathogenicity'
]

List of Plain Datasets​

List of Pre-aggregated Datasets​

List of Plain Datasets

List of Pre-aggregated Datasets