Data
Users can specify the data of each visualization (i.e., track
) through a track.data
property.
{
"tracks":[{
"data": {...}, // specify the data used in this track
"mark": "rect",
"color": ...,
...
}]
}
Supported Data Formats
For the flexible data exploration, Gosling supports two different kinds of datasets:
-
Plain Datasets (No HiGlass Server): These datasets can be directly used in Gosling without requiring any data preprocessing, including CSV, JSON, BigWig, BAM, BED.
-
Pre-aggregated Datasets (HiGlass Server): These datasets are preprocessed for the scalable data exploration and require a HiGlass server to access them in Gosling, including Vector, Multivec, and BEDDB. To learn more about preprocessing your data and setting up the server, please visit the HiGlass website.
CSV (No HiGlass Server)
Any small enough tabular data files, such as tsv, csv, BED, BEDPE, and GFF, can be loaded using "csv"
data specification.
{
"tracks": [
{
"data": {
"url": "https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv",
"type": "csv",
"chromosomeField": "Chromosome",
"genomicFields": ["chromStart", "chromEnd"]
},
...,
}]
}
property | type | description |
---|---|---|
| string | Required. Specify the URL address of the data file. |
| string | Required. Must be |
| string | Specify file separator, Default: ',' |
| number | Specify the number of rows loaded from the URL.
Default: |
| string | Experimental Proerty. |
| string[] | Specify the names of data fields if a CSV file does not contain a header. |
| object[] | Experimental Proerty. Each object follows the format |
| string[] | Specify the name of genomic data fields. |
| string | Specify the chromosome prefix if chromosomes are denoted using a prefix besides "chr" or a number |
| string | Specify the name of chromosome data fields. |
GFF3 (No HiGlass Server)
This format allows for files that follow the GFF3 specification.
Currently, the GFF3 file must have an accompanying index file. If you do not have an index file for your GFF3 file, you can create one using tabix. Otherwise, you can treat the GFF3 file as if it were a CSV file and use the CSV data specification, but this will not be as performant for large files.
The field names correspond to the names of the columns. For example, the field which corresponds to the "start" column
is called "start". The standard GFF fields are as follows: seq_id
, source
, type
, start
, end
, score
, strand
, phase
, and attributes
.
Here is an example GFF3 file line:
U00096.3 Genbank gene 352706 354592 . + . Name=prpE;gbkey=Gene;gene=prpE;gene_biotype=protein_coding;gene_synonym=ECK0332,yahU;locus_tag=b0335
This will be parsed as the following:
{
seq_id: "U00096.3"
source: "Genbank"
type: "gene"
start: 352706
end: 354592
phase: null
strand: "+"
score: null
attributes: Object { Name: (1) […], gbkey: (1) […], Name: (1) […], … }
child_features: Array []
derived_features: Array []
}
If we include the option attributesToFields: [{attribute: "Name", defaultValue: "unknown"}]
, then the Name
attribute
will included as a field:
{
Name: "prpE"
seq_id: "U00096.3"
source: "Genbank"
type: "gene"
start: 352706
end: 354592
phase: null
strand: "+"
score: null
attributes: Object { ID: (1) […], Dbxref: (2) […], Name: (1) […], … }
child_features: Array []
derived_features: Array []
}
This allows Name
to be used as a field in Gosling to label features.
{
"tracks":[{
"data": {
"url": "https://s3.amazonaws.com/gosling-lang.org/data/gff/E_coli_MG1655.gff3.gz",
"indexUrl": "https://s3.amazonaws.com/gosling-lang.org/data/gff/E_coli_MG1655.gff3.gz.tbi",
"type": "gff"
},
"mark": "rect",
"x": {"field": "start"}, // example using one of the standard fields
"xe": {"field": "end"},
... // other configurations of this track
}]
}
Generic Feature Format Version 3 (GFF3) format data. It parses files that follow the [GFF3 specification](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md).
property | type | description |
---|---|---|
| string | Required. URL link to the GFF file |
| string | Required. Must be |
| string | Required. URL link to the tabix index file |
| number | The maximum number of samples to be shown on the track. Samples are uniformly randomly selected so that this threshold is not exceeded. Default: |
| object[] | Each object follows the format |
VCF (No HiGlass Server)
This format allow files that follow the VCF specification. Currently, we only support the usage of VCF files that have a corresponding index file.
VCF file demo showing point mutations
{
"tracks":[{
"data": {
"url": "https://somatic-browser-test.s3.amazonaws.com/browserExamples/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20161006.somatic.indel.sorted.vcf.gz",
"indexUrl": "https://somatic-browser-test.s3.amazonaws.com/browserExamples/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20161006.somatic.indel.sorted.vcf.gz.tbi",
"type": "vcf",
"sampleLength": 5000
},
... // other configurations of this track
}]
}
The Variant Call Format (VCF).
property | type | description |
---|---|---|
| string | Required. URL link to the VCF file |
| string | Required. Must be |
| string | Required. URL link to the tabix index file |
| number | The maximum number of rows to be loaded from the URL. Default: |
JSON (No HiGlass Server)
This format allows users to include data directly in Gosling's JSON specification.
For better rendering performance, we recommend using JSON only for small data (~100 rows). For larger data, consider using CSV or other file formats.
{
"tracks":[{
"data": {
"type": "json",
"chromosomeField": "Chromosome",
"genomicFields": [
"chromStart",
"chromEnd"
],
"values": [
{
"Chromosome": "chr1",
"chromStart": 0,
"chromEnd": 2300000,
"Name": "p36.33",
"Stain": "gneg"
},
{
"Chromosome": "chr1",
"chromStart": 2300000,
"chromEnd": 5300000,
"Name": "p36.32",
"Stain": "gpos25"
}, ...
]
},
... // other configurations of this track
}]
}
property | type | description |
---|---|---|
| Required. Values in the form of JSON. | |
| string | Required. Must be |
| number | Specify the number of rows loaded from the URL.
Default: |
| object[] | Experimental Proerty. Each object follows the format |
| string[] | Specify the name of genomic data fields. |
| string | Specify the name of chromosome data fields. |
The property "genomicFieldsToConvert"
enables users to convert chromosome fields into genomic fields, which facilitates the creation of links between various chromosomes.
BigWig (No HiGlass Server)
{
"tracks":[{
"data": {
"url": 'https://s3.amazonaws.com/gosling-lang.org/data/4DNFIMPI5A9N.bw',
"type": "bigwig",
"column": "position",
"value": "peak"
},
... // other configurations of this track
}]
}
property | type | description |
---|---|---|
| string | Required. Specify the URL address of the data file. |
| string | Required. Must be |
| string | Assign a field name of quantitative values. Default: |
| string | Assign a field name of the start position of genomic intervals. Default: |
| string | Assign a field name of the end position of genomic intervals. Default: |
| string | Assign a field name of the middle position of genomic intervals. Default: |
| number | Binning the genomic interval in tiles (unit size: 256). |
| string | One of |
BAM (No HiGlass Server)
Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map-files.
property | type | description |
---|---|---|
| string | Required. URL link to the BAM data file |
| string | Required. Must be |
| string | Required. URL link to the index file of the BAM file |
| number | Determines the threshold of insert sizes for determining the structural variants. Default: |
| boolean | Load mates that are located in the same chromosome. Default: |
| number | Determine the threshold of coverage when extracting exon-to-exon junctions. Default: |
| boolean | Determine whether to extract exon-to-exon junctions. Default: |
BED (No HiGlass Server)
This format allows for BED files that follow the BED specification to be used.
There are 12 standard fields (chrom
, chromStart
, chromEnd
, name
, score
, strand
, thickStart
, thickEnd
, itemRgb
, blockCount
, blockSizes
, and blockStarts
).
The first three fields (chrom
, chromStart
, chromEnd
) are required. If custom fields are specified, they
will not be able to rename the first three fields.
Currently, the BED file must have an accompanying index file. If you do not have an index file for your BED file, you can create one using tabix. Otherwise, you can treat the BED file as if it were a CSV file and use the CSV data specification, but this will not be as performant for large files.
{
"tracks":[{
"data": {
"url": "https://s3.amazonaws.com/gosling-lang.org/data/bed/chr1_CDS_BED12.bed.gz",
"indexUrl": "https://s3.amazonaws.com/gosling-lang.org/data/bed/chr1_CDS_BED12.bed.gz.tbi"
"type": "bed",
},
"mark": "rect",
"x": {"field": "chromStart", "type": "genomic"}, // example using one of the standard fields
"xe": {"field": "chromEnd", "type": "genomic"},
... // other configurations of this track
}]
}
BED file format
property | type | description |
---|---|---|
| string | Required. Specify the URL address of the data file. |
| string | Required. Must be |
| string | Required. Specify the URL address of the data file index. |
| number | Specify the number of rows loaded from the URL.
Default: |
| string[] | An array of strings, where each string is the name of a non-standard field in the BED file. If there are |
Vector (Require HiGlass Server)
One-dimensional quantitative values along genomic position (e.g., bigwig) can be converted into HiGlass' "vector"
format data. Find out more about this format at HiGlass Docs.
{
"tracks":[{
"data": {
"url": 'https://resgen.io/api/v1/tileset_info/?d=VLFaiSVjTjW6mkbjRjWREA',
"type": "vector",
"column": "position",
"value": "peak"
},
... // other configurations of this track
}]
}
property | type | description |
---|---|---|
| string | Required. Specify the URL address of the data file. |
| string | Required. Must be |
| string | Assign a field name of quantitative values. Default: |
| string | Assign a field name of the start position of genomic intervals. Default: |
| string | Assign a field name of the end position of genomic intervals. Default: |
| string | Assign a field name of the middle position of genomic intervals. Default: |
| number | Binning the genomic interval in tiles (unit size: 256). |
| string | One of |
Multivec (Require HiGlass Server)
Two-dimensional quantitative values, one axis for genomic coordinate and the other for different samples, can be converted into HiGlass' "multivec"
data. For example, multiple BigWig files can be converted into a single multivec file. You can also convert sequence data (FASTA) into this format where rows will be different nucleotide bases (e.g., A, T, G, C) and quantitative values represent the frequency. Find out more about this format at HiGlass Docs.
{
"tracks":[{
"data": {
"url": "https://resgen.io/api/v1/tileset_info/?d=UvVPeLHuRDiYA3qwFlm7xQ",
"type": "multivec",
"row": "sample",
"column": "position",
"value": "peak",
"categories": ["sample 1", "sample 2", "sample 3", "sample 4"]
},
...// other configurations of this track
}]
}
property | type | description |
---|---|---|
| string | Required. Specify the URL address of the data file. |
| string | Required. Must be |
| string | Assign a field name of quantitative values. Default: |
| string | Assign a field name of the start position of genomic intervals. Default: |
| string | Assign a field name of samples. Default: |
| string | Assign a field name of the end position of genomic intervals. Default: |
| string | Assign a field name of the middle position of genomic intervals. Default: |
| string[] | assign names of individual samples. |
| number | Binning the genomic interval in tiles (unit size: 256). |
| string | One of |
BEDDB (Require HiGlass Server)
Regular BED, or similar, files can be pre-aggregated for the scalable data exploration. Find our more about this format at HiGlass Docs.
{
"tracks":[{
"data": {
"url": "https://higlass.io/api/v1/tileset_info/?d=OHJakQICQD6gTD7skx4EWA",
"type": "beddb",
"genomicFields": [
{"index": 1, "name": "start"},
{"index": 2, "name": "end"}
],
"valueFields": [
{"index": 5, "name": "strand", "type": "nominal"},
{"index": 3, "name": "name", "type": "nominal"}
],
"exonIntervalFields": [
{"index": 12, "name": "start"},
{"index": 13, "name": "end"}
]
},
... // other configurations of this track
}]
}
property | type | description |
---|---|---|
| string | Required. Specify the URL address of the data file. |
| string | Required. Must be |
| object[] | Required. Each object follows the format |
| object[] | Each object follows the format |
| [object, object] | Experimental Proerty. |
Data Transform
Gosling supports a diverse set of data transforms, including
Filter Transform , Str Concat Transform , Str Replace Transform , Log Transform , Displace Transform , Exon Split Transform , Genomic Length Transform , Sv Type Transform , Coverage Transform , Json Parse Transform .{
"tracks":[{
"data": ...,
// a list of data transforms can be applied to the data
"dataTransform": [
{ "type": "filter", "field": "type", "oneOf": ["gene"] },
{ "type": "filter", "field": "strand", "oneOf": ["+"], "not": true }
],
"mark": "rect",
...,
}]
}
Filter Transform
Users can apply three types of filters: oneOf
, inRange
, include
.
Each filter transform has the following properties:
Properties of One Of Filter
property | type | description |
---|---|---|
| string | Required. Must be |
| array | Required. Check whether the value is an element in the provided list. |
| string | Required. A filter is applied based on the values of the specified data field |
| boolean | when |
Properties of In Range Filter
property | type | description |
---|---|---|
| string | Required. Must be |
| number[] | Required. Check whether the value is in a number range. |
| string | Required. A filter is applied based on the values of the specified data field |
| boolean | when |
Properties of Include Filter
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. Check whether the value includes a substring. |
| string | Required. A filter is applied based on the values of the specified data field |
| boolean | when |
Str Concat Transform
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. |
| string | Required. |
| string[] | Required. |
Str Replace Transform
property | type | description |
---|---|---|
| string | Required. Must be |
| object[] | Required. Each object follows the format |
| string | Required. |
| string | Required. |
Log Transform
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. |
| string | If specified, store transformed values in a new field. |
| number | string | If not specified, 10 is used. |
Displace Transform
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. |
| string | Required. One of |
| Required. | |
| number | Specify maximum rows to be generated (default has no limit). |
Exon Split Transform
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. |
| Required. Each object follows the format | |
| object[] | Required. Each object follows the format |
Coverage Transform
Aggregate rows and calculate coverage
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. |
| string | Required. |
| string | |
| string | The name of a nominal field to group rows by in prior to piling-up |
JSON Parse Transform
Parse JSON Object Array and append vertically
property | type | description |
---|---|---|
| string | Required. Must be |
| string | Required. Length of genomic interval. |
| string | Required. Relative genomic position to parse. |
| string | Required. The field that contains the JSON object array. |
| string | Required. Base genomic position when parsing relative position. |
Apart from these data transforms, users can also aggregate data values (min, max, bin, mean, and count). Read more about data aggregation
Types
Type:Datum
property | type | description |
---|---|---|
stringKey | number|string | Values in the form of JSON. |
Type: BoundingBox
property | type | description |
---|---|---|
| string | Required. The name of a quantitative field that represents the start position. |
| string | Required. The name of a quantitative field that represents the end position. |
| number | The padding around visual lements. Either px or bp |
| boolean | Whether to consider |
| string | The name of a nominal field to group rows by in prior to piling-up. |