Skip to main content
Version: 0.17.1

Data

Users can specify the data of each visualization (i.e., track) through a track.data property.

{
"tracks":[{
"data": {...}, // specify the data used in this track
"mark": "rect",
"color": ...,
...
}]
}

Supported Data Formats

For the flexible data exploration, Gosling supports two different kinds of datasets:

  1. Plain Datasets (No HiGlass Server): These datasets can be directly used in Gosling without requiring any data preprocessing, including CSV, JSON, BigWig, BAM, BED.

  2. Pre-aggregated Datasets (HiGlass Server): These datasets are preprocessed for the scalable data exploration and require a HiGlass server to access them in Gosling, including Vector, Multivec, and BEDDB. To learn more about preprocessing your data and setting up the server, please visit the HiGlass website.

CSV (No HiGlass Server)

Any small enough tabular data files, such as tsv, csv, BED, BEDPE, and GFF, can be loaded using "csv" data specification.

{
"tracks": [
{
"data": {
"url": "https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv",
"type": "csv",
"chromosomeField": "Chromosome",
"genomicFields": ["chromStart", "chromEnd"]
},
...,
}]
}

property type description

url

string

Required. Specify the URL address of the data file.

type

string

Required. Must be "csv".

separator

string

Specify file separator, Default: ','

sampleLength

number

Specify the number of rows loaded from the URL. Default: 1000

longToWideId

string

Experimental Proerty.

headerNames

string[]

Specify the names of data fields if a CSV file does not contain a header.

genomicFieldsToConvert

object[]

Experimental Proerty. Each object follows the format {"chromosomeField":"string","genomicFields":"string[]"} ( )

genomicFields

string[]

Specify the name of genomic data fields.

chromosomePrefix

string

Specify the chromosome prefix if chromosomes are denoted using a prefix besides "chr" or a number

chromosomeField

string

Specify the name of chromosome data fields.

GFF3 (No HiGlass Server)

This format allows for files that follow the GFF3 specification.

GFF file demo

Currently, the GFF3 file must have an accompanying index file. If you do not have an index file for your GFF3 file, you can create one using tabix. Otherwise, you can treat the GFF3 file as if it were a CSV file and use the CSV data specification, but this will not be as performant for large files.

The field names correspond to the names of the columns. For example, the field which corresponds to the "start" column is called "start". The standard GFF fields are as follows: seq_id, source, type, start, end, score, strand, phase, and attributes.

Here is an example GFF3 file line:

U00096.3	Genbank	gene	352706	354592	.	+	.	Name=prpE;gbkey=Gene;gene=prpE;gene_biotype=protein_coding;gene_synonym=ECK0332,yahU;locus_tag=b0335

This will be parsed as the following:

{
​seq_id: "U00096.3"​​
​source: "Genbank"
​type: "gene"
​start: 352706
​end: 354592
​phase: null
​strand: "+"
​score: null
​attributes: Object { Name: (1) […], gbkey: (1) […], Name: (1) […], … }
​child_features: Array []
​derived_features: Array [] ​​
}

If we include the option attributesToFields: [{attribute: "Name", defaultValue: "unknown"}], then the Name attribute will included as a field:

{
Name: "prpE"
​seq_id: "U00096.3"​​
​source: "Genbank"
​type: "gene"
​start: 352706
​end: 354592
​phase: null
​strand: "+"
​score: null
​attributes: Object { ID: (1) […], Dbxref: (2) […], Name: (1) […], … }
​child_features: Array []
​derived_features: Array [] ​​
}

This allows Name to be used as a field in Gosling to label features.

{
"tracks":[{
"data": {
"url": "https://s3.amazonaws.com/gosling-lang.org/data/gff/E_coli_MG1655.gff3.gz",
"indexUrl": "https://s3.amazonaws.com/gosling-lang.org/data/gff/E_coli_MG1655.gff3.gz.tbi",
"type": "gff"
},
"mark": "rect",
"x": {"field": "start"}, // example using one of the standard fields
"xe": {"field": "end"},
... // other configurations of this track
}]
}

Generic Feature Format Version 3 (GFF3) format data. It parses files that follow the [GFF3 specification](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md).

property type description

url

string

Required. URL link to the GFF file

type

string

Required. Must be "gff".

indexUrl

string

Required. URL link to the tabix index file

sampleLength

number

The maximum number of samples to be shown on the track. Samples are uniformly randomly selected so that this threshold is not exceeded. Default: 1000

attributesToFields

object[]

Each object follows the format {"attribute":"string","defaultValue":"string"} ( ) Specifies which attributes to include as a fields. GFF files have an "attributes" column which contains a list of attributes which are each tag-value pairs (tag=value). This option allows for specific attributes to be accessible as a field. For example, if you have an attribute called "gene_name" and you want label features on your track using those values, you can use this option so that you can use "field": "gene_name" in the schema. If there is a single value corresponding to the tag, Gosling will parse that value as a string. If there are multiple values corresponding to a tag, Gosling will parse it as a comma-separated list string. If a feature does not have a particular attribute, then the attribute value will be set to the defaultValue.

VCF (No HiGlass Server)

This format allow files that follow the VCF specification. Currently, we only support the usage of VCF files that have a corresponding index file.

VCF file demo showing indels

VCF file demo showing point mutations

{
"tracks":[{
"data": {
"url": "https://somatic-browser-test.s3.amazonaws.com/browserExamples/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20161006.somatic.indel.sorted.vcf.gz",
"indexUrl": "https://somatic-browser-test.s3.amazonaws.com/browserExamples/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20161006.somatic.indel.sorted.vcf.gz.tbi",
"type": "vcf",
"sampleLength": 5000
},
... // other configurations of this track
}]
}

The Variant Call Format (VCF).

property type description

url

string

Required. URL link to the VCF file

type

string

Required. Must be "vcf".

indexUrl

string

Required. URL link to the tabix index file

sampleLength

number

The maximum number of rows to be loaded from the URL. Default: 1000

JSON (No HiGlass Server)

This format allows users to include data directly in Gosling's JSON specification.

caution

For better rendering performance, we recommend using JSON only for small data (~100 rows). For larger data, consider using CSV or other file formats.

{
"tracks":[{
"data": {
"type": "json",
"chromosomeField": "Chromosome",
"genomicFields": [
"chromStart",
"chromEnd"
],
"values": [
{
"Chromosome": "chr1",
"chromStart": 0,
"chromEnd": 2300000,
"Name": "p36.33",
"Stain": "gneg"
},
{
"Chromosome": "chr1",
"chromStart": 2300000,
"chromEnd": 5300000,
"Name": "p36.32",
"Stain": "gpos25"
}, ...
]
},
... // other configurations of this track
}]
}
property type description

values

Datum[]

Required. Values in the form of JSON.

type

string

Required. Must be "json". Define data type.

sampleLength

number

Specify the number of rows loaded from the URL. Default: 1000

genomicFieldsToConvert

object[]

Experimental Proerty. Each object follows the format {"chromosomeField":"string","genomicFields":"string[]"} ( )

genomicFields

string[]

Specify the name of genomic data fields.

chromosomeField

string

Specify the name of chromosome data fields.

The property "genomicFieldsToConvert" enables users to convert chromosome fields into genomic fields, which facilitates the creation of links between various chromosomes.

BigWig (No HiGlass Server)

{
"tracks":[{
"data": {
"url": 'https://s3.amazonaws.com/gosling-lang.org/data/4DNFIMPI5A9N.bw',
"type": "bigwig",
"column": "position",
"value": "peak"
},
... // other configurations of this track
}]
}
property type description

url

string

Required. Specify the URL address of the data file.

type

string

Required. Must be "bigwig".

value

string

Assign a field name of quantitative values. Default: "value"

start

string

Assign a field name of the start position of genomic intervals. Default: "start"

end

string

Assign a field name of the end position of genomic intervals. Default: "end"

column

string

Assign a field name of the middle position of genomic intervals. Default: "position"

binSize

number

Binning the genomic interval in tiles (unit size: 256).

aggregation

string

One of "mean", "sum". Determine aggregation function to apply within bins. Default: "mean"

BAM (No HiGlass Server)

Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map-files.

property type description

url

string

Required. URL link to the BAM data file

type

string

Required. Must be "bam".

indexUrl

string

Required. URL link to the index file of the BAM file

maxInsertSize

number

Determines the threshold of insert sizes for determining the structural variants. Default: 5000

loadMates

boolean

Load mates that are located in the same chromosome. Default: false

junctionMinCoverage

number

Determine the threshold of coverage when extracting exon-to-exon junctions. Default: 1

extractJunction

boolean

Determine whether to extract exon-to-exon junctions. Default: false

BED (No HiGlass Server)

This format allows for BED files that follow the BED specification to be used. There are 12 standard fields (chrom, chromStart, chromEnd, name, score, strand, thickStart, thickEnd, itemRgb, blockCount, blockSizes, and blockStarts). The first three fields (chrom, chromStart, chromEnd) are required. If custom fields are specified, they will not be able to rename the first three fields.

Currently, the BED file must have an accompanying index file. If you do not have an index file for your BED file, you can create one using tabix. Otherwise, you can treat the BED file as if it were a CSV file and use the CSV data specification, but this will not be as performant for large files.

BED file demo

{
"tracks":[{
"data": {
"url": "https://s3.amazonaws.com/gosling-lang.org/data/bed/chr1_CDS_BED12.bed.gz",
"indexUrl": "https://s3.amazonaws.com/gosling-lang.org/data/bed/chr1_CDS_BED12.bed.gz.tbi"
"type": "bed",
},
"mark": "rect",
"x": {"field": "chromStart", "type": "genomic"}, // example using one of the standard fields
"xe": {"field": "chromEnd", "type": "genomic"},
... // other configurations of this track
}]
}

BED file format

property type description

url

string

Required. Specify the URL address of the data file.

type

string

Required. Must be "bed".

indexUrl

string

Required. Specify the URL address of the data file index.

sampleLength

number

Specify the number of rows loaded from the URL. Default: 1000

customFields

string[]

An array of strings, where each string is the name of a non-standard field in the BED file. If there are n custom fields, we assume that the last n columns of the BED file correspond to the custom fields.

Vector (Require HiGlass Server)

One-dimensional quantitative values along genomic position (e.g., bigwig) can be converted into HiGlass' "vector" format data. Find out more about this format at HiGlass Docs.

{
"tracks":[{
"data": {
"url": 'https://resgen.io/api/v1/tileset_info/?d=VLFaiSVjTjW6mkbjRjWREA',
"type": "vector",
"column": "position",
"value": "peak"
},
... // other configurations of this track
}]
}
property type description

url

string

Required. Specify the URL address of the data file.

type

string

Required. Must be "vector".

value

string

Assign a field name of quantitative values. Default: "value"

start

string

Assign a field name of the start position of genomic intervals. Default: "start"

end

string

Assign a field name of the end position of genomic intervals. Default: "end"

column

string

Assign a field name of the middle position of genomic intervals. Default: "position"

binSize

number

Binning the genomic interval in tiles (unit size: 256).

aggregation

string

One of "mean", "sum". Determine aggregation function to apply within bins. Default: "mean"

Multivec (Require HiGlass Server)

Two-dimensional quantitative values, one axis for genomic coordinate and the other for different samples, can be converted into HiGlass' "multivec" data. For example, multiple BigWig files can be converted into a single multivec file. You can also convert sequence data (FASTA) into this format where rows will be different nucleotide bases (e.g., A, T, G, C) and quantitative values represent the frequency. Find out more about this format at HiGlass Docs.

{
"tracks":[{
"data": {
"url": "https://resgen.io/api/v1/tileset_info/?d=UvVPeLHuRDiYA3qwFlm7xQ",
"type": "multivec",
"row": "sample",
"column": "position",
"value": "peak",
"categories": ["sample 1", "sample 2", "sample 3", "sample 4"]
},
...// other configurations of this track
}]
}
property type description

url

string

Required. Specify the URL address of the data file.

type

string

Required. Must be "multivec".

value

string

Assign a field name of quantitative values. Default: "value"

start

string

Assign a field name of the start position of genomic intervals. Default: "start"

row

string

Assign a field name of samples. Default: "category"

end

string

Assign a field name of the end position of genomic intervals. Default: "end"

column

string

Assign a field name of the middle position of genomic intervals. Default: "position"

categories

string[]

assign names of individual samples.

binSize

number

Binning the genomic interval in tiles (unit size: 256).

aggregation

string

One of "mean", "sum". Determine aggregation function to apply within bins. Default: "mean"

BEDDB (Require HiGlass Server)

Regular BED, or similar, files can be pre-aggregated for the scalable data exploration. Find our more about this format at HiGlass Docs.

{
"tracks":[{
"data": {
"url": "https://higlass.io/api/v1/tileset_info/?d=OHJakQICQD6gTD7skx4EWA",
"type": "beddb",
"genomicFields": [
{"index": 1, "name": "start"},
{"index": 2, "name": "end"}
],
"valueFields": [
{"index": 5, "name": "strand", "type": "nominal"},
{"index": 3, "name": "name", "type": "nominal"}
],
"exonIntervalFields": [
{"index": 12, "name": "start"},
{"index": 13, "name": "end"}
]
},
... // other configurations of this track
}]
}
property type description

url

string

Required. Specify the URL address of the data file.

type

string

Required. Must be "beddb".

genomicFields

object[]

Required. Each object follows the format {"index":"number","name":"string"} ( ) Specify the name of genomic data fields.

valueFields

object[]

Each object follows the format {"index":"number","name":"string","type":"string"} ( One of "nominal", "quantitative".) Specify the column indexes, field names, and field types.

exonIntervalFields

[object, object]

Experimental Proerty.

Data Transform

Gosling supports a diverse set of data transforms, including

Filter Transform , Str Concat Transform , Str Replace Transform , Log Transform , Displace Transform , Exon Split Transform , Genomic Length Transform , Sv Type Transform , Coverage Transform , Json Parse Transform .
{
"tracks":[{
"data": ...,
// a list of data transforms can be applied to the data
"dataTransform": [
{ "type": "filter", "field": "type", "oneOf": ["gene"] },
{ "type": "filter", "field": "strand", "oneOf": ["+"], "not": true }
],
"mark": "rect",
...,
}]
}

Filter Transform

Users can apply three types of filters: oneOf, inRange, include. Each filter transform has the following properties:

Properties of One Of Filter

property type description

type

string

Required. Must be "filter".

oneOf

array

Required. Check whether the value is an element in the provided list.

field

string

Required. A filter is applied based on the values of the specified data field

not

boolean

when {"not": true}, apply a NOT logical operation to the filter. Default: false

Properties of In Range Filter

property type description

type

string

Required. Must be "filter".

inRange

number[]

Required. Check whether the value is in a number range.

field

string

Required. A filter is applied based on the values of the specified data field

not

boolean

when {"not": true}, apply a NOT logical operation to the filter. Default: false

Properties of Include Filter

property type description

type

string

Required. Must be "filter".

include

string

Required. Check whether the value includes a substring.

field

string

Required. A filter is applied based on the values of the specified data field

not

boolean

when {"not": true}, apply a NOT logical operation to the filter. Default: false

Str Concat Transform

property type description

type

string

Required. Must be "concat".

separator

string

Required.

newField

string

Required.

fields

string[]

Required.

Str Replace Transform

property type description

type

string

Required. Must be "replace".

replace

object[]

Required. Each object follows the format {"from":"string","to":"string"} ( )

newField

string

Required.

field

string

Required.

Log Transform

property type description

type

string

Required. Must be "log".

field

string

Required.

newField

string

If specified, store transformed values in a new field.

base

number | string

If not specified, 10 is used.

Displace Transform

property type description

type

string

Required. Must be "displace".

newField

string

Required.

method

string

Required. One of "pile", "spread". A string that specifies the type of displacement.

boundingBox

boundingBox

Required.

maxRows

number

Specify maximum rows to be generated (default has no limit).

Exon Split Transform

property type description

type

string

Required. Must be "exonSplit".

separator

string

Required.

flag

flag

Required. Each object follows the format {"field":"string","value":"number|string"} ( )

fields

object[]

Required. Each object follows the format {"chrField":"string","field":"string","newField":"string","type":"string"} ( One of "genomic", "nominal", "quantitative".)

Coverage Transform

Aggregate rows and calculate coverage

property type description

type

string

Required. Must be "coverage".

startField

string

Required.

endField

string

Required.

newField

string

groupField

string

The name of a nominal field to group rows by in prior to piling-up

JSON Parse Transform

Parse JSON Object Array and append vertically

property type description

type

string

Required. Must be "subjson".

genomicLengthField

string

Required. Length of genomic interval.

genomicField

string

Required. Relative genomic position to parse.

field

string

Required. The field that contains the JSON object array.

baseGenomicField

string

Required. Base genomic position when parsing relative position.

Apart from these data transforms, users can also aggregate data values (min, max, bin, mean, and count). Read more about data aggregation

Types

Type:Datum
property type description
stringKey

number|string

Values in the form of JSON.

Type: BoundingBox
property type description

startField

string

Required. The name of a quantitative field that represents the start position.

endField

string

Required. The name of a quantitative field that represents the end position.

padding

number

The padding around visual lements. Either px or bp

isPaddingBP

boolean

Whether to consider padding as the bp length.

groupField

string

The name of a nominal field to group rows by in prior to piling-up.