Version: 1.0.0

Data

Users can specify the data of each visualization (i.e., track) through a track.data property.

{
  "tracks":[{
    "data": {...}, // specify the data used in this track
    "mark": "rect",
    "color": ...,
    ...
  }]
}

Supported Data Formats

For the flexible data exploration, Gosling supports two different kinds of datasets:

Plain Datasets (No HiGlass Server): These datasets can be directly used in Gosling without requiring any data preprocessing, including CSV, JSON, BigWig, BAM, BED.
Pre-aggregated Datasets (HiGlass Server): These datasets are preprocessed for the scalable data exploration and require a HiGlass server to access them in Gosling, including Vector, Multivec, and BEDDB. To learn more about preprocessing your data and setting up the server, please visit the HiGlass website.

CSV (No HiGlass Server)

Any small enough tabular data files, such as tsv, csv, BED, BEDPE, and GFF, can be loaded using "csv" data specification.

{
  "tracks": [
    {
      "data": {
        "url": "https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv",
        "type": "csv",
        "chromosomeField": "Chromosome",
        "genomicFields": ["chromStart", "chromEnd"]
      },
      ...,
  }]
}

property	type	description
`url`	string	Required. Specify the URL address of the data file.
`type`	string	Required. Must be `"csv"`.
`separator`	string	Specify file separator, Default: ','
`sampleLength`	number	Specify the number of rows loaded from the URL. Default: `1000`
`longToWideId`	string	Experimental Proerty.
`headerNames`	string[]	Specify the names of data fields if a CSV file does not contain a header.
`genomicFieldsToConvert`	object[]	Experimental Proerty. Each object follows the format `{"chromosomeField":"string","genomicFields":"string[]"}` ( )
`genomicFields`	string[]	Specify the name of genomic data fields.
`chromosomePrefix`	string	Specify the chromosome prefix if chromosomes are denoted using a prefix besides "chr" or a number
`chromosomeField`	string	Specify the name of chromosome data fields.

GFF3 (No HiGlass Server)

This format allows for files that follow the GFF3 specification.

GFF file demo

Currently, the GFF3 file must have an accompanying index file. If you do not have an index file for your GFF3 file, you can create one using tabix. Otherwise, you can treat the GFF3 file as if it were a CSV file and use the CSV data specification, but this will not be as performant for large files.

The field names correspond to the names of the columns. For example, the field which corresponds to the "start" column is called "start". The standard GFF fields are as follows: seq_id, source, type, start, end, score, strand, phase, and attributes.

Here is an example GFF3 file line:

U00096.3	Genbank	gene	352706	354592	.	+	.	Name=prpE;gbkey=Gene;gene=prpE;gene_biotype=protein_coding;gene_synonym=ECK0332,yahU;locus_tag=b0335

This will be parsed as the following:

{
  ​seq_id: "U00096.3"​​
  ​source: "Genbank"
  ​type: "gene"
  ​start: 352706
  ​end: 354592
  ​phase: null
  ​strand: "+"
  ​score: null
  ​attributes: Object { Name: (1) […], gbkey: (1) […], Name: (1) […], … }
  ​child_features: Array []
  ​derived_features: Array []  ​​
}

If we include the option attributesToFields: [{attribute: "Name", defaultValue: "unknown"}], then the Name attribute will included as a field:

{
  Name: "prpE"
  ​seq_id: "U00096.3"​​
  ​source: "Genbank"
  ​type: "gene"
  ​start: 352706
  ​end: 354592
  ​phase: null
  ​strand: "+"
  ​score: null
  ​attributes: Object { ID: (1) […], Dbxref: (2) […], Name: (1) […], … }
  ​child_features: Array []
  ​derived_features: Array []  ​​
}

This allows Name to be used as a field in Gosling to label features.

{
  "tracks":[{
    "data": {
      "url": "https://s3.amazonaws.com/gosling-lang.org/data/gff/E_coli_MG1655.gff3.gz",
      "indexUrl": "https://s3.amazonaws.com/gosling-lang.org/data/gff/E_coli_MG1655.gff3.gz.tbi",
      "type": "gff"
    },
    "mark": "rect", 
      "x": {"field": "start"}, // example using one of the standard fields
      "xe": {"field": "end"},
    ... // other configurations of this track
  }]
}

Generic Feature Format Version 3 (GFF3) format data. It parses files that follow the [GFF3 specification](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md).

property	type	description
`url`	string	Required. URL link to the GFF file
`type`	string	Required. Must be `"gff"`.
`indexUrl`	string	Required. URL link to the tabix index file
`sampleLength`	number	The maximum number of samples to be shown on the track. Samples are uniformly randomly selected so that this threshold is not exceeded. Default: `1000`
`attributesToFields`	object[]	Each object follows the format `{"attribute":"string","defaultValue":"string"}` ( ) Specifies which attributes to include as a fields. GFF files have an "attributes" column which contains a list of attributes which are each tag-value pairs (`tag=value`). This option allows for specific attributes to be accessible as a field. For example, if you have an attribute called "gene_name" and you want label features on your track using those values, you can use this option so that you can use `"field": "gene_name"` in the schema. If there is a single `value` corresponding to the `tag`, Gosling will parse that value as a string. If there are multiple `value`s corresponding to a `tag`, Gosling will parse it as a comma-separated list string. If a feature does not have a particular attribute, then the attribute value will be set to the `defaultValue`.

VCF (No HiGlass Server)

This format allow files that follow the VCF specification. Currently, we only support the usage of VCF files that have a corresponding index file.

VCF file demo showing indels

VCF file demo showing point mutations

{
  "tracks":[{
    "data": {
      "url": "https://somatic-browser-test.s3.amazonaws.com/browserExamples/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20161006.somatic.indel.sorted.vcf.gz",
      "indexUrl": "https://somatic-browser-test.s3.amazonaws.com/browserExamples/7a921087-8e62-4a93-a757-fd8cdbe1eb8f.consensus.20161006.somatic.indel.sorted.vcf.gz.tbi",
      "type": "vcf",
      "sampleLength": 5000
    },
    ... // other configurations of this track
  }]
}

The Variant Call Format (VCF).

property	type	description
`url`	string	Required. URL link to the VCF file
`type`	string	Required. Must be `"vcf"`.
`indexUrl`	string	Required. URL link to the tabix index file
`sampleLength`	number	The maximum number of rows to be loaded from the URL. Default: `1000`

JSON (No HiGlass Server)

This format allows users to include data directly in Gosling's JSON specification.

caution

For better rendering performance, we recommend using JSON only for small data (~100 rows). For larger data, consider using CSV or other file formats.

{
  "tracks":[{
    "data": {
      "type": "json",
      "chromosomeField": "Chromosome",
      "genomicFields": [
          "chromStart",
          "chromEnd"
      ],
      "values": [
        {
          "Chromosome": "chr1",
          "chromStart": 0,
          "chromEnd": 2300000,
          "Name": "p36.33",
          "Stain": "gneg"
        },
        {
          "Chromosome": "chr1",
          "chromStart": 2300000,
          "chromEnd": 5300000,
          "Name": "p36.32",
          "Stain": "gpos25"
        }, ...
        ]
    },
    ... // other configurations of this track
  }]
}

property	type	description
`values`	Datum[]	Required. Values in the form of JSON.
`type`	string	Required. Must be `"json"`. Define data type.
`sampleLength`	number	Specify the number of rows loaded from the URL. Default: `1000`
`genomicFieldsToConvert`	object[]	Experimental Proerty. Each object follows the format `{"chromosomeField":"string","genomicFields":"string[]"}` ( )
`genomicFields`	string[]	Specify the name of genomic data fields.
`chromosomeField`	string	Specify the name of chromosome data fields.

The property "genomicFieldsToConvert" enables users to convert chromosome fields into genomic fields, which facilitates the creation of links between various chromosomes.

BigWig (No HiGlass Server)

{
  "tracks":[{
    "data": {
      "url": 'https://s3.amazonaws.com/gosling-lang.org/data/4DNFIMPI5A9N.bw',
      "type": "bigwig",
      "column": "position",
      "value": "peak"
    },
    ... // other configurations of this track
  }]
}

property	type	description
`url`	string	Required. Specify the URL address of the data file.
`type`	string	Required. Must be `"bigwig"`.
`value`	string	Assign a field name of quantitative values. Default: `"value"`
`start`	string	Assign a field name of the start position of genomic intervals. Default: `"start"`
`end`	string	Assign a field name of the end position of genomic intervals. Default: `"end"`
`column`	string	Assign a field name of the middle position of genomic intervals. Default: `"position"`
`binSize`	number	Binning the genomic interval in tiles (unit size: 256).
`aggregation`	string	One of `"mean"`, `"sum"`. Determine aggregation function to apply within bins. Default: `"mean"`

BAM (No HiGlass Server)

Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map-files.

property	type	description
`url`	string	Required. URL link to the BAM data file
`type`	string	Required. Must be `"bam"`.
`indexUrl`	string	Required. URL link to the index file of the BAM file
`maxInsertSize`	number	Determines the threshold of insert sizes for determining the structural variants. Default: `5000`
`loadMates`	boolean	Load mates that are located in the same chromosome. Default: `false`
`junctionMinCoverage`	number	Determine the threshold of coverage when extracting exon-to-exon junctions. Default: `1`
`extractJunction`	boolean	Determine whether to extract exon-to-exon junctions. Default: `false`

BED (No HiGlass Server)

This format allows for BED files that follow the BED specification to be used. There are 12 standard fields (chrom, chromStart, chromEnd, name, score, strand, thickStart, thickEnd, itemRgb, blockCount, blockSizes, and blockStarts). The first three fields (chrom, chromStart, chromEnd) are required. If custom fields are specified, they will not be able to rename the first three fields.

Currently, the BED file must have an accompanying index file. If you do not have an index file for your BED file, you can create one using tabix. Otherwise, you can treat the BED file as if it were a CSV file and use the CSV data specification, but this will not be as performant for large files.

BED file demo

{
  "tracks":[{
    "data": {
      "url": "https://s3.amazonaws.com/gosling-lang.org/data/bed/chr1_CDS_BED12.bed.gz",
      "indexUrl": "https://s3.amazonaws.com/gosling-lang.org/data/bed/chr1_CDS_BED12.bed.gz.tbi"
      "type": "bed",
    },
    "mark": "rect", 
      "x": {"field": "chromStart", "type": "genomic"}, // example using one of the standard fields
      "xe": {"field": "chromEnd", "type": "genomic"},
    ... // other configurations of this track
  }]
}

BED file format

property	type	description
`url`	string	Required. Specify the URL address of the data file.
`type`	string	Required. Must be `"bed"`.
`indexUrl`	string	Required. Specify the URL address of the data file index.
`sampleLength`	number	Specify the number of rows loaded from the URL. Default: `1000`
`customFields`	string[]	An array of strings, where each string is the name of a non-standard field in the BED file. If there are `n` custom fields, we assume that the last `n` columns of the BED file correspond to the custom fields.

Vector (Require HiGlass Server)

One-dimensional quantitative values along genomic position (e.g., bigwig) can be converted into HiGlass' "vector" format data. Find out more about this format at HiGlass Docs.

{
  "tracks":[{
    "data": {
      "url": 'https://resgen.io/api/v1/tileset_info/?d=VLFaiSVjTjW6mkbjRjWREA',
      "type": "vector",
      "column": "position",
      "value": "peak"
    },
    ... // other configurations of this track
  }]
}

property	type	description
`url`	string	Required. Specify the URL address of the data file.
`type`	string	Required. Must be `"vector"`.
`value`	string	Assign a field name of quantitative values. Default: `"value"`
`start`	string	Assign a field name of the start position of genomic intervals. Default: `"start"`
`end`	string	Assign a field name of the end position of genomic intervals. Default: `"end"`
`column`	string	Assign a field name of the middle position of genomic intervals. Default: `"position"`
`binSize`	number	Binning the genomic interval in tiles (unit size: 256).
`aggregation`	string	One of `"mean"`, `"sum"`. Determine aggregation function to apply within bins. Default: `"mean"`

Multivec (Require HiGlass Server)

Two-dimensional quantitative values, one axis for genomic coordinate and the other for different samples, can be converted into HiGlass' "multivec" data. For example, multiple BigWig files can be converted into a single multivec file. You can also convert sequence data (FASTA) into this format where rows will be different nucleotide bases (e.g., A, T, G, C) and quantitative values represent the frequency. Find out more about this format at HiGlass Docs.

{
  "tracks":[{
    "data": {
        "url": "https://resgen.io/api/v1/tileset_info/?d=UvVPeLHuRDiYA3qwFlm7xQ",
        "type": "multivec",
        "row": "sample",
        "column": "position",
        "value": "peak",
        "categories": ["sample 1", "sample 2", "sample 3", "sample 4"]
    },
    ...// other configurations of this track
  }]
}

property	type	description
`url`	string	Required. Specify the URL address of the data file.
`type`	string	Required. Must be `"multivec"`.
`value`	string	Assign a field name of quantitative values. Default: `"value"`
`start`	string	Assign a field name of the start position of genomic intervals. Default: `"start"`
`row`	string	Assign a field name of samples. Default: `"category"`
`end`	string	Assign a field name of the end position of genomic intervals. Default: `"end"`
`column`	string	Assign a field name of the middle position of genomic intervals. Default: `"position"`
`categories`	string[]	assign names of individual samples.
`binSize`	number	Binning the genomic interval in tiles (unit size: 256).
`aggregation`	string	One of `"mean"`, `"sum"`. Determine aggregation function to apply within bins. Default: `"mean"`

BEDDB (Require HiGlass Server)

Regular BED, or similar, files can be pre-aggregated for the scalable data exploration. Find our more about this format at HiGlass Docs.

{
  "tracks":[{
    "data": {
      "url": "https://higlass.io/api/v1/tileset_info/?d=OHJakQICQD6gTD7skx4EWA",
      "type": "beddb",
      "genomicFields": [
          {"index": 1, "name": "start"},
          {"index": 2, "name": "end"}
      ],
      "valueFields": [
          {"index": 5, "name": "strand", "type": "nominal"},
          {"index": 3, "name": "name", "type": "nominal"}
      ],
      "exonIntervalFields": [
          {"index": 12, "name": "start"},
          {"index": 13, "name": "end"}
      ]
    },
    ... // other configurations of this track
  }]
}

property	type	description
`url`	string	Required. Specify the URL address of the data file.
`type`	string	Required. Must be `"beddb"`.
`genomicFields`	object[]	Required. Each object follows the format `{"index":"number","name":"string"}` ( ) Specify the name of genomic data fields.
`valueFields`	object[]	Each object follows the format `{"index":"number","name":"string","type":"string"}` ( One of `"nominal"`, `"quantitative"`.) Specify the column indexes, field names, and field types.
`exonIntervalFields`	[object, object]	Experimental Proerty.

Data Transform

Gosling supports a diverse set of data transforms, including

Filter Transform , Str Concat Transform , Str Replace Transform , Log Transform , Displace Transform , Exon Split Transform , Genomic Length Transform , Sv Type Transform , Coverage Transform , Json Parse Transform .

{
  "tracks":[{
    "data": ...,
    // a list of data transforms can be applied to the data
    "dataTransform": [
          { "type": "filter", "field": "type", "oneOf": ["gene"] },
          { "type": "filter", "field": "strand", "oneOf": ["+"], "not": true }
    ],
    "mark": "rect",
    ...,
  }]
}

Filter Transform

Users can apply three types of filters: oneOf, inRange, include. Each filter transform has the following properties:

Properties of One Of Filter

property	type	description
`type`	string	Required. Must be `"filter"`.
`oneOf`	array	Required. Check whether the value is an element in the provided list.
`field`	string	Required. A filter is applied based on the values of the specified data field
`not`	boolean	when `{"not": true}`, apply a NOT logical operation to the filter. Default: `false`

Properties of In Range Filter

property	type	description
`type`	string	Required. Must be `"filter"`.
`inRange`	number[]	Required. Check whether the value is in a number range.
`field`	string	Required. A filter is applied based on the values of the specified data field
`not`	boolean	when `{"not": true}`, apply a NOT logical operation to the filter. Default: `false`

Properties of Include Filter

property	type	description
`type`	string	Required. Must be `"filter"`.
`include`	string	Required. Check whether the value includes a substring.
`field`	string	Required. A filter is applied based on the values of the specified data field
`not`	boolean	when `{"not": true}`, apply a NOT logical operation to the filter. Default: `false`

Str Concat Transform

property	type	description
`type`	string	Required. Must be `"concat"`.
`separator`	string	Required.
`newField`	string	Required.
`fields`	string[]	Required.

Str Replace Transform

property	type	description
`type`	string	Required. Must be `"replace"`.
`replace`	object[]	Required. Each object follows the format `{"from":"string","to":"string"}` ( )
`newField`	string	Required.
`field`	string	Required.

Log Transform

property	type	description
`type`	string	Required. Must be `"log"`.
`field`	string	Required.
`newField`	string	If specified, store transformed values in a new field.
`base`	number \| string	If not specified, 10 is used.

Displace Transform

property	type	description
`type`	string	Required. Must be `"displace"`.
`newField`	string	Required.
`method`	string	Required. One of `"pile"`, `"spread"`. A string that specifies the type of displacement.
`boundingBox`	boundingBox	Required.
`maxRows`	number	Specify maximum rows to be generated (default has no limit).

Exon Split Transform

property	type	description
`type`	string	Required. Must be `"exonSplit"`.
`separator`	string	Required.
`flag`	flag	Required. Each object follows the format `{"field":"string","value":"number\|string"}` ( )
`fields`	object[]	Required. Each object follows the format `{"chrField":"string","field":"string","newField":"string","type":"string"}` ( One of `"genomic"`, `"nominal"`, `"quantitative"`.)

Coverage Transform

Aggregate rows and calculate coverage

property	type	description
`type`	string	Required. Must be `"coverage"`.
`startField`	string	Required.
`endField`	string	Required.
`newField`	string
`groupField`	string	The name of a nominal field to group rows by in prior to piling-up

JSON Parse Transform

Parse JSON Object Array and append vertically

property	type	description
`type`	string	Required. Must be `"subjson"`.
`genomicLengthField`	string	Required. Length of genomic interval.
`genomicField`	string	Required. Relative genomic position to parse.
`field`	string	Required. The field that contains the JSON object array.
`baseGenomicField`	string	Required. Base genomic position when parsing relative position.

Apart from these data transforms, users can also aggregate data values (min, max, bin, mean, and count). Read more about data aggregation

Types

Type:Datum

property	type	description
stringKey	number\|string	Values in the form of JSON.

Type: BoundingBox

property	type	description
`startField`	string	Required. The name of a quantitative field that represents the start position.
`endField`	string	Required. The name of a quantitative field that represents the end position.
`padding`	number	The padding around visual lements. Either px or bp
`isPaddingBP`	boolean	Whether to consider `padding` as the bp length.
`groupField`	string	The name of a nominal field to group rows by in prior to piling-up.

Supported Data Formats​

CSV (No HiGlass Server)​

GFF3 (No HiGlass Server)​

VCF (No HiGlass Server)​

JSON (No HiGlass Server)​

BigWig (No HiGlass Server)​

BAM (No HiGlass Server)​

BED (No HiGlass Server)​

Vector (Require HiGlass Server)​

Multivec (Require HiGlass Server)​

BEDDB (Require HiGlass Server)​

Data Transform​

Filter Transform​

Str Concat Transform​

Str Replace Transform​

Log Transform​

Displace Transform​

Exon Split Transform​

Coverage Transform​

JSON Parse Transform​

Types​

Type:Datum​

Type: BoundingBox​