top of page
AB_overview_cropped.jpg

UNTANGLING GENOME ASSEMBLIES

ORGANIZATION

BC Cancer

DATE

2009

THE PROJECT

Sequencing a genome can give rich information about an how an organism functions, but obtaining a genome sequence made up of nucleotides (abbreviated A, C, G, and T) is not as simple as reading from one end to the other. Instead, sequencing involves first breaking up the genome into small fragments, sequencing those pieces, and then putting them all back together in a process known as genome assembly.

​

Researchers at the Michael Smith Genome Sciences Centre needed a way to efficiently analyze and debug their genome assembly algorithm, Assembly By Short Sequences (ABySS). Off-the-shelf tools did not support their use cases and struggled to handle large genomes. I designed and developed ABySS-Explorer that uses a novel graph visualization to address their needs and won an IEEE VIS Best Paper award in 2009.

​

THE DESIGN

Starting Point

This project began with a hand annotated printout of an ABySS assembly graph. Shaun Jackman, one of the ABySS co-authors and developers, was using this image to interpret the algorithm's output and asked if I could help.

A printout of an ABySS assembly graph hand annotated with yellow highlighter pen.

Transform THE Data

Genomes are full of repeated sequences, so it's common for assembled sequences to overlap. The ABySS algorithm results in a graph with sequences as vertices and points of overlap as edges.

Reasoning about data encoded this way is actually quite tricky and is not consistent with how most scientists visualize DNA sequences as long strings or lines. 

To tackle this, I flipped the representation to instead visualize assembled sequences a edges and points of overlap as vertices. In the example below, it's now easier to see that sequence 1 shares an overlap with sequences 2, 3, and 4 (left versus right graph).

A diagram showing overlaps between four genome sequences represented by the ABySS algorithm, as simple DNA strands, and in the ABySS-Explorer view.

Sequence Length

The next key challenge was how to depict the length of each sequence. In an email exchange with Martin Wattenberg, he suggested "would it make sense to used curved or even squiggly edges to represent longer segments".

Inspired by this idea, I encoded each sequence as a wave with one oscillation for each fixed number of nucleotides. Long sequences condense into nearly solid shapes and I rendered them as asymmetric leaf-like shapes to capture their directionality. 

A visualization of just three assembled sequences in a graph each represented as a squiggly line where the longer the line, the longer the sequence.

Colour

Colour is a powerful channel and I saved it for last. In this example, I used colour to annotate parts of an assembled genome from a lymphoma. Orange and blue colours map to distant regions in a healthy reference genome highlighting where genomic rearrangements have likely occurred in this cancer. 

A larger graph visualized with ABySS-Explorer where orange and blue colours indicate different regions in a reference genome.

MY CONTRIBUTIONS

DEFINE

the problem

I worked closely with computational biologists to understand their evolving analytical needs and defined the scope of the original design.

WRANGLE

the data

I transformed the raw output of the ABySS algorithm into a form that served the visual tasks and supported performant interaction. 

DESIGN

the prototypes

I iteratively designed ABySS-Explorer through both sketches and prototypes in code that were routinely critiqued and tested by end users. 

DEPLOY

the solution

I built the initial ABySS-Explorer application in Java and later supervised junior developers to extend its functionality. 

COMMUNICATE

the methods

I wrote the IEEE VIS paper describing the design, methods and applications, which won a best paper award.

WANT TO LEARN MORE?

You can read all about the design and application in the IEEE VIS paper.

also see

Thumbnail image of the process time tracking dashboard of Guides Analytics.
Thumbnail image of a colourful heatmap of genome alterations visualized using Montage.
Thumbnail image of a TimeScape visualization to study cancer evolution.
bottom of page