Wednesday, 4 November 2015

Go from gene symbols to a karyogram showing the positions of those genes

Karyograms can be made quite easily in R using ggbio (see docs; the code below is based on section 8.4 thereof)
I want to make a karyogram that indicates the positions of a bunch of genes on the human genome.

Required packages:
library('ggbio')
library('GenomicRanges')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
data(hg19Ideogram, package = "biovizBase")
data(hg19IdeogramCyto, package = "biovizBase")
hg19 <- keepSeqlevels(
    hg19Ideogram,
    paste0("chr", c(1:22, "X", "Y"))
    )
hg19.cyto <- keepSeqlevels(
    hg19IdeogramCyto,
    paste0("chr", c(1:22, "X", "Y"))
    )

A) Obtain genomic coordinates for the genes
Working in R, we want to obtain the genomic coordinates of the exon boundaries for a bunch of human gene symbols. Some of the symbols are deprecated, and some have been updated, some may not have been adopted by EntrezGene etc, but hopefully none have been corrupted by Excel

There are multiple transcript definitions for each gene. So, for each gene, we are going to condense the exons into non-overlapping regions (which won't necessarily represent any specific exon).

To do the above we need
i) to convert gene symbols to entrez gene ids:
# temp - assume this has been done
gene_ids <- c('1000', '1234', '54407')

ii) pull out the exon-boundaries for all transcripts associated with the entrez-gene ids:
exons <- select(
  txdb,
  keys = gene_ids,
  keytype = 'GENEID',
  columns = c('EXONCHROM', 'EXONSTART', 'EXONEND', 'TXID', 'TXSTRAND')
  )

iii) convert the exon boundaries into genomic-ranges objects:
# TODO: use gene symbols instead of GENEID as the names for gr
gr <- GRanges(
  seqnames = Rle(exons$EXONCHROM),
  ranges = with(exons,
    IRanges(start = EXONSTART, end = EXONEND, names = GENEID)),
  strand = Rle(exons$TXSTRAND),
  seqinfo = seqinfo(hg19)
  ) 

iv) collapse the exon boundaries for individual genes and construct a genomicRangesList indexed by the genes (isn't actually needed for the karyogram):
grl <- reduce(split(gr, names(gr)))

v) convert all the above into a function
# TODO...

B) Plot a basic karyogram with the genes overlaid:
# autoplot(unlist(grl), layout = 'karyogram')
p <- ggplot(hg19.cyto) + layout_karyogram(cytoband = TRUE)
p <- p + layout_karyogram(
    gr, geom = 'rect', ylim = c(11, 21), colour = 'blue'
    ) + xlab('')
p




Thursday, 6 August 2015

d3.js book

Noted that Scott Murray's d3.js book 'Interactive Visualisation for the Web' is available for free online. I've yet to find a detailed set of online tutorials for bokeh or d3 (although dashingd3js has a useful intro), so this looks promising.

Saturday, 14 February 2015

biographr intro

This is my first blog post, and indeed my first blog. I'm a cancer genomics guy at Liverpool Uni. I thought I'd collect some notes about genomics, statistics and data visualisation here, so that I can easily find them in the future. So there'll probably be some R/python/jags stuff: I often read papers and try to work out how the authors put their graphs etc together, but when it comes to generating the same types of figures myself I've already forgotten.
.. although I'll probably never post anything beyond this ..
Russ