Karyograms can be made quite easily in R using ggbio (see docs; the code below is based on section 8.4 thereof)
I want to make a karyogram that indicates the positions of a bunch of genes on the human genome.
Required packages:
library('ggbio')
library('GenomicRanges')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
data(hg19Ideogram, package = "biovizBase")
data(hg19IdeogramCyto, package = "biovizBase")
hg19 <- keepSeqlevels(
hg19Ideogram,
paste0("chr", c(1:22, "X", "Y"))
)
hg19.cyto <- keepSeqlevels(
hg19IdeogramCyto,
paste0("chr", c(1:22, "X", "Y"))
)
A) Obtain genomic coordinates for the genes
Working in R, we want to obtain the genomic coordinates of the exon boundaries for a bunch of human gene symbols. Some of the symbols are deprecated, and some have been updated, some may not have been adopted by EntrezGene etc, but hopefully none have been corrupted by Excel
There are multiple transcript definitions for each gene. So, for each gene, we are going to condense the exons into non-overlapping regions (which won't necessarily represent any specific exon).
To do the above we need
i) to convert gene symbols to entrez gene ids:
# temp - assume this has been done
gene_ids <- c('1000', '1234', '54407')
ii) pull out the exon-boundaries for all transcripts associated with the entrez-gene ids:
exons <- select(
txdb,
keys = gene_ids,
keytype = 'GENEID',
columns = c('EXONCHROM', 'EXONSTART', 'EXONEND', 'TXID', 'TXSTRAND')
)
iii) convert the exon boundaries into genomic-ranges objects:
# TODO: use gene symbols instead of GENEID as the names for gr
gr <- GRanges(
seqnames = Rle(exons$EXONCHROM),
ranges = with(exons,
IRanges(start = EXONSTART, end = EXONEND, names = GENEID)),
strand = Rle(exons$TXSTRAND),
seqinfo = seqinfo(hg19)
)
iv) collapse the exon boundaries for individual genes and construct a genomicRangesList indexed by the genes (isn't actually needed for the karyogram):
grl <- reduce(split(gr, names(gr)))
v) convert all the above into a function
# TODO...
B) Plot a basic karyogram with the genes overlaid:
# autoplot(unlist(grl), layout = 'karyogram')
p <- ggplot(hg19.cyto) + layout_karyogram(cytoband = TRUE)
p <- p + layout_karyogram(
gr, geom = 'rect', ylim = c(11, 21), colour = 'blue'
) + xlab('')
p
