I often have to compare ordered lists of genes, for example, for each of two different experiments, I might have estimates of the probability that a given gene is upregulated in that dataset: the values running from 0 to 1 for genes that are likely to be down- or up-regulated.
Let's set up a visual tool for comparing gene lists that are ordered in this way. Suppose we were comparing just a single pair of experiments on 10000 genes (eg, comparing ordered lists of human genes from one experiment with their mouse paralogs in another). We could set up two horizontal panels, each containing 10000 rectangles with the i'th rectangle in experiment A corresponding to the i'th ordered gene in that experiment. If the genes have the same order in the two experiments you've ballsed something up quite royally and will be mocked forevermore.
In gene-set-enrichment analysis, an analogous horizontal panel of ordered genes is given (though only a single panel is presented). This panel sits underneath a line graph that indicates if functionally-related genes cluster together in regions of 'ordered-gene-space' (so that you can identify if a set of genes is up- or down-regulated en masse by looking at either end of the line graph).
I want to set up a figure that looks as follows:
- | | + -- Indication of which genes are upregd
(or downregd) in a given expt
||||||---------|||||| -- Ordered genes from experiment 1
| |\ X //| -- Links between corresponding genes
||||||---------|||||| -- Ordered genes from experiment 2
As always, I want to do this in ggplot2.
A) First we set up some sample data.
B) Then we work out how to plot the heatmap-type horizontal panels that correspond to the ordered genes
C) Then we work out how to connect the rectangles for a given gene from two (or more) horizontal panels using line segments (and extend to multiple genes)
D) Then we plot the linking line-segments over the top of the horizontal panels of genes
E) Then we realise it doesn't show what we wanted and think up something else instead.
--------
## A) Set up some sample data
# Simplest case: All values are distinct with no missing values
# and the same genes are present in each of the meta-analyses
genes <- c(letters, toupper(letters))
# Set up three different datasets:
geneL <- Map(
function(dname){
# Choose some value between 0.9 and 1 to use to decide if a gene is
# significant in THIS dataset
sig.level <- runif(1, 0.9, 1)
# define the signed parity volumes for a set of 52 genes
dframe <- data.frame(gene.id = genes,
spv = runif(n = 52, min = -1, max = 1))
# parity volume
# - our measure of inter-dataset similarity for a gene in a meta-analysis
dframe$pv = abs(dframe$spv)
# rank of the signed parity volumes, low SPV => low rank
dframe$srank = rank(dframe$spv)
# is the gene significant?
dframe$sig <- (dframe$pv > sig.level) * sign(dframe$spv)
# name of the current collection of datasets (to be used in ggplot)
dframe$collectn = factor(dname)
dframe
},
c('metaA', 'metaB', 'metaC')
)
geneDF <- do.call('rbind', geneL)
head(geneDF)
gene.id spv pv srank sig collectn
metaA.1 a -0.2557522 0.2557522 13 0 metaA
metaA.2 b 0.1457067 0.1457067 27 0 metaA
metaA.3 c 0.8164156 0.8164156 49 0 metaA
metaA.4 d -0.5966361 0.5966361 8 0 metaA
metaA.5 e 0.7967794 0.7967794 48 0 metaA
metaA.6 f 0.8893505 0.8893505 51 0 metaA
summary(geneDF)
gene.id spv pv srank
a : 3 Min. :-0.97384 Min. :0.0008819 Min. : 1.00
A : 3 1st Qu.:-0.37879 1st Qu.:0.2023146 1st Qu.:13.75
b : 3 Median :-0.00186 Median :0.4398626 Median :26.50
B : 3 Mean : 0.02130 Mean :0.4484313 Mean :26.50
c : 3 3rd Qu.: 0.45022 3rd Qu.:0.6607802 3rd Qu.:39.25
C : 3 Max. : 0.98537 Max. :0.9853681 Max. :52.00
(Other):138
sig collectn
Min. :-1.00000 metaA:52
1st Qu.: 0.00000 metaB:52
Median : 0.00000 metaC:52
Mean : 0.00641
3rd Qu.: 0.00000
Max. : 1.00000
--------
B) Plot out the horizontal panels using ggplot:
# Attempt to write a ggplot geom_tile based method to simply plot the ranked
# spv values
# height = 0.5 was chosen so there is space between the datasets
#
fig1 <- function(DF = geneDF){
ggplot(
data = DF, aes(x = srank, y = collectn)
) +
geom_tile(aes(fill = sig, height = 0.5)) +
scale_fill_gradient2(low = 'blue', high = 'red', mid = 'white')
}
fig1()
So each horiziontal panel contains a list of 52 genes, and between 0 and 3 of the genes have been (randomly) assigned to be significantly up- or down-regulated in the three analyses.
We want to see whether the significant genes in one analysis correspond to the significant genes in another. To do this we use geom_segment
-------
C) Then we work out how to connect the rectangles for a given gene from two (or more) horizontal panels using line segments (and extend to multiple genes), ... and ...
D) Then we plot the linking line-segments over the top of the horizontal panels of genes
# Function to obtain the midpoints of the gene-specific panels
gene_twoclass_midpoints <- function(
DF,
ref.class,
alt.class,
genes,
class.colname,
gene.colname,
rank.colname,
offset = 0
){
# returns a data frame with entries xmid.ref, ymid.ref, xmid.alt, ymid.alt
# to be used as 'data' in geom_segment (x = x.ref, xend = x.alt, etc...)
# Is the ref.class numerically lower than the alt.class (if so, ref is
# plotted below alt in the panels)
class.lvls <- levels(DF[, class.colname])
lapply(c(ref.class, alt.class),
function(classname){ stopifnot(classname %in% class.lvls) })
if(which(class.lvls == ref.class) < which(class.lvls == alt.class))
offset.multiplier <- 1
else {
offset.multiplier <- -1
}
# Vertical coordinate of the midpoints is given by the numeric value of the
# class
# Horizontal coordinate of the midpoint is given by the rank of the gene
# within the dataset
class.coords <- Map(
function(classname, multiplier){
class.rows <- which(DF[, class.colname] == classname &
DF[, gene.colname] %in% genes)
class.df <- DF[class.rows, ]
ymid <- which(class.lvls == classname)
data.frame(
gene = class.df[, gene.colname],
x = class.df[, rank.colname],
y = ymid + offset * multiplier
)
},
c(ref.class, alt.class),
c(offset.multiplier, -1 * offset.multiplier)
)
# x/y coords for the two datasets are joined based on gene id
segment.coords <- merge(
x = class.coords[[1]],
y = class.coords[[2]],
by = 'gene',
suffixes = c('.ref', '.alt')
)
}
####
fig2 <- function(DF = geneDF, ref = 'metaA', offset = 0){
# non-reference analyses:
alt.classes <- setdiff(levels(DF$collectn), ref)
# For each dataset make a plots of the genes, ranked by signed parity volume,
# with significantly diffexed genes identified in blue (downregd) or
# red(upregd):
p <- ggplot(
data = DF, aes(x = srank, y = collectn)) +
geom_tile(aes(fill = sig, height = 0.5)) +
scale_fill_gradient2(low = 'blue', high = 'red', mid = 'white')
# use geom_segment to plot lines between (x,y) and (xend,yend)
ref.sample.sig.genes <- subset(DF,
collectn == ref & sig != 0
)$gene.id
# obtain the coordinates of the mid-points for plotting the line segments
coords <- do.call('rbind', Map(
function(alt.class){
gene_twoclass_midpoints(
DF,
ref.class = ref,
alt.class = alt.class,
genes = ref.sample.sig.genes,
class.colname = 'collectn',
gene.colname = 'gene.id',
rank.colname = 'srank',
offset = offset
)
},
alt.classes
))
q <- geom_segment(data = coords,
aes(x=x.ref, y=y.ref, xend = x.alt, yend = y.alt))
p + q
}
fig2()
This looks perfectly fine, but I'd rather the links joined top-to-bottom rather than midpoint-to-midpoint. That is, if a class A is plotted below a class B, then the top point of the panel for class A should link to the bottom point of the panel for class B. To do that, I just change the offset to 0.25 (since the panel heights are 0.5).
fig2(offset = 0.25)
Potentially a blog about bioinformatics, data visualisation, statistics by russH / @dubiousGeek
Thursday, 23 June 2016
Tuesday, 10 May 2016
Overlaying figures on a ggplot scatterplot
This follows on from a post I made on biomathr about graph diameters and eigenvalues (here), but is really a post about getting ggplot to overlay different types of figures on top of each other.
Let's suppose you want to make a 2d scatterplot of some data (which I do). And over the top of that image, you want to put a smaller plot, or a jpeg or some other kind of annotation (which I also want to do).
There's probably a simpler solution than the one I present, which involves using arrangeGrob to approximate the required grid point (I played around with annotation_custom, but I could only add a single graph using that).
First we make a bunch of graphs on 8 vertices that are all connected and are pairwise non-isomorphic (4156 of them, if that's important to you). We are interested in how the diameter, \(d\), of the graphs (the max intervertex distance in the graph) varies with the Laplacian eigenvalues of the graphs. Specifically, how the second smallest of these eigenvals varies with diameter. Since we're only interested in connected graphs, distances and diameters are all well defined and \( \lambda_2 > 0\).
Now, we know that $$1 \leq d \leq n-1 $$ for a graph of order \(n\), and that \(d=1\) for the complete graph only and \(d=n-1\) for the path of length n-1 only. Furthermore, from Mohar, we know that $$d \geq 4 / (n \lambda_2)$$ and that $$ d \leq 2 \left \lceil{\frac{\delta_{max} + \lambda_2}{4 \lambda_2} ln(n-1)} \right \rceil, $$ though, for the 4154 graphs that are neither complete nor paths, these constraints may be no better than \(2 \leq d \leq n-2\). Here, \(\delta_{max}\) is the highest vertex degree in the graph.
For each graph, we can determine the residuals between Mohar's inequalities and the observed diameter (where * is the Mohar-predicted upper-bound):
$$ m_{upper} = \frac{1}{*} (* - d)$$
$$ m_{lower} = \frac{1}{d} (d - \frac{4}{n \lambda_2}) $$
In the following, we evaluate these functions on each graph (other than the path and complete graph) and plot the upper against the lower residual as a scatter plot using ggplot and igraph (and some functions I've described here or in biomathr)
##########
library(plyr) source('scripts/diss_funcs.R')
source('scripts/diss_script.R')
igraph_ggplot <- function(
X,
coord.func = layout.fruchterman.reingold,
offset = 0.5,
add.vertices = TRUE,
add.labels = TRUE
){
# Converts a graph into x/y coords,
# then plots the graph as straight-line joined coords
d.frame <- igraph_to_ggplot_df(X, coord.func)
# Note that if I don't change the offset myself,
# arrangeGrob cuts off distal bits of my graph
x.range <- range(d.frame$x)
y.range <- range(d.frame$y)
g <- ggplot(data = d.frame,
aes(x = x, y = y, group = element.name, label = element.label)
) + geom_line(
) + theme_nothing(
) + xlim(x.range[1] - offset, x.range[2] + offset
) + ylim(y.range[1] - offset, y.range[2] + offset)
if(add.vertices){
g <- g + geom_point(colour = 'darkgoldenrod1', size = 10
) + geom_point(colour = 'white', size = 7)
}
if(add.labels){
g <- g + geom_text(hjust = 0.5,
vjust = 0.5,
aes(colour = element.type))
}
g
}
transparent_theme <- theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
axis.line = element_blank(),
panel.background = element_rect(fill = "transparent",colour = NA),
plot.background = element_rect(fill = "transparent",colour = NA)
)
lambda_seq <- function(g){
ev <- eigen(graph.laplacian(g))$values
ev
}
lambda_2 <- function(g){
rev(lambda_seq(g))[2]
}
g8 <- Filter(function(g){
is.connected(g) && ! diameter(g) %in% c(1, 7)
}, make_all_graphs(8))
# distributions of the diameter and eigenvalue for g8:
g8.details <- do.call('rbind', lapply(g8, function(g){
data.frame(diam = diameter(g), L2 = lambda_2(g))
}))
summary(as.factor(g8.details$diam))
# 2 3 4 5 6
# 1512 2167 419 50 6
# upper and lower bounds on the diameter, as given by Mohar
m_upper <- function(n, lam2, degmax){
2 * ceiling((degmax + lam2) * log(n-1) / (4 * lam2))
}
m_lower <- function(n, lam2){
4 / (n * lam2)
}
# What do these look like for n=8:
# We use a less strict inequality, where degmax is replaced by n-1
inequal.figure <- function(n){
x <- seq(0, n, length.out = 1000)
inequal.df <- data.frame(
x = x,
mohar.low = m_lower(n, x),
mohar.upp = m_upper(n, x, n-1),
trivial.low = 2,
trivial.upp = n-2
)
inequal.df$low <- apply(inequal.df[, c('mohar.low', 'trivial.low')], 1, max)
inequal.df$upp <- apply(inequal.df[, c('mohar.upp', 'trivial.upp')], 1, min)
gp <- ggplot(data = inequal.df) +
geom_line(aes(x = x, y = low), col = 'blue' ) +
geom_line(aes(x = x, y = mohar.low), col = 'blue', linetype = 'dotted') +
geom_line(aes(x = x, y = trivial.low), col = 'blue', linetype = 'dashed') +
geom_line(aes(x = x, y = upp), col = 'red' ) +
geom_line(aes(x = x, y = mohar.upp), col = 'red', linetype = 'dotted') +
geom_line(aes(x = x, y = trivial.upp), col = 'red', linetype = 'dashed') +
xlim(0, n) +
ylim(0, n) +
xlab(expression(lambda[2])) +
ylab("Diameter (bound)")
gp
}
ifig <- inequal.figure(8) + geom_point(
data = g8.details, aes(x = L2, y = diam)
)
# Diameters are only possible if they lie above the solid blue line and below the solid red line (unless complete / path graph)
# There's quite a wide range of \(\lambda_2\) values for any given diameter. What do the graphs look like that have diameter d and a range of different values of \(lambda_2\)
which.median <- function(x){
mid <- ceiling(length(x) / 2) #
order(x)[mid]
}
g8.details$idx <- 1:nrow(g8.details)
ddply(g8.details,
.(diam),
summarize,
L2.min = min(L2),
L2.med = median(L2),
L2.max = max(L2),
idx.min = idx[which.min(L2)],
idx.med = idx[which.median(L2)],
idx.max = idx[which.max(L2)]
)
# diam L2.min L2.med L2.max idx.min idx.med idx.max
# 1 2 1.0000000 2.0000000 6.0000000 3755 4041 4151
# 2 3 0.3542487 0.9467493 2.3542487 162 3039 2828
# 3 4 0.2384428 0.5291319 1.0000000 140 225 1372
# 4 5 0.2022567 0.2672468 0.5107114 63 104 429
# 5 6 0.1667170 0.1930416 0.2354934 7 3 31
graph.imgs <- lapply(
c(3755, 4041, 4151, 140, 225, 1372, 7, 3, 31),
function(idx){
g <- g8[[idx]]
p <- igraph_ggplot(g, add.vertices = FALSE, add.labels = FALSE)
p
})
ag <- arrangeGrob(grobs = graph.imgs, nrow = 3)
grid.newpage()
grid.draw(ag)
##############################################################
Which is great, but suppose we want to plot these graphs over
the top of a scatter plot?...
###############################################################
lam2_lower <- function(n){
# requires n>=5 for calculation
A <- 2 * log(n-1) / (n-4)
2 * (n-1) * A / (1 - 2 * A)
}
x <- seq(5, 5000, 1)
plot(x, lam2_lower(x))
# m_upper_resid <- function(g){
require(igraph)
d <- diameter(g)
deg <- max(degree(g))
L2 <- lambda_2(g)
n <- length(V(g))
fit <- m_upper(n = n, lam2 = L2, degmax = deg)
resid <- (1 / fit) * (fit - d)
stopifnot(resid >= 0)
resid
}
m_lower_resid <- function(g){
require(igraph)
d <- diameter(g)
L2 <- lambda_2(g)
n <- length(V(g))
fit <- m_lower(n = n, lam2 = L2)
resid <- (1 / d) * (d - fit)
stopifnot(resid >= 0)
resid
}
resid.df <- data.frame(
lower = sapply(g8, m_lower_resid),
upper = sapply(g8, m_upper_resid)
)
p <- ggplot(resid.df, aes(x = lower, y = upper)) + geom_point(
) + xlim(0, 1) + ylim(0, 1)
p # plot of residuals for upper and lower bound
# Would prefer to be able to indicate what some example graphs look like though.
# extremes of the residuals:
rows <- unique(unlist(lapply(resid.df, function(col){
c(which.min(col), which.max(col))
})))
resid.df[rows, ]
# lower upper
# 140 0.4757653 0.7777778
# 3466 0.9583333 0.0000000
# 551 0.5493984 0.8000000
graph1 <- arrangeGrob(
igraph_ggplot(g8[[rows[1]]], add.vertices = FALSE, add.labels = FALSE
) + transparent_theme)
graph2 <- arrangeGrob(
igraph_ggplot(g8[[rows[2]]], add.vertices = FALSE, add.labels = FALSE
) + transparent_theme)
empty <- rectGrob(gp = gpar(col = NULL))
# split the graphing region into a 4 * 5 grid
# and put the igraphs in grids that are pretty close to their scatterpoint
overlay.matrix <- matrix(
rep(1, 4 * 5),
nrow = 4, ncol = 5, byrow = TRUE
)
overlay.matrix[1, 3] <- 2
overlay.matrix[4, 5] <- 3
overlay <- arrangeGrob(
grobs = list(empty, graph1, graph2),
layout_matrix = overlay.matrix
)
g <- ggplotGrob(p)
g
# TableGrob (6 x 5) "layout": 8 grobs
# z cells name grob
# 1 0 (1-6,1-5) background rect[plot.background.rect.808]
# 2 3 (3-3,3-3) axis-l absoluteGrob[GRID.absoluteGrob.800]
# 3 1 (4-4,3-3) spacer zeroGrob[NULL]
# 4 2 (3-3,4-4) panel gTree[GRID.gTree.788]
# 5 4 (4-4,4-4) axis-b absoluteGrob[GRID.absoluteGrob.794]
# 6 5 (5-5,4-4) xlab text[axis.title.x.text.802]
# 7 6 (3-3,2-2) ylab text[axis.title.y.text.804]
# 8 7 (2-2,4-4) title text[plot.title.text.806]
# note that 'panel' lies in grid points
# 3-3 (t-b) and 4-4 (l-r)
# this is what we want to 'overlay' overlay onto:
g1 <- gtable_add_grob(
g,
grobs = overlay,
t = 3, l = 4, b = 3, r = 4
)
grid.newpage()
grid.draw(g1)
Let's suppose you want to make a 2d scatterplot of some data (which I do). And over the top of that image, you want to put a smaller plot, or a jpeg or some other kind of annotation (which I also want to do).
There's probably a simpler solution than the one I present, which involves using arrangeGrob to approximate the required grid point (I played around with annotation_custom, but I could only add a single graph using that).
First we make a bunch of graphs on 8 vertices that are all connected and are pairwise non-isomorphic (4156 of them, if that's important to you). We are interested in how the diameter, \(d\), of the graphs (the max intervertex distance in the graph) varies with the Laplacian eigenvalues of the graphs. Specifically, how the second smallest of these eigenvals varies with diameter. Since we're only interested in connected graphs, distances and diameters are all well defined and \( \lambda_2 > 0\).
Now, we know that $$1 \leq d \leq n-1 $$ for a graph of order \(n\), and that \(d=1\) for the complete graph only and \(d=n-1\) for the path of length n-1 only. Furthermore, from Mohar, we know that $$d \geq 4 / (n \lambda_2)$$ and that $$ d \leq 2 \left \lceil{\frac{\delta_{max} + \lambda_2}{4 \lambda_2} ln(n-1)} \right \rceil, $$ though, for the 4154 graphs that are neither complete nor paths, these constraints may be no better than \(2 \leq d \leq n-2\). Here, \(\delta_{max}\) is the highest vertex degree in the graph.
For each graph, we can determine the residuals between Mohar's inequalities and the observed diameter (where * is the Mohar-predicted upper-bound):
$$ m_{upper} = \frac{1}{*} (* - d)$$
$$ m_{lower} = \frac{1}{d} (d - \frac{4}{n \lambda_2}) $$
In the following, we evaluate these functions on each graph (other than the path and complete graph) and plot the upper against the lower residual as a scatter plot using ggplot and igraph (and some functions I've described here or in biomathr)
##########
library(plyr) source('scripts/diss_funcs.R')
source('scripts/diss_script.R')
igraph_ggplot <- function(
X,
coord.func = layout.fruchterman.reingold,
offset = 0.5,
add.vertices = TRUE,
add.labels = TRUE
){
# Converts a graph into x/y coords,
# then plots the graph as straight-line joined coords
d.frame <- igraph_to_ggplot_df(X, coord.func)
# Note that if I don't change the offset myself,
# arrangeGrob cuts off distal bits of my graph
x.range <- range(d.frame$x)
y.range <- range(d.frame$y)
g <- ggplot(data = d.frame,
aes(x = x, y = y, group = element.name, label = element.label)
) + geom_line(
) + theme_nothing(
) + xlim(x.range[1] - offset, x.range[2] + offset
) + ylim(y.range[1] - offset, y.range[2] + offset)
if(add.vertices){
g <- g + geom_point(colour = 'darkgoldenrod1', size = 10
) + geom_point(colour = 'white', size = 7)
}
if(add.labels){
g <- g + geom_text(hjust = 0.5,
vjust = 0.5,
aes(colour = element.type))
}
g
}
transparent_theme <- theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
axis.line = element_blank(),
panel.background = element_rect(fill = "transparent",colour = NA),
plot.background = element_rect(fill = "transparent",colour = NA)
)
lambda_seq <- function(g){
ev <- eigen(graph.laplacian(g))$values
ev
}
lambda_2 <- function(g){
rev(lambda_seq(g))[2]
}
g8 <- Filter(function(g){
is.connected(g) && ! diameter(g) %in% c(1, 7)
}, make_all_graphs(8))
# distributions of the diameter and eigenvalue for g8:
g8.details <- do.call('rbind', lapply(g8, function(g){
data.frame(diam = diameter(g), L2 = lambda_2(g))
}))
summary(as.factor(g8.details$diam))
# 2 3 4 5 6
# 1512 2167 419 50 6
# upper and lower bounds on the diameter, as given by Mohar
m_upper <- function(n, lam2, degmax){
2 * ceiling((degmax + lam2) * log(n-1) / (4 * lam2))
}
m_lower <- function(n, lam2){
4 / (n * lam2)
}
# What do these look like for n=8:
# We use a less strict inequality, where degmax is replaced by n-1
inequal.figure <- function(n){
x <- seq(0, n, length.out = 1000)
inequal.df <- data.frame(
x = x,
mohar.low = m_lower(n, x),
mohar.upp = m_upper(n, x, n-1),
trivial.low = 2,
trivial.upp = n-2
)
inequal.df$low <- apply(inequal.df[, c('mohar.low', 'trivial.low')], 1, max)
inequal.df$upp <- apply(inequal.df[, c('mohar.upp', 'trivial.upp')], 1, min)
gp <- ggplot(data = inequal.df) +
geom_line(aes(x = x, y = low), col = 'blue' ) +
geom_line(aes(x = x, y = mohar.low), col = 'blue', linetype = 'dotted') +
geom_line(aes(x = x, y = trivial.low), col = 'blue', linetype = 'dashed') +
geom_line(aes(x = x, y = upp), col = 'red' ) +
geom_line(aes(x = x, y = mohar.upp), col = 'red', linetype = 'dotted') +
geom_line(aes(x = x, y = trivial.upp), col = 'red', linetype = 'dashed') +
xlim(0, n) +
ylim(0, n) +
xlab(expression(lambda[2])) +
ylab("Diameter (bound)")
gp
}
ifig <- inequal.figure(8) + geom_point(
data = g8.details, aes(x = L2, y = diam)
)
# Diameters are only possible if they lie above the solid blue line and below the solid red line (unless complete / path graph)
which.median <- function(x){
mid <- ceiling(length(x) / 2) #
order(x)[mid]
}
g8.details$idx <- 1:nrow(g8.details)
ddply(g8.details,
.(diam),
summarize,
L2.min = min(L2),
L2.med = median(L2),
L2.max = max(L2),
idx.min = idx[which.min(L2)],
idx.med = idx[which.median(L2)],
idx.max = idx[which.max(L2)]
)
# diam L2.min L2.med L2.max idx.min idx.med idx.max
# 1 2 1.0000000 2.0000000 6.0000000 3755 4041 4151
# 2 3 0.3542487 0.9467493 2.3542487 162 3039 2828
# 3 4 0.2384428 0.5291319 1.0000000 140 225 1372
# 4 5 0.2022567 0.2672468 0.5107114 63 104 429
# 5 6 0.1667170 0.1930416 0.2354934 7 3 31
graph.imgs <- lapply(
c(3755, 4041, 4151, 140, 225, 1372, 7, 3, 31),
function(idx){
g <- g8[[idx]]
p <- igraph_ggplot(g, add.vertices = FALSE, add.labels = FALSE)
p
})
ag <- arrangeGrob(grobs = graph.imgs, nrow = 3)
grid.newpage()
grid.draw(ag)
##############################################################
Which is great, but suppose we want to plot these graphs over
the top of a scatter plot?...
###############################################################
lam2_lower <- function(n){
# requires n>=5 for calculation
A <- 2 * log(n-1) / (n-4)
2 * (n-1) * A / (1 - 2 * A)
}
x <- seq(5, 5000, 1)
plot(x, lam2_lower(x))
# m_upper_resid <- function(g){
require(igraph)
d <- diameter(g)
deg <- max(degree(g))
L2 <- lambda_2(g)
n <- length(V(g))
fit <- m_upper(n = n, lam2 = L2, degmax = deg)
resid <- (1 / fit) * (fit - d)
stopifnot(resid >= 0)
resid
}
m_lower_resid <- function(g){
require(igraph)
d <- diameter(g)
L2 <- lambda_2(g)
n <- length(V(g))
fit <- m_lower(n = n, lam2 = L2)
resid <- (1 / d) * (d - fit)
stopifnot(resid >= 0)
resid
}
resid.df <- data.frame(
lower = sapply(g8, m_lower_resid),
upper = sapply(g8, m_upper_resid)
)
p <- ggplot(resid.df, aes(x = lower, y = upper)) + geom_point(
) + xlim(0, 1) + ylim(0, 1)
p # plot of residuals for upper and lower bound
# Would prefer to be able to indicate what some example graphs look like though.
# extremes of the residuals:
rows <- unique(unlist(lapply(resid.df, function(col){
c(which.min(col), which.max(col))
})))
resid.df[rows, ]
# lower upper
# 140 0.4757653 0.7777778
# 3466 0.9583333 0.0000000
# 551 0.5493984 0.8000000
graph1 <- arrangeGrob(
igraph_ggplot(g8[[rows[1]]], add.vertices = FALSE, add.labels = FALSE
) + transparent_theme)
graph2 <- arrangeGrob(
igraph_ggplot(g8[[rows[2]]], add.vertices = FALSE, add.labels = FALSE
) + transparent_theme)
empty <- rectGrob(gp = gpar(col = NULL))
# split the graphing region into a 4 * 5 grid
# and put the igraphs in grids that are pretty close to their scatterpoint
overlay.matrix <- matrix(
rep(1, 4 * 5),
nrow = 4, ncol = 5, byrow = TRUE
)
overlay.matrix[1, 3] <- 2
overlay.matrix[4, 5] <- 3
overlay <- arrangeGrob(
grobs = list(empty, graph1, graph2),
layout_matrix = overlay.matrix
)
g <- ggplotGrob(p)
g
# TableGrob (6 x 5) "layout": 8 grobs
# z cells name grob
# 1 0 (1-6,1-5) background rect[plot.background.rect.808]
# 2 3 (3-3,3-3) axis-l absoluteGrob[GRID.absoluteGrob.800]
# 3 1 (4-4,3-3) spacer zeroGrob[NULL]
# 4 2 (3-3,4-4) panel gTree[GRID.gTree.788]
# 5 4 (4-4,4-4) axis-b absoluteGrob[GRID.absoluteGrob.794]
# 6 5 (5-5,4-4) xlab text[axis.title.x.text.802]
# 7 6 (3-3,2-2) ylab text[axis.title.y.text.804]
# 8 7 (2-2,4-4) title text[plot.title.text.806]
# note that 'panel' lies in grid points
# 3-3 (t-b) and 4-4 (l-r)
# this is what we want to 'overlay' overlay onto:
g1 <- gtable_add_grob(
g,
grobs = overlay,
t = 3, l = 4, b = 3, r = 4
)
grid.newpage()
grid.draw(g1)
Thursday, 21 April 2016
Multipanel ggplot with subfigure labels (and drawing igraph structures with ggplot2)
I want to make some multipanelled plots in R where, for a given figure:
base_size = 12, base_family = "Helvetica"
){
# blank plotting background for ggplot2
# - for removing all axes etc when plotting igraphs in ggplot2
# - copied from ggplot2 docs
theme_bw(base_size = base_size, base_family = base_family
) %+replace% theme(
rect = element_blank(),
line = element_blank(),
text = element_blank(),
legend.position = 'none'
)
}
labelGrob <- function(label){
- each of the subfigures is a ggplot (or other grid::grob object)
- each of the subfigures is given a label in the top-left hand corner
I've been using ggplot to draw graphs (the graph theory kind). It isn't quite set up to do this and indeed, igraph has perfectly good graph-drawing functions. They just don't return grobs (neither does sna::gplot), so they are a bit limited to my mind. Hence, I've set up a couple of simple functions to make ggplot figures out of igraph structures using the igraph::layout functions
Here's an example figure from my current dissertation. We first set up a theme to remove the weird default background for ggplot figures, import some libraries and define a function that returns a label in a rectangle.
library(igraph)
library(grid)
library(gridExtra)
library(ggplot2)
theme_nothing <- function(base_size = 12, base_family = "Helvetica"
){
# blank plotting background for ggplot2
# - for removing all axes etc when plotting igraphs in ggplot2
# - copied from ggplot2 docs
theme_bw(base_size = base_size, base_family = base_family
) %+replace% theme(
rect = element_blank(),
line = element_blank(),
text = element_blank(),
legend.position = 'none'
)
}
labelGrob <- function(label){
# function to generate subfigure labelsrequire(grid) require(gridExtra) textGrob(label, y = unit(0.9, 'npc')) }Then we define a function to convert an igraph object into a dataframe of x/y coords so that it can be plotted in ggplot.igraph_to_ggplot_df <- function( X, coord.func = layout.fruchterman.reingold ){ # Makes a dataframe of vertex / edge positions from an igraph # object stopifnot(is(X, "igraph"))# Give the vertices some names, if they don't already have them # then use the coord.func to define X/Y positions for each vertex v.names <- names(V(X)) if(is.null(v.names)){v.names <- paste0('v', 1:length(V(X)))} v.coords <- scale(coord.func(X)) dimnames(v.coords) <- list(v.names, c('x', 'y')) # Obtain the edges for the graph, give them names (if they don't # already have them) and then associate the start and end point # of each edge with the X/Y coords of the relevant vertex e.list <- get.edgelist(X) stopifnot(nrow(e.list) > 0) e.names <- names(E(X)) if(is.null(e.names)){ # assume the graph isn't null e.names <- paste0('e', 1:nrow(e.list)) } dimnames(e.list) <- list(e.names, c('head', 'tail')) stopifnot(length(intersect(v.names, e.names)) == 0) # convert the edge / vertex coords into a single dataframe for use in ggplot edge_df <- data.frame( element.name = rep(e.names, 2), element.label = '', element.type = 'edge', x = v.coords[as.vector(e.list[, c('head', 'tail')]), 'x'], y = v.coords[as.vector(e.list[, c('head', 'tail')]), 'y'] ) vertex_df <- data.frame( element.name = rownames(v.coords), element.label = rownames(v.coords), element.type = 'vertex', x = v.coords[, 'x'], y = v.coords[, 'y'] ) graph_df <- rbind(edge_df, vertex_df) graph_df }Then we define a function to turn igraph objects into ggplot graphical objects, so that these can be arranged and plotted out using grid.igraph_ggplot <- function( X, coord.func = layout.fruchterman.reingold, offset = 0.5 ){ # Converts a graph into x/y coords, # then plots the graph as straight-line joined coords d.frame <- igraph_to_ggplot_df(X, coord.func) # Note that if I don't change the offset myself, # arrangeGrob cuts off distal bits of my graph x.range <- range(d.frame$x) y.range <- range(d.frame$y) g <- ggplot(data = d.frame, aes(x = x, y = y, group = element.name, label = element.label) ) + geom_line( ) + geom_point(colour = 'darkgoldenrod1', size = 10 ) + geom_point(colour = 'white', size = 7 ) + geom_text(hjust = 0.5, vjust = 0.5, aes(colour = element.type) ) + theme_nothing( ) + xlim(x.range[1] - offset, x.range[2] + offset ) + ylim(y.range[1] - offset, y.range[2] + offset) g }This is just an example of a figure generated with the above code. We define the incidence matrix of a graph, and plot out the incidence, adjacency and Laplacian matrices of that graph (as tables) and a picture of the graph itself. The labels are added using arrangeGrob and labelGrob: I define an arrangement of 8 panels (2rows x 4 columns). The labels are placed into the first and third columns (which are 1/4 the width of the second and fourth columns.) fig1_2_1 <- function(){# Define a graph using it's incidence matrix, then plot # Signed incidence matrix D <- matrix(c(1, 1, 1, 0, -1,0, 0, 1, 0,-1, 0,-1, 0, 0,-1, 0, 0, 0, 0, 0 ), byrow = TRUE, nrow = 5, dimnames = list(paste0('v', 1:5), paste0('e', 1:4))) B <- abs(D) # Unsigned incidence matrix Q <- D %*% t(D) # Laplacian deg <- diag(Q) # Degree vector A <- diag(deg) - Q # Adjacency matrix X <- graph_from_adjacency_matrix( A, mode = 'undirected' ) g1 <- tableGrob(A) g2 <- tableGrob(D) g3 <- igraph_ggplot(X, offset = 0.25) g4 <- tableGrob(Q) ag <- arrangeGrob(labelGrob('a'), g1, labelGrob('b'), g2, labelGrob('c'), g3, labelGrob('d'), g4, nrow = 2, widths = c(1,4,1,4)) grid.draw(ag) NULL }------Voila!---------# TODO - show how to do the same thing with grid.draw(a); grid.draw(b) where 'a' contains figures and 'b' contains panel labels# TODO - eugh! work out why blogger is making my code/paragraphs so ugly?
Subscribe to:
Comments (Atom)







