Tuesday, 10 May 2016

Overlaying figures on a ggplot scatterplot

This follows on from a post I made on biomathr about graph diameters and eigenvalues (here), but is really a post about getting ggplot to overlay different types of figures on top of each other.

Let's suppose you want to make a 2d scatterplot of some data (which I do). And over the top of that image, you want to put a smaller plot, or a jpeg or some other kind of annotation (which I also want to do).

There's probably a simpler solution than the one I present, which involves using arrangeGrob to approximate the required grid point (I played around with annotation_custom, but I could only add a single graph using that).

First we make a bunch of graphs on 8 vertices that are all connected and are pairwise non-isomorphic (4156 of them, if that's important to you). We are interested in how the diameter, \(d\), of the graphs (the max intervertex distance in the graph) varies with the Laplacian eigenvalues of the graphs. Specifically, how the second smallest of these eigenvals varies with diameter. Since we're only interested in connected graphs, distances and diameters are all well defined and \( \lambda_2 > 0\).

Now, we know that $$1 \leq d \leq n-1 $$ for a graph of order \(n\), and that \(d=1\) for the complete graph only and \(d=n-1\) for the path of length n-1 only. Furthermore, from Mohar, we know that $$d \geq 4 / (n \lambda_2)$$ and that $$ d \leq 2 \left \lceil{\frac{\delta_{max} + \lambda_2}{4 \lambda_2} ln(n-1)} \right \rceil, $$ though, for the 4154 graphs that are neither complete nor paths, these constraints may be no better than \(2 \leq d \leq n-2\). Here, \(\delta_{max}\) is the highest vertex degree in the graph.

For each graph, we can determine the residuals between Mohar's inequalities and the observed diameter (where * is the Mohar-predicted upper-bound):
$$ m_{upper} = \frac{1}{*} (* - d)$$

$$ m_{lower} = \frac{1}{d} (d - \frac{4}{n \lambda_2}) $$

In the following, we evaluate these functions on each graph (other than the path and complete graph) and plot the upper against the lower residual as a scatter plot using ggplot and igraph (and some functions I've described here or in biomathr)

##########
library(plyr) source('scripts/diss_funcs.R')
source('scripts/diss_script.R')

igraph_ggplot <- function(
    X,
    coord.func = layout.fruchterman.reingold,
    offset = 0.5,
    add.vertices = TRUE,
    add.labels = TRUE
    ){
  # Converts a graph into x/y coords, 
  #   then plots the graph as straight-line joined coords
  d.frame <- igraph_to_ggplot_df(X, coord.func)
  # Note that if I don't change the offset myself,
  #   arrangeGrob cuts off distal bits of my graph
  x.range <- range(d.frame$x)
  y.range <- range(d.frame$y)
  g <- ggplot(data = d.frame, 
         aes(x = x, y = y, group = element.name, label = element.label)
    ) + geom_line(
    ) + theme_nothing(
    ) + xlim(x.range[1] - offset, x.range[2] + offset
    ) + ylim(y.range[1] - offset, y.range[2] + offset)
  if(add.vertices){
    g <- g + geom_point(colour = 'darkgoldenrod1', size = 10 
      ) + geom_point(colour = 'white', size = 7)
    }
  if(add.labels){
    g <- g + geom_text(hjust = 0.5,
                  vjust = 0.5,
                  aes(colour = element.type))
    }
  g
  }


transparent_theme <- theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  axis.text.x = element_blank(), 
  axis.text.y = element_blank(),
  axis.ticks = element_blank(),
  panel.grid = element_blank(),
  axis.line = element_blank(),
  panel.background = element_rect(fill = "transparent",colour = NA),
  plot.background = element_rect(fill = "transparent",colour = NA)
  )

lambda_seq <- function(g){
  ev <- eigen(graph.laplacian(g))$values
  ev
  }

lambda_2 <- function(g){
  rev(lambda_seq(g))[2]
  }

g8 <- Filter(function(g){
  is.connected(g) && ! diameter(g) %in% c(1, 7)
  }, make_all_graphs(8))

# distributions of the diameter and eigenvalue for g8:
g8.details <- do.call('rbind', lapply(g8, function(g){
  data.frame(diam = diameter(g), L2 = lambda_2(g))
  }))
summary(as.factor(g8.details$diam))
#    2    3    4    5    6 
# 1512 2167  419   50    6 

# upper and lower bounds on the diameter, as given by Mohar
m_upper <- function(n, lam2, degmax){
  2 * ceiling((degmax + lam2) * log(n-1) / (4 * lam2))
  }

m_lower <- function(n, lam2){
  4 / (n * lam2)
  }


# What do these look like for n=8:
# We use a less strict inequality, where degmax is replaced by n-1
inequal.figure <- function(n){
  x <- seq(0, n, length.out = 1000)
  inequal.df <- data.frame(
    x     = x,
    mohar.low = m_lower(n, x),
    mohar.upp = m_upper(n, x, n-1),
    trivial.low = 2,
    trivial.upp = n-2
    )
  inequal.df$low <- apply(inequal.df[, c('mohar.low', 'trivial.low')], 1, max)
  inequal.df$upp <- apply(inequal.df[, c('mohar.upp', 'trivial.upp')], 1, min)
  gp <- ggplot(data = inequal.df) + 
    geom_line(aes(x = x, y = low), col = 'blue'                    ) +
    geom_line(aes(x = x, y = mohar.low),   col = 'blue', linetype = 'dotted') +
    geom_line(aes(x = x, y = trivial.low), col = 'blue', linetype = 'dashed') + 
    geom_line(aes(x = x, y = upp),         col = 'red'              ) +
    geom_line(aes(x = x, y = mohar.upp),   col = 'red', linetype = 'dotted') +
    geom_line(aes(x = x, y = trivial.upp), col = 'red', linetype = 'dashed') + 
    xlim(0, n) + 
    ylim(0, n) + 
    xlab(expression(lambda[2])) +
    ylab("Diameter (bound)")
  gp
  }


ifig <- inequal.figure(8) + geom_point(
  data = g8.details, aes(x = L2, y = diam)
  )


# Diameters are only possible if they lie above the solid blue line and below the solid red line (unless complete / path graph)





# There's quite a wide range of \(\lambda_2\)  values for any given diameter. What do the graphs look like that have diameter d and a range of different values of \(lambda_2\)

which.median <- function(x){
  mid <- ceiling(length(x) / 2) # 
  order(x)[mid]
  }
g8.details$idx <- 1:nrow(g8.details)

ddply(g8.details,
  .(diam),
  summarize,
  L2.min = min(L2),
  L2.med = median(L2),
  L2.max = max(L2),
  idx.min = idx[which.min(L2)],
  idx.med = idx[which.median(L2)],
  idx.max = idx[which.max(L2)]
  )

  
#   diam    L2.min    L2.med    L2.max idx.min idx.med idx.max
# 1    2 1.0000000 2.0000000 6.0000000    3755    4041    4151
# 2    3 0.3542487 0.9467493 2.3542487     162    3039    2828
# 3    4 0.2384428 0.5291319 1.0000000     140     225    1372
# 4    5 0.2022567 0.2672468 0.5107114      63     104     429

# 5    6 0.1667170 0.1930416 0.2354934       7       3      31

graph.imgs <- lapply(
  c(3755, 4041, 4151, 140, 225, 1372, 7, 3, 31),
  function(idx){
    g <- g8[[idx]]
    p <- igraph_ggplot(g, add.vertices = FALSE, add.labels = FALSE)
    p
    })
ag <- arrangeGrob(grobs = graph.imgs, nrow = 3)
grid.newpage()
grid.draw(ag)


##############################################################
Which is great, but suppose we want to plot these graphs over 
the top of a scatter plot?...

###############################################################

lam2_lower <- function(n){
  # requires n>=5 for calculation
  A <- 2 * log(n-1) / (n-4)
  2 * (n-1) * A / (1 - 2 * A)
  }
x <- seq(5, 5000, 1)
plot(x, lam2_lower(x))

m_upper_resid <- function(g){
  require(igraph)
  d  <- diameter(g)
  deg <- max(degree(g))
  L2 <- lambda_2(g)
  n  <- length(V(g))
  fit <- m_upper(n = n, lam2 = L2, degmax = deg)
  resid <- (1 / fit) * (fit - d)
  stopifnot(resid >= 0)
  resid
  }

m_lower_resid <- function(g){
  require(igraph)
  d <- diameter(g)
  L2 <- lambda_2(g)
  n <- length(V(g))
  fit <- m_lower(n = n, lam2 = L2)
  resid <- (1 / d) * (d - fit)
  stopifnot(resid >= 0)
  resid
  }

resid.df <- data.frame(
  lower = sapply(g8, m_lower_resid),
  upper = sapply(g8, m_upper_resid)
  )

p <- ggplot(resid.df, aes(x = lower, y = upper)) + geom_point(
  ) + xlim(0, 1) + ylim(0, 1) 
p # plot of residuals for upper and lower bound
# Would prefer to be able to indicate what some example graphs look like though.

# extremes of the residuals:
rows <- unique(unlist(lapply(resid.df, function(col){
  c(which.min(col), which.max(col))
  })))
resid.df[rows, ]
#          lower     upper
# 140  0.4757653 0.7777778
# 3466 0.9583333 0.0000000
# 551  0.5493984 0.8000000

graph1 <- arrangeGrob(
  igraph_ggplot(g8[[rows[1]]], add.vertices = FALSE, add.labels = FALSE
  ) + transparent_theme)
graph2 <- arrangeGrob(
  igraph_ggplot(g8[[rows[2]]], add.vertices = FALSE, add.labels = FALSE
  ) + transparent_theme)
empty <- rectGrob(gp = gpar(col = NULL))

# split the graphing region into a 4 * 5 grid
# and put the igraphs in grids that are pretty close to their scatterpoint
overlay.matrix <- matrix(
  rep(1, 4 * 5),
  nrow = 4, ncol = 5, byrow = TRUE
  )
overlay.matrix[1, 3] <- 2
overlay.matrix[4, 5] <- 3
overlay <- arrangeGrob(
  grobs = list(empty, graph1, graph2),
  layout_matrix = overlay.matrix
  )
g <- ggplotGrob(p)
g
# TableGrob (6 x 5) "layout": 8 grobs
#   z     cells       name                                grob
# 1 0 (1-6,1-5) background      rect[plot.background.rect.808]
# 2 3 (3-3,3-3)     axis-l absoluteGrob[GRID.absoluteGrob.800]
# 3 1 (4-4,3-3)     spacer                      zeroGrob[NULL]
# 4 2 (3-3,4-4)      panel               gTree[GRID.gTree.788]
# 5 4 (4-4,4-4)     axis-b absoluteGrob[GRID.absoluteGrob.794]
# 6 5 (5-5,4-4)       xlab         text[axis.title.x.text.802]
# 7 6 (3-3,2-2)       ylab         text[axis.title.y.text.804]
# 8 7 (2-2,4-4)      title           text[plot.title.text.806]
# note that 'panel' lies in grid points 
#   3-3 (t-b) and 4-4 (l-r)
# this is what we want to 'overlay' overlay onto:
g1 <- gtable_add_grob(
  g,
  grobs = overlay,
  t = 3, l = 4, b = 3, r = 4
  )
grid.newpage()
grid.draw(g1)

No comments:

Post a Comment