Exercise 1 - Customising simple plots
Plotting a baby’s growth rate.
read.delim("weight_chart.txt") -> weight.chart
weight.chart
We’re going to use a normal scatterplot to plot the two values against each other. We’ll add a bunch of customisation by modifying the parameters we pass.
plot(
weight.chart$Age,
weight.chart$Weight,
type="b",
pch=15,
cex=1.5,
lwd=2,
ylim=c(2,10),
xlab="Age (months)",
ylab="Weight (kg)",
main="Weigh gain during early infant development"
)
Now we can read in the feature counts data to plot a barplot of the counts for different types of features.
read.delim("feature_counts.txt") -> feature.counts
feature.counts
We’re going to draw a customised barplot from this data.
par(mar=c(5,12,4,2))
barplot(
feature.counts$Count,
horiz = TRUE,
xlab="Number of feature instances",
names.arg=feature.counts$Feature,
main="Number of different feature types found",
las=1
)
Finally we want to make a biased random dataset which we can plot out as a histogram.
hist.data <- c(rnorm(10000),rnorm(10000)+4)
hist(hist.data,breaks=60,main="Bimodal data")
Exercise 2 - Using colour
We want to plot out the male/female count data as a barplot and change the colour in different ways.
read.delim("male_female_counts.txt") -> male.female
male.female
Now we can plot this. The barplot just needs the count data, and we’ll specifically need to add the sample names as labels. We need to make the labels small so they all fit (we could also have turned the plot around the other way).
barplot(
male.female$Count,
names.arg = male.female$Sample,
cex.names = 0.5
)
To add a different colour to each point (which isn’t generally a great idea!) we can use the rainbow function to generate a set of colours. We simply need to tell the function how many colours to generate, which in this case is the number of rows in the data frame.
rainbow(nrow(male.female))
[1] "#FF0000FF" "#FF9900FF" "#CCFF00FF" "#33FF00FF" "#00FF66FF" "#00FFFFFF" "#0066FFFF"
[8] "#3300FFFF" "#CC00FFFF" "#FF0099FF"
We can then pass this vector as the col paramter when drawing the barplot.
barplot(
male.female$Count,
names.arg = male.female$Sample,
cex.names = 0.5,
col=rainbow(nrow(male.female))
)
Now we want to make the males and females different colours. We can’t just pass the Sample column as colour since all of the sample names are different (even though they contain Male and Female). In this specific case, because males and females alternate we can just pass a fixed 2 colour vector.
barplot(
male.female$Count,
names.arg = male.female$Sample,
cex.names = 0.5,
col=c("blue2","red2")
)
If we’d wanted a more generic solution then we’d need to use some of the text mainipulation techniques shown in the advanced R course, specifically we can remove the D1 etc from the start of each string to leave us with a Male/Female split which we can use as a colour directly.
substring(male.female$Sample,4)
[1] "Male" "Female" "Male" "Female" "Male" "Female" "Male" "Female" "Male" "Female"
barplot(
male.female$Count,
names.arg = male.female$Sample,
cex.names = 0.5,
col=as.factor(substring(male.female$Sample,4))
)
We next want to use this type of categorical colour definition to highlight specific points in a scatteprlot. We can start by reading in the data.
read.delim("up_down_expression.txt") -> up.down
up.down
You can see that the data to plot are the Condition1 and Condition2 columns, and the categories are in the State column.
We can start by doing a simple uncoloured plot.
plot(
up.down$Condition1,
up.down$Condition2,
pch=19
)
Now we can colour by the State column.
plot(
up.down$Condition1,
up.down$Condition2,
pch=19,
col=up.down$State
)
We can see that 3 different groups have been coloured, but we need to understand how. We can figure this out by looking at the levels of the state column and the colour vector contained in the current system palette.
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
We can see that there is a direct mapping between the two vectors where down=black (position 1), unchanging=red and up=green. If we want to use our own colours then we need to set the palette to our colour choice in the order down,unchanging,up. In this case I’ve also added a legend so we can see which colour is which.
palette(c("blue2","grey","red2"))
plot(
up.down$Condition1,
up.down$Condition2,
pch=19,
col=up.down$State
)
legend("topleft",levels(up.down$State),fill=palette()[1:3])
Finally we are going to look at the use of quantitative colour. We’re going to use this to plot 3 variables on a single plot.
We can load the data first.
read.delim("expression_methylation.txt") -> expr.meth
expr.meth
We want to plot the promoter methylation against the gene methylation and colour by the expression. We can start by doing the uncoloured plot.
plot(
expr.meth$promoter.meth,
expr.meth$gene.meth,
pch=19
)
For the colouring we’re going to use a new function which has been provided to us. We can install this into the script first.
map.colours <- function (values,palette) {
range <- range(values)
proportion <- ((values-range[1])/(range[2]-range[1]))
index <- round ((length(palette)-1)*proportion)+1
return (palette[index])
}
This function will generate a vector of colours to match a quantitative vector we provide (the expression values in this case). We can then pass this as the colour vector to the plot and thereby colour the whole thing.
We need two pieces of data to run this - a vector of values (the expression column from our data frame), and a vector of colours in some kind of order from which to select.
To make our reference colour vector we’re going to make a colourRampPallete. This is a built in function for generating colour series.
ColourRampPalette needs a vector of reference colours and it will interpolte between these. In our plot we’re just going to run between grey and red.
The initial call to colorRampPalette returns us a function which can generate sets of colours.
colorRampPalette(c("grey","red2"))
function (n)
{
x <- ramp(seq.int(0, 1, length.out = n))
if (ncol(x) == 4L)
rgb(x[, 1L], x[, 2L], x[, 3L], x[, 4L], maxColorValue = 255)
else rgb(x[, 1L], x[, 2L], x[, 3L], maxColorValue = 255)
}
<bytecode: 0x00000000088e4a08>
<environment: 0x0000000008da55b0>
To generate the colour vector we need to then call this function, passing the number of colours to generate. We’ll make 100 colours this time.
colorRampPalette(c("grey","red2"))(100)
[1] "#BEBEBE" "#BEBCBC" "#BEBABA" "#BFB8B8" "#BFB6B6" "#C0B4B4" "#C0B2B2" "#C1B0B0" "#C1AEAE"
[10] "#C2ACAC" "#C2AAAA" "#C3A8A8" "#C3A6A6" "#C4A5A5" "#C4A3A3" "#C5A1A1" "#C59F9F" "#C69D9D"
[19] "#C69B9B" "#C79999" "#C79797" "#C89595" "#C89393" "#C99191" "#C98F8F" "#CA8E8E" "#CA8C8C"
[28] "#CB8A8A" "#CB8888" "#CC8686" "#CC8484" "#CD8282" "#CD8080" "#CE7E7E" "#CE7C7C" "#CE7A7A"
[37] "#CF7878" "#CF7676" "#D07575" "#D07373" "#D17171" "#D16F6F" "#D26D6D" "#D26B6B" "#D36969"
[46] "#D36767" "#D46565" "#D46363" "#D56161" "#D55F5F" "#D65E5E" "#D65C5C" "#D75A5A" "#D75858"
[55] "#D85656" "#D85454" "#D95252" "#D95050" "#DA4E4E" "#DA4C4C" "#DB4A4A" "#DB4848" "#DC4747"
[64] "#DC4545" "#DD4343" "#DD4141" "#DE3F3F" "#DE3D3D" "#DE3B3B" "#DF3939" "#DF3737" "#E03535"
[73] "#E03333" "#E13131" "#E12F2F" "#E22E2E" "#E22C2C" "#E32A2A" "#E32828" "#E42626" "#E42424"
[82] "#E52222" "#E52020" "#E61E1E" "#E61C1C" "#E71A1A" "#E71818" "#E81717" "#E81515" "#E91313"
[91] "#E91111" "#EA0F0F" "#EA0D0D" "#EB0B0B" "#EB0909" "#EC0707" "#EC0505" "#ED0303" "#ED0101"
[100] "#EE0000"
We’ve now got everything we need to call the map.colours function. This will generate a (large!) colour vector with all of the per-row colours we need for the plot.
head(custom.colours)
[1] "#D85656" "#DD4141" "#DE3D3D" "#D85454" "#DA4C4C" "#DF3737"
Finally,we can re-draw out plot using these custom colours to colour each point.
plot(
expr.meth$promoter.meth,
expr.meth$gene.meth,
pch=19,
col=custom.colours
)
Exercise 3 - Using overlays
We want to draw a line graph containing 3 lines for 3 different datasets. We’re going to do this using a base line plot for one of the datasets and then two overlays to add the other two.
read.delim("chromosome_position_data.txt") -> chr.pos
chr.pos
The 3 datasets we’re going to plot are WT, Mut1 and Mut2 and we’re going to plot them on the y axis agaist the position on the x axis.
We need to prepare a few things to make this work.
Firstly we need to get the full range of values in any of mut1, mut2 or wt so that we know we’ve got enough space on the y-axis to fit everything in.
max.value
[1] 68.15
We’re also going to use the RColourBrewer palette to provide the colours we’re going to use. We’ll take the first 3 colours in Set1.
library(RColorBrewer)
brewer.pal(3,"Set1") -> line.graph.colours
line.graph.colours
[1] "#E41A1C" "#377EB8" "#4DAF4A"
Now we can start to build the plot. We’ll start with the base plot which will be for the WT. Then we can add lines layers for the other two. Finally we’ll add a legend so we know what’s what.
plot(
chr.pos$Position,
chr.pos$WT,
type="l",
lwd=2,
col=line.graph.colours[1],
ylim=c(0,max.value),
las=1,
xlab="Chromosomal Position",
ylab="Value",
main="Values along a chromosome"
)
lines(chr.pos$Position,chr.pos$Mut1,lwd=2,col=line.graph.colours[2])
lines(chr.pos$Position,chr.pos$Mut2,lwd=2,col=line.graph.colours[3])
legend("topleft",c("WT","Mut1","Mut2"),fill=line.graph.colours)
In the brain/bodyweight data we want to plot a scatterplot of two sets of values, but we want to add to that some error bars using the arrows function, and some text labels.
read.delim("brain_bodyweight.txt") -> brain.body
brain.body
We can do a simple plot of the two weights to check that looks OK.
plot(
brain.body$Brainweight,
brain.body$Bodyweight,
pch=19
)
For the error bars we’re going to use the arrows function. The coordinates will be the weights plus and minus the SEMs. We’ll also use the brain/bodyweights to position the text labels.
plot(
brain.body$Brainweight,
brain.body$Bodyweight,
pch=19,
xlab="Brainweight",
ylab="Bodyweight"
)
# Brain SEM
arrows(
x0 = brain.body$Brainweight - brain.body$Brainweight.SEM,
y0 = brain.body$Bodyweight,
x1 = brain.body$Brainweight + brain.body$Brainweight.SEM,
y1 = brain.body$Bodyweight,
angle=90,
code = 3,
length=0.05
)
# Body SEM
arrows(
x0 = brain.body$Brainweight,
y0 = brain.body$Bodyweight - brain.body$Bodyweight.SEM,
x1 = brain.body$Brainweight,
y1 = brain.body$Bodyweight + brain.body$Bodyweight.SEM,
angle=90,
code = 3,
length=0.05
)
# Names
text(
brain.body$Brainweight,
brain.body$Bodyweight,
brain.body$Species,
pos = 1,
cex=0.5
)
