Babraham Bioinformatics - Training Courses

Training Courses

As part of its work with the Babraham Institute, the Bioinformatics group runs a regular series of training courses on many aspects of bioinformatics.

These courses are run regularly on the Babraham site but we are also able to come out and present them on other sites and also deliver them remotely. You can see the list of current Babraham dates which are available, and you can contact us to discuss options for running courses on your site.

You can also sign up to our mailing list to get the latest training news delivered direct to your inbox every couple of months.

Where possible we also aim to make the material from our courses publicly available so that anyone who wants to can download them for their own use.

Below is a list of the courses we currently run. Where they are available there is a link to the training manual and course exercises.

Introduction to Shiny
Using R Notebooks
Writing R packages
Using git and GitHub with RStudio

Python Courses
Perl Courses
- Learning to Program with Perl
Unix Courses
- An Introduction to Unix
Machine learning
- An Introduction to Machine Learning

Statistics

Modular Statistics courses using R

Statistical Analysis using R One Day
Statistical Analysis using R Bootcamp
Modules

Power analysis: Sample size estimation
Descriptive statistics and Data exploration
Analysis of quantitative data - Introduction
Analysis of quantitative data - Student's t-test
Analysis of quantitative data - One-way and Two-way ANOVA
Analysis of quantitative data - Linear Regression
Introduction to Linear modelling
Analysis of qualitative data - Non parametric statistics
Analysis of qualitative data

Modular statistics courses using GraphPad Prism

Statistical Analysis using GraphPad Prism One day
Statistical Analysis using GraphPad Prism Bootcamp
Modules

Power analysis: Sample size estimation
Descriptive statistics and Data exploration
Analysis of quantitative data - Introduction
Analysis of quantitative data - Student's t-test
Analysis of quantitative data - One-way and Two-way ANOVA
Analysis of quantitative data - Linear Regression
Introduction to Linear modelling
Analysis of qualitative data - Non parametric statistics
Analysis of qualitative data

Application focussed courses

Data Resources

An Introduction to Biological Big Data

Next Generation Sequencing

Quality control in Sequencing Experiments
Analysing Mapped Sequence Data with SeqMonk
RNA-Seq Analysis
10X Single Cell RNA-Seq Analysis
ChIP-Seq Analysis
Analysing bisulfite methylation sequence data

Proteomics

An Introduction to Proteomics

Modelling

An Introduction to Mathematical Modelling

Interpretation and Presentation

Extracting biological information from gene lists
Scientific Figure Design
Research Integrity: How To Be A Good Scientist
An Introduction to Using OneNote as a Laboratory Notebook

Comprehensive longer Bootcamp courses

Introduction to NGS Analysis for Biologists bootcamp
Introduction to Linux bootcamp
Introduction to R for Biologists bootcamp
Statistics bootcamp using R
Statsitics bootcamp using GraphPad Prism

Analysing Mapped Sequence Data with SeqMonk (Half day)

SeqMonk is a program which can analyse large data sets of mapped genomic positions. It is most commonly used to work with data coming from high-throughput sequencing pipelines.

The program allows you to view your reads against an annotated genome and to quantitate and filter your data to let you identify regions of interest. It is a friendly way to explore and analyse very large datasets.

This course provides an introduction to the main features of SeqMonk and will run through the analysis of a couple of different datasets to show what sort of analysis options it provides.

Course content

What is SeqMonk
Installing and configuring the program
Creating a project and importing data
Using the chromosome viewer
Quantitating and Filtering Data
Creating Reports
Exporting text and graphics

Course Material:

Course Data (zip) [2.6GB]

Statistical Analysis using R (One day)

Statistics are an important part of most modern studies and being able to effectively use a statistics package can help you to understand your results. This course provides an introduction to statistics illustrated though the use of the R language.

Course Content:

Introduction to Power Analysis
Qualitative and Quantitative Data Exploration
Graphical representations
Chi-square, Fisher's exact test, T-Test, ANOVA and correlation
Choosing an appropriate analysis
Interpreting analysis output

Course Material:

Statistical Analysis using GraphPad Prism (One day)

GraphPad Prism is a powerful and friendly package which allows you to plot and analyse your data. This course acts not only as an introduction to Prism, but also goes through the basic statistical knowledge which should allow you to make the most of your data.

Course Content:

Introduction to GraphPad Prism
Getting to know your data
Graphical representations
Choosing an appropriate analysis
Interpreting analysis output

Course Material:

Statistics bootcamp using R (3 days)

A more in depth look at statistical analyses using R.

Prerequisite: Introduction to R with Tidyverse (1 day)

Course Content Modules:

Power analysis: Sample size estimation
Descriptive statistics and Data exploration
Analysis of quantitative data - Introduction
Analysis of quantitative data - Student's t-test
Analysis of quantitative data - One-way and Two-way ANOVA
Analysis of quantitative data - Linear Regression
Introduction to Linear modelling
Analysis of quantitative data - Non parametric statistics
Analysis of qualitative data

Course Material modules:

Course Data (zip)

Statistics bootcamp using GraphPad Prism (2.5 days)

A more in depth look at statistical analyses using GraphPad Prism

Course Content modules:

Power analysis: Sample size estimation
Descriptive statistics and Data exploration
Analysis of quantitative data - Introduction
Analysis of quantitative data - Student's t-test
Analysis of quantitative data - One-way and Two-way ANOVA
Analysis of quantitative data - Linear and Non Linear relationship
Introduction to Linear modelling
Analysis of quantitative data - Non parametric statistics
Analysis of qualitative data
Survival Analysis

Course Material:

Course Data (zip)

Learning to Program with Perl (6 x 1.5 hour sessions)

For a long time, Perl has been a popular language among those starting out with programming. Although it is a powerful language, many of its features make it especially suited to first time programmers as it reduces the complexity found in many other languages. Perl is also one of the world's most popular languages which means there are a huge number of resources available to anyone setting out to learn it.

This course aims to introduce the basic features of the Perl language. At the end you should have everything you need to write moderately complicated programs, and enough pointers to other resources to get you started on bigger projects. The course tries to provide a grounding in the basic theory you'll need to write programs in any language, as well as an appreciation for the right way to do things in Perl.

Course Content:

Getting Started with Perl
Conditions, Arrays, Hashes and Loops
File Handling
Regular Expressions
Subroutines, References and Complex Data Structures
Perl Modules
Interacting with External Programs
Cross Platform Issues and Compiling

Course Material:

Introduction to Python (2 day, or 4 half-day bootcamp)

Python has established itself as one of the most commonly used programming languages. It is a very powerful language, which makes it relatively easy to write programs from simple automation scripts to more fully featured applications. In bioinformatics python has become widely used both as a language to write scripts and applications, but also, via packages like pandas, numpy and seaborn as an environment for data analysis, competing with more focussed languages such as R. In this course we focus on the use of python to develop simple scripts and larger applications. These can be used for simple data processing and aggreagation, for automating repeated tasks or to write larger user-facing command line programs. We start from the ground up, and make no assumption of any previous programming experience.

Course Content:

Setting up your python environment
Variables and Data Types
Functions and Methods
Python data structures
Iterators, Loops and Conditional Statements
Text Processing
Reading and Writing Files
Writing Functions and Larger Scripts
Using external resources

Course Material:

Advanced Python (2 day bootcamp)

In recent years, the programming language Python has become ever more popular in the bioinformatics and computational biology communities and indeed, learning this language marks many people's first introduction to writing code. This success of Python is due to a number of factors. Perhaps most importantly for a beginner, Python is relatively easy to use, being what we term a "high-level" programming language. Don't let this terminology confuse you however: "high-level" simply means that much of the computational tasks are managed for you, enabling you to write shorter and simpler code to get your jobs done.

This course builds on the basic features of Python3 introduced in the Introdcution to Python course. At the end of this course you should be able to write moderately complicated programs, and be aware of additional resources and wider capabilities of the language to undertake more substantial projects. The course tries to provide a grounding in the basic theory you'll need to write programs in any language as well as an appreciation of the right way to do things in Python.

Course Content:

More code structuring with Iterators
Write more elegant code with Python Comprehensions
Python Generators create data
Python scoping and exception handling
Using modules
Analysing text with Regular Expressions
Introduction to Object Oriented Programming

Course Material:

Understanding Object Oriented Python (One day)

A strength of Python and a feature that makes this language attractive to so many, is that Python is what is known as an object-oriented programming language (OOP).

This is a short course that introduces the basic concepts of OOP. It then goes into more detail explaining how to build and manipulate objects. While this course does not provide an exhaustive discussion of OOP in Python, by the end of the course attendees should be able to build sophisticated objects to aid analysis and research.

Course Content:

Introducing Object Oriented Programming
Creating objects and classes
Structuring objects
Using Inheritance to write succinct code

Course Material:

Introduction to R with Tidyverse (One day)

R is a popular language and environment that allows powerful and fast manipulation of data, offering many statistical and graphical options. This course aims to introduce R as a tool for statistics and graphics, with the main aim being to become comfortable with the R environment. As well as introducing core R language concepts this course also provides the basics of using the Tidyverse for data maniupulation, and ggplot for plotting. It will focus on entering and manipulating data in R and producing simple graphs. A few functions for basic statistics will be briefly introduced, but statistical functions will not be covered in detail.

Course Content:

What is R
Getting familiar with the R console
Entering Data
Manipulating data
Importing data files
Creating Graphs (scatterplots, line graphs, line graphs, histograms and density plots)

Course Material:

Post-Course Material:

Introduction to Core R (Half a day)

R is a popular language and environment that allows powerful and fast manipulation of data, offering many statistical and graphical options. This course aims to introduce R as a tool for statistics and graphics, with the main aim being to become comfortable with the R environment. It will focus on entering and manipulating data in R and producing simple graphs. A few functions for basic statistics will be briefly introduced, but statistical functions will not be covered in detail.

Course Content:

What is R
Getting familiar with the R console
Entering Data
Manipulating data
Importing data files
Creating Graphs (boxplots, barplots, scatterplots, line graphs)

Course Material:

Advanced Core R (Half a day)

This course follows on from the introductory course. It goes into more detail on practical guides to filtering and combining complex data sets. It also looks at other core R concepts such as looping with apply statements and using packages. Finally, it looks at how to document your R analyses and generate complete analysis reports.

Course Content:

Filtering and selection review
Text manipulation
Merging large datasets
Looping
Using and writing functions
R packages
Documenting your analysis

Course Material:

Plotting complex figures with Core R (Half a day)

This course is a comprehensive guide to the use of the built-in R plotting functionality to construct everything from customised simple plots to complex multi-layered figures. It follows on from the material in our introductory R course and participants are expected to have a basic understanding of R - enough to load and do basic manipulation of datasets.

Course Content:

The R painters model
Core graph types and options
Plot area customisation
Using colour in plots
Adding plot overlays
Useful extension packages
Writing plots to files

Course Material:

Advanced R with Tidyverse (One day)

The 'Tidyverse' is a set of add-in R packages for data loading, modelling, manipulation and plotting. It is an attempt to make data analysis and plotting cleaner, simpler and more consistent by addressing some poor design decisions in the original language.

This course follows on from our Introduction to R with tidyverse and focusses on the manipulation and restructuring of data using the tidyverse packages. The course shows how to do complex transformations on large data structures and how to deal efficiently with data which is both large and sometimes not well behaved.

Course Content:

Reading in data and dealing with problems
Advanced filtering and selections
Restructuring data into 'tidy' format
Mutating, grouping and summarising data
Merging datasets together
Using custom functions

Course Material:

Using R Notebooks (Half day)

This course is designed for people who are already familiar with R and are ready for a more integrated way to perform and report their analyses. It will show the use of R Notebooks for interactive analysis and then demonstrate how to apply this to the production of complete reports.

Course Content:

The structure of R Notebooks
Using Markdown to format text
Controlling and customising R code blocks
Customising the appearance of your document
Automated notebook compilation

Course Material:

R Notebook Data (zip) [1.8MB]

Plotting figures with ggplot (One day)

This course is normally taught as part of the R with Tidyverse bootcamp. Ggplot is the most popular plotting extension to R and replicates many of the graph types found in the core plotting libraries. This course provides an introduction to the ggplot2 libraries and gives a practical guide for how to use these to create different types of graphs.

Course Content:

How ggplot2 works
Plotting different graph types
Changing annotation, scaling and colours
Adding statistical summaries and other overlays
Faceting and highlighting
Saving plots

Course Material:

Writing R Packages (One day)

R packages are the best way to create robust re-usable code, either for internal use or for sharing with the wider community. In this course we will look at how to write functions which are robust for use by others. We will then go through the process of authoring function based R packages with the help of the recommended development tools.

Course Content:

Developing robust functions
Setting up git based package sources
Adapting function code for a package
Writing help files and vignettes
Writing a test suite
Installing the finished package

Course Material:

Course data (zip)

Introduction to Shiny (One day)

Shiny is an R package that enables interactive web applications to be built using R. They are a great way of allowing users to explore a dataset and make use of the graphical and statistical functionality of R without having to write any code.

Course Content:

This course is a combination of talks and practical exercises. It covers the concepts required to create a functioning Shiny application including:

Layouts
Inputs
Outputs
Reactivity

To write Shiny applications you should be comfortable with using R. It is recommended that students should have completed Introductory and Advanced R courses (core or tidyverse) before attending this course.

Course Material:

Using git and GitHub with RStudio (2 hours)

RStudio has embedded tools to facilitate the use of git with RProjects. This short course explores this functionality.

Course Content:

This course is a combination of talks and practical exercises and covers the following:

Version control theory
Using git with RProjects
Using GitHub as a remote repository

Course Material:

An Introduction to Unix (Half a day)

Increasing amounts of bioinformatics work is done in a command line unix environment. Most large scale processing applications are written for unix and most large scale compute environments are also based on this.

This course provides an introduction to the concepts of unix and provides a practical introduction to working in this environment. Internally we link this course to a more specific course illustrating the use of our internal cluster environment and this part of the course could be adapted for other sites with different compute infrastructure

Course Content:

Unix commands
Files and Directories
Viewing, Creating, Copying, Moving and Deleting Files
Pipes and Loops

Course Material:

Course Slides (pdf)
Course Slides (pptx)
Course Exercises (pdf)
Course Exercises (docx)
Unix cheat sheet (pdf) (External content from Fosswire)
Course Data (tar.gz) [5MB]

An Introduction to Machine Learning (One day)

This course provides a theoretical and practical introduction to the use of machine learning on biological datasets. For the final section of the course we will introduce the tidymodels framework for machine learning in R, so it will be helpful to have attended our introductory and advanced R courses, or to have had equivalent experience, although this is not a prerequisite to attend the course.

Course Content:

What is machine learning?
Different types of machine learning model
Evaluating models
Preparing input data
Running simple models in tidymodels
Automation with recipes and worklows

Course Material

Analysing bisulfite methylation sequencing data (One day)

This course builds on the core skills introduced in the Introduction to R, Introduction to Unix and Introduction to SeqMonk courses to provide a more in depth look at the analysis of bisulfite sequencing data. The course is a mix of theoretical lectures and hands-on practicals which go through the whole analysis pipeline, starting from raw sequence data and covering QC, visualisation, quantitation and differential methylation analysis.

Course Content:

The theoretical basis for BS-Seq
Processing raw sequencing data with Bismark
Visualisation and exploration of methylation calls with SeqMonk
The theory of differential methylation calling
Differential methylation analysis practical

Course Material:

Extracting biological information from gene lists (One day)

Many experimental designs end up producing lists of hits, usually based around genes or transcripts. Sometimes these lists are small enough that they can be examined individually, but often it is useful to do a more structured functional analysis to try to automatically determine any interesting biological themes which turn up in the lists.

This course looks at the various software packages, databases and statistical methods which may be of use in performing such an analysis. As well as being a practical guide to performing these types of analysis the course will also look at the types of artefacts and bias which can lead to false conclusions about functionality and will look at the appropriate ways to both run the analysis and present the results for publication.

Course Content:

Functional databases
Statistical test for testing functional enrichment
Common artefacts in functional analysis
Presenting functional analysis in publications
Motif detection tools

Course Material:

ChIP-Seq Analysis (One day)

This course provides a complete introduction to the theory and practice of the analysis of ChIP-Seq data. It is designed for biologists who may have limited practical bioinformatics skills, but who would like to use ChIP-Seq as part of their work. By the end of the course students should be able to process and analyse their own data.

Students on this course would benefit from having attended the SeqMonk or Unix introduction courses, but these are not required in order to attend.

Course Content:

The theory of ChIP-Seq analysis
Processing ChIP-Seq data
Exploring and Visualising ChIP-Seq data
Analysing for peak calling and differential enrichment

Course Material:

RNA-Seq Analysis (One day)

This course provides an introduction to the QC, processing and analysis of RNA-Seq data. It focuses on a workflow where RNA-Seq is performed on a large eukaryotic genome for which there is a reference genome available. The course starts with a comprehensive lecture covering the theory of RNA-Seq data generation and analysis and is then followed by hands-on practical sessions which run though the entire RNA-Seq analysis pipeline from raw fastq files to a list of differentially expressed candidate genes.

Course Content:

The theory of RNA-Seq analysis
Raw data QC
Mapping RNA-Seq data with hisat2
Viewing RNA-Seq data with SeqMonk
Differential expression analysis with DESeq
Reviewing and visualising differential expression hits
Analysing more complex multi-condition studies

Course Material:

10X Single Cell RNA-Seq Analysis (One day)

This course gives a practical introduction to the processing, qc and analysis of a simple single cell RNA-Seq experiment performed on the 10X platform. It explains the technology used to create the data and goes through some common analysis tools. The course also goes through the theory and practice of the dimension reduction techniques which are very often used to present this kind of data.

Course Content:

How 10X scRNA libraries are made
Processing raw data with CellRanger and assessing quality
Dimension reduction theory - PCA and tSNE
Reviewing processed data with the Loupe Browser
R package systems for scRNA analysis
Using Seurat to analyse 10X data

Course Material:

Course Data (zip 65MB)

An Introduction to Biological Big Data (3 days)

This couse provides both a biological and technical introduction to Biological Big Data. It is divided into three, day-long sessions where participants learn about the available big data resources, what they mean, and how to use them. There are extensive practicals to give time for people to familiarise themselves with the sites they are shown.

Course Content:

Day1: Central Dogma Data Resources - a refresher on the main biological concepts surround the central dogma, and an introduction to the data resources which allow you to access the current state of knowledge about your genes of interest
Day2: Experimental Techniques, Datatypes and Resources. An introduction to the technologies and equipment which allows us to expermentally measure relevant data at scale. We cover both the generation of new data, but also the repositories of existing public data which can be re-used.
Day3: Practical Computation for Bioinformatics. Finally we look at the practicalities of processing and analysing the data coming from high throughput experiments. We look at both hardware and software platforms and introduce the main techniques, languages and frameworks which are commonly used for large scale data analysis.

Course Material:

Quality Control in Sequencing Experiments (Half a day)

This course looks at the different ways in which sequencing based studies can fail and the options for visualisation and QC which allow you to identify and diagnose these failures at an early stage. It is designed to be of use to anyone who is using sequencing as part of their research, not just those who are running sequencing facilities.

Course Content:

Why QC is important
How sequencing experiments fail
Implementing sequencing QC
Existing QC software

Course Material:

Course Data (Direct link)
Course Data (zip) [59MB]

An Introduction to Mathematical Modelling (Half a day)

This course was developed in collaboration with the Le Novère lab at The Babraham Institute. The course is not currently running and is not supported, but we are leaving course materials here for reference.

It provides an introduction to the concepts of modelling biological systems. It is intended for biologists who have no experience in modelling but would like to know how it might apply to their area of research. The course provides a complete background to the history of modelling and the different approaches through which a biological system can be approximated by mathematical methods. The course also provides a practical introduction to the COPASI modelling environment.

Course Content:

An introduction to modelling
An overview of chemical kinetics
Mathematical modelling with COPASI

Course Material:

An Introduction to Proteomics (One day)

This course provides an introduction to the methods, data and analysis of quantitative proteomics data. It goes through the background of how the data is acquired and quantitated and the process of searching the spectra against reference databases to identify them at the spectrum, peptide and protein level. We look at quality control of search results to identify problems.

Data analysis is run using the MSstats package, both via the friendly Shiny interface, and then in more detail using R. Whilst there are no strict pre-requisites for this course, a familiatity with R and ggplot would be very helpful.

Course Content:

The theory of proteomics mass spectrometry
Acquiring RAW data files
How database searches work
Analysing data in MSstats

Course Material:

Scientific Figure Design (Whole day)

This course provides a practical guide to producing figures for use in reports and publications. It is a wide ranging course which looks at how to design figures to clearly and fairly represent your data, the practical aspects of graph creation, the allowable manipulation of bitmap images and compositing and editing of final figures.

The course will use a number of different open source software packages and is illustrated with a number of example figures adapted from common analysis tools.

Course Content:

Data Visualisation Theory Lecture
Data Representation Practical
Ethics of Data Representation Lecture
Design Theory Lecture
Inkscape Tutorial
Inkscape Practical

Course Material:

Course Data (zip)

Research Integrity: How To Be A Good Scientist (Half day)

This course provides a practical guide to doing the right thing when it comes to Research Integrity

The course will provide opportunities for discussion, hands-on illustrations and explorations

Course Content:

Research Integrity: Meanings and definitions
Research Integrity: The importance of formulating questions
Research Integrity: In Practice
Research Integrity: Data Storage and Management
Research Integrity: Responsibilities
Research Integrity: In the Lab
Research Integrity: The bottom line

Course Material:

An Introduction to Using OneNote as a Laboratory Notebook (half day)

This course provides a practical guide to using Microsoft OneNote as a Laboratory Notebook, with special consideration to practices, policies, expectations and responsibilities at The Babraham Institute

The course will use OneNote online as a cross-platform application.

Course Content:

Expectations and responsibilities
What is OneNote
Storing notebooks on the ELN
Getting started
Functions and tools in OneNote
Other useful tools
Sharing a OneNote notebook
Using OneNote as a Laboratory Notebook

Course Material:

Introduction to R for Biologists Bootcamp (3.5 days)

This Bootcamp for Biologists requires no previous experience. Over 3 1/2 days you will gain the practical experience to do your own analysis in R.

Course Content:

Introduction to R using Tidyverse
Advanced R using Tidyverse
Introduction to plotting and drawing graphs with ggplot2
Introduction to basic statistical concepts and how to execute them in R
Final Practical

Course Material:

Introduction to R with Tidyverse
Advanced R with Tidyverse
An introduction to ggplot
Statistical Analysis using R
Bootcamp Final Exercises (pdf)
Bootcamp Final Exercises (doc)

Introduction to NGS Analysis for Biologists Bootcamp (3.5 days)

This Bootcamp for Biologists requires no previous experience. Over 3 1/2 days you will gain an introduction to sequencing analysis from the ground up. Understand, explore and analyse your data and interpret the results.

Course Content:

Basic Sequencing QC
RNA Seq Analysis
ChIP Seq Analysis
Extracting Biological Information from Gene Lists

Course Material:

Quality control in Sequencing Experiments
RNA-Seq Analysis
ChIP-Seq Analysis
Extracting biological information from gene lists

Introduction to Linux Bootcamp (2.5 days)

This Bootcamp for Biologists requires no previous experience and will provide an understanding of the Linux environment. This 2 1/2 day course shows how to set up a working Linux environment; how you can install, configure and manage software and packages within it; how to run software and create basic, simple automation to enable execution in a more structured and scalable way.

Course Content:

Install a Linux operating system on your machine, either directly or through a virtual machine
Run and customise installed applications using the BASH shell
Perform simple automation, linking programs together and iterating the processing of large numbers of files
Install and configure new software packages
Understand how to use Linux in a variety of environments from personal computers to cloud infrastructure