Bismark
Function | A tool to map bisulfite converted sequence reads and determine cytosine methylation states |
---|---|
Language | Perl |
Requirements | A functional version of Bowtie2 or HISAT2 is required. For BAM output Samtools is also required |
Code Maturity | Stable |
Code Released | Yes, under GNU GPL v3 or later |
Mission Statement | The less people know about how sausages and our code are made, the better they sleep at night (untracable author) |
Initial Contact | Felix Krueger |
Download Now |
Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away. It's main features are:
- Bisulfite mapping and methylation calling in one single step
- Supports single-end and paired-end read alignments
- Supports ungapped and gapped alignments
- Alignment seed length, number of mismatches etc. are adjustable
- Output discriminates between cytosine methylation in CpG, CHG and CHH context
Bismark is now also available from GitHub. You are invited to leave comments, feature request or bug reports over there!
This link will take you to the Bismark publication.
This link will take you to our review about primary data analysis in BS-Seq.
Here you can access the Bismark documentation Bismark User Guide.
Here are some sample Bismark HTML reports: paired-end BS-Seq processing report, or a single-end PBAT processing report
Or some Bismark HTML summary reports: Bismark Summary Report WGBS, Bismark Summary Report RRBS (no deduplication), or a Bismark Summary for a single-cell experiment which summarises a larger number of samples (Bismark Summary single cells data (.txt))
Here is an overview of the alignment modes that are currently supported by Bismark: Bismark alignment modes (pdf).
Changelog
- 19-11-2019: 0.22.3 released (click here for the Release Notes hosted on Github)
- 16-10-2019: 0.22.2 released (click here for the Release Notes hosted on Github)
- 21-04-2019: 0.22.1 released (click here for the Release Notes hosted on Github)
- 16-04-2019: 0.22.0 released (click here for the Release Notes hosted on Github)
- 14-03-2019: 0.21.0 released (click here for the Release Notes hosted on Github)
- 01-02-2019: 0.20.1 released (click here for the Release Notes hosted on Github)
- 16-08-2018: 0.20.0 released (click here for the Release Notes hosted on Github)
- 27-04-2018: 0.19.1 released (click here for the Release Notes hosted on Github)
- 13-10-2017: 0.19.0 released (click here for the Release Notes hosted on Github)
- 23-05-2017: 0.18.1 released (click here for the Release Notes hosted on Github)
- 15-05-2017: 0.18.0 released (click here for the Release Notes hosted on Github)
- 18-01-2017: 0.17.0 released (click here for the Release Notes hosted on Github)
- 25-07-2016: 0.16.3 released (click here for the Release Notes hosted on Github)
-
- Bismark: Essential fixes (2 in total) to address a bug for Bowtie 2 alignments where reads that should be considered ambiguous were incorrectly assigned to the first alignment thread. These errors had crept in during releases 0.16.0 and 0.16.2). More info available on Github
- Bismark: Added support for large Bowtie (1) index files ending in .ebwtl which had been added in Bowtie v1.1.0
- Changed the Shebang in all scripts of the Bismark suite to #!/usr/bin/env perl instead of #!/usr/bin/perl
- deduplicate_bismark: Does now bail with a useful error message when the input files are empty
- bismark_genome_preparation: Added new option '--genomic_composition' so that the genomic composition can be calculated and written right at the genome preparation stage rather than by using bam2nuc
- bam2nuc: Now also calculates a fold coverage for the various (di-)nucleotides. The changes in the nucleotide_stats text file are also picked up and plotted by bismark2report
- bam2nuc: Added a new option '--genomic_composition_only' to just process the genomic sequence without requiring any data files
- bismark2summary: Added option -o/--basename FILENAME to specify a certain filename. If not specified the name will remain bismark_summary_report.txt/html
- bismark2summary: Added documentation and the options '--help' and '--version' to be consistent with the rest of Bismark
- bismark2summary: Added option '--title STRING' to give the HTML report a different title
- 25-04-2016: 0.16.1 released (click here for the Release Notes hosted on Github)
-
- Bismark: Removed a rogue warn/sleep statement for PE/Bowtie2 mode that had crept in during the last release...
- 20-04-2016: 0.16.0 released (click here for the Release Notes hosted on Github)
-
- Bismark: File endings .fastq | .fq | .fastq.gz | .fq.gz are now removed from the output file (unless they were specified with --basename) in a bid to reduce the length of the already long file names
- Bismark: Enabled the new option --dovetail (which will be turned on by default for --pbat libraries) which will now allow dovetailing reads to be reported
- Bismark: Changed the behaviour of corner cases where several non-directional alignments could have existed for the very same position but to different strands so that now the best alignment trumps the weaker one. As an example: If you relaxed the alignment criteria of a given alignment to allow ~60 mismatches for PE alignment we did find an alignment to the OT strand with a combined AS of -324, but there also was an alignment to the CTOB strand with and AS of 0 (perfect alignment). The CTOB now trumps the OT alignment, and the methylation information information is now reported for the bottom strand
- New module: bismark2summary accepts Bismark BAM files as input. It will then try to identify Bismark reports, and optionally deduplication reports or methylation extractor (splitting) reports automatically based the BAM file basename. It produces a tab delimited overview table (.txt) as well as a graphical HTML report. Examples can be found at Bismark Summary Report and Bismark Summary Report (.txt)
- The new Bismark module bam2nuc calculcates the average mono- and di-nucleotide coverage of libraries and compares this to the genomic average composition. bam2nuc can be called straight from within Bismark (option --nucleotide_coverage) or run stand-alone. bam2nuc creates a ...nucleotide_stats.txt file that is also automatically detected by bismark2report and incorporated into the HTML report
- bismark_sitrep.tpl: Removed an extra function call in bismark_sitrep.tpl so that the M-bias 2 plot is drawn once the M-bias 1 plot has finished drawing (parallel processing could with certain browsers and data may have resulted in a white spaceholder only)
- Methylation extractor: Altering the file path handling of coverage2cytosine and bismark2bedGraph also required some changes in the methylation extractor
- bismark2bedGraph: Input file path handling has been completely reworked. The output file which can be specified as -o output.bedGraph now has to be a single file name and mustn't contain any path information. A particular output folder may be specified with -dir /any/path/
- bismark2bedGraph: Addressing the file path handling issue also fixed a similar issue with the option --remove_spaces when -o had been specified
- coverage2cytosine: Changed zcat for gunzip -c when reading a gzipped coverage file. This should avoid some Mac platforms crashing because zcat invariably requires a file to end in the .Z (which it doesn't...)
- coverage2cytosine: Changed the way in which the coverage input file is handed over from the methylation_extractor to coverage2cytosine (previously the path information might have been part of the file name, but instead it will now be only part of the -dir output_directory option
- 14-01-2016: 0.15.0 released (click here for the Release Notes hosted on Github)
-
- Added option --se/--single_end [list]. This sets single-end mapping mode explicitly giving alist of file names as [list]. The filenames may be provided as a comma , or colon :-separated list
- Added option --genome_folder /path/to/genome as alternative to supplying the genome as the first argument
- Added an option --rg_tag to print an @RG header line as well as and RG:Z: tag to each read. The ID and SAMPLE fields default to 'SAMPLE', but can be specified manually with --rg_id or --rg_sample
- Added new option --ambig_bam for Bowtie2-mode only, which writes out a single alignment for sequences with multiple alignments to a special file ending in .ambiguous.bam. The alignments are in Bowtie2 format and do not any contain Bismark specific entries such as the methylation call etc. These ambiguous BAM files are intended to be used as coverage estimators for variant callers. Works for single-end and paired-end alignments in single or multi-core mode
- Added the new options --cram and --cram_ref to Bismark for both paired- and single-end alignments in single or multi-core mode. This option requires Samtools version 1.2 or higher. A genome FastA reference may be supplied as a single file with the option --cram_ref; if this is not specified the file is derived from the reference FastA file(s) used for the Bismark run, and written to the file Bismark_genome_CRAM_reference.mfa into the output directory.
- deduplicate_bismark: Added better handling of cases when the input file was empty (died for percentage calculation instead of calling it N/A)
- Added a note mentioning that Read1 and Read2 of paired-end files are expected to follow each other in two consecutive lines and possibly require name-sorting prior to deduplication. Also added a check that reads the first 100000 lines to see if the file appears to have been sorted and bail out if this is true
- methylation extractor: Added support for CRAM files (this option requires Samtools version 1.2 or higher) bismark2bedGraph
- Changed the way gzip compressed input files are handled when using the UNIX sort command (i.e. with --scaffolds/--gazillion or without --ample_memory coverage2cytosine
- Added option --gzip to compress output files. This currently only works for the default CpG_report and CX_report output files (and thus not with the option --gc or --split_files. The option --gzip is now also passed on from the bismark_methylation_extractor
- Added a check to bail if no information was found in the coverage file, e.g. if a wrong file path for a .cov.gz file had been specified
- bismark_genome_preparation: Added process handling to the child processes
- 20-08-2015: 0.14.5 released - minor fix
-
- deduplicate_bismark: Changed all instances of literal calls of 'samtools' calls to '$samtools_path'
- 19-08-2015: 0.14.4 released
-
- Bismark: Changed the FLAG values of paired-end alignments to the CTOT or CTOB strands so that reads can be properly displayed in SeqMonk when imported as BAM files. This change affects only paired-end alignments in --pbat or --non_directional mode. In detail we simply swapped the Read 1 and Read 2 FLAG values round so reads now resemble exactly concordant read pairs to the OT or OB strands. Note that results produced by the methylation extractor or further downstream of that are not affected by this change
- Bismark: Input files specified with filepath information for FastA files are now handled properly in --multicore runs (this was fixed only for FastQ files in the previous patch)
- Bismark: Unmapped and ambiguous files (options --unmapped and --ambiguous) are now written out as gzip compressed files by default
- Bismark: Changed the default mode of operation to --bowtie2. Bowtie (1) alignments may still be chosen using the option --bowtie1
- Bismark Genome Preparation: Changed the execution of the genome indexing of the parent process to system() rather than an exec() call since this seemed to lead to interesting faults when run in a pipeline setting
- Bismark Genome Preparation: Changed the default indexing mode to --bowtie2. Bowtie (1) indexing is still available via the option --bowtie1
- bismark2bedGraph: The coverage (.cov) and bedGraph (.bedGraph) files are now written out as gzip compressed files by default
- coverage2cytosine: Added new option '--gc/--gc_context' to reprocess the genome to find methylation in GpC context. This might be useful for specialist applications where GpC methylases had been employed. The output format is exactly the same as for the normal cytosine report, and only positions covered by at least one read are reported (output file ends in .GpC_report.txt). In addition this will write out a Bismark coverage file (ending in GpC.cov)
- deduplicate_bismark: Removed redundant closing statements to get rid of warning messages
- deduplicate_bismark: The option --representative is no longer displayed in the help text. The option was once useful to determine the PCR bias that had been introduced by over digestion with bisulfite and is nearly always not what should be used for deduplication (it will be left in and is still functional for the time being though)
- 06-05-2015: 0.14.3 released
-
- Bismark: Changed the renaming settings for paired-end files so that 'sam' within the filename no longer gets renamed to 'bam' (e.g. smallsample.sam > smallbample.sam)
- Bismark: Input files specified with filepath information are now handled properly in --multicore runs
- Bismark: The --multicore option currently requires the output files to be in BAM format, so specifying --sam at the same time has been disallowed
- Methylation Extractor: fixed another bug for the same issue as in 0.14.1 that had crept into the 0.14.2 release (to do with --ignore_3prime)
- coverage2cytosine: Changed the option --merge_CpG so that CGs starting at position 1 are not considered (since the 3-base sequence context of the bottom strand C at position 2 can not be determined)
- 27-03-2015: 0.14.2 released
-
- Methylation Extractor: Added a bug fix for the same issue as in 0.14.1 that was overlooked in the earlier release
- 27-03-2015: 0.14.1 released
-
- Bismark: Fixed the cleaning up stage in a --multicore run when --gzip had been specified as well
- Bismark: Fixed the handling of files in a --multicore run when the input files had been specified including file path information
- Bismark: Please note that the option -B/--basename in conjunction with --multicore is currently not supported (as in: disabled), but we are aiming to address this soon
- Methylation Extractor: Fixed a bug with the position adjustment of paired-end reads when the reads should have been trimmed from their 3' ends (option --ignore_3prime)
- deduplicate_bismark: Now also removing newline characters from the read conversion tag in case other programs interfered with the tag ordering and put this tag into the very last column
- 06-03-2015: 0.14.0 released - Bismark Parallelization
-
- Bismark: Finally added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance. If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned...
- Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam
- Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway
- Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly
- deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required
- deduplicate_bismark: Added option --version so that Clusterflow can report a version number
- bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well
- coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running
- 27-12-2014: 0.13.1 released
-
- Bismark Genome Preparation: Added a check for unique chromosome names to the Bismark indexer to avoid disappointments later
- Methylation Extractor: Fixed a bug for the M-bias reports when the option --multicore was used, in which case only the numbers of one core were used to constuct the report. Now every different thread writes out an individual M-bias table, and once the methylation extraction has completed all these individual files are merged into a single, cumulative table as it should be
- Methylation Extractor: Added a new option --mbias_off, which processes the files as normal but does not write out any M-bias files. This option is meant for users who run the methylation extractor two times, the first time to figure out whether there is a bias that needs to be removed, and the second time using the --ignore options, but without overwriting the already existent M-bias reports
- bismark2bedGraph: Deferred removal of the input file path information a little so that specifying file paths doesn't prevent bismark2bedGraph from finding the input files anymore
- bismark2bedGraph: If the specified output directory doesn't exist it will be created
- bismark2bedGraph: Changed the way scaffolds are sorted (with --gazillion/--scaffold specified) to -k3,3V (this was done following a suggestion by Volker Brendel, Indiana University: "The -k3,3V sort option is critical when the sequence names are numbered scaffolds (without left-buffering of zeros)
- coverage2cytosine: Added a new option --merge_CpG that will post-process the genome-wide report to write out an additional coverage file which has the top and bottom strand methylation evidence pooled into a single CpG dinucleotide entity. This may be the desirable input format for some downstream processing tools such as the R-package bsseq (by K.D. Hansen). For an example please see the RELEASE_NOTES file. This option is currently experimental, and only works if CpG context only and a single genome-wide report were specified (i.e. it doesn't work with the options --CX or --split_by_chromosome)
- coverage2cytosine: Changed the processing of not-covered chromosomes so that they are sorted and not processed randomly. This should make runs more reproducible
- 01-10-2014: 0.13.0 released
-
- Bismark: Fixed renaming issue for SAM to BAM files (which would have replaced any occurrence of sam in the file name, e.g. sample1_... instead of the file extension .sam)
- Methylation Extractor: Added new option '--multicore INT' to set the number of cores to be used for the methylation extraction process. If system resources are plentiful this is a viable option to speed up the extraction process (we observed a near linear speed increase for up to 10 cores specified). Please note that a typical process of extracting a BAM file and writing out '.gz' output streams will in fact use ~3 cores per value of --multicore INT specified (1 for the methylation extractor itself, 1 for a Samtools stream, 1 for a GZIP stream), so --multicore 10 is likely to use around 30 cores of system resources. This option has no bearing on the speed of the bismark2bedGraph or genome-wide cytosine report processes
- Methylation Extractor: Added two new options '--ignore_3prime INT' (for single-end alignments and Read 1 of paired-end alignments) and '--ignore_3prime_r2 INT' (for Read 2 of paired-end alignments) to remove positions that display a methylation call bias on the 3' end of reads
- Methylation Extractor: The option --no_overlap is now the default for paired-end data. One may explicitly choose to include overlapping data with the option '--include_overlap'
- Methylation Extractor: The splitting report will now be written out by default (previously optional --report)
- Methylation Extractor: In paired-end mode, read-pairs which had been skipped because either read was shorter than a specified (very high) value of '--ignore' or '--ignore_r2' will now have the information of the other read extracted if it meets the length criteria (if applicable). Thanks to Andrew Dei Rossi for contributing a patch
- bismark2bedGraph: Fixed the location of the sorting directory which could have failed if an output directory had been specified
- 21-07-2014: hotfix 0.12.5 released
-
- Bismark: Added one additional check to the way ambiguous alignments are handled in Bowtie 2 mode. In more detail this adds a check whether the current ambiguous alignment is worse than the best alignment so far, in which case the sequence does not get flagged as ambiguous
- 21-07-2014: Version 0.12.4 released
-
- Bismark: Improved the way ambiguous alignments are handled in Bowtie 2 mode. Previously, sequences were classified as ambiguously aligning as soon as a sequence produced several equally good alignments within the same alignment thread. Under certain circumstances however there may exist equally good alignments within the same alignment thread, but the sequence might have a better (unique) alignment in another thread. Such a unique alignment will now trump the ambiguous alignment as it should
- Bismark: Got rid of 2 warning messages of MD-tag information for reads containing deletions (Bowtie 2 mode only) which accidentally made it through to the release
- Bismark: Added '-x' to the invocation of Bowtie 2 for FastA sequences so that it works again (It used to work previously only because Bowtie 2 did not check it properly and automatically used bowtie2-align-s, but now it does check...)
- Methylation Extractor: Line endings are now chomped at an earlier stage so that interfering with the optional fields in the Bismark BAM file doesn't break the methylation extractor (e.g. reordering of optional tags by Picard)
- 23-06-2014: Version 0.12.3 released
-
- Bismark: Replaced the XX-tag field (base-by-base mismatches to the reference, excluding indels) by an MD:Z: field that now properly reflects mismatches as well as indels
- Bismark: Fixed the hemming distance value (NM:i: field) for reads containing insertions (Bowtie 2 mode only) which was previously offset by the number of insertions in the read
- methylation extractor/bismark2bedGraph: Changed the '--zero_based' option of the methylation extractor and bismark2bedGraph to write out an additional coverage file (ending in .zero.cov) which uses the UCSC zero-based, half-open standard
- bismark2bedGraph: Changed the requirement of CpG context files to now start with CpG... (from CpG_...)
- 04-05-2014: Version 0.12.2 released
-
- Bismark: Added support for the new 64-bit index files for very large genomes in Bowtie 2 mode. The large genome indexes (ending in .bt2l instead of .bt2 for small genomes) are generated automatically by bismark_genome_preparation and work just as well in the Bismark alignment step
- Bismark: Fixed a bug that would omit the name of the second last chromosome from the SAM header if the genome had been supplied as Multi-FastA file. Everything else, including the alignments, would have been unaffected by this glitch
- Bismark: When the option '--basename' is specified, SE amibiguous file names now feature an underscore in their file name. Also, the pie chart file names are now derived from the the basename
- Methylation Extractor: Introduced a length check when the options --ignore or --ignore_r2 were set to ensure that only reads that are long enough are being processed
- 29-04-2014: Version 0.12.1 released
-
- Bismark: Added calculation of MAPQ values for SAM/BAM output generated with Bowtie 2 for both single-end and paired-end mode. The calculation is implemented like in Bowtie 2 itself. Mapping quality values are still unavailable for alignments performed with Bowtie and retain a value of 255 throughout
- Bismark: Fixed an uninitialised value warning for PE alignments with Bowtie 2 that occurred whenever Read 2 aligned to the very start of a chromosome (this only affected the warning itself and had no impact on any results)
- coverage2cytosine: all chromosomes or scaffolds are now processed irrespective of whether they were covered in the sequencing experiment or not. Previously, CpG/cytosine reports for genomes with lots of small scaffolds that were not covered by any reads might have had a variable number of lines between experiments
- 08-04-2014: Version 0.11.1 released
-
- Bismark: The option --pbat now also works for use with Bowtie 2, in both single-end and paired-end mode. The only limitation to this is that it only works with FastQ files and uncompressed temporary files
- Bismark: Changed the order in which the @SQ lines are written out to the SAM/BAM header from random to the same order they are being read in from the genomes folder (or the order of the files in which they occur within a multi-FastA file)
- Bismark: Included a new option '-B/--basename basename' for output files instead of deriving these names from the input file. --basename takes precedence over the option --prefix.
- Bismark: Unmapped or ambiguous files now end in .fq or.fa for FastA or FastQ files, respectively (instead of .txt files)
- Methylation extractor: willl no longer attempt to delete unused files if --mbias_only was speficied
- Methylation extractor: Added a test to see if a file that does not end in .bam is in fact a BAM file, and if this succeeds open the file using Samtools view
- 03-12-2013: Bug fix for deduplicate_bismark
-
- deduplicate_bismark: fixed a bug for '--representative' mode where the final report was accidentally written to the SAM file instead of the report file. Please note that using '--representative' is nearly always what you DON'T WANT to do, since this selects for the most highly amplified PCR product/artefact and not a random read
- 27-11-2013: Version 0.10.1 released
-
- Bismark methylation extractor: The methylation extractor does now detect automatically whether Bismark alignment file(s) were run in single-end or paired-end mode. The automatic detection can be overridden by manually specifying -s or -p and this option is only available for SAM/BAM files
- bismark2bedGraph: When run in stand-alone mode, the coverage file will replace 'bedGraph' as the file ending with 'bismark.cov'. If the output filename is anything other than 'bedGraph', '.bismark.cov' will be appended to the filename
- bismark2bedGraph: When run in stand-alone mode, '--counts' will be enabled by default for the coverage output
- bismark2bedGraph: Added a new option '--scaffolds/--gazillion' for users working with unfinished genomes sporting tens or even hundreds of thousands of scaffolds/contigs/chromosomes. Such a large number of reference sequences frequently resulted in errors with pre-sorting reads to individual chromosome files because of the operating system's limitation of the number of filehandles that can be written to at any one time (typically this limit is anything between 128 and 1024 filehandles; to find out this limit on Linux, type: ulimit -a). To bypass the limitation of open filehandles, the option '--scaffolds' does not pre-sort methylation calls into individual chromosome files. Instead, all input files are temporarily merged into a single file (unless there is only a single file), and this file will then be sorted by both chromosome AND position using the UNIX sort command. Please be aware that this option might take a looooong time to complete, depending on the size of the input files, and the memory you allocate to this process (see '--buffer_size')
- bismark2bedGraph: Added a new option '--ample_memory'. Using this option will not sort chromosomal positions using the UNIX sort command, but will instead use two arrays to sort methylated and unmethylated calls, respectively. This may result in a faster sorting process for very large files, but this comes at the cost of a larger memory footprint (as an estimate, two arrays of the length of the largest human chromosome 1 (~250 million bp) consume around 16GB of RAM). Note however that due to the overhead of creating and looping through huge arrays this option might in fact be *slower* for small-ish files (up to a few million alignments). Note also that this option is not currently compatible with options '--scaffolds/--gazillion'. This option still needs some efficiency testing as to when it actually makes sense to use it, but it produces identical results to the default sort option. Thanks to Yi-Shiou Chen for contributing this twist
- deduplicate_bismark: The deduplication script does now detect automatically whether a Bismark alignment file was run in single-end or paired-end mode (this happens separately for every file analysed). The automatic detection can be overridden by manually specifying -s or -p and this option is only available for SAM/BAM files
- bismark2report: Specifying a single file for each of the optional reports does now will now work as intended, instead of being skipped
- coverage2cytosine: Added some counting and statements to indicate when the run finished successfully (it proved to be difficult to follow the report process for a genome with nearly half a million scaffolds...)
- 10-11-2013: Version 0.10.0 released
-
- Bismark: The option '--prefix' does now also work for the C->T and G->A transcribed temporary files to allow multiple instances of Bismark to be run on the same file in the same folder (e.g. using Bowtie and Bowtie 2 or some stricter and laxer parameters concurrently)
- bismark2report: Changed the behavior of this module to automatically find all Bismark mapping reports in the current working directory, and to try and work out whether the optional reports are present as well (i.e. deduplication, splitting and M-bias reports). This uses the file basename and will fail if the files have been renamed at any stage
- bismark2report: Added commas as separator for large numbers to improve readability
- Bismark methylation extractor: will now delete unsused methylation context files (e.g. CTOT and CTOB files for a directional library)
- bismark2bedGraph: Dropped the option -k3,3 from the sort command to result in a dramatic speed increase while sorting. This option had been used previously to enable sorting by chromosome in addition to position, but should no longer be needed because the files are being read in sorted by chromosome already
- bismark2bedGraph: This module does now produce these two output files:
(1) A bedGraph file, which now contains a header line: 'track type=bedGraph'. The genomic start coords are 0-based, the end coords are 1-based.
(2) A coverage file ending in .cov. This file replaces the former 'bedGraph --counts' file and is required to proceed with the subsequent step to generate a genome-wide cytosine report (the module doing this has been renamed to coverage2cytosine to reflect this file name change) - coverage2cytosine: Changed the name of this module from 'bedGraph2cytosine' to 'coverage2cytosine' to reflect the change that this module now requires the methylation coverage file produced by the bismark2bedGraph module (this coverage file replaces the former "bedGraph --counts" output)
- coverage2cytosine: Previously, the cytosine report would always report every C position in any context, even though the default should have reported CpG positions only. This has now been fixed
- Bismark genome preparation: Made a couple of changes to make the genome preparation fully non-interactive. This means that the path to the genome folder and to Bowtie (1/2) have to be specified up front (for Bowtie (1/2) it is otherwise assumed that it is in the PATH). Furthermore, already existing bisulfite indices in the target folder will be overwritten and the user is no longer prompted if he agrees to this. We got rid of this because creating a second index (Bowtie 1 as well as 2) in the same folder in non-interactive mode got stuck in loops asking whether it is alright to proceed or not, generating therabyte sized log files without ever starting doing anything useful...)
- deduplicate_bismark: Renamed the rather long deduplication script to this slightly shorter one. Also added some filehandle closing statements that might have caused buffering issues under certain circumstances
- 08-16-2013: Version 0.9.0 released
-
- Bismark: Implemented the new methylation call symbols 'U' and 'u' for methylated or unmethylated cytosines in unknown sequence context, respectively. If the sequence context bases contain an N, e.g. CN or CHN, the context cannot be determined accurately (previously, these cases were assumed to be in CHH context). These situations may arise whenever the reference sequence contains Ns, or when insertions in the read occur close to a cytosine position (bases inserted into the read have no direct equivalent in the reference sequence and were assumed to be Ns for the methylation call). In practical terms, the 'U/u' methylation calls will only occur for Bowtie 2 alignments because Bowtie 1 does not support gapped alignments or read alignments if the reference contains any N's. The Bismark report will now also include the 'U/u' statistics, such as count and % methylation, however only if run in Bowtie 2 mode
- bismark2report: this new module generates a graphical interactive HTML report of the Bismark alignment, deduplication, splitting and M-bias statistics for convenient visualisation of what is going on. Since several different modules of Bismark may be included into this report that may or may not have been run, bismark2report requires the user to specify the relevant reports as input files. Many thanks to Phil Ewels for the conceptual design and his help with this report. Here are examples for a standard paired-end BS-Seq report, or for a single-end PBAT report
- Bismark: Fixed a bug affecting the generation of the alignment overview pie chart which occurred for PBAT libraries only
- Methylation Extractor: Added handling of the newly introduced methylation call U/u for cytosines in Unknown sequence context (CN or CHN). These methylation calls are simply ignored in the extraction process to not cause too much confusion for downstream analysis
- bismark2bedGraph: Added a check to see whether input files start with CpG_* or not. If they don't, please include the option '--CX' when running bismark2bedGraph as a stand-alone tool
- 07-26-2013: Version 0.8.3 released
-
- Bismark: Changed the FLAG values of paired-end SAM/BAM output files to comply with other downstream applications such as Picard. In addition, reads will no longer have /1 or /2 appended to the read IDs. For the time being, the old FLAG values and read ID tags can still be obtained using the option '--old_flag'. For more information on the change of FLAG tags please see the RELEASE NOTES or type 'bismark --help'
- Methylation Extractor: Changed the additional check for the module GD::Graph::colour to an 'eval {require ...}' statement instead of using 'use'. This should now properly skip drawing the M-bias plot if the module is not installed on the system
- Methylation Extractor: Implemented two quick tests for paired-end SAM/BAM files to see if the file had been sorted by chromosomal position prior to using the methylation extractor, because this would cause problems with the strand identity and overlaps since both reads 1 and read 2 are expected to follow each other directly in the Bismark alignment file. The first test attempts to find an @SO (for sorted) tag in the SAM header. If this cannot be found, the first 100000 sequences are checked for whether or not their ID is the same. If the file appears to have been sorted, the methylation extractor will bail and ask for an unsorted file instead
- 07-24-2013: Version 0.8.2 released
-
- Bismark: Changed the values of the TLEN values in paired-end SAM format generated by Bowtie 2 whenever one read was completely contained within the other; in such cases both TLEN values will be set to the length of the longer fragment
- Bismark: Changed the output filename for Bowtie 2 files for single-end reads from '...bt2_bismark.sam' to '...bismark_bt2.sam' so that single-end and paired-end file names are more consistent
- Methylation Extractor: Added a new option '--mbias_only'. If this option is specified, the M-bias plot(s) and their data are being written out. The standard methylation report ('--report') is optional. Since this option will not extract any methylation data, neither bedGraph nor cytosine report conversion are not allowed
- Methylation Extractor: If a specific output directory and '--cytosine_report' are specified at the same time, the bedGraph2cytosine module will now use the bedGraph file located in the output directory as intended
- Methylation Extractor: Added an additional check for the module GD::Graph::colour; if it can't be found on the system drawing of the M-bias plot will be skipped
- 07-12-2013: Version 0.8.1 released
-
- Methylation Extractor: Changed the function of the option '--ignore <int>' to ignore the first <int> bp from the 5' end of single-end reads or Read 1 of paired-end files. In addition, added a new option '--ignore_r2 <int>' to ignore the first <int> bp from the 5' end of Read 2 of paired-end files. Since the first couple of bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of the end-repair of sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details
- Methylation Extractor: Changed colours, legends and background colour of the M-bias plot
- Bismark: Changed the way in which the alignment overview file is being named to actually work
- 07-12-13: Version 0.8.0 released
-
- Bismark: Added new option '--prefix' to add a prefix to the output filenames. For example, '--prefix test' with 'file.fq' would result in the output file 'test.file.fq_bismark.sam' etc.
- Bismark: Fixed a warning message that occurred when chromosomal sequences could not be extracted in paired-end Bowtie2 mode
- Bismark: will now generate a pie chart with the alignment statistics once a run has finished; this allows to get a quick overview of how many sequences aligned uniquely or sequences that did not align, either due to producing no alignment at all, multiple mapping or because it was impossible to extract the chromosomal sequence
- Methylation Extractor: upon completion, the methylation extractor will now produce an M-bias (methylation bias) plot, which shows the methylation proportion across each possible position in the reads (described in: Hansen et al., Genome Biology, 2012, 13:R83). The data for the M-bias plot will be written into a text file (to generate graphs by alternative means) and drawn into a .png file. The plot also contains the absolute number of methylation calls per position (methylated + unmethylated)
- 05-10-13: Version 0.7.12 released
-
- Bismark: Removed a rogue sleep(1) command that would slow down single-end Bowtie 2 alignments for a single lane of HiSeq (200M sequences) from ~1 day to 6 years and 4 months (roughly)
- bismark2bedGraph: keeps now track of the temp files it just created in a session instead of using all files in the output folder ending in ".methXtractor.temp". This lets you kick off the bedGraph conversion step from already sorted, individual methXtractor.temp files if desired
- 04-22-13: Version 0.7.11 released
-
- Bismark: Fixed non-functional single-end alignments with Bowtie2 which were accidentally broken by introducing the option '--pbat' in v0.7.10 (an evil 'if' instead of 'elsif'...)
- For paired-end alignments with Bowtie 1, the option '--non_bs_mm' would accidentally confuse the number of mismatches of read 1 and read 2 whenever the first read aligned in reverse orientation, i.e. for OB and CTOT alignments. This has now been corrected
- Previously, the option '--non_bs_mm' would potentially output non-integer values for Bowtie 2 alignments if the read (or reference) contained 'N' characters. Alignment scores from 'N's are now adjusted so that they count as mismatches similar to what Bowtie 1 does. This works for fine reads with up to and including 5 N's (which is quite a lot...)
- Methylation extractor: To avoid duplication and keep code modular, the bedGraph conversion step invoked by the option '--bedGraph' is now been farmed out to the module 'bismark2bedGraph'. This script is independent of the methylation extractor and also works as a stand-alone tool from the methylation extractor output (compressed or gzip compressed files). To work well from within the methylation extractor this script (which is now included in the Bismark package) needs to reside in the same folder as the 'bismark_methylation_extractor' itself
- bismark2bedGraph: Temporary chromosome files now have an input file name included in their file name to enable parallel processing of several files in the same directory at the same time
- To avoid duplication and keep code modular, the bedGraph to genome-wide cytosine methylation report step invoked by the option '--cytosine_report' has now been split out to the module 'bedGraph2cytosine'. This script is independent of the methylation extractor and also works as a stand-alone tool from the Bismark bedGraph '--counts' output (compressed or gzip compressed files). To work well from within the methylation extractor this script (which is now included in the Bismark package) needs to reside in the same folder as the 'bismark_methylation_extractor' itself
- Deduplication script: Fixed some warnings that were thrown if '--bam' was not specified
- 04-18-13: Version 0.7.10 released
-
- Bismark: Added new option '--gzip' that causes temporary bisulfite conversion files to be written out in a GZIP compressed form to save disk space. This option is available for most alignment modes with the exception of paired-end FastA files
- Added new option '--bam' that causes the output file to be written out in BAM format instead of the default SAM format. Bismark will attempt to use the path to Samtools that was specified with '--samtools_path', or, if it hasn't been specified explicitly, attempt to find Samtools in the PATH. If no installation of Samtools can be found the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file)
- Added new option '--samtools_path' to point Bismark to your Samtools installation, e.g. /home/user/samtools/. Does not need to be specified explicitly if Samtools is in the PATH
- Added new option '--pbat' which is to be used for PBAT-Seq libraries (Post-Bisulfite Adapter Tagging; Kobayashi et al., PLoS Genetics, 2012). This is essentially the exact opposite of alignments in 'directional' mode, as it will only launch two alignment threads to the CTOT and CTOB strands instead of the normal OT and OB ones. The option '--pbat' works currently only for single-end and paired-end FastQ files for use with Bowtie1 and uncompressed temporary files only (there are no plans to extend this to other alignment modes at present)
- Methylation extractor: The methylation extractor does now also read BAM files, however this requires a working copy of Samtools. The new option '--samtools_path' may point the methylation extractor to your Samtools installation, e.g. /home/user/samtools/. This does not need to be specified explicitly if Samtools is in the PATH
- Added new option '--gzip' to write out the primary methylation extractor files (CpG_OT_..., CpG_OB_... etc) in a GZIP compressed form to save disk space. This option does not work on bedGraph and genome-wide cytosine reports as they are 'tiny' anyway
- The methylation extractor does now treat InDel free reads differently than before which leads to a ~60% increase in extraction speed for ungapped alignments in SAM format!
- When sorting methylation calls for the bedGraph step, the methylation extractor does now use the output directory to store temporary sort files instead of the default /tmp/ directory
- Deduplication script: The deduplication script does now also read BAM files, however this requires a working copy of Samtools. The new option '--samtools_path' may point the script to your Samtools installation, e.g. /home/user/samtools/. This does not need to be specified explicitly if Samtools is in the PATH
- The deduplication script also received the new option '--bam' to write out deduplicated files directly in BAM format. If no installation of Samtools can be found the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file)
- 03-01-13: Version 0.7.8 released
-
- Bismark: Added new option '--non_bs_mm' which prints an extra column at the end of SAM files showing the number of non-nisulfite mismatches of a read. This option is not available in '--vanilla' format. Format for single-end reads: "XA:Z:mismatches". Format for paired-end reads: read 1: "XA:Z:mismatches", read 2: "XB:Z:mismatches"
- Bismark: The mapping report file names were changed to _bismark_(SE/PE)_report.txt (Bowtie 1) or bt2_bismark_(SE/PE)_report.txt (Bowtie 2) to keep it more uniform
- Methylation extractor: The input file(s) may now be specified with a file path which abrogates the need to be in the same directory as the input file(s) when calling the methylation extractor
- Methylation extractor: Added new function '--buffer_size' to increase the physical memory used for the sorting the output by chromosomal positions (only needed for bedGraph output)
- Methylation extractor: Reference sequence files containing pipe ('|') characters were found to crash the methylation extractor as the chromosome name was used for filenames. These characters are now replaced with underscores when the reads are sorted during the bedGraph step
- Updated the Bismark User Guide with sections for the bedGraph and genome-wide methylation report outputs, and Appendix IV is now showing alignment stats for the test data
- 02-10-12: Version 0.7.7 released
-
- When reading in the genome file, Bismark does now automatically remove \r line ending characters as well. This sometimes caused problems when genome files had been edited on Windows machines.
- Added support for the Bowtie 2 options '--rdg int1,int2' and '--rfg int1,int2' to adjust the gap open and extension penalties for both read and reference sequence. This might be useful in very specialised circumstances (e.g. when handling PacBio data...)
- The methylation extractor received a fairly extensive overhaul:
- Renamed methylation_extractor to bismark_methylation_extractor
- Added new function '-o/--output' to specify an output directory. This became necessary for integration into Galaxy
- Added new function '--no_header' to suppress the Bismark version header in the output files if plain alignment data is more desirable
- Added option '--bedGraph' to produce a bedGraph output file once the methylation extraction has finished; this reports the genomic location of a cytosine and its methylation state (in %). By default, only cytosines in CpG context will be sorted/reported
- Implemented option '--cutoff threshold' to set the minimum number of times a methylation state has to be seen for that nucleotide before its methylation percentage is reported
- Implemented option '--counts' which adds two additional columns to the bedGraph output file to enable further calculations:
Column 5: count of methylated calls per position
Column 6: count of unmethylated calls per position - Implemented option '--CX_context' so that the sorted bedGraph output file contains information on every single cytosine that was covered in the experiment irrespective of its sequence context
- Added option '--cytosine_report' which produces a genome-wide methylation report for all cytosines. By default, the output uses 1-based chromosome coordinates and reports CpG context only. The output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide content and methylation state
- Option '--CX_context' applies to the cytosine report as well. The output file wil contain information on every single cytosine in the genome irrespective of its context. This applies to both forward and reverse strands
- Implemented option '--zero_based' to use zero-based coordinates like used in e.g. bed files instead of 1-based coordinates
- Implemented option '--genome_folder PATH' to be used to extract sequences from. Accepted formats are FastA files ending with '.fa' or '.fasta'
- Added an option '--split_by_chromosome' which writes the cytosine report output to individual chromosome files instead of to one single very large file
- 23-08-12: Update to genome_methylation2bedGraph script
-
- Added an option '--split_by_chromosome' to enable sorting of very large files. The methylation extractor output is first written into temporary files chromosome by chromosome. These temporary files can then sorted by position and are deleted afterwards
- Added an option '--counts' which adds 2 more lines to the output file to enable further calculations (technically no longer in bedGraph format then...)
Column 5: count of methylated calls per position
Column 6: count of unmethylated calls per position
- 31-07-12: Version 0.7.6 released
-
- Reworked the way in which SAM files (both single and paired-end) are handled in the methylation extractor so that reads containing InDels, which may be generated by Bismark using Bowtie 2, are now handled as intended. Bismark users employing Bowtie 2 for alignments are strongly encouraged to upgrade to this version
- Changed the way in which the methylation extractor identifies the read and genome conversion flags in SAM output. This might become relevant if the Bismark SAM mapping output was compressed/decompressed with CRAM or Goby at some point, since these tools may change the order of optional tags in a SAM entry. Thanks to Z. Zeno for pointing this out and contributing a patch
- 16-07-12: Version 0.7.5 released
-
- Trailing read ID segment numbers (e.g. /1,/2 or /3) are now removed internally for Bowtie 2 alignments in paired-end mode as this might have caused no reads to align at all if the segment number was not 1 or 2. As of Bowtie 2 version 2.0.0-beta7 this behavior has been disabled for unpaired reads
- The Bowtie 2 option -M is now deprecated (as of Bowtie 2 version 2.0.0-beta7). What used to be called -M mode is still the default mode, but adjusting the -M setting is deprecated. The options -D and -R should be used to adjust the effort expended to find valid alignments
- Changed the default seed mismatch parameter (controlled by -n) to 1 (down from 2). This increases alignment speed noticably and typically produces very similar results for good quality read data
- Fixed a bug where the chromosomal sequence could not be extracted for very short genomic sequences for alignments with Bowtie 2
- The methylation extractor and the Bismark alignment output deduplication script do now read both raw and gzipped (.gz) Bismark mapping files
- Manual updated accordingly
- 26-04-12: Version 0.7.4 released
-
- Introduced a new option '--temp_dir <dir>' to which the C-to-T or G-to-A transcribed temporary files can be written to instead of using the same folder that contains the input files. This might become useful for implementation into Galaxy.
- The input files to be aligned may now contain path information, e.g. /home/user/file.fq or ../temp/file.fq, and one no longer has to call Bismark from within the directory containing the input files.
- 05-04-12: Version 0.7.3 released
-
- Corrected a bug for the TLEN field in paired-end SAM output. This value was occasionally calculated incorrectly if both reads were overlapping almost entirely with a difference of only a single bp between the end of one read and the start of the second read. This did not affect the output of the methylation extractor but merely the display of the read alignment itself
- Removed a potential source of crashes with gzipped input files and the option -u/--qupto
- methylation_extractor: Corrected a potential flaw for the 'remove overlap' option for paired-end alignments in --vanilla mode when the first read aligned in a reverse orientation
- methylation_extractor: file endings of all files generated by the methylation extractor will be only a single '.txt' if the file was called .txt before
- 14-03-12: Version 0.7.2 released
-
- methylation_extractor: changed the file endings of all files generated by the methylation extractor to '.txt'; this is to avoid confusing these files with SAM formatted Bismark output files
- deduplicate_Bismark_alignment_output.pl: Fixed a bug for paired-end deduplication mode in SAM format, which only printed the second read alignment of a pair to the deduplicated file
- trim_galore: Updated so that non-RRBS FastQ files are adapter and quality trimmed in a single pass
- trim_galore: added an option --fastqc_args "..." to pass extra arguments to FastQC for easier integration into pipelines
- trim_galore: Added some more documentation and trim_galore can now be downloaded separately here
- validate_paired_end_files: Updated so that one can optionally write out unpaired single-end reads should a read-pair fail to be considered a valid paired-end read pair
- 29-02-12: Version 0.7.1 released
-
- Adjusted Bismark so that white spaces or tab characters in the read IDs get replaced with underscores on the fly. This was necessary because some ID checks would fail as Bowtie2 truncates read IDs if it encounters spaces in the read ID (causing errors with the latest RTA version), whereas Bowtie 1 only truncates read IDs if 'tab' characters were found. More information about this can be found in the RELEASE_NOTES.
- An RRBS QC pack is now avaliable for download which contains a brief guide to RRBS, the Cutadapt-wrapping script trim_galore as well as a validate_paired_end_files script to remove read pairs for which at least one of the read has been trimmed to a too short read length due to quality and/or adapter trimming.
- 24-02-12: Version 0.7.0 released
-
- Changed Bismark's behavior for "--directional" mode (default) to run only 2 parallel instances of Bowtie 1/2 to the original top (OT) and bottom (OB) strands, instead of 4 instances to all possible bisulfite strands. This change might result in somewhat faster alignment speed and mapping efficiency. It is still possible to run the 4-alignment strand mode for any combination of input file(s) and choice of aligner by specifying --non_directional.
- Changed the --score_min default function for Bowtie 2 alignments to a more stringent setting of "L,0,-0.2" instead of using the Bowtie2 default function (which was "L,0,-0.6")
- 06-02-12: Version 0.6.4 released
-
- Adjusted the options -u and -s so that only the non-skipped part of the input file will be transcribed and analysed. This allows splitting up very large files into smaller chunks to allow parallel processing, e.g -s 10000000 -u 20000000 would analyse sequences 10000001 to 20000000. The alignment report will be based on this reduced number of reads analysed
- In paired-end mode, the options --unmapped and --ambiguous do now output unaligned or multiply aligned reads, respectively, to their correct output files as intended
- Sequences in FastA format do now receive Phred score qualities of 40 throughout (ASCII 'I') to prevent the SAM to BAM conversion in SAMtools from failing
- If a genomic sequence could not be extracted it will now also be counted and reported for use with Bowtie 1
- Suppressed debugging warning meassages that were printed in error for Bowtie2 alignments (single-end mode only)
- 04-01-12: Version 0.6.3 released
-
- The methylation extractor does now also work with Bismark SAM output files
- Fixed a bug caused when a read was called 0 (zero)
- Changed the XX:Z mismatch field in the SAM output to display mismatching nucleotides of the reference sequence (instead of the read sequence ones)
- 15-12-11: Version 0.6.beta2 released
-
- Added a parallelization option for Bowtie 2 alignments ('-p'). Since it makes use of the option '--reorder' this option requires a Bowtie 2 version of 2.0.0-beta5 or higher. This option is still experimental and is only recommended for use on very powerful hardware setups (i.e. lots of cores and memory).
- 08-12-11: Version 0.6.beta1 released
-
- Bismark does now also support gapped alignments with Bowtie 2 (when specifying the option '--bowtie2')
- The bismark_genome_preparation does now also generate Bowtie 2 bisulfite indexes
- The Bismark default output has been changed to SAM format (for both Bowtie 1 and Bowtie 2)
- The 'old' output is still available via the option '--vanilla'
- Slightly increased the alignment efficiencies for Bowtie 1 alignments
- Changed the default mapping behavior to the former option '--directional' ('--non-directional' re-enables four-strand output)
- Changed the default maximum insert size parameter (-X/--maxins) for paired-end alignments to 500bp (up from 250bp)
- The methylation extractor works currently only on the 'vanilla' Bismark output
- The bismark2SAM script will now reverse qualities and methylation calls when reads were reverse-complemented
- 17-10-11: Version 0.5.4 released
-
- Bismark will now accept input files in either normal, uncompressed or gzipped format
- Added the option -o/--output_dir <dir> to Bismark which lets you specify the folder for all Bismark output files instead of writing into the same folder as the input file(s). If the output directory does not exist already it will be created first
- The path to the genome folder can now be absolute or relative (e.g. ../genomes/mouse/)
- Changed the way unmapped or ambiguous reads are reported so that one output file (and/or ambiguous read file) is generated per input file. Their name will be derived from the input file name. For paired-end samples, the unmapped or ambiguous filenames can be discriminated by _1 and _2 in their file names
- Added the number of sequences analysed in total to the paired-end report file (was only printed on screen previously)
- Fixed a bug for the FastQ output for ambiguous reads where quality scores were not followed by a new line
- 20-09-11: Update to bismark2SAM script
-
- The bismark2SAM script does now also report the methylation calls in a custom field (XM) for easier downstream processing. In addition, the second read of a paired-end alignment has a 2 at the end of the ID field to reflect the paired-end nature. Thanks to T. McBryan for implementing these new features.
- 13-09-11: Version 0.5.3 released
-
- Increased the 'chunkmbs' default value to 512 MB (up from 256 MB)
- Corrected a mix-up of the strand names of the complementary strands in the alignment report for single-end alignments (see release notes)
- Fixed a bug in the genome_methylation_bismark2bedGraph script that was introduced during the 1-based (Bismark) to 0-based (bedGraph) coordinate adaptation in June 2011. Thanks to M.A. Bentley for his contributions to the new version.
- Improved the bismark2SAM script to more accurately describe the origin of a bisulfite strand in the bitwise FLAG field. Thanks to E. Vidal for his contributions to the new version.
- 16-08-11: Version 0.5.2 released
-
- Increased the 'chunkmbs' default value to 256 MB (up from 64 MB)
- Bismark will now accept input files in both comma and space separated format
- Fixed a bug in the methylation extractor which resulted in offset positions for reverse reads when the option '--ignore' was used (single-end only)
- Included a check (and warning) whether the read IDs in the input files contain tab characters, as this will cause Bowtie to truncate the reads and result in no alignments
- 16-06-11: Version 0.5.1 released
-
- The genome folder for the bismark_genome_preparation can now be specified either as absolute or relative path
- Fixed a bug where a newline character was missing after the quality values in the unmapped reads FastQ output
- Fixed a bug which prevented paired-end alignments in FastA format
- Input files for the methylation extractor can now also have a relative path
- 21-04-11: Version 0.5.0 released
-
- Bismark alignments should now also support FastQ files produced by Casava v1.8 which will be available soon
- The Bismark output will now have an additional column (2 extra columns for paired-end data) with the basecall qualities (in Phred33/ Sanger format; left blank for FastA data)
- A bug was fixed for the reporting of paired-end alignments whereby alignments to the CTOT strand were assigned to CTOB strand and vice versa
- 10-02-11: Version 0.4.1 released
-
- Bisulfite genomes are now written into a multi-FastA file by default. This allows indexing of new genomes with tens of thousands of contigs or scaffolds
- The internal reporting of paired-end alignments was changed, so that sequence which produce two identical alignments are preferentially assigned to the original strands as intended
- 04-02-11: Version 0.4.0 released
-
- The option --directional is now also available for paired-end libraries. This will ignore alignments to strands which should theoretically not be sequenced
- Fixed a strand confusion in the alignments summary report for paired-end alignments (this only affected the report but not any alignments as such)
- 26-01-11: Version 0.3.0 released
-
- The Bismark User Guide replaces the previous documentation (INSTALL.txt and README.txt). It is easy to follow and contains many more details about BS-Seq and Bimark
- A BS-Seq test dataset is now available for download. It contains 10K sequences (human, shotgun) in FastQ format, taken from the SRR020138 data set (Lister et al, 2009).
- Both bismark and bismark_genome_preparation will now recognise the reference genome sequences with either .fa and .fasta file extensions
- 18-01-11: Version 0.2.6 released
-
- Fixed a bug which might have been caused by specifying very lax alignment parameters (allowing 10+ non-BS mismatches)
- 22-12-10: Version 0.2.5 released
-
- Added the option '--un <filename>' to write out unaligned reads to <filename>
- Added the option '--ambiguous <filename>' to write out ambiguously aligned reads to <filename>
- 18-11-10: Version 0.2.4 released
-
- Added the option '-I/--minins <int>' to modify the minimum valid insert size for paired-end alignments
- Added the option '-X/--maxins <int>' to modify the maximum valid insert size for paired-end alignments
- Changed the remove_tree command in the genome preparation script to rm_tree to be compatible with older versions of Perl (thanks to S. Cooper for spotting this)
- 04-11-10: Version 0.2.3 released
-
- Added the option '--directional' to Bismark to only report alignments to the original strands if the library was generated in a strand-sepcific manner
- The alignment option '--best' will now be selected by default to ensure the best possible alignments
- All Bismark output files will now end in .txt so they can be viewed or imported more easily
- Changed the reporting format slightly to increase readability
- 13-09-10: Version 0.2.2 released
-
- Fixed a bug whereby the methylation positions of certain reverse mapped reads were offset by a few bp (in the methylation extractor output)
- 08-09-10: Version 0.2.1 released
-
- The Bismark aligner will now handle Multi-Fasta-Files (MFA) as intended.
- 07-09-10: Version 0.2.0 released
-
- Non-CpG context methylation will now be subdivided into CHG and CHH context
- Added the option '--chunkmbs <int>' to counteract Bowtie best-first memory chunk exhaustion warnings in --best and paired-end alignment mode
- Added the option '--quiet' so that bowtie warnings can be suppressed if desired
- FastA files do no longer need the file extension '.fa' in order to work
- Bismark will no longer tolerate non-unique chromosome names when reading the genome into memory
- Fixed an issue with paired-end report files
- The methylation extractor will by default produce individual output files for CpG, CHG and CHH conext
- The methylation extractor can optionally merge CHG and CHH context into 'non-CpG' context if desired
- The methylation extract will ensure that its version matches the Bismark version used to generate the Bismark mapping results file
- 09-08-10: Version 0.1.5 released
-
- Fixed a bug whereby specifying '-n 0' as alignment parameter would not work correctly
- 06-08-10: Version 0.1.4 released
-
- Bismark will no longer stop during the methylation call process when it encounters ambiguity bases in the reference genome
- Fixed a strand-specifity mix-up in the single-end methylation extractor output
- 03-08-10: Version 0.1.3 released
-
- The genome indexer will now properly (and recursively) remove any pre-existing bisulfite genome directory before creating a new one
- The genome indexer will now convert ambiguity code for DNA into N's instead (anything else than C, A, T or G)
- The genome indexer does now also accept fastA files with mutltiple sequence entries
- Fixed a strand-specifity mix-up in the single-end methylation extractor output
- The option to ignore bases in the methylation extractor does now correctly alter the position of the remaining methylation calls
- Added an option to the methylation extractor to score overlapping methylation calls for paired-end alignments only once
- 17-06-10: Version 0.1.2 released
-
- Both single-end and paired-end alignments have a new and final output format (see README.txt for more details)
- Bismark and the Methylation Extractor will include their version info in the first line of the output file
- Fixed a bug with the chromosome name resolution for paired-end alignments
- Reads aligning to the very edges of chromosomes previously produced several error messages when trying to extract one additional bp to determine if Cs are in CpG context. These reads will be excluded.
- The Bismark and Methylation Extractor --help option will give info about their output file format
- 15-06-10: Version 0.1.1 released
-
- Bismark also handles genome fastA files in other formats than only Ensembl format
- Fixed a runtime bug with first alignments
- 14-06-10: Version 0.1 released
-
- Initial release
- All basic functions working