SNPsplit - allele-specific alignment sorting
Function | A tool to determine allele-specific alignments from high-throughput sequencing experiments that have been aligned to N-masked genomes |
---|---|
Language | Perl |
Requirements | A functional version of Samtools is required. |
Code Maturity | Stable. SNPsplit has been successfully used for different applications and genomes. |
Code Released | Yes, under GNU GPL v3 or later. |
Initial Contact | Felix Krueger |
Download Now |
SNPsplit is an allele-specific alignment sorter which is designed to read in alignment files in SAM/BAM format and determine the allelic origin of reads that cover known SNP positions. For this to work a library must have been aligned to a genome which had all SNP positions masked by the ambiguity base 'N', and aligned using aligners that are capable of using a reference genome which contains ambiguous nucleobases, such as Bowtie 2 or TopHat. In addition, a list of all known SNP positions between the two different genomes must be provided using the option --snp_file.
It is probably worth mentioning that the determination of overlaps correctly handles the CIGAR operations M (match), D (deletion in the read), I (insertion in the read) and N (skipped regions, used for splice mapping by TopHat). Other CIGAR operations are currently not supported.
- Supports single-end and paired-end BAM/SAM alignment files
- In paired-end mode, paired and singleton alignments may be merged or treated separately
- Supports BAM files generated by Bowtie2
- Supports BAM files generated by Tophat
- Supports Bisulfite-Seq BAM files generated by Bismark
- Supports Hi-C BAM files generated by HiCUP
- Supports BAM files generated by STAR
- Supports BAM files generated by Hisat2
- Individual output files for genome 1-specific, genome 2-specific and unassigned alignments
- Output for conflicting alignments optionally
SNPsplit is now also available on Github where you can find the latest development version SNPsplit (Github)
Here you can access the documentation for more information on the SNPsplit workflow SNPsplit User Guide (pdf).
Here is an example paired-end SNPsplit report SNPsplit PE report (pdf)
Here is an example SNPsplit report for a BS-Seq experiment SNPsplit BS-Seq report (pdf)
Here is an example SNPsplit report for a Hi-C experiment SNPsplit Hi-C report (pdf)
Changelog
- 29-03-2017: Version 0.3.2 released (click here for the Release Notes hosted on Github)
-
- SNPsplit_genome_preparation: Relaxed SNP filtering criteria to now support multiple homozygous variants for the same position in the genome. This step should incresae the number of usable SNPs slightly (but noticably)
- SNPsplit_genome_preparation: Changed the SNP filtering for --dual_hybrid mode to only include positions where both strains had a high confidence call (irrespective of the nature of the call). This step should greatly reduce the number of false positive allele calls
- SNPsplit_genome_preparation: Added a check to SNPsplit_genome_preparation that produces a [FATAL ERROR] if the stored chromosome names are not the same as the ones in the VCF file (which is a rather common mistake when people use the Ensembl VCF file but get the genome from UCSC. This should change soon if and when Ensembl adopts the same standard used by NCBI/UCSC).
- SNPsplit_genome_preparation: Added a new version of the genome preparation script that can deal with the latest version of the VCF file for the old NCBIM37 genome build ("mgp.v2.snps.annot.reformat.vcf.gz"). The script is called "SNPsplit_genome_preparation_v2VCF" and may be found in the folder "outdated_VCF_versions" on Github. Please note that this does not include the changes to the current version of the genome preparation (see above)
- SNPsplit: Changed the samtools command throughout SNPsplit to now correctly use the path supplied by the user with --samtools_path
- SNPsplit: Option --genome_build [NAME] should now work as intended (used to be --build only)
- Changed the documentation about the latest changes in SNP filtering
- 18-07-2016: Version 0.3.1 released
-
- Manual: Added a fairly detailed section about how SNPs are filtered and processed during the SNPsplit genome preparation so it can be adapted more easily for different VCF files
- 18-05-2016: Version 0.3.0 released
-
- SNPsplit: Changed sorting command for BAM files to also work with Samtools versions 1.3+
- SNPsplit: The sorting report for single-end files is now also written to the report files
- SNPsplit: Added the # of SNPs used for the allele-discrimination to the report file to make it easier to spot errors
- SNPsplit: Now removing CR and LF line endings when reading in the SNP file. For SNP annotation files copied from a Windows machine we saw problems with no allele-specific reads for genome 2 at all which was due to the invisible \r character for the SNP call
- SNPsplit_genome_preparation: Added whole new functionality to construct single- or dual-hybrid genomes starting from VCF files which are obtainable from the (Mouse Genomes Project), here is a brief description of what it does: SNPsplit_genome_preparation is designed to read in a variant call files from the Mouse Genomes Project (e.g. this latest file: ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v5.merged.snps_all.dbSNP142.vcf.gz) and generate new genome versions where the strain SNPs are either incorporated into the new genome (full sequence) or masked by the ambiguity nucleobase 'N' (N-masking)
- The SNPsplit genome preparation may be run for a single hybrid strain, e.g. Black6/Strain of Interest, or a dual hybrid strain, e.g. 129S1/Cast
- 19-08-2015: Version 0.2.0 released
-
- Added support for allele-splitting for Bisulfite-Seq files with the new option '--bisulfite'. This assumes Bisulfite-Seq data processed with Bismark as input. In paired-end mode ('--paired'), Read 1 and Read 2 of a pair are expected to follow each other in consecutive lines. SNPsplit will run a quick check at the start of a run to see if the file provided appears to be a Bismark file, and set the flags '--bisulfite' and/or '--paired' automatically. In addition it will perform a quick check to see if a paired-end file appears to have been positionally sorted, and if not will set the flag '--no_sort'
- Reads having the unmapped FLAG set in the BAM/SAM file (0x4 bit) are now skipped and excluded from the tagging and sorting process
- Improved file renaming settings when input file was in SAM format (no longer deletes the input files..). Also changed renaming settings to only change .bam at the end of reads
- The name of the SNP annotation file is now displayed on screen and written to the report files
- 04-11-2014: Version 0.1.3 released
-
- Initial release
- All basic functions working