Software
This page serves as an index for the applications written and distributed by the Yandell, Marth, and Quinlan labs. Each item may include links to: documentation, code, and publications.
VARPRISM (VARiant PRIoritization SuM)
A software package that identifies genes with a statistical excess of damaging de novo mutations among individuals with a genetic disease. VARPRISM incorporates functional variant prediction information (the VAAST CASM score) to improve the statistical power of risk gene mapping and controls for local mutation rate heterogeneity. The beta version of VARPRISM is currently available for download.
annotates a VCF with any number of sorted and tabixed input BED, BAM, and VCF files in parallel. It does this by finding overlaps as it streams over the data and applying user-defined operations on the overlapping annotations.
Taxonomer
Taxonomer is an ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host mRNA transcript profiling.
A structural variant (SV) caller that integrates several sources of mapping information to identify SVs. WHAM classifies SVs using a flexible and extendable machine-learning algorithm (random forest).
A command line tool and a C API for storing and querying large-scale genotype data sets like those produced by 1000 Genomes, the Uk100K, and forthcoming datasets involving millions of genomes.
An open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement.
Compares familial-relationships and sexes as reported in a PED file with those inferred from a VCF.
A pipeline designed to make the annotation of novel plant genomes tractable for small groups with limited bioinformatics experience and resources, and faster and more transparent for large groups with more experience and resources.
iobio uses immediate visual feedback to make understanding complex genomic datasets more intuitive, and analysis more interactive.
A flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis.
Tangram
A C/C++ command line toolbox for structural variation(SV) detection.
A tool and pipeline management system that can be used to effectively deploy the majority of tools developed in the MarthLab as well as other third-party tools.
RUFUS
A new approach to variant detection that does not rely on mapping or whole genome assembly methods.
These utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks.
SubcloneSeeker
A computational framework for reconstructing tumor subclone structures.
A probabilistic framework that we have developed to integrate multiple structural variation signals such as discordant paired-end alignments and split-read alignments.
A disease-gene identification tool designed for high-throughput sequence data in pedigrees.
Integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles.
MOSAIK
A stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome.
A powerful framework for exploring genetic variation in the context of the wealth of existing genome annotations that are available for the human genome.
The application of population genomics to non-model organisms is greatly facilitated by the low cost of next generation sequencing (NGS).
Python based image analysis software designed for the automated analysis of images of the animal S.
Probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences.
BamTools
A Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
A CGL-based program that integrates RepeatMasker with BLASTX to provide a comprehensive means of identifying repetitive elements.
Provides an informatics infrastructure for a laboratory, department, or research institute engaged in the large-scale analysis of genomes and their annotations.
Freebayes
A Bayesian genetic variant detector designed to find small polymorphisms.
SCISSORS
A split-read aligner that maps orphaned read mates (i.e. where one end-mate is aligned with high mapping quality, but the other mate is unmapped), as well as re-maps severely clipped reads (reads mapped with many unaligned or “clipped-off” bases).
VARPRISM (VARiant PRIoritization SuM)
A software package that identifies genes with a statistical excess of damaging de novo mutations among individuals with a genetic disease. VARPRISM incorporates functional variant prediction information (the VAAST CASM score) to improve the statistical power of risk gene mapping and controls for local mutation rate heterogeneity. The beta version of VARPRISM is currently available for download.
Taxonomer
Taxonomer is an ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host mRNA transcript profiling.
A structural variant (SV) caller that integrates several sources of mapping information to identify SVs. WHAM classifies SVs using a flexible and extendable machine-learning algorithm (random forest).
A disease-gene identification tool designed for high-throughput sequence data in pedigrees.
Integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles.
The application of population genomics to non-model organisms is greatly facilitated by the low cost of next generation sequencing (NGS).
Python based image analysis software designed for the automated analysis of images of the animal S.
Probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences.
A pipeline designed to make the annotation of novel plant genomes tractable for small groups with limited bioinformatics experience and resources, and faster and more transparent for large groups with more experience and resources.
A CGL-based program that integrates RepeatMasker with BLASTX to provide a comprehensive means of identifying repetitive elements.
Provides an informatics infrastructure for a laboratory, department, or research institute engaged in the large-scale analysis of genomes and their annotations.
iobio uses immediate visual feedback to make understanding complex genomic datasets more intuitive, and analysis more interactive.
A tool and pipeline management system that can be used to effectively deploy the majority of tools developed in the MarthLab as well as other third-party tools.
RUFUS
A new approach to variant detection that does not rely on mapping or whole genome assembly methods.
SubcloneSeeker
A computational framework for reconstructing tumor subclone structures.
Freebayes
A Bayesian genetic variant detector designed to find small polymorphisms.
BamTools
A Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
MOSAIK
A stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome.
SCISSORS
A split-read aligner that maps orphaned read mates (i.e. where one end-mate is aligned with high mapping quality, but the other mate is unmapped), as well as re-maps severely clipped reads (reads mapped with many unaligned or “clipped-off” bases).
Tangram
A C/C++ command line toolbox for structural variation(SV) detection.
annotates a VCF with any number of sorted and tabixed input BED, BAM, and VCF files in parallel. It does this by finding overlaps as it streams over the data and applying user-defined operations on the overlapping annotations.
A command line tool and a C API for storing and querying large-scale genotype data sets like those produced by 1000 Genomes, the Uk100K, and forthcoming datasets involving millions of genomes.
An open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement.
Compares familial-relationships and sexes as reported in a PED file with those inferred from a VCF.
These utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks.
A flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis.
A probabilistic framework that we have developed to integrate multiple structural variation signals such as discordant paired-end alignments and split-read alignments.
A powerful framework for exploring genetic variation in the context of the wealth of existing genome annotations that are available for the human genome.