Tools

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Tools portals)
(Replaced content with 'Please refer to the Resources page.')
 
(147 intermediate revisions not shown)
Line 1: Line 1:
-
The Gerstein lab has made it a priority to develop its cutting edge
+
Please refer to the [[Resources]] page.
-
algorithms into tools in the form of downloadable programs, webservers, and
+
-
databases. These tools are the heart of our work in transforming the big data
+
-
of genomes into knowledge, which has real medical consequences. Below we
+
-
highlight some of these tools. For an overview of lab tool publications click [http://papers.gersteinlab.org/subject/tools here].
+
-
 
+
-
=Tools portals=
+
-
below is a list of the tool portals the lab has developed.
+
-
 
+
-
===The Morph Server===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"| [[File:Morph-icon.jpg‎]] <br> [http://morph2.molmovdb.org/ morph2 server] ||style="width:7%; text-align:center;"|1995-2014||The Morph Server generates a plausible pathway between two conformations of a protein or nucleic acid structure. A large number of statistics and several high-quality movies are output.
+
-
|}
+
-
 
+
-
 
+
-
===Pseudogene.org===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"| [[File:pseudogene.png‎]]<br>Pseudogene.org||style="width:7%; text-align:center;"|1995-2014||Pseudogene.org is a collection of resources related to our efforts to survey eukaryotic genomes for pseudogene sequences, "pseudo-fold" usage, amino-acid composition, and single-nucleotide polymorphisms (SNPs) to help elucidate the relationships between pseudogene families across several organisms.
+
-
|}
+
-
 
+
-
=Pseudogene Tools=
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://pseudogene.org '''Pseudogene.org''']||style="width:7%; text-align:center;"|2001-2011||A collection of resources related to our efforts to survey eukaryotic genomes for pseudogene sequences, "pseudo-fold" usage, amino-acid composition, and single-nucleotide polymorphisms (SNPs) to help elucidate the relationships between pseudogene families across several organisms.
+
-
|}
+
-
 
+
-
=Genome Technology Tools=
+
-
 
+
-
===Introduciton===
+
-
 
+
-
To extract knowledge from high-throughput genomic experiments, such as RNA-seq or ChIP-chip the Gerstein lab has made the following tools. To identify of splice sites and gene models from RNA-seq data we made RSEQtools (ref). In addition, RSEQtools has the benefit of allowing researchers to remove sequence information for read signal information, thus protecting the identity of the subjects’ data. To better understand alternative splicing and exon skipping events we made IQSeq (ref). To better identify fusion transcripts from paired-end RNA-sequencing we created Fusion-seq (ref). To aggregate the distribution of signals in RNA-seq or ChIP-chip signal profiles and to correlate multiple-related signal tracks we made ACT (ref). To distinguish candidate cancer drivers from inherited polymorphisms (passenger cancer mutations) we created FunSeq (ref). Large structural variation, including copy-number variation, and unbalanced inversion events, are widespread in human genomes to detect these we made BreakSeq (ref). Collectively, these tools provide knowledge that can inform personal medicine.
+
-
 
+
-
===RNA-seq===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://act.gersteinlab.org/ '''ACT''']||style="width:7%; text-align:center;"|2011||The aggregation and correlation toolbox (ACT) is an aggregation and correlation toolbox for analyses of genome tracks.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://archive.gersteinlab.org/proj/rnaseq/fusionseq/ '''FusionSeq''']||style="text-align:center;"|2010||A computational framework for detecting chimeric transcripts from paired-end RNA-seq experiments. It includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and provides a ranked list of fusion-transcript candidates that can be further evaluated via experimental methods. FusionSeq also contains a module to identify exact sequences at breakpoint junctions.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://archive.gersteinlab.org/proj/rnaseq/IQSeq/ '''IQseq''']||style="text-align:center;"|2010||A tool for isoform quantification with RNA-seq data. Given isoform annotation and alignment of RNA-seq reads, it will use an EM algorithm to infer the most probable expression level for each isoform of a gene.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://archive.gersteinlab.org/proj/rnaseq/rseqtools/ '''RSEQtools''']||style="text-align:center;"|2010||A suite of tools that use Mapped Read Format (MRF) for the analysis of RNA-Seq experiments.  MRF was developed to address privacy concerns associated with the potential for mRNA sequence reads to identify and genetically characterise specific individuals; it is a compact data summary format that enables anonymization of confidential sequence information, while maintaining the ability to conduct subsequent functional genomics studies.  RSEQtools provides a suite of modules that convert to/from MRF data and perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads, and segmenting that signal into actively transcribed regions.
+
-
|}
+
-
 
+
-
 
+
-
===ChIP===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://www.gersteinlab.org/proj/BoCaTFBS/ '''BoCaTFBS''']||style="width:7%; text-align:center;"|2006||A boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments. This tool is based on a data mining approach combining noisy data from ChIP-chip experiments with known binding site patterns. BoCaTFBS uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://www.gersteinlab.org/proj/PeakSeq/ '''PeakSeq''']||style="text-align:center;"|2009||A tool for calling peaks corresponding to transcription factor binding sites from ChIP-Seq data scored against a matched control such as Input DNA.  PeakSeq employs a two-pass strategy in which putative binding sites are first identified in order to compensate for genomic variation in the 'mappability' of sequences, before a second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances.  Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.
+
-
|}
+
-
 
+
-
 
+
-
===Allele-specific effects===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://alleleseq.gersteinlab.org/home.html '''AlleleSeq''']||style="width:7%; text-align:center;"|2011||The AlleleSeq is a computational pipeline that is used to study allele-specific expression (ASE) and allele specific binding (ASB). The pipeline first constructs a diploid personal genome sequence, then map RNA-seq and ChIP-seq functional genomic data onto this personal genome. Consequently, locations where there are differences in number of mapped reads between maternally- and paternally-derived sequences can be identified and these provide evidence for allele-specific events.
+
-
|}
+
-
 
+
-
===Microarrays & Proteomics===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://bioinfo.mbb.yale.edu/ExpressYourself '''ExpressYourself''']||style="width:7%; text-align:center;"|2003||An interactive platform for background correction, normalization, scoring, and quality assessment of raw microarray data.
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://proteomics.gersteinlab.org '''PARE''']||style="width:7%; text-align:center;"|2007||(Protein Abundance and mRNA Expression). A tool for comparing protein abundance and mRNA expression data.  In addition to globally comparing the quantities of protein and mRNA, PARE allows users to select subsets of proteins for focused study (based on functional categories and complexes). Furthermore, it highlights correlation outliers, which are potentially worth further examination.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://purelight.biology.yale.edu:8080/servlets-examples/procat.html '''ProCAT --BROKEN URL--''']||style="text-align:center;"|2006||A data analysis approach for protein microarrays. ProCAT corrects for background bias and spatial artifacts, identifies significant signals, filters nonspecific spots, and normalizes the resulting signal to protein abundance. ProCAT provides a powerful and flexible new approach for analyzing many types of protein microarrays.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://tilescope.gersteinlab.org/ '''Tilescope --BROKEN URL--''']||style="text-align:center;"|2007||An online analysis pipeline for high-density tiling microarray data. Tilescope normalizes signals between channels and across arrays, combines replicate experiments, score each array element, and identifies genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.
+
-
|-style="height: 100px;"
+
-
|style="text-align: center;"|[http://tiling.gersteinlab.org '''Tiling''']||style="text-align:center;"|NA||A platform with all our tilling array analysis tools.
+
-
|}
+
-
 
+
-
 
+
-
===Clustering===
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://bioinfo.mbb.yale.edu/expression/cluster '''Local Clustering''']||style="width:7%; text-align:center;"|2001||A new algorithm for local clustering to find timeshifted and/or inverted relationships in gene expression data is available as C source code.
+
-
|}
+
-
 
+
-
=Network Tools=
+
-
===Introduction===
+
-
the Gerstein lab has been a pioneer in applying network analysis to generate knowledge form large-scale experiments.To this end, we have developed TopNet-like Yale Network Analyzer (tYNA) for managing, comparing and mining multiple networks, both directed and undirected (ref). This tools focuses not on individual genes and proteins but on the relationships between them. For example, Identifying defective cliques, finding small network motifs (such as feed-forward loops), calculating global statistics (such as the clustering coefficient and eccentricity), and identifying hubs and bottlenecks. To apply semantic web technologies such as resource description framework (RDF), RDF site summary (RSS), relational-database-to-RDF mapping (D2RQ) to more efficiently query life sciences data and meta-data we built YeastHub (ref). The network tools developed in the Gerstein lab provide new insights into existing data and make information easy to find.
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://networks.gersteinlab.org/genome/interactions/networks/ '''TopNet''']||style="width:7%; text-align:center;"|2004||An automated web tool designed to calculate topological parameters and compare different sub-networks for any given network.
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://tyna.gersteinlab.org/tyna/ '''tYNA''']||style="width:7%; text-align:center;"|2006||(TopNet-like Yale Network Analyzer). A Web system for managing, comparing and mining multiple networks, both directed and undirected. tYNA efficiently implements methods that have proven useful in network analysis, including identifying defective cliques, finding small network motifs (such as feed-forward loops), calculating global statistics (such as the clustering coefficient and eccentricity), and identifying hubs and bottlenecks etc.
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://yeasthub.gersteinlab.org '''Yeasthub --BROKEN URL--''']||style="width:7%; text-align:center;"|???||A semantic web-based application which demonstrates how a life sciences data warehouse can be built using a native Resource Description Framework (RDF) data store. This data warehouse allows integration of different types of yeast genome data provided by different resources in different formats including the tabular and RDF formats.
+
-
|}
+
-
 
+
-
=Evolution Tools=
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://coevolution.gersteinlab.org/coevolution/ '''Coevolution analysis of protein residues''']||style="width:7%; text-align:center;"|2008||An integrated online system that enables comparative analyses of residue coevolution with a comprehensive set of commonly used scoring functions, including Statistical Coupling Analysis (SCA), Explicit Likelihood of Subset Variation (ELSC), mutual information and correlation-based methods.
+
-
|}
+
-
+
-
 
+
-
=Genome Structural Variation Analysis=
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://sv.gersteinlab.org '''SVs''']||style="width:7%; text-align:center;"|2007-2011||A dedicated web-page for analysis of genome structural variations (AGE, CNVnator, PEMer, BreakSeq, vcf2diploid).
+
-
|}
+
-
 
+
-
 
+
-
=Structural Biology Tools=
+
-
===Introduction===
+
-
To extract knowledge about the three-dimensional dynamics of proteins and ultimately their function we have built the Database of Macromolecular Movements (MolMovDB). Initially published in 1998 the main functionality was to interpolate the movements of macromolecules between two known crystal structures (ref). In 2005 a number of additions were made (ref). These additions include a more accurate method for interpolating multi – chain macromolecules, and an updated interface. In 2008 a Normal mode hinge prediction modal was added so that users could detect hinges in uploaded structures (ref). The MolMovDB and its subsequent additions have provided knowledge about the functioning of proteins and of the structure of potential new drugs.
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://molmovdb.org/molmovdb/morph '''Morph Server''']||style="width:7%; text-align:center;"|2000||Generates a plausible pathway between two conformations of a protein or nucleic acid structure. A large number of statistics and several high-quality movies are output.
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://spine.nesg.org '''SPINE''']||style="width:7%; text-align:center;"|2001||A laboratory-information management system (LIMS) for the [http://www.nesg.org NorthEast Structural Genomics Consortium]. The online version is restricted to consortium users, but most of the code is freely available for download.
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://geometry.molmovdb.org '''Macromolecular Packing Tools''']||style="width:7%; text-align:center;"|1994-2009||A number of programs for calculating properties of protein and nucleic acid structures have been collected into a single distribution.  Included are a library of utility functions for dealing with structures, and a convenient interactive command-line interpreter.
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://helix.gersteinlab.org/ '''HIT''']||style="width:7%; text-align:center;"|2006||(Helix Interaction Tool). A web-based comprehensive package of tools for analyzing helix-helix interactions in proteins.
+
-
|}
+
-
 
+
-
=Other=
+
-
:{|class="wikitable sortable" border="1" cellspacing="0" cellpadding="10"
+
-
|- bgcolor="lightsteelblue"
+
-
!Tool Name!!Release Date!!class="unsortable"|Description
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://pubnet.gersteinlab.org/ '''PubNet''']||style="width:7%; text-align:center;"|2005||A web-based tool that extracts several types of relationships returned by PubMed queries and maps them into networks, allowing for graphical visualization, textual navigation, and topological analysis. 
+
-
|-style="height: 100px;"
+
-
|style="width:15%; text-align:center;"|[http://hub.gersteinlab.org/ir-supp/ '''HUB''']||style="width:7%; text-align:center;"|???||A tool for leveraging the structure of the semantic web to enhance information retrieval for proteomics. This tool helps Proteomics researchers to be able to quickly retrieve relevant information from the web and the biomedical literature.
+
-
|}
+

Latest revision as of 14:50, 5 May 2014

Please refer to the Resources page.

Personal tools