Revision as of 16:14, 25 September 2019

The tools & resources listed below have been published and are actively maintained by the Gerstein lab. You may view a list of the associated literature here.

In addition to the tools below, the lab has also published a number of tools that are not currently being actively maintained.

You may also access tools that have not yet been published

Source code for all software is available on our Github page: Github.gersteinlab.org (or, equivalently, github.com/gersteinlab)

Portals

MolMovDB

Name	Description
MolMovDB	Servers and a suite of accessory tools for the analysis of conformational changes in protein and nucleic acid structures.

Networks

Name	Description
Networks	The Gerstein lab has been a pioneer in applying network analysis to generate knowledge form large-scale experiments. To this end, we have developed a portal for our network research.

Pseudogene.org

Name	Description
Pseudogene.org	Pseudogene.org is a collection of resources related to our efforts to survey eukaryotic genomes for pseudogene sequences, "pseudo-fold" usage, amino-acid composition, and single-nucleotide polymorphisms (SNPs) to help elucidate the relationships between pseudogene families across several organisms.

Structural Variants (SV)

Name	Description
Structural Variants	Software that may be used to investigate Structural Variations (SVs) and Copy Number Variations (CNVs).

Data Sets

Name	Release Date	Description
BreakDB	2009	This database, which is part of the PEMer package, contains information about structural variants and associated breakpoints.
PsychENCODE resource	2018	This website is a comprehensive functional genomic resource for the human brain from PsychENCODE Phase I, including all derived data, integrative models and links to raw data.

Evolution

Name	Release Date	Description
Coevolution analysis of protein residues	2008	An integrated online system that enables comparative analyses of residue coevolution with a comprehensive set of commonly used scoring functions, including statistical coupling analysis (SCA), explicit likelihood of subset variation (ELSC), mutual information and correlation-based methods.

Genome Technology

Gene Regulation

Name	Release Date	Description
Loregic	2015	Loregic is a computational method integrating gene expression and regulatory network data, to characterize the logical cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target, and finds the gate that best matches each triplet’s observed gene expression pattern across many conditions. Using human ENCODE ChIP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs.

Allele-Specific Effects

Name	Release Date	Description
AlleleDB	2016	AlleleDB is an online resource for storing and visualizing allele-specific binding (ASB) and gene expression (ASE). Using variants from the 1000-Genomes Project and RNA-seq and ChIP-seq data from related projects, this resource serves as a repository for the catalog of ASB and ASE variants, associated genomic elements and personal genomes used in the study. AlleleDB also interfaces with the UCSC browser for visualization of results.
AlleleSeq	2011	AlleleSeq is a computational pipeline that is used to study allele-specific expression (ASE) and allele specific binding (ASB). The pipeline first constructs a diploid personal genome sequence, then maps RNA-seq and ChIP-seq functional genomic data onto this personal genome. Consequently, locations in which there are differences in number of mapped reads between maternally- and paternally-derived sequences can be identified, thereby providing evidence for allele-specific events.

ChIP-Seq

Name	Release Date	Description
MUSIC Github repo	2014	MUSIC is an algorithm for identification of enriched regions at multiple scales in the read depth signals from ChIP-Seq experiments.
PeakSeq Github repo	2009	A tool for calling peaks corresponding to transcription factor binding sites from ChIP-Seq data scored against a matched control such as input DNA. PeakSeq employs a two-pass strategy in which putative binding sites are first identified in order to compensate for genomic variation in the 'mappability' of sequences, before a second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances.

Functional Annotation

Name	Release Date	Description
FunSeq - & - FunSeq2 Github	2013 & 2014	These tools can be used to automatically score and annotate the disease-causing potential of SNVs, particularly those which are non-coding. FunSeq can detect recurrent annotation elements in non-coding regions when running with multiple personal genomes. FunSeq2 is an extension of FunSeq that provides a means of prioritizing somatic variants from cancer whole genome sequencing.
LARVA Github repo	2015	LARVA is a computational framework designed to facilitate the study of noncoding variants. It addresses issues that have made it difficult to derive an accurate model of the background mutation rates of noncoding elements in cancer genomes. LARVA integrates a comprehensive set of noncoding functional elements, modeling their mutation count with a beta-binomial distribution to handle overdispersion. Moreover, LARVA uses regional genomic features (such as replication timing) to better estimate local mutation rates and mutational enrichments.
VAT Github repo	2012	A computational framework to functionally annotate variants in personal genomes using a cloud-computing environment.
ALoFT Github repo	2017	A method to annotate and predict the disease-causing potential of loss-of-function variants.
MOAT Github repo	2017	MOAT (Mutations Overburdening Annotations Tool) is a computational system for identifying significant mutation burdens in genomic elements with an empirical, nonparametric method. Taking a set of variant calls and a set of annotations, MOAT calculates which annotations have observed variant counts that are substantially elevated with respect to a distribution of expected variant counts determined by permutation of the input data.
uORFs Github repo	2018	A catalog of predicted functional upstream open reading frames (uORFs) in humans.
GRAM Github repo	2019	A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell type-specific manner. GRAM combines a universal regulatory score defined by transcription factor binding with an easily obtainable modifier defined by transcription factor binding and expression to reflect the particular cell type. To use GRAM, you need to provide a non-coding variant file in BED format and the whole genome expression file (see github repo for details).

Microarrays & Proteomics

Name	Release Date	Description
MOTIPS	2010	MOTIPS employs an efficient search algorithm to scan a target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder.
PARE	2007	Protein Abundance and mRNA Expression (PARE) is a tool for comparing protein abundance and mRNA expression data. In addition to globally comparing the quantities of protein and mRNA, PARE allows users to select subsets of proteins for focused study (based on functional categories and complexes). Furthermore, it highlights correlation outliers, which may warrant further investigation.

RNA-Seq

Name	Release Date	Description
ACT	2011	The aggregation and correlation toolbox (ACT) is an efficient, multifaceted toolbox for analyzing continuous signal and discrete region tracks from high-throughput genomic experiments, such as RNA-seq or ChIP-chip signal profiles from the ENCODE and modENCODE projects, or lists of single nucleotide polymorphisms from the 1000 genomes project.
FusionSeq Github repo	2010	FusionSeq may be used to identify fusion transcripts from paired-end RNA-sequencing. FusionSeq includes filters to remove spurious candidate fusions with artifacts, such as misalignment or random pairing of transcript fragments, and it ranks candidates according to several statistics. It also includes a module to identify exact sequences at breakpoint junctions.
IQseq Github repo	2012	A tool for isoform quantification with RNA-seq data. Given isoform annotation and alignment of RNA-seq reads, it will use an EM algorithm to infer the most probable expression level for each isoform of a gene.
RSEQtools Github repo	2011	A suite of tools that use Mapped Read Format (MRF) for the analysis of RNA-Seq experiments. MRF is a compact data format that enables anonymization of confidential sequence information while maintaining the ability to conduct subsequent functional genomics studies. RSEQtools provides a suite of modules that convert to/from MRF data and perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads, and segmenting that signal into actively transcribed regions.
TeXP Github repo	2019	TeXP accounts and removes the effects of pervasive transcription when quantifying LINE activity. Our method uses the broad distribution of LINEs to estimate the effects of pervasive transcription. Using TeXP, we processed thousands of transcriptome datasets to uniformly, and unbiasedly measure LINE-1 activity across healthy somatic cells.

Structural Variation

Name	Release Date	Description
AGE (citation)	2011	AGE is used for defining breakpoints of genomic structural variants at single-nucleotide resolution, using optimal alignments with gap excision.
CNVnator (citation)	2011	CNVnator may be used to discover, genotype, and characterize typical and atypical CNVs from familial and population genome sequencing.

Networks

Name	Release Date	Description
DynaSIN	2011	The Dynamic Structure Interaction Network (DynaSIN) is a resource for studying protein-protein interaction networks in the context of conformational changes.
OrthoClust	2014	A computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific.
PubNet	2005	A web-based tool that extracts several types of relationships returned by PubMed queries and maps them on to networks, allowing for graphical visualization, textual navigation, and topological analysis.
TopNet	2004	TopNet is an automated web tool designed compare the topologies of sub-networks, looking for global differences associated with different types of proteins. This automated web tool designed to address this question, calculating and comparing topological characteristics for different sub-networks derived from any given protein network.
tYNA	2006	(TopNet-like Yale Network Analyzer). A Web system for managing, comparing and mining multiple networks, both directed and undirected. tYNA efficiently implements methods that have proven useful in network analysis, including identifying defective cliques, finding small network motifs (such as feed-forward loops), calculating global statistics (such as the clustering coefficient and eccentricity), and identifying hubs and bottlenecks etc.

Structure and Macromolecular Motions

Name	Release Date	Description
3V	2010	The 3V web server extracts and comprehensively analyzes the internal volumes of input RNA and protein structures. It identifies internal volumes by taking the difference between two rolling-probe solvent-excluded surfaces.
HIT	2006	The Helix Interaction Tool (HIT) is a comprehensive package for analyzing helix-helix packing in proteins. This enables the user to obtain quantitative measures of the helix interaction surface area and helix crossing angle, as well as several methods for visualizing the helical interaction.
Macromolecular Geometry and Packing Tools	1994-2009	A number of programs for calculating properties of protein and nucleic acid structures have been collected into a single distribution. Included is a library functions for analyzing structures, a convenient interactive command-line interpreter, and software for the calculation of geometrical quantities associated with macromolecular structures and their motions.
Morph Server	2000	A web server for generating and viewing models of protein conformational change using interpolation with energy minimization. The user may opt to use either single- or multi-chain proteins as input.
STRESS Github repo	2016	STRucturally-identified ESSential residues (STRESS) is a web tool that enables users to submit PDB-formatted protein structures to predict both surface- and interior-allosteric residues. The software behind this tool employs 3D structures to build models of protein conformational change in order to perform allosteric site predictions.
Intensification Github repo	2016	Intensification is a database that contains the results for 12 repeat protein domains, from the amplification of population-genetic signal by constructing a motif-based multiple sequence alignment (motif-MSA). We make use of the modular structure of repeat motifs to amplify signals of selection from population genetics and traditional inter-species conservation.

more

more tools & resources

@@ Line 1: / Line 1: @@
-The tools & resources listed on this wiki page have been '''published and are actively maintained''' by the Gerstein lab. You may view a list of the associated literature [http://papers.gersteinlab.org/subject/coretools/index.html here].
+The tools & resources listed below have been '''published and are actively maintained''' by the Gerstein lab. You may view a list of the associated literature [http://papers.gersteinlab.org/subject/coretools/index.html '''here'''].
-In addition to the '''actively maintained tools''' listed below, the lab has also published a number of [http://info.gersteinlab.org/More_tools '''additional tools that are not currently maintained'''].
+In addition to the tools below, the lab has also published a number of [http://info.gersteinlab.org/More_tools '''tools that are not currently being actively maintained'''].
-Lab resources that have '''not been published''' are given [http://info.gersteinlab.org/Even_more_tools here].
+You may also access tools that have [http://info.gersteinlab.org/Even_more_tools '''not yet been published''']
-Source code for all software is available on our [http://github.gersteinlab.org/ Github page].
+Source code for all software is available on our Github page: [http://github.gersteinlab.org/ Github.gersteinlab.org] (or, equivalently, [http://github.com/gersteinlab github.com/gersteinlab])
 =Portals=
@@ Line 50: / Line 50: @@
 |-style="height: 100px;"
 |style="width:15%; text-align:center;"|[http://sv.gersteinlab.org/breakdb/ '''BreakDB''']||style="width:7%; text-align:center;"|2009||This database, which is part of the PEMer package, contains information about structural variants and associated breakpoints.
+|-style="height: 100px;"
+|style="width:15%; text-align:center;"|[http://resource.psychencode.org/ '''PsychENCODE resource''']||style="width:7%; text-align:center;"|2018||This website is a comprehensive functional genomic resource for the human brain from PsychENCODE Phase I, including all derived data, integrative models and links to raw data.
 |}
@@ Line 102: / Line 104: @@
 |-style="height: 100px;"
+|style="width:15%; text-align:center;"|[http://vat.gersteinlab.org/ '''VAT'''] <br> [http://github.gersteinlab.org/vat/ Github repo] ||style="width:7%; text-align:center;"|2012|| A computational framework to functionally annotate variants in personal genomes using a cloud-computing environment.
+|-style="height: 100px;"
+|style="width:15%; text-align:center;"|[http://aloft.gersteinlab.org/ '''ALoFT'''] <br> [https://github.com/gersteinlab/aloft Github repo] ||style="width:7%; text-align:center;"|2017|| A method to annotate and predict the disease-causing potential of loss-of-function variants.
+|-style="height: 100px;"
+|style="width:15%; text-align:center;"|[http://moat.gersteinlab.org/ '''MOAT'''] <br> [https://github.com/gersteinlab/MOAT Github repo] ||style="width:7%; text-align:center;"|2017|| MOAT (Mutations Overburdening Annotations Tool) is a computational system for identifying significant mutation burdens in genomic elements with an empirical, nonparametric method. Taking a set of variant calls and a set of annotations, MOAT calculates which annotations have observed variant counts that are substantially elevated with respect to a distribution of expected variant counts determined by permutation of the input data.
+|-style="height: 100px;"
+|style="width:15%; text-align:center;"|[http://github.gersteinlab.org/uORFs/ '''uORFs'''] <br> [https://github.com/gersteinlab/uORFs Github repo] ||style="width:7%; text-align:center;"|2018|| A catalog of predicted functional upstream open reading frames (uORFs) in humans.
+|-style="height: 100px;"
+|style="width:15%; text-align:center;"|[https://github.com/gersteinlab/GRAM '''GRAM'''] <br> [https://github.com/gersteinlab/GRAM Github repo] ||style="width:7%; text-align:center;"|2019|| A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell type-specific manner. GRAM combines a universal regulatory score defined by transcription factor binding with an easily obtainable modifier defined by transcription factor binding and expression to reflect the particular cell type. To use GRAM, you need to provide a non-coding variant file in BED format and the whole genome expression file (see github repo for details).
-|style="width:15%; text-align:center;"|[http://vat.gersteinlab.org/ '''VAT'''] <br> [http://github.gersteinlab.org/vat/ Github repo] ||style="width:7%; text-align:center;"|2012|| A computational framework to functionally annotate variants in personal genomes using a cloud-computing environment.
 |}
@@ Line 137: / Line 140: @@
 |-style="height: 100px;"
 |style="text-align: center;"|[http://archive.gersteinlab.org/proj/rnaseq/rseqtools/ '''RSEQtools''']<br>[http://github.gersteinlab.org/RSEQtools/ Github repo]||style="text-align:center;"|2011||A suite of tools that use Mapped Read Format (MRF) for the analysis of RNA-Seq experiments. MRF is a compact data format that enables anonymization of confidential sequence information while maintaining the ability to conduct subsequent functional genomics studies. RSEQtools provides a suite of modules that convert to/from MRF data and perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads, and segmenting that signal into actively transcribed regions.
+|-style="height: 100px;"
+|style="text-align: center;"|[https://github.com/gersteinlab/texp/blob/master/README.md '''TeXP''']<br>[https://github.com/gersteinlab/texp Github repo]||style="text-align:center;"|2019||TeXP accounts and removes the effects of pervasive transcription when quantifying LINE activity. Our method uses the broad distribution of LINEs to estimate the effects of pervasive transcription. Using TeXP, we processed thousands of transcriptome datasets to uniformly, and unbiasedly measure LINE-1 activity across healthy somatic cells.
 |}
@@ Line 180: / Line 185: @@
 |-style="height: 100px;"
 |style="width:15%; text-align:center;"|[http://stress.molmovdb.org/ '''STRESS'''] <br> [https://github.com/gersteinlab/STRESS Github repo] ||style="width:7%; text-align:center;"|2016||STRucturally-identified ESSential residues (STRESS) is a web tool that enables users to submit PDB-formatted protein structures to predict both surface- and interior-allosteric residues. The software behind this tool employs 3D structures to build models of protein conformational change in order to perform allosteric site predictions.
+|-style="height: 100px;"
+|style="width:15%; text-align:center;"|[http://intensification.gersteinlab.org// '''Intensification'''] <br> [https://github.com/gersteinlab/Intensification Github repo] ||style="width:7%; text-align:center;"|2016||Intensification is a database that contains the results for 12 repeat protein domains, from the amplification of population-genetic signal by constructing a motif-based multiple sequence alignment (motif-MSA). We make use of the modular structure of repeat motifs to amplify signals of selection from population genetics and traditional inter-species conservation.
 |-style="height: 100px;"
 |}

Resources: Difference between revisions

Revision as of 16:14, 25 September 2019

Contents

Portals

MolMovDB

Networks

Pseudogene.org

Structural Variants (SV)

Data Sets

Evolution

Genome Technology

Gene Regulation

Allele-Specific Effects

ChIP-Seq

Functional Annotation

Microarrays & Proteomics

RNA-Seq

Structural Variation

Networks

Structure and Macromolecular Motions

more

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools