FusionSeq Requirements

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Provided)
(Human genome GRCh37/hg19)
 
(16 intermediate revisions not shown)
Line 1: Line 1:
{{FusionSeqHeader}}
{{FusionSeqHeader}}
==Software Requirements==
==Software Requirements==
-
FusionSeq requires several additional packages to be installed in order to carry out the analysis and visualize the results. Moreover, since its modularity, different programs would need specific libraries. Moreover, some data sets are also required for the analysis (see [[#Data Requirements|Data Requirements]]). Here we describe the complete set of tools that one would need to run the analysis as we do in our lab. The modules should be installed in the listed order.
+
FusionSeq requires several additional packages to be installed in order to carry out the analysis and visualize the results. Moreover, since its modularity, different programs would need specific libraries. Moreover, some data sets are also required for the analysis (see [[#Data Requirements|Data Requirements]]). Here we describe the complete set of tools that one would need to run the analysis as we do in our lab. The modules should be installed in the listed order.
 +
 
 +
'''Note''': the following instructions apply if one wants to compile FusionSeq from the source code (all versions). Alternatively, one can download the [[FusionSeq_Download#Binaries|binaries]] (version 7.0 and later).  
===Alignment tools===
===Alignment tools===
Line 9: Line 11:
===Scientific and bioinformatics libraries===
===Scientific and bioinformatics libraries===
-
* [http://www.gnu.org/software/gsl/ GNU Scientific Library (GSL)]: this library is a required for the compilation of the [http://rnaseq.gersteinlab.org/doc/bios/ BIOS].
+
* [http://www.gnu.org/software/gsl/ GNU Scientific Library (GSL)]: this library is a required for the compilation of [http://rnaseq.gersteinlab.org/doc/bios/ BIOS]. As a reference, we tested FusionSeq with gsl-1.14.
-
* [http://rnaseq.gersteinlab.org/doc/bios/ BIOS]: this library can be downloaded as part of [http://rseqtools.gersteinlab.org RSEQtools], a computational framework to analyze RNA-Seq data, or it can be downloaded as a separate component from [http://rnaseq.gersteinlab.org/fusionseq/tarballs/bios_0.9.0.tar.gz here].
+
=====(versions 0.7.0 and later)=====
 +
* Starting with version 0.7.0, two new libraries are required:
 +
** [http://rnaseq.gersteinlab.org/doc/bios/ libbios], which replaces the old BIOS, can be downloaded [http://rnaseq.gersteinlab.org/fusionseq/tarballs/libbios-1.1.0.tar.gz here].
 +
** [http://rnaseq.gersteinlab.org/doc/mrf/ libmrf] can be downloaded [http://rnaseq.gersteinlab.org/fusionseq/tarballs/libmrf-1.0.0.tar.gz here]
 +
Please note that this libraries are for the "early access" version of FusionSeq.
-
Instructions to install [http://www.gnu.org/software/gsl/ GSL] and [http://rnaseq.gersteinlab.org/doc/bios/ BIOS] can be found in [[Installation_and_Configuration_of_FusionSeq]]. However, please ensure that you read all the requirements (including [[#Data_requirements|Data requirements]]) and downloaded all the libraries and packages needed.
+
=====(versions up to 0.6.1)=====
 +
* [http://rnaseq.gersteinlab.org/doc/bios/ BIOS] library: this library can be downloaded as part of [http://rseqtools.gersteinlab.org RSEQtools], a computational framework to analyze RNA-Seq data, or it can be downloaded as a separate component from [http://rnaseq.gersteinlab.org/fusionseq/tarballs/bios_0.9.0.tar.gz here].
 +
 
 +
 
 +
Instructions to install [http://www.gnu.org/software/gsl/ GSL] and [http://rnaseq.gersteinlab.org/doc/bios/ BIOS] (or libbios and libmrf -- depending on the version) can be found in '''[[Installation and Configuration of FusionSeq]]'''. However, please ensure that you read all the requirements (including [[#Data_requirements|Data requirements]]) and downloaded all the libraries and packages needed.
===Drawing tools===
===Drawing tools===
-
* [http://www.libgd.org/Main_Page GD library]: The gd library is used to create schematic images of the PE reads connecting the two genes. It is required by [[FusionSeq_List_of_programs#gfr2images|gfr2images]], which is an optional component of FusionSeq.
+
* [http://www.boutell.com/gd/ GD library]: The gd library is used to create schematic images of the PE reads connecting the two genes. It is required by [[FusionSeq_List_of_programs#gfr2images|gfr2images]], which is an optional component of FusionSeq. As a reference, we tested FusionSeq with gd-2.0.35.
===Data analysis===
===Data analysis===
Line 28: Line 38:
*[http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ Homo Sapiens Reference genome (hg18)]: the user should download both chromFa.zip and hg18.2bit.
*[http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ Homo Sapiens Reference genome (hg18)]: the user should download both chromFa.zip and hg18.2bit.
The human genome needs to be properly indexed to be used by bowtie. Please see the instruction of bowtie for performing this operation. Indicatevely, you would need to run something like:  
The human genome needs to be properly indexed to be used by bowtie. Please see the instruction of bowtie for performing this operation. Indicatevely, you would need to run something like:  
-
  $ bowtie-build -f hg18_nh.fa /path2bowtieIndex/hg18_nh/
+
  $ bowtie-build -f hg18_nh.fa /path/to/bowtie/Index/hg18_nh/hg18_nh
where '''hg18_nh.fa''' corresponds to the concatenation of all human chromosomes from chromFa.zip ''without'' the different haplotypes and the "random" sequences.
where '''hg18_nh.fa''' corresponds to the concatenation of all human chromosomes from chromFa.zip ''without'' the different haplotypes and the "random" sequences.
===Provided===
===Provided===
-
The following data sets (for hg18), bundled in a tarball, can be downloaded [http://rnaseq.gersteinlab.org/fusionseq/tarballs/FusionSeq_Annotation_Data_hg18_1.1.tar.gz here (hg18)].  
+
The following data sets (for hg18), bundled in a tarball, can be downloaded [http://rnaseq.gersteinlab.org/fusionseq/tarballs/FusionSeq_Annotation_Data_hg18_1.1.tar.gz here (hg18)]. For hg19 see [[#Human genome GRCh37/hg19|below]].
* knownGeneAnnotationTranscriptCompositeModel.txt - the interval file with the coordinates of the composite models
* knownGeneAnnotationTranscriptCompositeModel.txt - the interval file with the coordinates of the composite models
-
* knownGeneAnnotationTranscriptCompositeModel.fa - the sequence of all the composite transcripts
+
* knownGeneAnnotationTranscriptCompositeModel.fa - the sequences of all the composite transcripts
* kgXref.txt - the mapping between the UCSC knownGene annotation set and other information (RefSeq, gene symbols and description etc.)
* kgXref.txt - the mapping between the UCSC knownGene annotation set and other information (RefSeq, gene symbols and description etc.)
* knownToTreefam.txt - the mapping between UCSC knownGene annotation and TreeFam
* knownToTreefam.txt - the mapping between UCSC knownGene annotation and TreeFam
Line 44: Line 54:
<pre>
<pre>
$ bowtie-build -f knownGeneAnnotationTranscriptCompositeModel.fa  
$ bowtie-build -f knownGeneAnnotationTranscriptCompositeModel.fa  
-
   /path2bowtieIndex/hg18_knownGeneAnnotationTranscriptCompositeModel/hg18_knownGeneAnnotationTranscriptCompositeModel
+
   /path/to/bowtie/Index/hg18_knownGeneAnnotationTranscriptCompositeModel/hg18_knownGeneAnnotationTranscriptCompositeModel
</pre>
</pre>
 +
knownGeneAnnotationTranscriptCompositeModel.txt (the interval file) and knownGeneAnnotationTranscriptCompositeModel.fa (the sequences) should be located in the same directory.
 +
Although we extensively used the UCSC knownGene annotation set, it is worth mentioning that it is possible to use other gene annotation sets. However, in this case, the same information, and in the same format, should be provided to the corresponding programs.
Although we extensively used the UCSC knownGene annotation set, it is worth mentioning that it is possible to use other gene annotation sets. However, in this case, the same information, and in the same format, should be provided to the corresponding programs.
 +
=====Human genome GRCh37/hg19=====
The corresponding version of these files for hg19 can be found [http://rnaseq.gersteinlab.org/fusionseq/tarballs/FusionSeq_Annotation_Data_hg19_1.0.tar.gz here (hg19)].  
The corresponding version of these files for hg19 can be found [http://rnaseq.gersteinlab.org/fusionseq/tarballs/FusionSeq_Annotation_Data_hg19_1.0.tar.gz here (hg19)].  
-
 
<center>[[#top|Top]]</center>
<center>[[#top|Top]]</center>

Latest revision as of 11:53, 7 May 2011

FusionSeq main web page
User documentation main

Contents

Software Requirements

FusionSeq requires several additional packages to be installed in order to carry out the analysis and visualize the results. Moreover, since its modularity, different programs would need specific libraries. Moreover, some data sets are also required for the analysis (see Data Requirements). Here we describe the complete set of tools that one would need to run the analysis as we do in our lab. The modules should be installed in the listed order.

Note: the following instructions apply if one wants to compile FusionSeq from the source code (all versions). Alternatively, one can download the binaries (version 7.0 and later).

Alignment tools

Please make sure that blat and bowtie executables are part of the PATH, i.e. they can be accessed and executed from any location on your file system. Moreover, make sure that twoBitToFa is also downloaded from the blat package and part of the PATH.

Scientific and bioinformatics libraries

(versions 0.7.0 and later)
  • Starting with version 0.7.0, two new libraries are required:

Please note that this libraries are for the "early access" version of FusionSeq.

(versions up to 0.6.1)
  • BIOS library: this library can be downloaded as part of RSEQtools, a computational framework to analyze RNA-Seq data, or it can be downloaded as a separate component from here.


Instructions to install GSL and BIOS (or libbios and libmrf -- depending on the version) can be found in Installation and Configuration of FusionSeq. However, please ensure that you read all the requirements (including Data requirements) and downloaded all the libraries and packages needed.

Drawing tools

  • GD library: The gd library is used to create schematic images of the PE reads connecting the two genes. It is required by gfr2images, which is an optional component of FusionSeq. As a reference, we tested FusionSeq with gd-2.0.35.

Data analysis

  • ROOT: this is a very powerful mathematical and computational framework. In the context of FusionSeq, it is used to perform a Kolomogorov-Smirnov analysis for filtering the breakpoint junctions and plotting the insert-size distribution.
Top

Data Requirements

Here is the list of required data for a comprehensive use of FusionSeq tools.

External

The human genome needs to be properly indexed to be used by bowtie. Please see the instruction of bowtie for performing this operation. Indicatevely, you would need to run something like:

$ bowtie-build -f hg18_nh.fa /path/to/bowtie/Index/hg18_nh/hg18_nh

where hg18_nh.fa corresponds to the concatenation of all human chromosomes from chromFa.zip without the different haplotypes and the "random" sequences.

Provided

The following data sets (for hg18), bundled in a tarball, can be downloaded here (hg18). For hg19 see below.

  • knownGeneAnnotationTranscriptCompositeModel.txt - the interval file with the coordinates of the composite models
  • knownGeneAnnotationTranscriptCompositeModel.fa - the sequences of all the composite transcripts
  • kgXref.txt - the mapping between the UCSC knownGene annotation set and other information (RefSeq, gene symbols and description etc.)
  • knownToTreefam.txt - the mapping between UCSC knownGene annotation and TreeFam
  • hg18_repeatMasker.interval - the interval file, i.e. the file with the coordinates, of the repetitive regions
  • ribosomal.2bit - the ribosomal sequences in 2bit format

The composite model needs to be indexed by bowtie:

$ bowtie-build -f knownGeneAnnotationTranscriptCompositeModel.fa 
  /path/to/bowtie/Index/hg18_knownGeneAnnotationTranscriptCompositeModel/hg18_knownGeneAnnotationTranscriptCompositeModel

knownGeneAnnotationTranscriptCompositeModel.txt (the interval file) and knownGeneAnnotationTranscriptCompositeModel.fa (the sequences) should be located in the same directory.

Although we extensively used the UCSC knownGene annotation set, it is worth mentioning that it is possible to use other gene annotation sets. However, in this case, the same information, and in the same format, should be provided to the corresponding programs.

Human genome GRCh37/hg19

The corresponding version of these files for hg19 can be found here (hg19).

Top
Personal tools