FusionSeq FAQ
From GersteinInfo
User documentation main
General Questions
Does FusionSeq work with my favorite alignment tool?
The format of the paired-end reads that is "understood" by FusionSeq is Mapped Read Format (MRF). We provide several conversion tools from most common alignment programs and formats, including SAM/BAM, to represent mapped reads using MRF. Please take a look at RSEQtools for more information, and specifically to: Format conversion utilities.
Does FusionSeq work with colorspace paired-end reads?
FusionSeq has been developed to be as much independent as possible from the sequencing technology and the alignment tool. However, extensive testing was conducted on Illumina Genome Analyzer II platform only.
Where can I obtain the annotation data for hg19?
Annotation data for hg19 can be found here.
Can I use FusionSeq with my favorite species?
In principle, you can run FusionSeq using any paired-end RNA-Seq data. However, you would need to provide the corresponding data that is currently used for human, i.e.:
- a genome sequence, in 2bit format
- a gene annotation set in interval format; including composite models of genes
- the sequences of the composite models in the gene annotation set
- a mapping between your gene annotation and TreeFam (optional, used by gfrLargeScaleHomologyFilter)
- a list of the repetitive regions, in interval format (optional, used by gfrRepeatMaskerFilter)
- a ribosomal sequence library in 2bit format (optional, used by gfrRibosomalFilter)
- the mapping between your gene annotation and other descriptive information, e.g. gene symbols, descriptions, etc. (optional, used by gfrAddInfo)
Where can I find some data sets to test FusionSeq?
Please find some test data sets here.
Is there a demo version of FusionSeq?
A demo version of the web-interface of FusionSeq is available here. You can access the results described in the paper, by typing the sample ID (e.g. 106_T, 1700_D, etc.).
Where can I find more information?
The most up-to-date user documentation for FusionSeq is available here. If you look for the developer's documentation, you can find it here.
The BOWTIE_INDEXES directory is used for reference indexes as well as for temporary index files. However, I have a centralized repository of indexed genomes and cannot create the temporary files in that directory
A workaround of this issue would be to create a local directory where one has write permission. This would solve the problem of generating temporary index files when running the junction-sequence identifier module. To also have the indexed genome and transcriptome in the same folder, one could link them symbolically, for example:
$ cd /path/to/local/folder/ $ ln -s /path/to/centralized/repository/hg18_nh/ . $ ln -s /path/to/centralized/repository/hg18_knownGeneAnnotationTranscriptCompositeModel/ .
Now, BOWTIE_INDEXES in geneFusionConfig.h should point to /path/to/local/folder/ where the user can generate temporary files:
#define BOWTIE_INDEXES /path/to/local/folder
How can I cite FusionSeq?
Please cite this publication:
- Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, Tewari AK, Kitabayashi N, Moss BJ, Chee MS, Demichelis F, Rubin MA, Gerstein MB. FusionSeq: a modular framework for finding gene fusions by analyzing Paired-End RNA-Sequencing data. Genome Biol 21 Oct. 2010; 11:R104 [1]
Compilation troubleshooting
Where can I find the BIOS library, required for FusionSeq?
As described in Requirements, the BIOS library can be downloaded as part of RSEQtools, a computational framework to analyze RNA-Seq data, or it can be downloaded as a separate component from here.
TROOT.h: No such file or directory
This error occurs because the compiler does not find TROOT.h file. This file is part of ROOT, a framework for mathematical and statistical analysis. If you have installed ROOT, please make sure that you have defined ROOTSYS as the path to the ROOT folder and added it to your PATH:
$ export ROOTSYS=/path/to/ROOT/ $ export PATH=$ROOTSYS/bin:$PATH
Please also see Installing and configuring ROOT for more details.
Running issues
FusionSeq does not find the annotation datasets. However, geneFusionConfig.h specifies their correct location and the files are present.
This error:
ls_createFromFile '$HOME/path/to/data/annotation_data.txt'
occurs because environmental variable, such as $HOME, are not interpreted. Please use full path names in geneFusionConfig.h to specify directory locations.
I followed the instructions, but I still get many WARNINGs. Is this expected?
Yes, every program in FusionSeq provides some logging information. We recommend to capture the log data by redirecting STDERR (e.g. '2> fusionseq.log').
geneFusions: Segmentation Fault
There a number of reasons why one gets this error. One possibility is the lack of the sequences in the MRF file. Although MRF does not require the inclusion of sequences to be valid, sequences are indeed required by geneFusions. Please ensure that sequences are present in the MRF file.
gfrConfidenceValues: Cannot find the .meta file
The .meta file is required to run gfrConfidenceValues. This is a tab-delimited file including the number of mapped reads. A simple way to generate this file is to run:
$ MAPPED=$(grep -v "AlignmentBlock" file.mrf | grep -v "#" | wc -l); printf "Mapped_reads\t%d\n" $MAPPED > file.meta
The final files should look like:
Mapped_reads 123456789
Paired-end reads and bowtie: I aligned each end separately. How do I convert the alignment file to MRF?
To convert bowtie alignment into MRF when ends are aligned separately, we require the two ends to be on subsequent lines. This could be partially achieved by concatenating and sorting the two alignment files, e.g. cat end_1.bowtie end_2.bowtie | sort > alignment.bowtie. However, in some cases, only one end is mapped, thus creating "singletons" in the alignment file, where only one end is reported. Since there have been many requests regarding this issue, we decided to share an "internal" utility: bowtiePairedFix. Here you can download the binary file:
The conversion command is:
cat end_1.bowtie end_2.bowtie | sort | bowtiePairedFix | bowtie2mrf paired -sequence > data.mrf 2> data.mrf.log
Please note that this program is also provided "as is".