FusionSeq FAQ

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Paired-end reads and bowtie: I aligned each end separately. How do I convert the alignment file to MRF?)
(Paired-end reads and bowtie: I aligned each end separately. How do I convert the alignment file to MRF?)
Line 60: Line 60:
To convert bowtie alignment into MRF when ends are aligned separately, we require the two ends to be on subsequent lines. This could be partially achieved by concatenating and sorting the two alignment files, e.g. cat end_1.bowtie end_2.bowtie | sort  > alignment.bowtie. However, in some cases, only one end is mapped, thus creating "singleton" in the alignment file, where only one end is reported. Since there have been many requests regarding this issue, we decided to share an "internal" utility: bowtiePairedFix. Here you can download the binary file:
To convert bowtie alignment into MRF when ends are aligned separately, we require the two ends to be on subsequent lines. This could be partially achieved by concatenating and sorting the two alignment files, e.g. cat end_1.bowtie end_2.bowtie | sort  > alignment.bowtie. However, in some cases, only one end is mapped, thus creating "singleton" in the alignment file, where only one end is reported. Since there have been many requests regarding this issue, we decided to share an "internal" utility: bowtiePairedFix. Here you can download the binary file:
-
* [http://archive.gersteinlab.org/proj/rnaseq/fusionseq/tarballs/bowtiePairedFix_linux64 bowtiePairedFix (GNU/Linux x86_64)]
+
* [http://archive.gersteinlab.org/proj/rnaseq/fusionseq/tarballs/bowtiePairedFix.linux64 bowtiePairedFix (GNU/Linux x86_64)]
* [http://archive.gersteinlab.org/proj/rnaseq/fusionseq/tarballs/bowtiePairedFix.MacOs.10.6.7 bowtiePairedFix (MacOs 10.6.7)]
* [http://archive.gersteinlab.org/proj/rnaseq/fusionseq/tarballs/bowtiePairedFix.MacOs.10.6.7 bowtiePairedFix (MacOs 10.6.7)]
The conversion command is:
The conversion command is:

Revision as of 13:26, 18 April 2011

FusionSeq main web page
User documentation main

Contents

General Questions

Does FusionSeq work with my favorite alignment tool?

The format of the paired-end reads that is "understood" by FusionSeq is Mapped Read Format (MRF). We provide several conversion tools from most common alignment programs and formats, including SAM/BAM, to represent mapped reads using MRF. Please take a look at RSEQtools for more information, and specifically to: Format conversion utilities.

Does FusionSeq work with colorspace paired-end reads?

FusionSeq has been developed to be as much independent as possible from the sequencing technology and the alignment tool. However, extensive testing was conducted on Illumina Genome Analyzer II platform only.

Where can I obtain the annotation data for hg19?

Annotation data for hg19 can be found here.

Can I use FusionSeq with my favorite species?

In principle, you can run FusionSeq using any paired-end RNA-Seq data. However, you would need to provide the corresponding data that is currently used for human, i.e.:

  1. a genome sequence, in 2bit format
  2. a gene annotation set in interval format; including composite models of genes
  3. the sequences of the composite models in the gene annotation set
  4. a mapping between your gene annotation and TreeFam (optional, used by gfrLargeScaleHomologyFilter)
  5. a list of the repetitive regions, in interval format (optional, used by gfrRepeatMaskerFilter)
  6. a ribosomal sequence library in 2bit format (optional, used by gfrRibosomalFilter)
  7. the mapping between your gene annotation and other descriptive information, e.g. gene symbols, descriptions, etc. (optional, used by gfrAddInfo)

Where can I find some data sets to test FusionSeq?

Please find some test data sets here.

Is there a demo version of FusionSeq?

A demo version of the web-interface of FusionSeq is available here. You can access the results described in the paper, by typing the sample ID (e.g. 106_T, 1700_D, etc.).

Where can I find more information?

The most up-to-date user documentation for FusionSeq is available here. If you look for the developer's documentation, you can find it here.

How can I cite FusionSeq?

Please cite this publication:

  • Sboner A, Habegger L, Pflueger D, Terry S, Chen DZ, Rozowsky JS, Tewari AK, Kitabayashi N, Moss BJ, Chee MS, Demichelis F, Rubin MA, Gerstein MB. FusionSeq: a modular framework for finding gene fusions by analyzing Paired-End RNA-Sequencing data. Genome Biol 21 Oct. 2010; 11:R104 [1]


Compilation troubleshooting

Where can I find the BIOS library, required for FusionSeq?

As described in Requirements, the BIOS library can be downloaded as part of RSEQtools, a computational framework to analyze RNA-Seq data, or it can be downloaded as a separate component from here.

TROOT.h: No such file or directory

This error occurs because the compiler does not find TROOT.h file. This file is part of ROOT, a framework for mathematical and statistical analysis. If you have installed ROOT, please make sure that you have defined ROOTSYS as the path to the ROOT folder and added it to your PATH:

$ export ROOTSYS=/path/to/ROOT/ 
$ export PATH=$ROOTSYS/bin:$PATH

Please also see Installing and configuring ROOT for more details.

Running issues

FusionSeq does not find the annotation datasets. However, geneFusionConfig.h specifies their correct location and the files are present.

This error:

ls_createFromFile    '$HOME/path/to/data/annotation_data.txt'

occurs because environmental variable, such as $HOME, are not interpreted. Please use full path names in geneFusionConfig.h to specify directory locations.

I followed the instructions, but I still get many WARNINGs. Is this expected?

Yes, every program in FusionSeq provides some logging information. We recommend to capture the log data by redirecting STDERR (e.g. '2> fusionseq.log').

geneFusions: Segmentation Fault

There a number of reasons why one gets this error. One possibility is the lack of the sequences in the MRF file. Although MRF does not require the inclusion of sequences to be valid, sequences are indeed required by geneFusions. Please ensure that sequences are present in the MRF file.

Paired-end reads and bowtie: I aligned each end separately. How do I convert the alignment file to MRF?

To convert bowtie alignment into MRF when ends are aligned separately, we require the two ends to be on subsequent lines. This could be partially achieved by concatenating and sorting the two alignment files, e.g. cat end_1.bowtie end_2.bowtie | sort > alignment.bowtie. However, in some cases, only one end is mapped, thus creating "singleton" in the alignment file, where only one end is reported. Since there have been many requests regarding this issue, we decided to share an "internal" utility: bowtiePairedFix. Here you can download the binary file:

The conversion command is:

cat end_1.bowtie end_2.bowtie | sort | bowtiePairedFix | bowtie2mrf paired > data.mrf 2> data.mrf.log

Please note that this program is also provided "as is".

Personal tools