FusionSeq main web page
User documentation main

Data formats

FusionSeq use a few data formats to perform its operations.

Mapped Read Format (MRF)

This format is defined in the context of RSEQtools. More details can be found here.

Gene Fusion Report (GFR)

This file format defines the relevant information for each fusion transcript candidate. The rationale is that different filters can be applied to exclude “false positives” artificial fusions starting from an initial set. We also provide a parser that interprets this format allowing the user to propagate easily any changes to this format. For a given fusion candidate, involving gene A and gene B, the basic GFR format requires the following fields:

the ID of the fusion candidate (id): typically it contains the sample name and a unique number separated by an underscore. The number is padded with zeros for consistency;
SPER, DASPER and RESPER: scoring of the fusion candidate;
Number of inter-transcript reads (numInter), i.e. the number of pairs having the ends mapped to the two genes;
P-value of the insert size distribution analysis for the fusion transcript. Since we do not know the actual composition of the fusion transcript, we compute the p-value for both directions: AB (where gene A is upstream of gene B - pValueAB) and BA (where gene B is upstream of gene A -- pValueBA);
Mean insert-size value of the minimal fusion transcript fragment. As before for the p-values, we compute both AB and BA versions (interMeanAB, interMeanBA);
Number of intra-transcript reads for gene A (numIntra1) and gene B (numIntra2), respectively, i.e the number of pairs where both ends map to the same gene;
The type of the fusion (fusionType): cis, when both genes are on the same chromosome, or trans, otherwise;
Name(s) of the transcripts (nameTranscript): all the UCSC gene IDs of the isoforms of each gene in the annotation separated by the pipe symbol '|';
Chromosome of the genes (chromosomeTranscript);
Strand information (strandTranscript);
Start and end coordinates of the longest transcript for both genes (startTranscript, endTranscript);
Number of exons in the composite model for both genes (numExonTranscript);
Coordinates of the exons in the composite model (exonCoordinatesTranscript): each exon is separated by the pipe symbol '|' and start and end coordinates are comma-separated;
Exon-pair count: it describes which elements are connected and corresponding number of inter-transcript reads;
interReads: the pair-read type, as well as the exons and the coordinates of the reads joining the two genes. Pair-type, exon number, start and end coordinates are reported as a comma-separated list, with the pipe symbol '|' separating the different pairs. The pair-reads type encodes the different possibilities two reads can be classified to in terms of the gene annotation set:
- 1 : exon-exon
- 2 : exon-intron
- 3 : intron-exon
- 4 : intron-intron
- 5 : intron-boundary
- 6 : exon-boundary
- 7 : boundary-exon
- 8 : boundary-intron
- 9 : boundary-boundary
Reads of the transcripts: the actual sequence of all the inter-reads.
Pair-count: a summary of the number of reads for each category and joined exons (see interReads for the category definition). The field reports the pair-reads type, the number of reads, the two exons that are joined by the pair as a comma-separated list. The different pair types are separated by the pipe "|" symbol.

The GFR format can include additional optional information computed in the subsequent processing. For example, it is possible to add gene symbols (geneSymbolTranscript) and descriptions (descriptionTranscript) from the UCSC knownGene annotation set.

FusionSeq List of programs

Data formats

Mapped Read Format (MRF)

Gene Fusion Report (GFR)

Breakpoint data format (BP)

Core programs

Fusion detection module

geneFusions

gfrClassify

Filtration cascade module

Mis-alignment filters

gfrLargeScaleHomologyFilter

gfrSmallScaleHomologyFilter

gfrRepeatMaskerFilter

Random pairing of transcript fragments

gfrAbnormalInsertSizeFilter

Combination of mis-alignment and random pairing

gfrRibosomalFilter

gfrExpressionConsistencyFilter

Other filters

gfrPCRFilter

gfrAnnotationConsistencyFilter

gfrProximityFilter

gfrBlackListFilter

gfrSpliceJunctionFilter

gfrMitochondrialFilter

Scoring the candidates

gfrConfidenceValues

gfrConfidenceValueTranscript [deprecated]

Junction-sequence identification module

gfr2bpJunctions

validateBpJunctions

bpFilter

bp2wig

bp2alignment

bowtie2bp

Auxiliary modules

gfr2images

gfr2fasta

gfr2bed

gfr2gff

export2mrf

gfrAddInfo

gfrCountPairTypes

Navigation menu

Search