AlleleSeq
From GersteinInfo
(→Output) |
|||
Line 46: | Line 46: | ||
'''(b) Map files''' <br> | '''(b) Map files''' <br> | ||
- | These are coordinate files that correspond the variants on the parental genomes and | + | These are coordinate files that correspond to the variants on the parental genomes and |
the reference genome. This is especially important when insertions and deletions | the reference genome. This is especially important when insertions and deletions | ||
- | are included in the construction of the diploid genome | + | are included in the construction of the diploid genome, since the positions go out of sync in the personal and reference genomes.<br><br> |
- | + | ||
- | + | ||
'''(c) Chain files''' <br> | '''(c) Chain files''' <br> | ||
- | Using the chain file, one can use the | + | Using the chain file, one can use the [http://genome.ucsc.edu/goldenPath/help/chain.html LiftOver] tool to convert the annotation |
- | + | coordinates from reference genome to personal haplotypes. <br><br> | |
- | + | ||
Please refer to the README of vcf2diploid for a more detailed description.<br> | Please refer to the README of vcf2diploid for a more detailed description.<br> | ||
=AlleleSeq Pipeline= | =AlleleSeq Pipeline= |
Revision as of 16:18, 7 June 2013
Contents |
General outline of pipeline
The basic goal of the pipeline is to take a large collection of reads generated from ChIP-seq or RNA-seq experiments associated with an individual and detect single nucleotide variants (SNVs) that correspond to significantly skewed number of reads. To do this, the pipeline starts with a preprocessing step, before the actual process.
(1) Pre-processing - diploid genome construction using vcf2diploid
In the Rozowsky et al. (2011) paper, the
pre-processing step separate (phase) the child's diploid genome into its parental
haplotypes based on the sequences of the parents.
(2) AlleleSeq pipeline - mapping and statistical testing using PIPELINE.mk package
a) Reads from ChIP-seq and RNA-seq experiments are aligned and mapped to both
haplotype genomes.
b) Then for each SNV position with mapped reads, we compare the allele
frequencies observed in the two parental haplotypes.
vcf2diploid
Essentially, it constructs a personal genome integrating the the variants from the parents and child to the reference genome.
Installation
1. Download the tool.
2. Type
$make
Usage
java -Xmx10000m -jar vcf2diploid.jar -id sample_id -chr file1.fa file2.fa ... [-vcf file1.vcf file2.vcf ...] > logfile.txt
OPTIONS: id - (required) the ID of individual whose genome is being constructed (e.g., NA12878). The tool recognizes by this ID in the VCF file
chr - (required) FASTA file(s) of reference sequence(s)
vcf - (required) VCF4.0 file(s) containing variants from parents and the individual
Xmx - max memory allocation for JAVA. In this example, 10GB was allocated. logfile.txt - stores the standard output produce from the run
Output
(a) Maternal and paternal FASTA files
These are the references used for the AlleleSeq pipeline.
(b) Map files
These are coordinate files that correspond to the variants on the parental genomes and
the reference genome. This is especially important when insertions and deletions
are included in the construction of the diploid genome, since the positions go out of sync in the personal and reference genomes.
(c) Chain files
Using the chain file, one can use the LiftOver tool to convert the annotation
coordinates from reference genome to personal haplotypes.
Please refer to the README of vcf2diploid for a more detailed description.