Revision as of 15:45, 7 June 2013

General outline of pipeline

The basic goal of the pipeline is to take a large collection of reads generated from ChIP-seq or RNA-seq experiments associated with an individual and detect single nucleotide variants (SNVs) that correspond to significantly skewed number of reads. To do this, the pipeline starts with a preprocessing step, before the actual process.

-Pre-processing - diploid genome construction using vcf2diploid Assuming that the individual is part of a trio (father-mother-child), the pre-processing step separate (phase) the child's diploid genome into its parental haplotypes based on the sequences of the parents. The genotypes of the trio are then used in the subsequent AlleleSeq pipeline.

-AlleleSeq pipeline - mapping and statistical testing using PIPELINE.mk package

(a) Reads from ChIP-seq and RNA-seq experiments are aligned and mapped to both

haplotype genomes, picking the best match for each read. This is done to eliminate the reference bias that would exist if we have mapped to the standard human reference genome.

(d) Then for each SNV position with mapped reads, we compare the allele

frequencies observed in the two parental haplotypes. Candidate SNVs showing allele-specific effects are identified using a statistical framework and by assigning statistical significance to each SNV.

vcf2diploid

The AlleleSeq pipeline from the Rozowsky et al. paper requires a pre-processing step. This is the step in which a diploid genome is constructed from the parental sequences, using the PERL script vcf2diploid.

AlleleSeq

From GersteinInfo

Revision as of 15:45, 7 June 2013

Contents

General outline of pipeline

vcf2diploid

AlleleSeq pipeline

Views

Personal tools

GersteinLab Public Wiki

Search

Toolbox

@@ Line 1: / Line 1: @@
-'''AlleleSeq'''
+__TOC__
-Rozowsky et al.
+=General outline of pipeline=
-''in preparation''
+The basic goal of the pipeline is to take a large collection of reads generated
+from ChIP-seq or RNA-seq experiments associated with an individual and detect
+single nucleotide variants (SNVs) that correspond to significantly skewed number
+of reads. To do this, the pipeline starts with a preprocessing step, before the
+actual process.
+-Pre-processing - diploid genome construction using vcf2diploid
+Assuming that the individual is part of a trio (father-mother-child), the
+pre-processing step separate (phase) the child's diploid genome into its parental
+haplotypes based on the sequences of the parents. The genotypes of the trio are
+then used in the subsequent AlleleSeq pipeline.
+-AlleleSeq pipeline - mapping and statistical testing using PIPELINE.mk package
+ (a) Reads from ChIP-seq and RNA-seq experiments are aligned and mapped to both
+haplotype genomes, picking the best match for each read. This is done to eliminate
+the reference bias that would exist if we have mapped to the standard human
+reference genome.
+ (d) Then for each SNV position with mapped reads, we compare the allele
+frequencies observed in the two parental haplotypes. Candidate SNVs showing
+allele-specific effects are identified using a statistical framework and by
+assigning statistical significance to each SNV.
-AlleleSeq Pipeline
+=vcf2diploid=
+The AlleleSeq pipeline from the Rozowsky ''et al.'' paper requires a pre-processing step. This is the step in which a diploid genome is constructed from the parental sequences, using the PERL script '''''vcf2diploid'''''.
-Results for GM12878
-Assembled diploid genome for NA12878 using phased SNPs, Indels and SVs
+=AlleleSeq pipeline=
-http://sv.gersteinlab.org/NA12878_diploid