Revision as of 16:11, 7 June 2013

General outline of pipeline

The basic goal of the pipeline is to take a large collection of reads generated from ChIP-seq or RNA-seq experiments associated with an individual and detect single nucleotide variants (SNVs) that correspond to significantly skewed number of reads. To do this, the pipeline starts with a preprocessing step, before the actual process.

(1) Pre-processing - diploid genome construction using vcf2diploid
In the Rozowsky et al. (2011) paper, the pre-processing step separate (phase) the child's diploid genome into its parental haplotypes based on the sequences of the parents.

(2) AlleleSeq pipeline - mapping and statistical testing using PIPELINE.mk package
a) Reads from ChIP-seq and RNA-seq experiments are aligned and mapped to both haplotype genomes. b) Then for each SNV position with mapped reads, we compare the allele frequencies observed in the two parental haplotypes.

vcf2diploid

Essentially, it constructs a personal genome integrating the the variants from the parents and child to the reference genome.

Installation

1. Download the tool.
2. Type

$make

Usage

java -Xmx10000m -jar vcf2diploid.jar -id sample_id -chr file1.fa file2.fa ... [-vcf file1.vcf file2.vcf ...] > logfile.txt

OPTIONS:
id          - (required) the ID of individual whose genome is being constructed (e.g., NA12878). The tool recognizes by this ID in the VCF file 

chr         - (required) FASTA file(s) of reference sequence(s) 

vcf         - (required) VCF4.0 file(s) containing variants from parents and the individual 


Xmx         - max memory allocation for JAVA. In this example, 10GB was allocated.
logfile.txt - stores the standard output produce from the run

@@ Line 24: / Line 24: @@
 Essentially, it constructs a personal genome integrating the the variants from the parents and child to the reference genome.
-==installation==
+==Installation==
 . Download the tool. <br>
@@ Line 30: / Line 30: @@
   $make
 <br>
 ==Usage==
   java -Xmx10000m -jar vcf2diploid.jar -id sample_id -chr file1.fa file2.fa ... [-vcf file1.vcf file2.vcf ...] > logfile.txt

AlleleSeq

From GersteinInfo

Revision as of 16:11, 7 June 2013

Contents

General outline of pipeline

vcf2diploid

Installation

Usage

AlleleSeq pipeline

Views

Personal tools

GersteinLab Public Wiki

Search

Toolbox