Funseq

From GersteinInfo

Revision as of 20:51, 6 May 2013 by Public (Talk | contribs)
Jump to: navigation, search

Contents


Installation

A. Required Tools

The following tools are REQUIRED for FunSeq:
1) Bedtools
2) Samtools
3) Tabix
4) VAT - A good installation guide for VAT can be found here.

B. PERL Requirement

1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here.
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.

C. FunSeq tool installation

FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following:

$ cd FUNSEQ/ 
$ perl Makefile.PL
$ make
$ make test
$ make install


D. Required Data Files

Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ':

1. 1kg.phase1.snp.bed.gz (bed format)
Contents : all 1KG phaseI SNVs in bed format.
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency)
Purpose : to filter out common variants against 1KG SNVs.

2. ENCODE.annotation.gz (bed format)
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name.
Purpose : to find SNVs in annotated regions.

3. ENCODE.tf.bound.union.bed (bed format)
Contents : transcription factor (TF) motifs in ENCODE TF peaks.
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name
Purpose : used for motif breaking analysis

4. gencode7.cds.bed (bed format)
Contents : extracted CDS information from Gencode7.
Columns : chromosome, start position, end position
Purpose : extract SNVs in CDS region

5. gencode.v7.promoter.bed (bed format)
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS)
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG).
Purpose : correlate promoter SNVs with gene

6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval
Purpose : For variant annotation tool (VAT); Gencode v7.

7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa
Purpose : For Variant Annotation Tool (VAT); Gencode v7.

8. DRM_transcript_pairs_modify
Contents : distal regulatory module with gene information.
Purpose : correlate enhancer SNVs with gene

9. Pouya.motif
Contents : PWMs
Purpose : used for motif breaking calculation

10. PPI.hubs.txt
Purpose : defined hub genes in protein-protein interaction network

11. REG.hubs.txt
Purpose : defined hub genes in regulatory network

12. GENE.strong_selection.txt
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants.

13. human_ancestor_GRCh37_e59/*
Contents : this directory contains human ancestral allele in hg19, Ch37.
Purpose : for motif breaking calculation in personal or germ-line genome.
* Note : for somatic analysis, these files are not needed.

Running FunSeq

Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
       Options :
               	-f              user input SNVs file
               	-maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1)
               	-m              1 - somatic Genome; 2 - germline or personal Genome
               	-inf            input format - BED or VCF
               	-outf           output format - BED or VCF
       Default : -maf 0 -m 1 -outf vcf
Personal tools