Funseq
From GersteinInfo
Line 9: | Line 9: | ||
3) [http://sourceforge.net/projects/samtools/files/tabix/ Tabix] <br> | 3) [http://sourceforge.net/projects/samtools/files/tabix/ Tabix] <br> | ||
4) [http://vat.gersteinlab.org/index.php VAT] - A good installation guide for VAT can be found [http://ngsda.blogspot.com/2011/06/vat.html here]. <br> | 4) [http://vat.gersteinlab.org/index.php VAT] - A good installation guide for VAT can be found [http://ngsda.blogspot.com/2011/06/vat.html here]. <br> | ||
- | + | <br> | |
==B. PERL Requirement== | ==B. PERL Requirement== | ||
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded [http://www.perl.org/ here]. <br> | 1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded [http://www.perl.org/ here]. <br> | ||
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found [http://search.cpan.org/~szabgab/Parallel-ForkManager-1.03/lib/Parallel/ForkManager.pm here]. | 2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found [http://search.cpan.org/~szabgab/Parallel-ForkManager-1.03/lib/Parallel/ForkManager.pm here]. | ||
- | + | <br> | |
==C. FunSeq tool installation== | ==C. FunSeq tool installation== | ||
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br> | FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br> | ||
Line 22: | Line 22: | ||
$ make install <br> | $ make install <br> | ||
<br> | <br> | ||
- | |||
- | |||
==D. Required Data Files== | ==D. Required Data Files== | ||
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br><br> | Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br><br> | ||
Line 78: | Line 76: | ||
Purpose : for motif breaking calculation in personal or germ-line genome. <br> | Purpose : for motif breaking calculation in personal or germ-line genome. <br> | ||
* Note : for somatic analysis, these files are not needed. <br> | * Note : for somatic analysis, these files are not needed. <br> | ||
- | + | <br> | |
=Running FunSeq= | =Running FunSeq= | ||
Revision as of 20:48, 6 May 2013
Contents |
Installation
A. Required Tools
The following tools are REQUIRED for FunSeq:
1) Bedtools
2) Samtools
3) Tabix
4) VAT - A good installation guide for VAT can be found here.
B. PERL Requirement
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here.
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.
C. FunSeq tool installation
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following:
$ cd FUNSEQ/
$ perl Makefile.PL
$ make
$ make test
$ make install
D. Required Data Files
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ':
1. 1kg.phase1.snp.bed.gz (bed format)
Contents : all 1KG phaseI SNVs in bed format.
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency)
Purpose : to filter out common variants against 1KG SNVs.
2. ENCODE.annotation.gz (bed format)
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name.
Purpose : to find SNVs in annotated regions.
3. ENCODE.tf.bound.union.bed (bed format)
Contents : transcription factor (TF) motifs in ENCODE TF peaks.
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name
Purpose : used for motif breaking analysis
4. gencode7.cds.bed (bed format)
Contents : extracted CDS information from Gencode7.
Columns : chromosome, start position, end position
Purpose : extract SNVs in CDS region
5. gencode.v7.promoter.bed (bed format)
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS)
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG).
Purpose : correlate promoter SNVs with gene
6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval
Purpose : For variant annotation tool (VAT); Gencode v7.
7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa
Purpose : For Variant Annotation Tool (VAT); Gencode v7.
8. DRM_transcript_pairs_modify
Contents : distal regulatory module with gene information.
Purpose : correlate enhancer SNVs with gene
9. Pouya.motif
Contents : PWMs
Purpose : used for motif breaking calculation
10. PPI.hubs.txt
Purpose : defined hub genes in protein-protein interaction network
11. REG.hubs.txt
Purpose : defined hub genes in regulatory network
12. GENE.strong_selection.txt
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants.
13. human_ancestor_GRCh37_e59/*
Contents : this directory contains human ancestral allele in hg19, Ch37.
Purpose : for motif breaking calculation in personal or germ-line genome.
* Note : for somatic analysis, these files are not needed.
Running FunSeq
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf> Options : -f user input SNVs file -maf Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1) -m 1 - somatic Genome; 2 - germline or personal Genome -inf input format - BED or VCF -outf output format - BED or VCF
Default : -maf 0 -m 1 -outf vcf