Funseq
From GersteinInfo
(→Installation) |
(→Installation) |
||
Line 15: | Line 15: | ||
==C. FunSeq tool installation== | ==C. FunSeq tool installation== | ||
- | FunSeq is a PERL- and Linux-based tool. At the command-line prompt, enter the following: <br> | + | FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br> |
$ cd FUNSEQ/ <br> | $ cd FUNSEQ/ <br> | ||
$ perl Makefile.PL <br> | $ perl Makefile.PL <br> | ||
Line 23: | Line 23: | ||
<br> | <br> | ||
<br> | <br> | ||
+ | |||
+ | ==D. Required Data Files= | ||
+ | Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br> | ||
+ | 1. 1kg.phase1.snp.bed.gz (bed format) <br> | ||
+ | Contents : all 1KG phaseI SNVs in bed format. <br> | ||
+ | Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) <br> | ||
+ | Purpose : to filter out common variants against 1KG SNVs. <br><br> | ||
+ | |||
+ | 2. ENCODE.annotation.gz (bed format) <br> | ||
+ | Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers <br> | ||
+ | Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. <br> | ||
+ | Purpose : to find SNVs in annotated regions. <br><br> | ||
+ | |||
+ | 3. ENCODE.tf.bound.union.bed (bed format) <br> | ||
+ | Contents : transcription factor (TF) motifs in ENCODE TF peaks. <br> | ||
+ | Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name <br> | ||
+ | Purpose : used for motif breaking analysis <br><br> | ||
+ | |||
+ | 4. gencode7.cds.bed (bed format) <br> | ||
+ | Contents : extracted CDS information from Gencode7. <br> | ||
+ | Columns : chromosome, start position, end position <br> | ||
+ | Purpose : extract SNVs in CDS region <br><br> | ||
+ | |||
+ | 5. gencode.v7.promoter.bed (bed format) <br> | ||
+ | Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) <br> | ||
+ | Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). <br> | ||
+ | Purpose : correlate promoter SNVs with gene <br><br> | ||
+ | |||
+ | 6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval <br> | ||
+ | Purpose : For variant annotation tool (VAT); Gencode v7. <br><br> | ||
+ | |||
+ | 7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa <br> | ||
+ | Purpose : For Variant Annotation Tool (VAT); Gencode v7. <br><br> | ||
+ | |||
+ | 8. DRM_transcript_pairs_modify <br> | ||
+ | Contents : distal regulatory module with gene information. <br> | ||
+ | Purpose : correlate enhancer SNVs with gene <br><br> | ||
+ | |||
+ | 9. Pouya.motif <br> | ||
+ | Contents : PWMs <br> | ||
+ | Purpose : used for motif breaking calculation <br><br> | ||
+ | |||
+ | 10. PPI.hubs.txt <br> | ||
+ | Purpose : defined hub genes in protein-protein interaction network <br><br> | ||
+ | |||
+ | 11. REG.hubs.txt <br> | ||
+ | Purpose : defined hub genes in regulatory network <br><br> | ||
+ | |||
+ | 12. GENE.strong_selection.txt <br> | ||
+ | Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. <br><br> | ||
+ | |||
+ | 13. human_ancestor_GRCh37_e59/* <br> | ||
+ | Contents : this directory contains human ancestral allele in hg19, Ch37. <br> | ||
+ | Purpose : for motif breaking calculation in personal or germ-line genome. <br> | ||
+ | * Note : for somatic analysis, these files are not needed. <br> | ||
+ | |||
+ | =Running FunSeq= | ||
+ | |||
+ | Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf> | ||
+ | Options : | ||
+ | -f user input SNVs file | ||
+ | -maf Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1) | ||
+ | -m 1 - somatic Genome; 2 - germline or personal Genome | ||
+ | -inf input format - BED or VCF | ||
+ | -outf output format - BED or VCF | ||
+ | |||
+ | Default : -maf 0 -m 1 -outf vcf |
Revision as of 20:44, 6 May 2013
Contents |
Installation
A. Required Tools
The following tools are REQUIRED for FunSeq:
1) Bedtools
2) Samtools
3) Tabix
4) VAT - A good installation guide for VAT can be found here.
B. PERL Requirement
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here.
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.
C. FunSeq tool installation
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following:
$ cd FUNSEQ/
$ perl Makefile.PL
$ make
$ make test
$ make install
=D. Required Data Files
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ':
1. 1kg.phase1.snp.bed.gz (bed format)
Contents : all 1KG phaseI SNVs in bed format.
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency)
Purpose : to filter out common variants against 1KG SNVs.
2. ENCODE.annotation.gz (bed format)
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name.
Purpose : to find SNVs in annotated regions.
3. ENCODE.tf.bound.union.bed (bed format)
Contents : transcription factor (TF) motifs in ENCODE TF peaks.
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name
Purpose : used for motif breaking analysis
4. gencode7.cds.bed (bed format)
Contents : extracted CDS information from Gencode7.
Columns : chromosome, start position, end position
Purpose : extract SNVs in CDS region
5. gencode.v7.promoter.bed (bed format)
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS)
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG).
Purpose : correlate promoter SNVs with gene
6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval
Purpose : For variant annotation tool (VAT); Gencode v7.
7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa
Purpose : For Variant Annotation Tool (VAT); Gencode v7.
8. DRM_transcript_pairs_modify
Contents : distal regulatory module with gene information.
Purpose : correlate enhancer SNVs with gene
9. Pouya.motif
Contents : PWMs
Purpose : used for motif breaking calculation
10. PPI.hubs.txt
Purpose : defined hub genes in protein-protein interaction network
11. REG.hubs.txt
Purpose : defined hub genes in regulatory network
12. GENE.strong_selection.txt
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants.
13. human_ancestor_GRCh37_e59/*
Contents : this directory contains human ancestral allele in hg19, Ch37.
Purpose : for motif breaking calculation in personal or germ-line genome.
* Note : for somatic analysis, these files are not needed.
Running FunSeq
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
Options : -f user input SNVs file -maf Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1) -m 1 - somatic Genome; 2 - germline or personal Genome -inf input format - BED or VCF -outf output format - BED or VCF
Default : -maf 0 -m 1 -outf vcf