Funseq
From GersteinInfo
(→Installation) |
|||
Line 24: | Line 24: | ||
<br> | <br> | ||
- | ==D. Required Data Files= | + | ==D. Required Data Files== |
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br> | Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br> | ||
- | 1. 1kg.phase1.snp.bed.gz (bed format) <br> | + | '''1. 1kg.phase1.snp.bed.gz (bed format)''' <br> |
Contents : all 1KG phaseI SNVs in bed format. <br> | Contents : all 1KG phaseI SNVs in bed format. <br> | ||
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) <br> | Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) <br> | ||
Purpose : to filter out common variants against 1KG SNVs. <br><br> | Purpose : to filter out common variants against 1KG SNVs. <br><br> | ||
- | 2. ENCODE.annotation.gz (bed format) <br> | + | '''2. ENCODE.annotation.gz (bed format) '''<br> |
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers <br> | Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers <br> | ||
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. <br> | Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. <br> | ||
Purpose : to find SNVs in annotated regions. <br><br> | Purpose : to find SNVs in annotated regions. <br><br> | ||
- | 3. ENCODE.tf.bound.union.bed (bed format) <br> | + | '''3. ENCODE.tf.bound.union.bed (bed format) '''<br> |
Contents : transcription factor (TF) motifs in ENCODE TF peaks. <br> | Contents : transcription factor (TF) motifs in ENCODE TF peaks. <br> | ||
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name <br> | Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name <br> | ||
Purpose : used for motif breaking analysis <br><br> | Purpose : used for motif breaking analysis <br><br> | ||
- | 4. gencode7.cds.bed (bed format) <br> | + | '''4. gencode7.cds.bed (bed format) '''<br> |
Contents : extracted CDS information from Gencode7. <br> | Contents : extracted CDS information from Gencode7. <br> | ||
Columns : chromosome, start position, end position <br> | Columns : chromosome, start position, end position <br> | ||
Purpose : extract SNVs in CDS region <br><br> | Purpose : extract SNVs in CDS region <br><br> | ||
- | 5. gencode.v7.promoter.bed (bed format) <br> | + | '''5. gencode.v7.promoter.bed (bed format) '''<br> |
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) <br> | Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) <br> | ||
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). <br> | Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). <br> | ||
Purpose : correlate promoter SNVs with gene <br><br> | Purpose : correlate promoter SNVs with gene <br><br> | ||
- | 6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval <br> | + | '''6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval '''<br> |
Purpose : For variant annotation tool (VAT); Gencode v7. <br><br> | Purpose : For variant annotation tool (VAT); Gencode v7. <br><br> | ||
- | 7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa <br> | + | '''7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa <br>''' |
Purpose : For Variant Annotation Tool (VAT); Gencode v7. <br><br> | Purpose : For Variant Annotation Tool (VAT); Gencode v7. <br><br> | ||
- | 8. DRM_transcript_pairs_modify <br> | + | '''8. DRM_transcript_pairs_modify''' <br> |
Contents : distal regulatory module with gene information. <br> | Contents : distal regulatory module with gene information. <br> | ||
Purpose : correlate enhancer SNVs with gene <br><br> | Purpose : correlate enhancer SNVs with gene <br><br> | ||
- | 9. Pouya.motif <br> | + | '''9. Pouya.motif''' <br> |
Contents : PWMs <br> | Contents : PWMs <br> | ||
Purpose : used for motif breaking calculation <br><br> | Purpose : used for motif breaking calculation <br><br> | ||
- | 10. PPI.hubs.txt <br> | + | '''10. PPI.hubs.txt''' <br> |
Purpose : defined hub genes in protein-protein interaction network <br><br> | Purpose : defined hub genes in protein-protein interaction network <br><br> | ||
- | 11. REG.hubs.txt <br> | + | '''11. REG.hubs.txt''' <br> |
Purpose : defined hub genes in regulatory network <br><br> | Purpose : defined hub genes in regulatory network <br><br> | ||
- | 12. GENE.strong_selection.txt <br> | + | '''12. GENE.strong_selection.txt''' <br> |
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. <br><br> | Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. <br><br> | ||
- | 13. human_ancestor_GRCh37_e59/* <br> | + | '''13. human_ancestor_GRCh37_e59/*''' <br> |
Contents : this directory contains human ancestral allele in hg19, Ch37. <br> | Contents : this directory contains human ancestral allele in hg19, Ch37. <br> | ||
Purpose : for motif breaking calculation in personal or germ-line genome. <br> | Purpose : for motif breaking calculation in personal or germ-line genome. <br> |
Revision as of 20:46, 6 May 2013
Contents |
Installation
A. Required Tools
The following tools are REQUIRED for FunSeq:
1) Bedtools
2) Samtools
3) Tabix
4) VAT - A good installation guide for VAT can be found here.
B. PERL Requirement
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here.
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.
C. FunSeq tool installation
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following:
$ cd FUNSEQ/
$ perl Makefile.PL
$ make
$ make test
$ make install
D. Required Data Files
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ':
1. 1kg.phase1.snp.bed.gz (bed format)
Contents : all 1KG phaseI SNVs in bed format.
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency)
Purpose : to filter out common variants against 1KG SNVs.
2. ENCODE.annotation.gz (bed format)
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name.
Purpose : to find SNVs in annotated regions.
3. ENCODE.tf.bound.union.bed (bed format)
Contents : transcription factor (TF) motifs in ENCODE TF peaks.
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name
Purpose : used for motif breaking analysis
4. gencode7.cds.bed (bed format)
Contents : extracted CDS information from Gencode7.
Columns : chromosome, start position, end position
Purpose : extract SNVs in CDS region
5. gencode.v7.promoter.bed (bed format)
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS)
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG).
Purpose : correlate promoter SNVs with gene
6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval
Purpose : For variant annotation tool (VAT); Gencode v7.
7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa
Purpose : For Variant Annotation Tool (VAT); Gencode v7.
8. DRM_transcript_pairs_modify
Contents : distal regulatory module with gene information.
Purpose : correlate enhancer SNVs with gene
9. Pouya.motif
Contents : PWMs
Purpose : used for motif breaking calculation
10. PPI.hubs.txt
Purpose : defined hub genes in protein-protein interaction network
11. REG.hubs.txt
Purpose : defined hub genes in regulatory network
12. GENE.strong_selection.txt
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants.
13. human_ancestor_GRCh37_e59/*
Contents : this directory contains human ancestral allele in hg19, Ch37.
Purpose : for motif breaking calculation in personal or germ-line genome.
* Note : for somatic analysis, these files are not needed.
Running FunSeq
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
Options : -f user input SNVs file -maf Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1) -m 1 - somatic Genome; 2 - germline or personal Genome -inf input format - BED or VCF -outf output format - BED or VCF
Default : -maf 0 -m 1 -outf vcf