Funseq

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Installation)
(Installation)
Line 15: Line 15:
==C. FunSeq tool installation==
==C. FunSeq tool installation==
-
FunSeq is a PERL- and Linux-based tool. At the command-line prompt, enter the following: <br>
+
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br>
$ cd FUNSEQ/ <br>
$ cd FUNSEQ/ <br>
$ perl Makefile.PL <br>
$ perl Makefile.PL <br>
Line 23: Line 23:
<br>
<br>
<br>
<br>
 +
 +
==D. Required Data Files=
 +
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br>
 +
1. 1kg.phase1.snp.bed.gz  (bed format) <br>
 +
Contents : all 1KG phaseI SNVs in bed format. <br>
 +
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) <br>
 +
Purpose : to filter out common variants against 1KG SNVs. <br><br>
 +
 +
2. ENCODE.annotation.gz  (bed format) <br>
 +
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers <br>
 +
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. <br>
 +
Purpose :  to find SNVs in annotated regions.  <br><br>
 +
 +
3. ENCODE.tf.bound.union.bed  (bed format) <br>
 +
Contents : transcription factor (TF) motifs in ENCODE TF peaks.  <br>
 +
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name <br>
 +
Purpose : used for motif breaking analysis <br><br>
 +
 +
4. gencode7.cds.bed  (bed format) <br>
 +
Contents : extracted CDS information from Gencode7. <br>
 +
Columns :  chromosome, start position, end position  <br>
 +
Purpose : extract SNVs in CDS region <br><br>
 +
 +
5. gencode.v7.promoter.bed  (bed format) <br>
 +
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) <br>
 +
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). <br>
 +
Purpose : correlate promoter SNVs with gene <br><br>
 +
 +
6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval <br>
 +
Purpose : For variant annotation tool (VAT); Gencode v7. <br><br>
 +
 +
7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa <br>
 +
Purpose : For Variant Annotation Tool (VAT); Gencode v7. <br><br>
 +
 +
8. DRM_transcript_pairs_modify <br>
 +
Contents : distal regulatory module with gene information. <br>
 +
Purpose : correlate enhancer SNVs with gene <br><br>
 +
 +
9. Pouya.motif <br>
 +
Contents : PWMs <br>
 +
Purpose : used for motif breaking calculation <br><br>
 +
 +
10. PPI.hubs.txt <br>
 +
Purpose : defined hub genes in protein-protein interaction network <br><br>
 +
 +
11. REG.hubs.txt <br>
 +
Purpose : defined hub genes in regulatory network <br><br>
 +
 +
12. GENE.strong_selection.txt <br>
 +
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. <br><br>
 +
 +
13. human_ancestor_GRCh37_e59/* <br>
 +
Contents : this directory contains human ancestral allele in hg19, Ch37.  <br>
 +
Purpose : for motif breaking calculation in personal or germ-line genome. <br>
 +
* Note :  for somatic analysis, these files are not needed. <br>
 +
 +
=Running FunSeq=
 +
 +
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
 +
        Options :
 +
                -f              user input SNVs file
 +
                -maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1)
 +
                -m              1 - somatic Genome; 2 - germline or personal Genome
 +
                -inf            input format - BED or VCF
 +
                -outf          output format - BED or VCF
 +
 +
        Default : -maf 0 -m 1 -outf vcf

Revision as of 20:44, 6 May 2013

Contents


Installation

A. Required Tools

The following tools are REQUIRED for FunSeq:
1) Bedtools
2) Samtools
3) Tabix
4) VAT - A good installation guide for VAT can be found here.

B. PERL Requirement

1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here.
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.

C. FunSeq tool installation

FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following:
$ cd FUNSEQ/
$ perl Makefile.PL
$ make
$ make test
$ make install


=D. Required Data Files

Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ':
1. 1kg.phase1.snp.bed.gz (bed format)
Contents : all 1KG phaseI SNVs in bed format.
Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency)
Purpose : to filter out common variants against 1KG SNVs.

2. ENCODE.annotation.gz (bed format)
Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers
Columns : chromosome , annotation start position (0-based), annotation end position, annotation name.
Purpose : to find SNVs in annotated regions.

3. ENCODE.tf.bound.union.bed (bed format)
Contents : transcription factor (TF) motifs in ENCODE TF peaks.
Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name
Purpose : used for motif breaking analysis

4. gencode7.cds.bed (bed format)
Contents : extracted CDS information from Gencode7.
Columns : chromosome, start position, end position
Purpose : extract SNVs in CDS region

5. gencode.v7.promoter.bed (bed format)
Contents : compiled promoter regions, -2.5kb from transcription start site (TSS)
Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG).
Purpose : correlate promoter SNVs with gene

6. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval
Purpose : For variant annotation tool (VAT); Gencode v7.

7. gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa
Purpose : For Variant Annotation Tool (VAT); Gencode v7.

8. DRM_transcript_pairs_modify
Contents : distal regulatory module with gene information.
Purpose : correlate enhancer SNVs with gene

9. Pouya.motif
Contents : PWMs
Purpose : used for motif breaking calculation

10. PPI.hubs.txt
Purpose : defined hub genes in protein-protein interaction network

11. REG.hubs.txt
Purpose : defined hub genes in regulatory network

12. GENE.strong_selection.txt
Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants.

13. human_ancestor_GRCh37_e59/*
Contents : this directory contains human ancestral allele in hg19, Ch37.
Purpose : for motif breaking calculation in personal or germ-line genome.
* Note : for somatic analysis, these files are not needed.

Running FunSeq

Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>

       Options :
               	-f              user input SNVs file
               	-maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1)
               	-m              1 - somatic Genome; 2 - germline or personal Genome
               	-inf            input format - BED or VCF
               	-outf           output format - BED or VCF
       Default : -maf 0 -m 1 -outf vcf
Personal tools