Funseq
From GersteinInfo
| Line 88: | Line 88: | ||
|          Default : -maf 0 -m 1 -outf vcf |          Default : -maf 0 -m 1 -outf vcf | ||
| + | |||
| + | =Sample output VCF= | ||
| + | You can download a sample of the output VCF [http://funseq.gersteinlab.org/PR2832.FunSEQ.vcf here]. | ||
Revision as of 20:57, 6 May 2013
| Contents | 
Installation
A. Required Tools
The following tools are REQUIRED for FunSeq: 
1) Bedtools 
2) Samtools 
3) Tabix 
4) VAT - A good installation guide for VAT can be found here. 
B. PERL Requirement
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here. 
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.
C. FunSeq tool installation
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: 
$ cd FUNSEQ/ $ perl Makefile.PL $ make $ make test $ make install
D. Required Data Files
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': 
	1.	1kg.phase1.snp.bed.gz   (bed format) 
			Contents : all 1KG phaseI SNVs in bed format. 
			Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) 
			Purpose : to filter out common variants against 1KG SNVs. 
	2.	ENCODE.annotation.gz   (bed format) 
			Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers 
			Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. 
			Purpose :  to find SNVs in annotated regions.  
	3.	ENCODE.tf.bound.union.bed  (bed format) 
			Contents : transcription factor (TF) motifs in ENCODE TF peaks.  
			Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name 
			Purpose : used for motif breaking analysis 
	4.	gencode7.cds.bed  (bed format) 
			Contents : extracted CDS information from Gencode7. 
			Columns :  chromosome, start position, end position  
			Purpose : extract SNVs in CDS region 
	5.	gencode.v7.promoter.bed  (bed format) 
			Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) 
			Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). 
			Purpose : correlate promoter SNVs with gene 
	6.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval 
			Purpose : For variant annotation tool (VAT); Gencode v7. 
	7.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa 
			Purpose : For Variant Annotation Tool (VAT); Gencode v7. 
	8.	DRM_transcript_pairs_modify 
			Contents : distal regulatory module with gene information. 
			Purpose : correlate enhancer SNVs with gene 
	9.	Pouya.motif 
			Contents : PWMs 
			Purpose : used for motif breaking calculation 
	10.	PPI.hubs.txt 
			Purpose : defined hub genes in protein-protein interaction network 
	11.	REG.hubs.txt 
			Purpose : defined hub genes in regulatory network 
	12.	GENE.strong_selection.txt 
			Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. 
	13.	human_ancestor_GRCh37_e59/* 
			Contents : this directory contains human ancestral allele in hg19, Ch37.  
			Purpose : for motif breaking calculation in personal or germ-line genome. 
			* Note :  for somatic analysis, these files are not needed. 
Running FunSeq
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
       Options :
               	-f              user input SNVs file
               	-maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1)
               	-m              1 - somatic Genome; 2 - germline or personal Genome
               	-inf            input format - BED or VCF
               	-outf           output format - BED or VCF
Default : -maf 0 -m 1 -outf vcf
Sample output VCF
You can download a sample of the output VCF here.
