VAT

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
Line 8: Line 8:
The Variant Annotation Tool (VAT) consists of a set of modules to annotate genetic variants including SNPs and indels. This software package also contains a program to aggregate SNP and indel variants at the gene level.  Subsequently, an image is generated  for each gene to visualize the functional impact of these variants.  This information can then be viewed and shared using a web-interface. In addition to annotation of the coding variants, this tool also integrates allele frequencies and genotype data providing population-specific information from published high quality variation databases such as [http://www.1000genomes.org 1000 Genomes Project].
The Variant Annotation Tool (VAT) consists of a set of modules to annotate genetic variants including SNPs and indels. This software package also contains a program to aggregate SNP and indel variants at the gene level.  Subsequently, an image is generated  for each gene to visualize the functional impact of these variants.  This information can then be viewed and shared using a web-interface. In addition to annotation of the coding variants, this tool also integrates allele frequencies and genotype data providing population-specific information from published high quality variation databases such as [http://www.1000genomes.org 1000 Genomes Project].
-
<br>
+
<br><br>
== Data formats ==
== Data formats ==
Line 48: Line 48:
Note: the coordinates in the Interval format are '''zero-based''' and the '''end coordinate is not included'''.
Note: the coordinates in the Interval format are '''zero-based''' and the '''end coordinate is not included'''.
-
<br>
+
<br><br>
== List of programs ==
== List of programs ==

Revision as of 18:23, 6 March 2011

VAT Main Page


Contents


Introduction

The Variant Annotation Tool (VAT) consists of a set of modules to annotate genetic variants including SNPs and indels. This software package also contains a program to aggregate SNP and indel variants at the gene level. Subsequently, an image is generated for each gene to visualize the functional impact of these variants. This information can then be viewed and shared using a web-interface. In addition to annotation of the coding variants, this tool also integrates allele frequencies and genotype data providing population-specific information from published high quality variation databases such as 1000 Genomes Project.



Data formats

Top

Variant Call Format (VCF)

The Variant Call Format (VCF) is a tab-delimited text file format to represent a number of different genetic variants including SNPs and Indels. This format was developed as part of the 1000 Genomes Project. A detailed summary of this file format can be found here.


Top

Interval Format

The Interval format consists of eight tab-delimited columns and is used to represent genomic intervals such as genes. This format is closely associated with the intervalFind module, which is part of BIOS. This module efficiently finds intervals that overlap with a query interval. The underlying algorithm is based on containment sublists: Alekseyenko, A.V., Lee, C.J. "Nested Containment List (NCList): A new algorithm for accelerating interval query of genome alignment and interval databases" Bioinformatics 2007;23:1386-1393 [1].

1.   Name of the interval
2.   Chromosome 
3.   Strand
4.   Interval start (with respect to the "+")
5.   Interval end (with respect to the "+")
6.   Number of sub-intervals
7.   Sub-interval starts (with respect to the "+", comma-delimited)
8.   Sub-interval end (with respect to the "+", comma-delimited)   

Example file:

uc001aaw.1      chr1    +       357521  358460  1       357521  358460
uc001aax.1      chr1    +       410068  411702  3       410068,410854,411258    410159,411121,411702
uc001aay.1      chr1    -       552622  554252  3       552622,553203,554161    553066,553466,554252
uc001aaz.1      chr1    +       556324  557910  1       556324  557910
uc001aba.1      chr1    +       558011  558705  1       558011  558705  

In this example the intervals represent a transcripts, while the sub-intervals denote exons.

Note: the coordinates in the Interval format are zero-based and the end coordinate is not included.



List of programs

Personal tools