Summary of file

The pipeline software to map SNPs on to the human genome was written by Lukas Habegger and analysis by Suganthi Balasubramanian, Yale University. The SNPs are mapped based on GENCODE ver3b annotation file.

Input files used and other notes:

Gene annotation file GENCODE ver3b downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/gencode.v3b.annotation.NCBI36.gtf.gz

SNPs mapped to protein-coding transcript annotations of the GENCODE file Amino acid translations done with respect to reference allele in the human genome (not ancestral allele)

SNP call sets

1. Pilot1 Low coverage Chromosome 20, Broad call set downloaded from


High stringency SNP calls used.

2. Pilot1 Low coverage Chromosome 20, Univ. of Michigan call set downloaded from

server: fantasia.sph.umich.edu user: 1000genomes pwd: (ask for password if needed)

dir: pub/2009.08.chr20.2

The specific SNP files used as input for pipeline are: CEU.ref_called_allels CHB+JPT.ref_called_allels YRI.ref_called_allels

SNPs where either allele does not match the reference genome have not been included.

3. Genome-wide Pilot1 Low coverage call set from Richard Durbin's group downloaded from


Only high quality SNPs from Richard's set have been used (lines where filter field column7=0)

