1000genomes nonsyn
From GersteinInfo
(Created page with '[http://homes.gersteinlab.org/people/suganthi/outbox/1000genomes/REL-0908/all_LOF/nsyn/CEU.0908.all.pc.nsyn CEU.0908.nsyn] Summary of file The pipeline software to map SNPs on…')
Latest revision as of 12:00, 9 June 2010
Summary of file
The pipeline software to map SNPs on to the human genome was written by Lukas Habegger and analysis by Suganthi Balasubramanian, Yale University. The SNPs are mapped based on GENCODE ver3b annotation file.
Input files used and other notes:
Gene annotation file GENCODE ver3b downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/gencode.v3b.annotation.NCBI36.gtf.gz
SNPs mapped to protein-coding transcript annotations of the GENCODE file Amino acid translations done with respect to reference allele in the human genome (not ancestral allele)
SNP call sets
1. Pilot1 Low coverage Chromosome 20, Broad call set downloaded from
http://www.broadinstitute.org/~jmaguire/pilot1_chr20.august_31_2009/release.1/
High stringency SNP calls used.
2. Pilot1 Low coverage Chromosome 20, Univ. of Michigan call set downloaded from
server: fantasia.sph.umich.edu user: 1000genomes pwd: (ask for password if needed)
dir: pub/2009.08.chr20.2
The specific SNP files used as input for pipeline are: CEU.ref_called_allels CHB+JPT.ref_called_allels YRI.ref_called_allels
SNPs where either allele does not match the reference genome have not been included.
3. Genome-wide Pilot1 Low coverage call set from Richard Durbin's group downloaded from
ftp://ftp.sanger.ac.uk/pub/1000genomes/REL-0908/LowCov/
Only high quality SNPs from Richard's set have been used (lines where filter field column7=0)