ACT Tool
From GersteinInfo
Here is some info. on http://act.gersteinlab.org (Aggregation & Correlation Toolbox)
| Contents | 
Overview
ACT is a toolbox for harvesting useful results from a vast sea of genomic experimental data. In particular, it is a set of scripts (Aggregation, Correlation, and Saturation) designed to be downloaded and used to analyze signal or hit tracks. These scripts, along with their supporting material (documentation, example files) can be accessed by clicking on their respective icons on the act.gersteinlab.org home page. Details of what each script is designed to do, i.e. what files it takes in, what it outputs, and important notes, are discussed below.
There are also several supporting features on the website such as a gallery and example files: these are also discussed below.
Aggregation
The aggregation script takes values from multiple points on a single genomic signal track and creates an average signal profile around a set of anchor points, such as Transcription Start Sites (TSS's).
The main download is written in Python. Each run takes two input files: a signal or hit track (in the form of an sgr file or point file), and an annotations file in bed format. The output is a columnar file with explanatory headers--the files can be plotted in programs like gnuplot, excel, or matlab. The main download package has an R script in the samples folder which shows one way of plotting the output data with error bars.
It should be noted that in computing the "average signal profile" there are a number of computational choices to be made: for example, bin size, whether to use the median or mean of signals within a bin as the bin's value, whether to use the median or mean of signals across all bins as the final value in the signal profile. Since the annotations file requires regions input, there is also a choice to be made as to whether to aggregate around only a single point (the 5' end of the region, such as TSS's) or to include the entire region in the aggregation. Options dealing with all of these choices are available in the main aggregation download. For an idea of how bin scaling over regions works, see the aggregation powerpoint in the gallery.
- Specific use instructions
After downloading and unzipping the aggregation package, Agg.tar, the program can be run as follows (data files can be found under "Example Data" in the Aggregation section):
python ACT.py --nbins=50 --mbins=0 --radius=50000 hg17_ensembl.bed baf155.sgr > baf155_ensembl.out
where hg17_ensembl.bed is the annotations file and baf155.sgr is the signal track, placed in the same folder as ACT.py. An alternative run which would include the 3' boundary of each gene region can be performed using the following:
python ACT.py --nbins=50 --mbins=50 --radius=50000 --regions hg17_ensembl.bed baf155.sgr > baf155_ensembl.out
An aggregation run on point tracks (such as SNP lists) to determine average density can be performed as follows:
python ACT.py --nbins=50 --mbins=0 --radius=50000 --signalparser=PointParser gencode.pc.coords.chr1 YRI.snps.parsed.chr1 > YRI_gencode.out
There are additional tags corresponding to different aggregation options which can be viewed in the readme.
- Contact
Robert Bjornson
Correlation
The correlation script takes multiple signal tracks of equal length and divides each one into bins, similarly to the aggregation script, except in this case the bins are not hinged around anchor points and they are generally wider (either hundreds or thousands of bases, depending on which script is chosen). Each bin is assigned a value based on the corresponding signal track values, and then the arrays of bins are correlated with each other in pairwise fashion. Ultimately, a matrix of correlation coefficients corresponding to the correlations between all signal tracks is obtained.
There are options in the correlation script allowing one to control bin (sliding window) size and the overlap of the bins (windows).
There are two versions of the correlation tool. In Kevin Yip's version (Corr/Sat bundle) a final correlation matrix is created based on either the Spearman, Pearson, or normal score correlation between each pair of binned data sets. In
- Contact
Correlation P was written by Lucas Lochovsky The Saturation/Correlation bundle was written by Kevin Yip
Saturation
Saturation script allows us to determine the saturation level of a given feature after multiple genomic experiments.
Each input file corresponds to one experimental condition (e.g. one new individual), and each line in a file specifies a genomic location that has the biological phenomenon under study (e.g. tagged SNP's). Our implementation makes use of special data structures to avoid redundant counting. It normally takes less than a minute to generate the plot for up to 30 input files each with a few thousand lines. To handle more files and files with more lines, the tool also provides an option to compute the coverage of a random sample of the input file combinations.
- Contact
Kevin Yip
Web ACT
- Aggregation
Note: based on C++ version of source code
- Correlation
- Contact
Justin Jee
Other
- Citation
A paper describing this site and software is currently in preparation . Currently, please just reference act.gersteinlab.org if you use the tool.


