ACT FAQ

From GersteinInfo

Revision as of 15:23, 29 October 2010 by Mbg (Talk | contribs)
Jump to: navigation, search

ACT FAQ's

  • corr-sat-bundle: I noticed that the toy input files with Corr-sat are actual appended with .bedGraph4 as opposed to .wig (although the format of the actual files are the same). Does this matter for the program? Can Corr-sat accommodate multiple file types?

The program takes a raw signal file in multiple file formats, turns it into a bin file, which is in wig format, and then performs correlation/saturation analysis. There might be some confusion because the .bedGraph4 files provided in the example are a kind of wig format. However, since the program also creates another wig file during the binning process, another file appendage is used for the initial input.


  • How do I use corr-p in conjunction with the "integrated example"?

Since we are dealing with signal tracks rather than SNP positions or a bed file of genomic locations, we advise users to use the correlation tool found in corr-sat-bundle to do the correlation calculation. The corr-sat-bundle script able to use multiple types of correlation coefficients and uses a smaller binning window, which are also beneficial for the purposes of this example.


  • How small are the "small datasets" which can be used with Web ACT and GSA?

Web ACT and GSA were designed with pilot-phase ENCODE-sized data sets or smaller in mind. For example, the examples on the web-act tool are on the order of hundreds of thousands of lines in length. The GSA is designed to handle signal files of between thousands and tens of thousands of lines in length. The size of the annotation file generally does not affect the speed at which the program runs.


  • Why might a command like this produce an error?
$ python ../ACT.py --mbins=150 --nbins=10 --region --radius=500 --output=148_testregion.out 148_Exons.bed h3k36.wig

Not including the --region tag will make it so that act will aggregate around single points, in this case the start sites of all the exons you supplied, plus or minus a specified radius. If the --region tag IS included, what it will do is aggregate "radius" base pairs upstream of the start site, downstream of the stop site, AND the "region" between the start and stop site. The way it will do this is to create "mbins" number of bins between the start and stop site for each exon in the annotations file, and scale the bins based on the size of the exon. What is probably happening here is that since you set mbins to a relatively high number (150, which is probably larger than the size of some of the exons) there are some cases where it is trying to divide the annotations file into more bins than there are base pairs, hence the error. So the solution would be to set mbins to something smaller.

Personal tools