Papers Page Code Old

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Automatic generation of publication documents from XML)
([ return to index] [ Update Pages] [ Code Documentation] [ Zip Archives] [ Web Stats])
Line 30: Line 30:
==Automatic generation of publication documents from XML==
==Automatic generation of publication documents from XML==
-
====[ [http://papers.gersteinlab.org return to index] ] [ [http://papers.gersteinlab.org:1234 Update Pages] ] [ [http://papers.gersteinlab.org/papers/papersdoc.htm Code Documentation] ][ [http://papers.gersteinlab.org/papers/xml/zip.html Zip Archives] ] [ [http://papers.gersteinlab.org/webstats/ Web Stats] ]====
+
====[ [http://papers.gersteinlab.org/papers return to index] ] [ [http://papers.gersteinlab.org/papers_template Update Pages] ] [ [http://papers.gersteinlab.org/papers/papersdoc.htm Code Documentation] ][ [http://papers.gersteinlab.org/papers/xml/zip.html Zip Archives] ] [ [http://papers.gersteinlab.org/webstats/ Web Stats] ]====
<font size="3" face="arial, Helvetica, sans-serif"> '''Introduction and guidelines:''' All papers are defined by a unique "labid" such as "pgenes-nar" or "genome-transposon-nature". Ideally, the labid should contain the abbreviated subject and journal name as shown. Each paper has a directory named with the labid in /web/e-print. This directory should contain a file named <tt>citation.xml</tt> containing reference information for the paper. To add a paper, you simply need to create the proper e-print directory and citation.xml file, then invoke the scripts to regenerate the PubMed listing and indices. These can all be accomplished using the web forms and scripts located [/papers_template here.]
<font size="3" face="arial, Helvetica, sans-serif"> '''Introduction and guidelines:''' All papers are defined by a unique "labid" such as "pgenes-nar" or "genome-transposon-nature". Ideally, the labid should contain the abbreviated subject and journal name as shown. Each paper has a directory named with the labid in /web/e-print. This directory should contain a file named <tt>citation.xml</tt> containing reference information for the paper. To add a paper, you simply need to create the proper e-print directory and citation.xml file, then invoke the scripts to regenerate the PubMed listing and indices. These can all be accomplished using the web forms and scripts located [/papers_template here.]

Revision as of 10:59, 12 September 2011

Contents

New Stuff

MBGLab--Papers-Master : https://docs.google.com/spreadsheet/ccc?key=0AiiHlTECOi8edGhzSXlZZzEyRThQeDB6R0pRc0FvcGc&hl=en_US#gid=0


Old Stuff

Lists of Papers:

XML tags

XML Dumps:

Main Papers / Before 1997 / Medline results

Zip Archives:

E-print directory (>1.2 GB) / Papers (>590 MB)

Please see our copyright statement

Automatic generation of publication documents from XML

[ return to index ] [ Update Pages ] [ Code Documentation ][ Zip Archives ] [ Web Stats ]

Introduction and guidelines: All papers are defined by a unique "labid" such as "pgenes-nar" or "genome-transposon-nature". Ideally, the labid should contain the abbreviated subject and journal name as shown. Each paper has a directory named with the labid in /web/e-print. This directory should contain a file named citation.xml containing reference information for the paper. To add a paper, you simply need to create the proper e-print directory and citation.xml file, then invoke the scripts to regenerate the PubMed listing and indices. These can all be accomplished using the web forms and scripts located [/papers_template here.]

Each citation.xml file consits of a series of tags contained within a <UniqueData> element. The value of each tag is specified within the body of the element. Each file should have at least a <labid> tag. Two examples are available, one for a [samples/citation.xml paper listed in PubMed], and one for a [samples/citation2.xml paper from a non-PubMed journal] containing more complete citation information.

Here is a list of the tags and their meanings:

<UniqueData> - enclosing element for all the tags
<PMID> - PubMed id
<labtitle> - title of the article
<labcite> - citation of the article (author, journal, year, etc)
<Authors> - authors in [First Initial] [Lastname], ... format
<Journal> - journal that the article appeared in (may also be specified by <MedlineTA>)
<Year> - year the article was published
<Volume> - volume of the journal that article appeared in
<Pages> - pages of the volume that the article appeared in (may also be specifed by <MedlinePgn> or <bookdata>)
<ignore/> - whether the article should appear in the main index
<labid> - id by which to refer to the article
<website> - supplemental website
<website2> - second supplemental website
<sortval> - relative ordering of the article on the main index (high = early)
<preprint> - URL of the preprint file
<target> - currently reserved for NESG papers
<subject> - currently reserved for NESG papers
<grant> - specifies the grant(s) funding the paper (e.g. "cegs,keck")

The tags can conceptually be divided into two groups: ones such as PMID and labitle, which serve to identify the paper, and tags such as website and subject which supply supplemental information about the paper. There are three ways to identify a paper (in order of decreasing precedence):

  1. PMID
  2. labtitle, labcite
  3. labtitle, Authors, Journal and optionally Year, Volume and Pages

For the proper display of the paper, at least on of these methods must be specified in citation.xml. You should always include the PMID if a paper is known to be listed in PubMed. Option 2 should be used for papers that are in press.

The other group of tags supplies additional information about the paper specified by the first group of tags. All of these tags are optional, however used of <grant> and <preprint> is strongly encouraged. (Please consult Mark for guidelines on citing grants.)

CGI and Perl scripts: These are invoked from a password-protected directory on the server ([/papers_template click here to access]). You can view the source code [scripts here]. The most important script that most people will use is papers.cgi. This simply invokes two other scripts:

  • importEprint.sh creates the directory /web/papers/skel which is used for collecting citation info later on. It will back up the existing copy of /web/papers to /web/papers_template/backup/skel_DATE.tar.gz, which can later be recovered.
  • papers.pl is a script that takes a main XML file containing a list of articles in NCBI format, and collects information contained in the copies citation.xml files found in /web/papers/skel which specify supplimental information for the articles in the main file. It will add any papers not in the list downloaded from the NCBI to the overall list, which is saved in the file /web/papers/papers.xml. The following files are produced by this script:
/web/papers/index.html
/web/papers/paper-tags.htm
/web/papers/paper-ids.htm
/web/papers/papers-simple.html
/web/papers/papers.xml
/web/papers/[labid]/index.html
/web/papers/grant/index.html
/web/papers/grant/[grantid].html

The other scripts of importance:

  • downloadXML.cgi obtains a complete listing of Mark's publications from PubMed. This is a non-trivial task (and currently relies on the NCBI not changing any aspect of their site). The downloaded file is /web/papers/NCBIData.xml. This script should be invoked whenever a new paper is published.
  • rollback.cgi recovers an earlier copy of /web/papers in case of disaster. Backups are stored in /web/papers_template/backup .
Personal tools