Papers Page Code

From GersteinInfo

(Difference between revisions)

Jump to: navigation, search

Latest revision as of 17:02, 16 September 2019

Word Cloud

Link to Word Cloud

(Lab Member) Papers Page Rebuild and Further Info (Private)

Private Wiki includes instructions about how to update papers page and further info, click here Private Wiki

Papers GitHub

Papers 2.0 is now version controlled under GitHub: https://github.com/gersteinlab/papers.gersteinlab.org

SpreadSheet Structure

"Papers Page" is built up from two basic google spreadsheets, "Papers Master" and "Papers Subjects". "Papers Master" contains descriptions about papers, such as, pubmed ID, authors, citation, et al.. Papers are affiliated to some grants. Grants information is stored in "Papers Subjects".

Here is a list of the tags and their meanings

Papers Master

Papers Master:

<labid> - id by which to refer to the article
<PMID> - PubMed id
<title> - title of the article
<citation> - citation of the article (author, journal, year, etc)
<preprint> - URL of the preprint file
<subjects> - specifies the grant(s) funding the paper (e.g. "cegs,keck")
<website> - supplemental website
<Year> - year the article was published
<footnote> - additional information
<website2> - second supplemental website

The tags can conceptually be divided into two groups: ones such as PMID and title, which serve to identify the paper, and tags such as website and subject which supply supplemental information about the paper. There are two ways to identify a paper (in order of decreasing precedence):

I. PMID
II. title, citation

You should always include the PMID if a paper is known to be listed in PubMed. Option 2 should be used for papers that are in press.

The other group of tags supplies additional information about the paper specified by the first group of tags. All of these tags are optional, however used of <subjects> and <preprint> is strongly encouraged.

labid	PMID	title	citation	preprint	subject	website	Year	footnote	website2
metamembrane	20430783			http://archive.gersteinlab.org/papers/e-print/metamembrane/preprint.pdf	interactions	http://metagenomics.gersteinlab.org/membrane	2010

Spreadsheet of Paper IDs and PubMed IDs with other annotation: MBGLab--Papers-Master (HTML) (CSV)

Papers Subjects

Papers Subjects:

<category> - classification of grants
<labid> - id refer to each grant
<title> - description of each grant
<website> - external website
<html> - additional information

We encourage you to sort <category> after adding new grants because of coding issues. <website> should also be reflected in <html> section. For example, "don" has <website> "http://www.donaghue.org", also "URL: <A HREF=http://www.donaghue.org> http://www.donaghue.org</A>" in <html> section.

category	labid	title	website	html
Research Grants	don	Dongahue Young Investigator	http://www.donaghue.org/	Young investigator award from Donaghue Foundation to M Gerstein (PI), "Comparative Genomics of Microbial Pathogens," (DF98-113, 1/1/99-12/31/03). URL: <A HREF=http://www.donaghue.org> http://www.donaghue.org</A>Articles funded by this grant:

HTML table of all subject headings in the current papers page

MBGLab--Papers XML Import

HTML table of all PubMed entries in tabular format

Generate publication documents from SpreadSheet

Two Steps Flowchart (created by Mike Wilson):

Download XML file from NCBI using PubMed ID to generate pubmed_spreadsheet. Pubmed_spreadsheet stores <title> <citation> et al. of papers corresponding to "PMID" in "Papers Master". This step is done by scripts automatically.

GoogleSpreadsheet.py: grab googlespreadsheet with python, see Grab_GoogleSpreadsheet_with_a_Python
Other Code: see PubmedSpreadsheet_Generation_Code

Pipeline:

First obtain pubmed_result.xml from papers medline query
   parse_pmids.py
   curl `cat ncbiquery.txt` > NCBIData.xml
Reformat NCBIData.xml to tab delimited file to upload to Google
   python import.py
replace PubMed Import XML spreadsheet with export_out.tab
   reload_data.py

Build Papers Page

This step grabs all information from three spreadsheets, "Papers Master","Papers Subjects" and "Pubmed_Spreadsheet", to build up the whole website. Each paper and each grant has its own description page.

Script used in this step:

update.py : see Build_Papers_Page_Code

Papers Page Code (Old)

Papers_Page_Code_Old

Old Papers Server ... and also R code for NSF colab

Old version of pagers website Old Papers

Other info of papers Paper_search

R Code for compiling list of NSF co-authors: This is buggy and approximate, but a good place to start

https://github.com/ejfertig/NSFBiosketch/blob/master/CollaboratorList.Rmd

library('easyPubMed')
library('plyr')
library('xlsx')
author <- 'Gerstein M'
authorFilter <- 'Gerstein Maya'
currentDate <- Sys.Date()
queryDate <- seq(Sys.Date(), length = 2, by = "-48 months")[2]
pmquery <- sapply(articles_to_list(fetch_pubmed_data(get_pubmed_ids(paste0(author,'[Author]')))),article_to_df)
names(pmquery) <- NULL
pmquery <- pmquery[sapply(pmquery,function(x){any(colnames(x)=='year')})]
datePMID <- as.Date(sapply(pmquery,function(x){paste(x[1,'year'], x[1,'month'], x[1,'day'],sep="-")}))
pmquery <- pmquery[datePMID >= queryDate]
pmquery.dataframe <- ldply(pmquery,data.frame)
pmquery.dataframe$Initials <- getInitials(pmquery.dataframe$firstname)
pmquery.dataframe$ColabType <- 'Co-Author'
pmquery.dataframe <- pmquery.dataframe[!duplicated(paste(pmquery.dataframe$lastname,pmquery.dataframe$Initials)),]
pmquery.dataframe <- pmquery.dataframe[order(pmquery.dataframe$lastname,pmquery.dataframe$Initials),]

@@ Line 1: / Line 1: @@
-==New Stuff==
+==Word Cloud==
-[https://docs.google.com/spreadsheet/ccc?key=0AiiHlTECOi8edGhzSXlZZzEyRThQeDB6R0pRc0FvcGc&hl=en_US#gid=0 MBGLab--Papers-Master]
-[https://docs.google.com/spreadsheet/ccc?key=0AiiHlTECOi8edDBCUUhtMFZzRFVMaVlxZ1JrNGp1b0E&hl=en_US#gid=0 Papers-Subjects]
+Link to [[Word Cloud]]
-==Automatic generation of publication documents from SpreadSheet==
+== (Lab Member) Papers Page Rebuild and Further Info (Private)==
+*Private Wiki includes instructions about how to update papers page and further info, click here [http://wiki.gersteinlab.org/labinfo/Papers_Page_Documentation Private Wiki]
-====[http://papers.gersteinlab.org Return to index]====
+==Papers GitHub==
+Papers 2.0 is now version controlled under GitHub: https://github.com/gersteinlab/papers.gersteinlab.org
+==SpreadSheet Structure==
+*"Papers Page" is built up from two basic google spreadsheets, "Papers Master" and "Papers Subjects". "Papers Master" contains descriptions about papers, such as, pubmed ID, authors, citation, et al..  Papers are affiliated to some grants. Grants information is stored in "Papers Subjects".
+Here is a list of the tags and their meanings<br \>
+===Papers Master===
+Papers Master:
+ <labid> - id by which to refer to the article
+ <PMID> - PubMed id
+ <title> - title of the article
+ <citation> - citation of the article (author, journal, year, etc)
+ <preprint> - URL of the preprint file
+ <subjects> - specifies the grant(s) funding the paper (e.g. "cegs,keck")
+ <website> - supplemental website
+ <Year> - year the article was published
+ <footnote> - additional information
+ <website2> - second supplemental website
-<font size="3" face="arial, Helvetica, sans-serif"> '''Introduction and guidelines:''' All papers are defined by a unique "labid" such as "pgenes-nar" or "genome-transposon-nature". Ideally, the labid should contain the abbreviated subject and journal name as shown. To add a paper, you simply need to go to MBGLab--Papers-Master (Link above) , then fill in the corresponding columns. If you also want to add a new subject area, go to Papers-Subjects (Link above). After finishing adding new papers, go to [http://wiki.gersteinlab.org/labinfo/Main_Page Private Wiki] and find the rebuild link, then click it.
+The tags can conceptually be divided into two groups: ones such as PMID and title, which serve to identify the paper, and tags such as website and subject which supply supplemental information about the paper. There are two ways to identify a paper (in order of decreasing precedence):
-Here is a list of the tags and their meanings:
+I. PMID<br \>
+II. title, citation
-<blockquote>
+You should always include the PMID if a paper is known to be listed in PubMed. Option 2 should be used for papers that are in press.
-<UniqueData> - enclosing element for all the tags<br />
+The other group of tags supplies additional information about the paper specified by the first group of tags. All of these tags are optional, however used of <subjects> and <preprint> is strongly encouraged.
-<PMID> - PubMed id<br />
-<labtitle> - title of the article<br />
-<labcite> - citation of the article (author, journal, year, etc)<br />
-<Authors> - authors in [First Initial] [Lastname], ... format<br />
-<Journal> - journal that the article appeared in (may also be specified by <MedlineTA>)<br />
-<Year> - year the article was published<br />
-<Volume> - volume of the journal that article appeared in<br />
-<Pages> - pages of the volume that the article appeared in (may also be specifed by <MedlinePgn> or <bookdata>)<br />
-<ignore/> - whether the article should appear in the main index<br />
-<labid> - id by which to refer to the article<br />
-<website> - supplemental website<br />
-<website2> - second supplemental website<br />
-<sortval> - relative ordering of the article on the main index (high = early)<br />
-<preprint> - URL of the preprint file<br />
-<target> - currently reserved for NESG papers<br />
-<subject> - currently reserved for NESG papers<br />
-<grant> - specifies the grant(s) funding the paper (e.g. "cegs,keck")<br />
-</blockquote>
-The tags can conceptually be divided into two groups: ones such as PMID and labitle, which serve to identify the paper, and tags such as website and subject which supply supplemental information about the paper. There are three ways to identify a paper (in order of decreasing precedence):
+ {| border="1" cellpadding="0" cellspacing ="0"
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''labid'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''PMID'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''title'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''citation'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''preprint'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''subject'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''website'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''Year'''
+ ! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''footnote'''
+ ! scope="col" width="100" align="center" style="background:#f0f0f0;"|'''website2'''
+ |-
+ |metamembrane||20430783||||||http://archive.gersteinlab.org/papers/e-print/metamembrane/preprint.pdf||interactions||http://metagenomics.gersteinlab.org/membrane||2010||||||
+ |}
-# PMID
-# labtitle, labcite
-# labtitle, Authors, Journal and optionally Year, Volume and Pages
-For the proper display of the paper, at least on of these methods must be specified in citation.xml. You should always include the PMID if a paper is known to be listed in PubMed. Option 2 should be used for papers that are in press.
+Spreadsheet of Paper IDs and PubMed IDs with other annotation: MBGLab--Papers-Master ([https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AiiHlTECOi8edGhzSXlZZzEyRThQeDB6R0pRc0FvcGc&output=html HTML]) ([https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AiiHlTECOi8edGhzSXlZZzEyRThQeDB6R0pRc0FvcGc&output=csv CSV])
-The other group of tags supplies additional information about the paper specified by the first group of tags. All of these tags are optional, however used of <tt><grant></tt> and <tt><preprint> </tt> is strongly encouraged. (Please consult Mark for guidelines on citing grants.)
+===Papers Subjects===
+Papers Subjects:
+ <category> - classification of grants
+ <labid> - id refer to each grant
+ <title> - description of each grant
+ <website> - external website
+ <html> - additional information
-'''CGI and Perl scripts:''' These are invoked from a password-protected directory on the server ([/papers_template click here to access]). You can view the source code [scripts here]. The most important script that most people will use is <tt>papers.cgi</tt>. This simply invokes two other scripts:
+We encourage you to sort <category> after adding new grants because of coding issues. <website> should also be reflected in <html> section. For example, "don" has <website> "http://www.donaghue.org", also "URL: <A HREF=http://www.donaghue.org> http://www.donaghue.org</A>" in <html>  section.
-* <tt>importEprint.sh</tt> creates the directory <tt>/web/papers/skel</tt> which is used for collecting citation info later on. It will back up the existing copy of <tt>/web/papers</tt> to <tt>/web/papers_template/backup/skel_DATE.tar.gz</tt>, which can later be recovered.
+{| border="1" cellpadding="0" cellspacing ="0"
-* <tt>papers.pl</tt> is a script that takes a main XML file containing a list of articles in NCBI format, and collects information contained in the copies citation.xml files found in <tt>/web/papers/skel</tt> which specify supplimental information for the articles in the main file. It will add any papers not in the list downloaded from the NCBI to the overall list, which is saved in the file <tt>/web/papers/papers.xml</tt>. The following files are produced by this script:
+! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''category'''
+! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''labid'''
+! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''title'''
+! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''website'''
+! scope="col" width="50" align="center" style="background:#f0f0f0;"|'''html'''
+|-
+|Research Grants||don||Dongahue Young Investigator||http://www.donaghue.org/||Young investigator award from Donaghue Foundation to M Gerstein (PI), "Comparative Genomics of Microbial Pathogens," (DF98-113, 1/1/99-12/31/03). URL: <A HREF=http://www.donaghue.org> http://www.donaghue.org</A>Articles funded by this grant:
+|}
-<blockquote>
+[https://docs.google.com/spreadsheet/pub?key=0AiiHlTECOi8edDBCUUhtMFZzRFVMaVlxZ1JrNGp1b0E&output=html HTML table of all subject headings in the current papers page]
-/web/papers/index.html<br/>
-/web/papers/paper-tags.htm<br/>
-/web/papers/paper-ids.htm<br/>
-/web/papers/papers-simple.html<br/>
-/web/papers/papers.xml<br/>
-/web/papers/[labid]/index.html<br/>
-/web/papers/grant/index.html<br/>
-/web/papers/grant/[grantid].html<br/>
-</blockquote>
-The other scripts of importance:
+===MBGLab--Papers XML Import===
-* <tt>downloadXML.cgi</tt> obtains a complete listing of Mark's publications from PubMed. This is a non-trivial task (and currently relies on the NCBI not changing any aspect of their site). The downloaded file is <tt>/web/papers/NCBIData.xml</tt>. This script should be invoked whenever a new paper is published.
+[https://docs.google.com/spreadsheet/pub?key=0AiiHlTECOi8edHFxQWRwN0kxeTQ3ZzdCOXhQX2Z1ZGc&output=html HTML table of all PubMed entries in tabular format]
-* <tt>rollback.cgi</tt> recovers an earlier copy of <tt>/web/papers</tt> in case of disaster. Backups are stored in <tt>/web/papers_template/backup </tt>.
-</font>
-==Rebuild Link==
+==Generate publication documents from SpreadSheet==
-{|
+Two Steps Flowchart (created by Mike Wilson):
-|[http://wiki.gersteinlab.org/labinfo/Main_Page Rebuild Link on private wiki]
-| align="center" |
+[[Image:Papers1.png||Import XML Flow]] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [[Image:Papers2.png||Rebuild Papers Page]]
-|}
+* Download XML file from NCBI using PubMed ID to generate pubmed_spreadsheet. Pubmed_spreadsheet stores <title> <citation> et al. of papers corresponding to "PMID" in "Papers Master". This step is done by scripts automatically.
+GoogleSpreadsheet.py: grab googlespreadsheet with python, see [[Grab_GoogleSpreadsheet_with_a_Python]]<br \>
+Other Code: see [[PubmedSpreadsheet_Generation_Code]]
+Pipeline:
+ First obtain pubmed_result.xml from papers medline query
+    parse_pmids.py<br \>   curl `cat ncbiquery.txt` > NCBIData.xml
+ Reformat NCBIData.xml to tab delimited file to upload to Google
+    python import.py
+ replace PubMed Import XML spreadsheet with export_out.tab
+    reload_data.py
+*Build Papers Page
+This step grabs all information from three spreadsheets, "Papers Master","Papers Subjects" and "Pubmed_Spreadsheet", to build up the whole website. Each paper and each grant has its own description page.
+Script used in this step:
+update.py : see [[Build_Papers_Page_Code]]
+==Papers Page Code (Old)==
+*[[Papers_Page_Code_Old]]
+==Old Papers Server ... and also R code for NSF colab ==
+*Old version of pagers website  [http://oldpapers.gersteinlab.org Old Papers]
+*Other info of papers [[Paper_search]]
+R Code for compiling list of NSF co-authors:
+This is buggy and approximate, but a good place to start
+https://github.com/ejfertig/NSFBiosketch/blob/master/CollaboratorList.Rmd
-==Old Code Page==
+ library('easyPubMed')
-Redirect [[Papers Page Code Old]]
+ library('plyr')
+ library('xlsx')
+ author <- 'Gerstein M'
+ authorFilter <- 'Gerstein Maya'
+ currentDate <- Sys.Date()
+ queryDate <- seq(Sys.Date(), length = 2, by = "-48 months")[2]
+ pmquery <- sapply(articles_to_list(fetch_pubmed_data(get_pubmed_ids(paste0(author,'[Author]')))),article_to_df)
+ names(pmquery) <- NULL
+ pmquery <- pmquery[sapply(pmquery,function(x){any(colnames(x)=='year')})]
+ datePMID <- as.Date(sapply(pmquery,function(x){paste(x[1,'year'], x[1,'month'], x[1,'day'],sep="-")}))
+ pmquery <- pmquery[datePMID >= queryDate]
+ pmquery.dataframe <- ldply(pmquery,data.frame)
+ pmquery.dataframe$Initials <- getInitials(pmquery.dataframe$firstname)
+ pmquery.dataframe$ColabType <- 'Co-Author'
+ pmquery.dataframe <- pmquery.dataframe[!duplicated(paste(pmquery.dataframe$lastname,pmquery.dataframe$Initials)),]
+ pmquery.dataframe <- pmquery.dataframe[order(pmquery.dataframe$lastname,pmquery.dataframe$Initials),]

Papers Page Code

From GersteinInfo

Latest revision as of 17:02, 16 September 2019

Contents

Word Cloud

(Lab Member) Papers Page Rebuild and Further Info (Private)

Papers GitHub

SpreadSheet Structure

Papers Master

Papers Subjects

MBGLab--Papers XML Import

Generate publication documents from SpreadSheet

Papers Page Code (Old)

Old Papers Server ... and also R code for NSF colab

Views

Personal tools

GersteinLab Public Wiki

Search

Toolbox