Grabbing Google Docs information with a Python

From GersteinInfo

(Difference between revisions)
Jump to: navigation, search
(Created page with '==Google API== In order to use the GoogleSpreadsheet class, a number of dependencies must be met. Assuming you already have Linux and Python installed, follow these [http://cod…')
(GoogleSpreadsheet Class)
 
(10 intermediate revisions not shown)
Line 2: Line 2:
In order to use the GoogleSpreadsheet class, a number of dependencies must be met.  Assuming you already have Linux and Python installed, follow these [http://code.google.com/apis/gdata/articles/python_client_lib.html#dependencies instructions] to enable the Google Data API.
In order to use the GoogleSpreadsheet class, a number of dependencies must be met.  Assuming you already have Linux and Python installed, follow these [http://code.google.com/apis/gdata/articles/python_client_lib.html#dependencies instructions] to enable the Google Data API.
-
 
-
 
-
==GoogleSpreadsheet Class==
 
-
 
-
Python code to import Google Spreadsheet objects.  Save below text as GoogleSpreadsheet.py, then in your python code import GoogleSpreadsheet.
 
==Example Usage==
==Example Usage==
Line 20: Line 15:
</pre></code>
</pre></code>
-
==Code==
+
==GoogleSpreadsheet Class==
 +
 
 +
Python code to import Google Spreadsheet objects.  Save below text as GoogleSpreadsheet.py, then in your python code import GoogleSpreadsheet.
<code><pre>
<code><pre>
Line 36: Line 33:
''' An iterable google spreadsheet object.  Each row is a dictionary with an entry for each field, keyed by the header.  GData libraries from Google must be installed.'''
''' An iterable google spreadsheet object.  Each row is a dictionary with an entry for each field, keyed by the header.  GData libraries from Google must be installed.'''
-
def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='2312324', source=''):
+
def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='xyxyxy', source=''):
gd_client = gdata.spreadsheet.service.SpreadsheetsService()
gd_client = gdata.spreadsheet.service.SpreadsheetsService()
gd_client.email = user
gd_client.email = user
Line 73: Line 70:
</pre></code>
</pre></code>
-
==Grab GoogleSpreadsheet with Python==
+
==Brief introduction to the code==
-
 
+
-
This is a practical guide to grab data from the GoogleSpreadsheet with a Python script. Since changes were made to the Gersteinlab Alumni page, that will be used as the example. This can be done similarly for the People's page.
+
-
You may need to know some elementary Python and HTML scripting.
+
You may need to know some elementary Python and HTML scripting. A sanitized code can be found
 +
[http://archive.gersteinlab.org/misc/scriptSkeleton.py here].
-
*Editing <br>
+
<br>There are a number of functions in the script in addition to the function:
-
There are a number of functions in the script in addition to the function:
+
divideSpreadsheet
divideSpreadsheet
  - looks in the 'category' column and splits the spreadsheet into graduate students, postdocs, staff, research scientists and undergrads. <br>
  - looks in the 'category' column and splits the spreadsheet into graduate students, postdocs, staff, research scientists and undergrads. <br>

Latest revision as of 16:08, 14 September 2011

Contents

Google API

In order to use the GoogleSpreadsheet class, a number of dependencies must be met. Assuming you already have Linux and Python installed, follow these instructions to enable the Google Data API.

Example Usage

from GoogleSpreadsheet import GoogleSpreadsheet

spreadsheet_id = "45667" # Unique Spreadsheet ID
worksheet_id = "rf6"

spreadsheet = GoogleSpreadsheet(spreadsheet_id, worksheet_id)
for row in spreadsheet:
  print row["lastname"]

GoogleSpreadsheet Class

Python code to import Google Spreadsheet objects. Save below text as GoogleSpreadsheet.py, then in your python code import GoogleSpreadsheet.

try:
  from xml.etree import ElementTree
except ImportError:
  from elementtree import ElementTree
import gdata.spreadsheet.service
import gdata.service
import atom.service
import gdata.spreadsheet
import atom

class GoogleSpreadsheet:
	''' An iterable google spreadsheet object.  Each row is a dictionary with an entry for each field, keyed by the header.  GData libraries from Google must be installed.'''
	
	def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='xyxyxy', source=''):
		gd_client = gdata.spreadsheet.service.SpreadsheetsService()
		gd_client.email = user
		gd_client.password = password
		gd_client.source = source
		gd_client.ProgrammaticLogin()
		
		self.count = 0
		self.rows = self.formRows(gd_client.GetListFeed(spreadsheet_id, worksheet_id))
		
	def formRows(self, ListFeed):
		rows = []
		for entry in ListFeed.entry:
			d = {}
			for key in entry.custom.keys():
				d[key] = entry.custom[key].text
			rows.append(d)
		return rows
			
	def __iter__(self):
		return self
		
	def next(self):
		if self.count >= len(self.rows):
			self.count = 0
			raise StopIteration
		else:
			self.count += 1
			return self.rows[self.count - 1]
	
	def __getitem__(self, item):
		return self.rows[item]
		
	def __len__(self):
		return len(self.rows)

Brief introduction to the code

You may need to know some elementary Python and HTML scripting. A sanitized code can be found here.


There are a number of functions in the script in addition to the function: divideSpreadsheet

- looks in the 'category' column and splits the spreadsheet into graduate students, postdocs, staff, research scientists and undergrads. 

printSection

- if you add a column in the GoogleSpreadsheet, this is the segment in the script to add the column.
- the new column should be added before "current position". This is because the format is such that printing of "current position" spans the bottom of all the columns. So you need to increase "colspan=" by the number of columns you are adding. 

printHeader

- prints the header

printFooter

- prints the footer

The main function is located at the bottom of the script, and it basically prints the HTML in sequential order (header section footer...)

Personal tools