Grabbing Google Docs information with a Python

From GersteinInfo

Revision as of 15:34, 14 September 2011 by Public (Talk | contribs)
Jump to: navigation, search

Contents

Google API

In order to use the GoogleSpreadsheet class, a number of dependencies must be met. Assuming you already have Linux and Python installed, follow these instructions to enable the Google Data API.


GoogleSpreadsheet Class

Python code to import Google Spreadsheet objects. Save below text as GoogleSpreadsheet.py, then in your python code import GoogleSpreadsheet.

Example Usage

from GoogleSpreadsheet import GoogleSpreadsheet

spreadsheet_id = "45667" # Unique Spreadsheet ID
worksheet_id = "rf6"

spreadsheet = GoogleSpreadsheet(spreadsheet_id, worksheet_id)
for row in spreadsheet:
  print row["lastname"]

Code

try:
  from xml.etree import ElementTree
except ImportError:
  from elementtree import ElementTree
import gdata.spreadsheet.service
import gdata.service
import atom.service
import gdata.spreadsheet
import atom

class GoogleSpreadsheet:
	''' An iterable google spreadsheet object.  Each row is a dictionary with an entry for each field, keyed by the header.  GData libraries from Google must be installed.'''
	
	def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='2312324', source=''):
		gd_client = gdata.spreadsheet.service.SpreadsheetsService()
		gd_client.email = user
		gd_client.password = password
		gd_client.source = source
		gd_client.ProgrammaticLogin()
		
		self.count = 0
		self.rows = self.formRows(gd_client.GetListFeed(spreadsheet_id, worksheet_id))
		
	def formRows(self, ListFeed):
		rows = []
		for entry in ListFeed.entry:
			d = {}
			for key in entry.custom.keys():
				d[key] = entry.custom[key].text
			rows.append(d)
		return rows
			
	def __iter__(self):
		return self
		
	def next(self):
		if self.count >= len(self.rows):
			self.count = 0
			raise StopIteration
		else:
			self.count += 1
			return self.rows[self.count - 1]
	
	def __getitem__(self, item):
		return self.rows[item]
		
	def __len__(self):
		return len(self.rows)

Brief introduction to the code

You may need to know some elementary Python and HTML scripting.

  • Brief introduction to the different sections

There are a number of functions in the script in addition to the function: divideSpreadsheet

- looks in the 'category' column and splits the spreadsheet into graduate students, postdocs, staff, research scientists and undergrads. 

printSection

- if you add a column in the GoogleSpreadsheet, this is the segment in the script to add the column.
- the new column should be added before "current position". This is because the format is such that printing of "current position" spans the bottom of all the columns. So you need to increase "colspan=" by the number of columns you are adding. 

printHeader

- prints the header

printFooter

- prints the footer

The main function is located at the bottom of the script, and it basically prints the HTML in sequential order (header section footer...)

Personal tools