Grabbing Google Docs information with a Python
From GersteinInfo
Contents |
Google API
In order to use the GoogleSpreadsheet class, a number of dependencies must be met. Assuming you already have Linux and Python installed, follow these instructions to enable the Google Data API.
Example Usage
from GoogleSpreadsheet import GoogleSpreadsheet
spreadsheet_id = "45667" # Unique Spreadsheet ID
worksheet_id = "rf6"
spreadsheet = GoogleSpreadsheet(spreadsheet_id, worksheet_id)
for row in spreadsheet:
print row["lastname"]
GoogleSpreadsheet Class
Python code to import Google Spreadsheet objects. Save below text as GoogleSpreadsheet.py, then in your python code import GoogleSpreadsheet.
try:
from xml.etree import ElementTree
except ImportError:
from elementtree import ElementTree
import gdata.spreadsheet.service
import gdata.service
import atom.service
import gdata.spreadsheet
import atom
class GoogleSpreadsheet:
''' An iterable google spreadsheet object. Each row is a dictionary with an entry for each field, keyed by the header. GData libraries from Google must be installed.'''
def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='xyxyxy', source=''):
gd_client = gdata.spreadsheet.service.SpreadsheetsService()
gd_client.email = user
gd_client.password = password
gd_client.source = source
gd_client.ProgrammaticLogin()
self.count = 0
self.rows = self.formRows(gd_client.GetListFeed(spreadsheet_id, worksheet_id))
def formRows(self, ListFeed):
rows = []
for entry in ListFeed.entry:
d = {}
for key in entry.custom.keys():
d[key] = entry.custom[key].text
rows.append(d)
return rows
def __iter__(self):
return self
def next(self):
if self.count >= len(self.rows):
self.count = 0
raise StopIteration
else:
self.count += 1
return self.rows[self.count - 1]
def __getitem__(self, item):
return self.rows[item]
def __len__(self):
return len(self.rows)
Brief introduction to the code
You may need to know some elementary Python and HTML scripting. A sanitized code can be found here.
There are a number of functions in the script in addition to the function:
divideSpreadsheet
- looks in the 'category' column and splits the spreadsheet into graduate students, postdocs, staff, research scientists and undergrads.
printSection
- if you add a column in the GoogleSpreadsheet, this is the segment in the script to add the column. - the new column should be added before "current position". This is because the format is such that printing of "current position" spans the bottom of all the columns. So you need to increase "colspan=" by the number of columns you are adding.
printHeader
- prints the header
printFooter
- prints the footer
The main function is located at the bottom of the script, and it basically prints the HTML in sequential order (header section footer...)