Grabbing Google Docs information with a Python
From GersteinInfo
(Difference between revisions)
(→GoogleSpreadsheet Class) |
|||
(2 intermediate revisions not shown) | |||
Line 33: | Line 33: | ||
''' An iterable google spreadsheet object. Each row is a dictionary with an entry for each field, keyed by the header. GData libraries from Google must be installed.''' | ''' An iterable google spreadsheet object. Each row is a dictionary with an entry for each field, keyed by the header. GData libraries from Google must be installed.''' | ||
- | def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password=' | + | def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='xyxyxy', source=''): |
gd_client = gdata.spreadsheet.service.SpreadsheetsService() | gd_client = gdata.spreadsheet.service.SpreadsheetsService() | ||
gd_client.email = user | gd_client.email = user | ||
Line 72: | Line 72: | ||
==Brief introduction to the code== | ==Brief introduction to the code== | ||
- | You may need to know some elementary Python and HTML scripting. | + | You may need to know some elementary Python and HTML scripting. A sanitized code can be found |
+ | [http://archive.gersteinlab.org/misc/scriptSkeleton.py here]. | ||
<br>There are a number of functions in the script in addition to the function: | <br>There are a number of functions in the script in addition to the function: |
Latest revision as of 16:08, 14 September 2011
Contents |
Google API
In order to use the GoogleSpreadsheet class, a number of dependencies must be met. Assuming you already have Linux and Python installed, follow these instructions to enable the Google Data API.
Example Usage
from GoogleSpreadsheet import GoogleSpreadsheet
spreadsheet_id = "45667" # Unique Spreadsheet ID
worksheet_id = "rf6"
spreadsheet = GoogleSpreadsheet(spreadsheet_id, worksheet_id)
for row in spreadsheet:
print row["lastname"]
GoogleSpreadsheet Class
Python code to import Google Spreadsheet objects. Save below text as GoogleSpreadsheet.py, then in your python code import GoogleSpreadsheet.
try:
from xml.etree import ElementTree
except ImportError:
from elementtree import ElementTree
import gdata.spreadsheet.service
import gdata.service
import atom.service
import gdata.spreadsheet
import atom
class GoogleSpreadsheet:
''' An iterable google spreadsheet object. Each row is a dictionary with an entry for each field, keyed by the header. GData libraries from Google must be installed.'''
def __init__(self, spreadsheet_id, worksheet_id, user='abc@place.org', password='xyxyxy', source=''):
gd_client = gdata.spreadsheet.service.SpreadsheetsService()
gd_client.email = user
gd_client.password = password
gd_client.source = source
gd_client.ProgrammaticLogin()
self.count = 0
self.rows = self.formRows(gd_client.GetListFeed(spreadsheet_id, worksheet_id))
def formRows(self, ListFeed):
rows = []
for entry in ListFeed.entry:
d = {}
for key in entry.custom.keys():
d[key] = entry.custom[key].text
rows.append(d)
return rows
def __iter__(self):
return self
def next(self):
if self.count >= len(self.rows):
self.count = 0
raise StopIteration
else:
self.count += 1
return self.rows[self.count - 1]
def __getitem__(self, item):
return self.rows[item]
def __len__(self):
return len(self.rows)
Brief introduction to the code
You may need to know some elementary Python and HTML scripting. A sanitized code can be found here.
There are a number of functions in the script in addition to the function:
divideSpreadsheet
- looks in the 'category' column and splits the spreadsheet into graduate students, postdocs, staff, research scientists and undergrads.
printSection
- if you add a column in the GoogleSpreadsheet, this is the segment in the script to add the column. - the new column should be added before "current position". This is because the format is such that printing of "current position" spans the bottom of all the columns. So you need to increase "colspan=" by the number of columns you are adding.
printHeader
- prints the header
printFooter
- prints the footer
The main function is located at the bottom of the script, and it basically prints the HTML in sequential order (header section footer...)