I've been playing with the recently-released HTTP API for accessing the Best Buy product catalog. While its a little strange to use at first, its actually pretty useful. One of the things I am interested in is online retail, specifically how to make Internet shopping easier. Lets imagine I am looking for information on a particular digital camera - the Nikon Coolpix S210.

A Python skeleton
First, lets get our little Python test harness together. Also, you are going to need your own Best Buy Remix API key. Here is a skeletal Python HTTP client:

import httplib

API_KEY=''

QUERY="/v1/products(name=Coolpix*&modelNumber=S210)"
OPTS="?sort=name.desc&show=all&format=json"
c = httplib.HTTPConnection('api.remix.bestbuy.com')
c.request('GET', "%s%s&apiKey=%s" %(QUERY, OPTS, API_KEY))
r = c.getresponse()
data = r.read()
print data
Save the above to a file like bb.py.

Our first Best Buy query
Now lets try to write a sample query for the Nikon Coolpix S210. Although the Best Buy Remix API docs are a bit sparse, we can guess that items must have an attribute called 'name'. In fact they do! So lets try searching for the camera by name.

# Same code as above, but we change the value of QUERY:
QUERY="/v1/products(name=Nikon Coolpix S210)"
Looks pretty reasonable. Unfortunately, Best Buy is going to give us back a 400 error:

$ python bb.py
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>400 - Bad Request</title>
 </head>
 <body>
  <h1>400 - Bad Request</h1>
 </body>
</html>

Gimme everything!
It turns out that Best Buy don't name their products in the most intuitive way. Lets try a wildcard on just `Coolpix' instead:

# Same code as above, but we change the value of QUERY:
QUERY="/v1/products(name=Coolpix*)"

This time, we are going to get tons of data back, in JSON format. Best Buy remix defaults to XML, but I prefer JSON so I added the format=json parameter to the query. Ok, so now we have an overwhelming amount of data on Coolpix cameras - but we really just want information for the S210.

Best Buy's quirky product schema
Well, there is a solution. Best Buy don't store the model number in the `name' attribute - instead they store it in a separate `modelNumber' attribute. If we query for name=Coolpix* AND modelNumber=S210, we should get the expected result, finally:

# Same code as above, but we change the value of QUERY:
QUERY="/v1/products(name=Coolpix*&modelNumber=S210)"

Et voila! Now Best Buy gives us back all the information it has about the Nikon Coolpix S210. This is pretty detailed stuff, including all those details like compatible memory formats, digital zoom, along with the price and availability. Very cool! Just for kicks, lets show the whole script to send a query to Best Buy, parse the JSON response, and finally print the price:

import httplib
import json

API_KEY=''
QUERY="/v1/products(name=Coolpix*&modelNumber=S210)"
OPTS="?sort=name.desc&show=all&format=json"
c = httplib.HTTPConnection('api.remix.bestbuy.com')
c.request('GET', "%s%s&apiKey=%s" %(QUERY, OPTS, API_KEY))
r = c.getresponse()
data = r.read()

camera_info = json.loads(data)

print "price: %s"%(camera_info['products'][0]['regularPrice'])
And we run it:
$ python bb.py
price: 89.99
Whee!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

On ORMs
It so happens that I end up dealing with the Python ORM SQLObject pretty often. I don't really like ORMs very much, since in my experience they make those 80% of database things that are already easy to do with plain SQL easier, while making the other 20% of database things which are already hard impossible. They do save some boiler-plate, and let you express your schema and queries in Python (or whatever programming language you are using) instead of SQL - but this tends to break down at a certain level of complexity, and just gets in the way. You end up doing a huge amount of wrangling with the ORM to do something which is very simple in plain old SQL. Fundamentally, SQL was designed as a declarative query language for the relational model and not to represent object hierarchies in the same way programming languages do, hence ORMs are always going to be a nasty hack in my opinion.

SQLObject vs. SQLAlchemy
sqlalchemy logo I actually much prefer SQLAlchemy to SQLObject. SQLAlchemy has a more explicit divide between its various components - you aren't forced to use the ORM stuff if you don't want to. It can be used just as a handy database abstraction layer with programmatic SQL and connection pooling and so on if you want. And if you truly want to go the full ORM mapping route, they provide that too. For truly tricky things, SQLAlchemy will be happy to provide you with a DB-API 2.0 cursor so that you can execute whatever custom SQL you wish.

Monolithic vs. modular
monolith This is my main problem with SQLObject - its very difficult to figure out how to get at the underlying database connection. I don't know how its possible to use the connection pooling and programmatic SQL builder without using the ORM but perhaps it is doable. The documentation for SQLObject is far inferior to the documentation of SQLAlchemy I'm sorry to say. Just try to figure out how to use transactions reliably with SQLObject! Even when I managed to put together some code which according to the documentation should work, SQLObject decided to interleave the actions in separate transactions. With SQLAlchemy I never had this problem.

Getting at the cursor
While seemingly undocumented, it is in fact possible to get the underlying driver's connection object, and from there grab a DB-API cursor. The pattern is:

# Set up the SQLObject connection as usual
connection = connectionForURI('sqlite:/:memory:')
# Grab the database connection
dbobj = connection.getConnection()
# Get a cursor from the low-level driver
cursor = dbobj.cursor()
# 
cursor.close()
Et voila.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

I wrote in a previous article about SQLite is great for small-to-medium web projects and also prototyping. Its not very hard to port a SQLite implementation to a more robust and scalable RDBMS such as PostgreSQL. Anyway, if you have used SQLite in any capacity, you have no doubt noticed that it does not have very strict type enforcement. You can put pretty much whatever you want into a column. This isn't a big deal when you are working with a dynamically-typed language such as Python. Its pretty trivial to convert an integer to a string or vice-versa. One exception to this is with datetime.date and datetime.datetime objects. Date objects map nicely to the SQLite 'DATE' type and datetime objects map nicely to the SQLite 'TIMESTAMP' type. Its very common that you will want, in your Python code, to deal with real datetime or date objects, for the purposes of arithmetic or formatting. It can be a real pain in the ass to manually convert your datetime/date objects to and from SQLite-compatible string representations - both for results going out of the database and for values going into the database. Luckily, you don't have to! Python's sqlite3 module has native converters for both these types. You simply need to ensure that your SQLite schema has the correct types specified for the columns, and sqlite3 can do it for you. For example, here is a basic table definition:

CREATE TABLE rateable_scale(
rateable_scale_id INTEGER PRIMARY KEY,
rateable_scale_creator TEXT,
rateable_scale_created_date TIMESTAMP,
rateable_id INTEGER,
scale_id INTEGER
);
In this example, I expect the column 'rateable_scale_created_date' to map to a datetime.datetime object in Python - which maps to a 'TIMESTAMP' type for SQLite. Once you have your column types specified correctly, you simply set up your SQLite connection in Python with a couple of extra options:
def connect_file(self, filename):
    ''' Connect to the provided DB file '''
    self.conn = sqlite3.connect(filename, detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
    # this row factory makes the results objects accessible both by index
    # and by column name
    self.conn.row_factory = sqlite3.Row
    self.connected = True
The important part of the above snippet is the connect line:
self.conn = sqlite3.connect(filename, detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
Notice the detect_types parameter. This is what instructs the sqlite3 module to magically convert date and datetime objects for you!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

I run OpenBSD on all my machines. I think its a great operating system with excellent range of features and all the components fit together nicely. One of my favourite things about OpenBSD is the highly aggressive release schedule. While a stable release is cut every 6 months, Theo is producing complete, full builds of the system for most architectures from CVS HEAD on a nearly daily basis. The entire ports tree is baked into binary packages very frequently too - although since this is much more time-consuming it is more like a full package build appears on mirrors every week or two. Such releases are called 'snapshots'.

In any case, I don't run OpenBSD 'release' or 'stable' builds on any of my machines - I run snapshots everywhere. So I am frequently downloading new snapshots. While its not exactly difficult to mirror a directory via FTP by hand, I wrote a small Python program to do it for me. The Python program has a few nice options. It defaults to using ftp.openbsd.org as the mirror, but this can be trivially overriden by the -m flag. I typically use -m rt.fm, rt.fm I have found to be an excellent mirror for the USA. The program also automatically detects the architecture of the machine you are running on - but you can override this via the -a flag. It also doesn't download the very large ISO images which are built along with the snapshots. Finally, once it has completed downloading everything, it will (if there is an MD5 file present) verify the MD5 checksums of each downloaded file.

This isn't a complicated program, but I find it useful, and I thought I'd share. Here it is in its 80-odd line entirety (or download it here).

#!/usr/bin/env python
# $Id: autosnap.py,v 1.5 2008/11/19 04:20:25 niallo Exp $

import fnmatch
import ftplib
import getopt
import hashlib
import os
import sys

MIRROR="ftp.openbsd.org"
PATH="/pub/OpenBSD/snapshots/"
DROP_DIR="."

ARCH=os.uname()[4]
# list of files not to download - globs supported
FILE_EXCEPT = ['*.iso*']

def usage():
    print >> sys.stderr, "autosnap.py [-a arch] [-d drop dir] [-m mirror] [-p path]"
    sys.exit(2)

def main():
    ftp = ftplib.FTP(MIRROR)
    ftp.login()
    ftp.cwd("%s/%s" %(PATH, ARCH))
    files = ftp.nlst()
    remove = []
    for p in FILE_EXCEPT:
        remove.extend(fnmatch.filter(files, p))
    for r in remove:
        files.remove(r)
    for f in files:
        print "fetching file %s" %(f)
        ftp.retrbinary("RETR %s" %(f), open("%s/%s" %(DROP_DIR, f), 'wb').write, 4096)
    ftp.quit()
    if 'MD5' in files:
        print "Verifying MD5sums"
        f = open("%s/MD5" %(DROP_DIR), "r")
        md5sums = {}
        for line in f:
            filename = line[line.index('(')+1:line.index(')')]
            if filename in files:
                hash = line.split('=')[1].strip()
                md5sums[filename] = str(hash)
        f.close()
        files.remove('MD5')
        good = 0
        for filename in md5sums.keys():
            f = open("%s/%s" %(DROP_DIR, filename), "r")
            d = f.read()
            f.close()
            m = hashlib.md5()
            m.update(d)
            digest = m.hexdigest()
            if digest == md5sums[filename]:
                print "%s OK" %(filename)
                good += 1
            else:
                print "%s FAIL" %(filename)
        print "%d/%d files verified OK" %(good, len(md5sums))
        if good == len(md5sums):
            sys.exit(0)
        else:
            sys.exit(1)
if __name__ == "__main__":
    try:
        opts, args = getopt.getopt(sys.argv[1:], "a:d:m:p:")
    except getopt.GetoptError:
        usage()
        sys.exit(2)
    for o, a in opts:
        if o == "-a":
            ARCH = a
        if o == "-d":
            DROP_DIR = a
        if o == "-m":
            MIRROR = a
        if o == "-p":
            PATH = a
    main()

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

Facebook apps in Python and Pylons part 2

November 22, 2008 at 08:30 PM | categories: Technical, Python | View Comments |

This article is a followup to my previous post, Facebook apps in Python and Pylons part 1. I'm going to talk a little more about what is interesting about Facebook apps and how they work in practice. At the end, I provide a little code sample and a convenience decorator to save you some hassle. Why write a Facebook app? facebook logo Even if you are pretty familiar with using Facebook, you would be easily forgiven if you didn't fully understand what the capabilities of a Facebook application are, and how the flow works. Facebook applications essentially offer you:

  • The ability to put your own content on a user's profile.
  • The ability to update a user's news feed.
While there are a bunch more things you can do with your Facebook application - as described on the official "Anatomy of a Facebook application" page on developers.facebook.com, those two things are likely the most interesting to you. How do I add content to the user's profile and to their news feed? This is the next question! Obviously, the basic answer is "by writing a Facebook app, stupid!". Of course, you're looking for a little bit more than just that. The first step is for the user to add your application. I'm about to drop a whole load of Facebook API jargon - specialised terms are highlighted in bold - the user can do this by visiting your canvas URL. You generate the canvas page from your callback URL, and include some FBML to give it an add to profile button. Once the person adds your application, Facebook will redirect them to your post add URL. From the post add hook, you can use the Facebook API to call setFBML to add content to their profile page, and publishUserAction to add stuff to their feed. Its pretty trivial - but there is an additional caveat. Before you can do anything useful in a Facebook app, you must have a valid Facebook session. Basically, you want most of your entry points to only be loaded if a user has a logged in session, and if they don't, you want them to be redirected to the login page. This ends up being a fair bit of boiler-plate code. I have written this method decorator to normalise the boiler-plate code into a single place, such that your Pylons controller methods will be handed a valid PyFacebook API object (from the PyFacebook library - see part 1). Here is an extremely basic code skeleton for a Facebook app in Pylons using my decocorator and PyFacebook:
def require_login(f):
    ''' This decorator first checks to see
       if the user is authenticated.
       If not, it redirects them
       in the appropriate fashion to the
       log in page.  If they are authenticated,
       it sets up the PyFacebook Facebook
       object and passes it down to our wrapped method. '''
    def redirect(fb, url):
        if fb.in_canvas:
            log.info("doing fbml redirect 302")
            return '' %(url, )
        else:
            log.info("sending a 302")
            response.status_int = 302
            response.headers['location'] = url
            return 'Moved temporarily'
    api_key = config['pyfacebook.apikey']
    secret_key = config['pyfacebook.secret']
    appid = config['pyfacebook.appid']
    auth_token = request.params.get('auth_token', None)
    fb = Facebook(api_key, secret_key, app_name='myapp',
             callback_path='/myapp/callback',
             auth_token=auth_token)
    if not fb.check_session(request) or not auth_token:
        log.info("got an unauthenticated session request")
        return lambda a: redirect(fb, fb.get_login_url())
    return lambda a: f(a, fb=fb)

class FacebookController(BaseController):

    def index(self):
        return 'Hello World'

    @require_login
    def post_add(self, fb=None):
        fb.auth.getSession()
        log.info("got a valid session from user %s", fb.uid)
        fb.profile.setFBML('')

    @require_login
    def callback(self, fb=None):
        c.uid = fb.uid
        return render('/canvas.fbml')

Hopefully that is enough to get you started. I'll be writing more about this subject so stay tuned. If you have any specific questions, feel free to post a comment!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

« Previous Page -- Next Page »