Recently I needed to implement my own autocomplete for a project on snagmachine.com. We had a large database of products and wanted to ease data entry by hinting to the user via autocomplete when possible. In future, we can probably just use Freebase Suggest but right now we needed our own solution.

The Pieces
Autocomplete is not too hard to understand. It is comprised of two pieces:

  • Client-side javascript
  • Backend web service

Autocomplete JavaScript Widget
The client-side JavaScript code is the thing that watches user input to a given text-entry field, and sends queries to the backend web service. If the backend has some suggestions for the user, the JavaScript then displays those hints and lets the user pick one.

Although not hard to understand, the display code and query timing bits of the client-side component make it not utterly trivial to implement. For this reason, we decided to use an existing jQuery autocomplete widget, instead of writing our own.

Choosing one was a little bit confusing, because there are around four or five distinct jQuery autocomplete widgets floating around. It took some investigation to find one which fit our needs. We ended up picking Ajax Autocomplete for jQuery by Tomas Kirda.

We chose this one because it seemed to be the only widget which easily enabled you to pass arbitrary metadata to the client JavaScript from the backend, in addition to mere text completion. So, if you need to pass along IDs or other values with your suggestions, I recommend this widget instead of some of the simpler ones.

Setting up the autocomplete widget in your JavaScript is quite straightforward:
$("#sometextfield").autocomplete({
            serviceUrl:'/api/product_autocomplete',
            onSelect: function(val, data) {
              /* Handle data here */
            },
});

Backend Web Service
Writing an autocomplete web service is pretty simple. Your entrypoint is going to accept a string of text (query) and return a set of results to be displayed to the user by the autocomplete widget. Depending on your usage, you may also wish to include some metadata along with your results - for example, the database ID of each completion, or something like that.

The service is especially trivial if you are using a database which supports an analog to the SQL LIKE/ILIKE operator which does basic wild-card text matching. I believe that all databases supported by SQLAlchemy will have this feature.

Although I happened to be using SQLAlchemy and a fairly traditional RDBMS (PostgreSQL) for snagmachine.com, something similar should be quite possible with Tokyo Tyrant and the like.

We are using Pylons and SQLAlchemy for snagmachine.com, but again, it shouldn't be much more complicated with some other web framework:

    @rest.restrict('GET')
    @jsonify
    def tag_autocomplete(self):
        if 'query' not in request.params:
            abort(400)
        fragment = request.params['query']
        keywords = fragment.split()
        searchstring = "%%".join(keywords)
        searchstring = '%%%s%%' %(searchstring)
        try:
            ac_q = Session.query(Tag)
            res = ac_q.filter(Tag.name.ilike(searchstring)).limit(10)
            return dict(query=fragment,
                    suggestions=[r.name for r in res],
                    data=["%s" %(r.name) for r in res])
        except NoResultFound:
            return dict(query=fragment, suggestions=[], data=[])


In the above code, we are using a very simple SQLAlchemy model class "Tag" which basically consists of a text `name' field:
tag_table = sa.Table("tag", meta.metadata,
    sa.Column("id", types.Integer, sa.schema.Sequence('taq_seq_id',
        optional=True), primary_key=True),
    sa.Column("name", types.Unicode(50), nullable=False, unique=True),
    sa.Column("extra", types.String),
)
class Tag(object):
    pass
orm.mapper(Tag, tag_table)

We also use the Pylons rest decorator and the Pylons jsonify decorator for convenience.


Note that in the above code, we:
  • Use the ilike operator
  • Use wildcards at the beginning and end of the string
  • Replace whitespace with wildcards

We've found this mode to give us the best user experience, however there are performance implications. PostgreSQL at least can only utilise text indexes for LIKE, and furthermore only if the wildcards are suffixes [This email from Tom Lane has the details].

While using the index does yield about an order of magnitude difference in query response time, we are talking about 0.1 ms vs 1.0 ms with our dataset. For our use case, this is perfectly acceptable!

So, thats pretty much everything there is to it. Hope this article helps!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

Pylons tip #5 - Streaming static files

November 04, 2009 at 07:15 PM | categories: Technical, Python | View Comments |

Pylons makes it super easy to return data to a client. You just return a string from your controller method!

class HelloController(BaseController):

    def index(self):
        return 'Hello World!'

Very nice. However, what if you want to serve up a potentially quite large file to the client? Sure, you could read the file into memory, and then return the entire buffer, but that is not very efficient. If you have a multi-megabyte file, you end up wasting lots of memory. What you want to do, actually, is read a chunk of the file at a time, and then send that. So instead of reading the entire file into memory and returning it in a single go, you do lots of little chunks. Simple conceptually. How do you do this in Pylons? Thankfully you can do this with a Python generator. Instead of returning a buffer, you return a generator:
class ImageController(BaseController):

    def index(self, id):
        ''' Stream local image contents to client '''
        try:
            imgf = open("%s/%s" %(config['image_data_dir'],
                os.path.basename(id)), 'r')
        except IOError:
            abort(404)
        def stream_img():
            chunk = imgf.read(1024)
            while chunk:
                yield chunk
                chunk = imgf.read(1024)
            imgf.close()
        return stream_img()
This works quite nicely. Hope that helps!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

Simple Python Twitter Search API Crawler Class

September 27, 2009 at 03:55 PM | categories: Python | View Comments |

I've been getting into Twitter (I'm @niallohiggins btw) a bit recently. One of the things I wanted to do was write a little program to periodically search for a specific tag and then process the results. The Twitter Search API is very easy to use, even if there are some annoying issues. Here is a very simple class I wrote to issue searches and return the results. It also keeps track of the last high water mark (max_id) of the previous search, so you hopefully won't get the same results twice - although you still want to code defensively for that in case there is a bug in Twitter. Feel free to use this code yourself. Note that you'll have to implement your own 'submit' method.

import httlib
import json
import logging
import socket
import time
import urllib

SEARCH_HOST="search.twitter.com"
SEARCH_PATH="/search.json"


class TagCrawler(object):
    ''' Crawl twitter search API for matches to specified tag.  Use since_id to
    hopefully not submit the same message twice.  However, bug reports indicate
    since_id is not always reliable, and so we probably want to de-dup ourselves
    at some level '''

    def __init__(self, max_id, tag, interval):
        self.max_id = max_id
        self.tag = tag
        self.interval = interval
        
    def search(self):
        c = httplib.HTTPConnection(SEARCH_HOST)
        params = {'q' : self.tag}
        if self.max_id is not None:
            params['since_id'] = self.max_id
        path = "%s?%s" %(SEARCH_PATH, urllib.urlencode(params))
        try:
            c.request('GET', path)
            r = c.getresponse()
            data = r.read()
            c.close()
            try:
                result = json.loads(data)
            except ValueError:
                return None
            if 'results' not in result:
                return None
            self.max_id = result['max_id']
            return result['results']
        except (httplib.HTTPException, socket.error, socket.timeout), e:
            logging.error("search() error: %s" %(e))
            return None

    def loop(self):
        while True:
            logging.info("Starting search")
            data = self.search()
            if data:
                logging.info("%d new result(s)" %(len(data)))
                self.submit(data)
            else:
                logging.info("No new results")
            logging.info("Search complete sleeping for %d seconds"
                    %(self.interval))
            time.sleep(float(self.interval))

    def submit(self, data):
        pass

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

Last month I started Py Web SF, the San Francisco Python & Web Technology meet-up. The idea is 1-2 conversation-style presentations of about 30 minutes with a group of 10-20 people. My hope is to have a more intimate group than the very good Bay Piggies (which I highly recommend). With a small group, it is possible to have more interaction, discussion and collaboration. In a typical lecture/audience format, people unfortunately tend to switch into "passive listener" mode.

pywebsf

June meet-up
Anyway, the first meet-up went extremely well - we had 15 people show up, which was a perfect number for the space. Shannon -jj Behrens gave an excellent talk on building RESTful Web services and Marius Eriksen - in fact a colleague from the OpenBSD project - gave an awesome talk on GeoDjango. Slides for both talks are online, of course.

July meet-up
Metaweb Technologies presenting a comparison of Django and Pylons. Then we have Alec Flett, another Metaweb'er, speaking about all the issues involved in scaling Python web applications.

Check it out
If you are interested in checking out the event, its July 28th, 6pm @ SF Main Public Library’s Stong Room. Full details can be found at pywebsf.org. Or if you are interested in giving a talk, just let me know.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

Turbo Gears 2.0 Released

May 27, 2009 at 06:30 PM | categories: Technical, Python | View Comments |

Turbo Gears 2.0

I read today that Turbo Gears 2.0 has been released - at long last! I used Turbo Gears 1 briefly in 2007 for a small project then switched to Pylons.

Pylons is pretty neat because its really a framework for building a framework. You can pick and choose WSGI middleware and slot it all together with whatever templating engine or database abstraction layer you like. Pylons just gives you most of the glue you'd need - stuff like unit and functional test harnesses, request routing, caching, and various handy decorators. The Pylons approach has some disadvantages however, since its not quite as well integrated as say Django, which has more of a one-size-fits-all, monolithic approach. Pylons also has excellent documentation.

Its interesting that Turbo Gears 2.0 is built on top of Pylons. It seems to aim to provide you with a more consistent out-of-the-box solution - they standardise on one templating language (Genshi), one database abstraction layer (SQLAlchemy), etc.

It will be interesting to see where Turbo Gears goes from here. It could be quite compelling if the community catches on, but it seems to me that its a little late to the game. I'm not certain what jumping to Turbo Gears 2.0 from Pylons would buy me at this point, and I'd imagine many people feel the same way.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

« Previous Page -- Next Page »