Recently I needed to implement my own autocomplete for a project on snagmachine.com. We had a large database of products and wanted to ease data entry by hinting to the user via autocomplete when possible. In future, we can probably just use Freebase Suggest but right now we needed our own solution.

The Pieces
Autocomplete is not too hard to understand. It is comprised of two pieces:

  • Client-side javascript
  • Backend web service

Autocomplete JavaScript Widget
The client-side JavaScript code is the thing that watches user input to a given text-entry field, and sends queries to the backend web service. If the backend has some suggestions for the user, the JavaScript then displays those hints and lets the user pick one.

Although not hard to understand, the display code and query timing bits of the client-side component make it not utterly trivial to implement. For this reason, we decided to use an existing jQuery autocomplete widget, instead of writing our own.

Choosing one was a little bit confusing, because there are around four or five distinct jQuery autocomplete widgets floating around. It took some investigation to find one which fit our needs. We ended up picking Ajax Autocomplete for jQuery by Tomas Kirda.

We chose this one because it seemed to be the only widget which easily enabled you to pass arbitrary metadata to the client JavaScript from the backend, in addition to mere text completion. So, if you need to pass along IDs or other values with your suggestions, I recommend this widget instead of some of the simpler ones.

Setting up the autocomplete widget in your JavaScript is quite straightforward:
$("#sometextfield").autocomplete({
            serviceUrl:'/api/product_autocomplete',
            onSelect: function(val, data) {
              /* Handle data here */
            },
});

Backend Web Service
Writing an autocomplete web service is pretty simple. Your entrypoint is going to accept a string of text (query) and return a set of results to be displayed to the user by the autocomplete widget. Depending on your usage, you may also wish to include some metadata along with your results - for example, the database ID of each completion, or something like that.

The service is especially trivial if you are using a database which supports an analog to the SQL LIKE/ILIKE operator which does basic wild-card text matching. I believe that all databases supported by SQLAlchemy will have this feature.

Although I happened to be using SQLAlchemy and a fairly traditional RDBMS (PostgreSQL) for snagmachine.com, something similar should be quite possible with Tokyo Tyrant and the like.

We are using Pylons and SQLAlchemy for snagmachine.com, but again, it shouldn't be much more complicated with some other web framework:

    @rest.restrict('GET')
    @jsonify
    def tag_autocomplete(self):
        if 'query' not in request.params:
            abort(400)
        fragment = request.params['query']
        keywords = fragment.split()
        searchstring = "%%".join(keywords)
        searchstring = '%%%s%%' %(searchstring)
        try:
            ac_q = Session.query(Tag)
            res = ac_q.filter(Tag.name.ilike(searchstring)).limit(10)
            return dict(query=fragment,
                    suggestions=[r.name for r in res],
                    data=["%s" %(r.name) for r in res])
        except NoResultFound:
            return dict(query=fragment, suggestions=[], data=[])


In the above code, we are using a very simple SQLAlchemy model class "Tag" which basically consists of a text `name' field:
tag_table = sa.Table("tag", meta.metadata,
    sa.Column("id", types.Integer, sa.schema.Sequence('taq_seq_id',
        optional=True), primary_key=True),
    sa.Column("name", types.Unicode(50), nullable=False, unique=True),
    sa.Column("extra", types.String),
)
class Tag(object):
    pass
orm.mapper(Tag, tag_table)

We also use the Pylons rest decorator and the Pylons jsonify decorator for convenience.


Note that in the above code, we:
  • Use the ilike operator
  • Use wildcards at the beginning and end of the string
  • Replace whitespace with wildcards

We've found this mode to give us the best user experience, however there are performance implications. PostgreSQL at least can only utilise text indexes for LIKE, and furthermore only if the wildcards are suffixes [This email from Tom Lane has the details].

While using the index does yield about an order of magnitude difference in query response time, we are talking about 0.1 ms vs 1.0 ms with our dataset. For our use case, this is perfectly acceptable!

So, thats pretty much everything there is to it. Hope this article helps!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

blog comments powered by Disqus