Read a file line by line in C - secure fgets idiom

October 03, 2009 at 02:57 PM | categories: Technical, C, UNIX | View Comments |

A pretty common thing to do in any program is read a file line-by-line. In other interpreted or managed languages this is trivial, the standard libraries will make it super easy for you. Just look at how simple it is to do this in Python or Perl or even Shell. In C its a little more complicated, because you have to think about how much memory you need up front, and also the standard library is kind of crufty. You always have to worry about overflowing buffers or dereferencing a NULL pointer. buffer-overflow However, there is a nice libc function available in BSD-derived platforms (including Mac OS X) - fgetln(3). This function makes it nice and easy to read arbitrary-length lines from a file in a safe way. Unfortunately, its not available in GNU libc - that is to say, if you use this function, your program won't compile on Linux. Its not a trivial libc function to port - unlike say strlcpy - since it relies on private details of the FILE structure. These private details don't happen to be the same in glibc, so it doesn't work out of the box. While GNU libc doesn't provide fgetln(3), it does provide its own similar function, getline(3). Of course, if you use this function - which is a bit uglier than fgetln(3) in my opinion - your program won't work on BSD libc systems. So basically, neither of these functions are usable if you want your program to be reasonably portable. Pretty much everything I write in C I want to work on at least Linux, the BSDs, and Mac OS X. You could write your own line reading function on top of ANSI C. Or you might be able to get away with using the existing ANSI function, fgets(3). You need to be careful with fgets, however. You can easily introduce bugs if you aren't careful to cover all the error cases. The other big problem with fgets is that you need to know the maximum length of lines you are going to read in advance, otherwise you'll end up with truncation. In most applications, you can get away with a kilobyte or two on the stack for each line and be ok. In some places, it could be a deal killer though. Anyway, here is a good idiom for using fgets:

char buf[MAXLINELEN];
while (fgets(buf, MAXLINELEN, ifp) != NULL) {
    buf[strcspn(buf, "\n")] = '\0';
    if (buf[0] == '\0')
        continue;
}
The explanation of why you use strcspn() can be found in the OpenBSD manual page.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

tmux, a BSD alternative to GNU Screen

June 04, 2009 at 08:25 PM | categories: Technical, C, UNIX | View Comments |

I started using tmux today. It's a terminal multiplexer / task switcher for UNIX-likes, very much in the same vein as GNU Screen. However, it's a from-scratch implementation, designed to be clean, sane and easy to configure. The more liberal 3-clause BSD license is a plus also, since it means that OpenBSD has been able to integrate it into the source tree, so that it's available out of the box.

Comparison with GNU Screen
I've been a heavy screen user for many years - almost all my work is done on remote screen sessions. However, screen configuration has always been essentially black magic to me. For this reason, tmux and its nice manual page is a breath of fresh air. `tmux list-commands' is very straight forward and easy to grok. Furthermore, I like that everything in tmux is scriptable from the command line - you can run commands like `tmux resize-pane-up -t comms' to resize the pane on a session called 'comms'.

The other thing I really like about tmux is its default status bar. Some people might hate this, but I find it very useful to have a clock and a list of windows along with the process executing in them. This took quite some work to set up to my liking in GNU screen, but the default in tmux is great.

My config
One thing I don't much like is the default of C-b as the 'prefix' command. I suppose this makes some sense, since the author doesn't want to clobber GNU screen key bindings. Perhaps he will consider changing it to C-a, like in GNU screen, in the future. In any case, this isn't hard to change. Also, I am constantly using C-a C-a to switch back to the previous window - the default for this action in tmux is C-b l. Much less friendly in my opinion - of course, it's also easy to change!

So here are the contents of my $HOME/.tmux.conf:

set -g prefix C-a
bind-key C-a last-window

Getting tmux
I'm sure that packages exist for most operating systems. You can grab the source from http://tmux.sourceforge.net/. On OpenBSD, you can simply run `pkg_add -i tmux' to get the binary on your system.

UPDATE
Since OpenBSD 4.6, tmux is part of the base system. This means that if you are running OpenBSD 4.6 or later, you don't need to install any packages in order to get tmux.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

My C BitTorrent implementation, Unworkable, used to be hosted on an anonymous CVS repository I had running on my server at home. This was fine, until I reinstalled the machine from scratch and didn't feel like setting up the whole anonymous CVS access again. Its a pretty painful process, unfortunately, although there is this guide to setting up anonymous CVS. No public VCS, bad for an Open Source project So, for a while, there was no public source control for Unworkable, which sucked. Its difficult and cumbersome for other developers to write diffs and track changes without one. While I generally like to maintain full control of the source code hosting, I've had good experiences with Google Code before. They don't show ads, and their interface is clean and to the point, unlike say SourceForge, which is covered in ads and various nonsense. Anyway, I had a CVS repository and I first wanted to convert it to Subversion, including all the history, then I wanted to import that into Google Code. Converting from CVS to Subversion The first thing to do was to convert my existing CVS repository to Subversion. There is a nice tool specifically for this, cvs2svn. It is in fact very easy to use, at least in my basic case - I only work with HEAD or in SVN terminology, trunk. I simply ran:

cvs2svn --trunk-only --svnrepos ~/unworkable-svn ~/unworkable-cvs
Et voila, I have a shiny new Subversion repository in ~/unworkable-svn, with full history. Importing Subversion repository to Google Code Google Code lets you import an existing Subversion repository pretty easily, as long as you have an empty project. When your Google Code project is created, it will be set to revision 1. In Subversion-land, revision 0 is sort of magic, and so you will need to overwrite it to properly import your existing repository. Google give you a place to do this, but its slightly confusing because they don't put it under 'Administer'. In order to reset your repository you must:
  • Log into your Google Code project with administrator privileges.
  • Browse to the 'Source' page (either 'Checkout' or 'Browse' but not 'Changes').
  • Scroll to the bottom of the page, and click the 'reset this repository' link, which is sort of hidden.
  • Choose the option "Did you just start this project and do you want to 'svnsync' content from an existing repository into this project?"
  • Click the big "Reset Repository" button which has a big red warning label beside it.
Now you are ready to import your repository. You will use the 'svnsync' tool included in Subversion 1.4 and up to do this. There are two commands, one which takes a path to both the Google Code repository and your repository, and another which takes just a path to the Google Code repository. The first one will run quite quickly, the second one can take a while as it imports each individual revision,
# This command takes both the path to your Google Code repository
# and the path to the repository you want to import
svnsync init --username YOURUSERNAME 
    https://YOURPROJECT.googlecode.com/svn file:///path/to/localrepos
# This command takes just the path to the Google Code repository.
# It will take a while to complete.
svnsync sync --username YOURUSERNAME https://YOURPROJECT.googlecode.com/svn
Once you've done that, your code is imported. Enjoy!

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

mkpath() - `mkdir -p' alike in C for UNIX

January 08, 2009 at 07:11 PM | categories: Technical, C, UNIX | View Comments |

Most people are probably familiar with the UNIX utility, mkdir(1). The mkdir utility makes directories (surprise surprise). There is a matching mkdir(2) system call available in the POSIX standard C library. The usage is pretty straightforward - how ever, the command-line executable, mkdir(1), supports a useful option -p to "create intermediate directories as required". Its very convenient to run `mkdir -p' on a long path before copying things or whatever, since you don't have to worry about the directory structure not existing. However, the mkdir(2) library function doesn't support an analogous mode. If you want to recursively create all the intermediate directories in a path in your program, you must implement this yourself. I've used this same function in at least three distinct projects now and so I decided to post the code:

/* Function with behaviour like `mkdir -p'  */
int
mkpath(const char *s, mode_t mode){
        char *q, *r = NULL, *path = NULL, *up = NULL;
        int rv;

        rv = -1;
        if (strcmp(s, ".") == 0 || strcmp(s, "/") == 0)
                return (0);

        if ((path = strdup(s)) == NULL)
                exit(1);
     
        if ((q = strdup(s)) == NULL)
                exit(1);

        if ((r = dirname(q)) == NULL)
                goto out;
        
        if ((up = strdup(r)) == NULL)
                exit(1);

        if ((mkpath(up, mode) == -1) && (errno != EEXIST))
                goto out;

        if ((mkdir(path, mode) == -1) && (errno != EEXIST))
                rv = -1;
        else
                rv = 0;

out:
        if (up != NULL)
                free(up);
        free(q);
        free(path);
        return (rv);
}
UPDATE 2010-05-19 These are the includes you need:
#include 
#include 
#include 
#include 
#include 

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments