Author Archive

cron considered harmful

June 22nd, 2010 08:19
On numerous occasions I have lamented both the design and the typical usage of cron to friends and other geeks. My main gripes:
  • jobs are not protected against running concurrently (design issue)
  • intervals are fixed (design issue)
  • people tend to schedule the same job at the same time on whole clusters of machines (typical usage issue)
The first issue can be fixed with tools like lockfile and setlock — a lot of red tape for something that should be a default feature.

The second and third issue are closely related in that both cause undesirable load spikes because many things happen at the same time, either because of intervals phasing up or because of similar jobs running on a bunch of machines at the same time, perhaps hammering the network.

A specific pet peeve is Mailman Reminder Day. Firstly, I just don’t see the point; if my address is on a mailing list and that list is practically dead, I just don’t care. Secondly, it means every first of the month I have tens of reminders that I just delete. Some of these lists are busy — for those, the reminder is a minor nuisance. But many lists are extremely quiet (think software release announcements) and for some of these lists, the reminders are over 50% of the total mail volume. It’s so wasteful. Also, I can’t help but think that all those reminders being sent out at the same time (well, divided over 24 hours because of time zone differences) cannot be good for the mail ecosystem as a whole.

For issues two and three, Colm MacCárthaigh wrote a few great posts detailing why cron is bad, and showcasing one potential solution to the issues at hand. I suggest reading these posts fully, they are very insightful:

This post was inspired by my pet peeve about Mailman and about jobs running in parallel unintendedly; this post was triggered by Job Snijders pointing me to another interesting post; Colm’s posts above were referred to in the comments.

silly Python unicode mistake

June 12th, 2010 08:58
For a simple blog-to-twitter posting gateway (source code) I’m relying on the excellent feedparser and twitter modules, and I am trusting them to handle unicode strings without trouble. With most well-written Python modules (and these two are no exception!) methods will return unicode strings as they see fit, and other methods will accept these unicode strings and handle all the nitty gritty encoding details for me.

A simplified version of my workflow would look like this:

def post(entry):
  title = entry.title
  print "posting [%s]" % title
  api.PostUpdate(title) # api is a twitter Api object

feed = feedparser.parse(config["feed"])
for e in reversed(feed.entries):
  if not e.id in seen:
    post(e)
This code bombed out with an exception on the first post that had a non-ASCII title. Can you spot why?

It’s the print statement. All the APIs I’m using have zero trouble with unicode, but print wants to encode for your terminal and it’ll usually assume that that is ASCII. My ‘debugging’ output actually broke the program. My workaround is to say title.encode("ascii","replace")

Brend on #python pointed out to me that the issue is not, exactly, print. The issue is interpolating title into a non-unicode string. Depending on environment, using print on the unicode object might in fact work. For those environments, saying print u"posting [%s]" % title could help. In my case however, I ran into the issue from cron with no locale set at all, so dumbing the string down to ascii is still the right thing to do.

Mac desktop Twitter client roundup

June 11th, 2010 14:10
I’ve been a Tweetie 1/Mac user for a long time now, and it has suited my needs well. However, there are minor annoyances. After Twitter acquired Tweetie developer Atebits, posts from Tweetie/Mac started showing up as posted from ‘Twitter for iPhone’. Also, Tweetie has semi-excellent keyboard navigation support but it’s too buggy, making me grab my mouse/touchpad in annoyance very frequently. Given the sheer amount of Twitter clients available, I asked my friends on Twitter and Facebook to suggest some options. Here are my biased opinions on the suggestions. I would like to thank all who contributed!

Kiwi

Kiwi looks and feels like it really wants to be Tweetie - but it’s nowhere close. Scrolling is jerky and the free version is much more annoying than the occasional ad in Tweetie.

Echofon

Echofon was recommended to me as ‘very minimal yet complete’ and it is. For single account usage, Echofon is basically on par with Tweetie for usability, including keyboard navigation. However, multiple account support seems hackish - events in one account are very hard to notice when focused on another account. This is one thing Tweetie gets absolutely right.

If I only had one account to manage, I could seriously consider switching to Echofon.

Nambu

Nambu is a bit fuller-featured in the UI department than the clients above, including Tweetie. Keyboard navigation is better than Tweetie (although it lacks some shortcuts) and about as buggy. Like Echofon, single account usage is great, and Nambu has some features I really dig, like thumbnails on twitpic-posts. Multiple account support in Nambu is excellent, with a choice between combining all timelines, and having them separated with use of a sidebar. Enabling the sidebar pops out an interface that resembles a cross between Tweetie and Mail.app, very clean.

If Echofon or Tweetie seem limited or minimal, Nambu is a great choice.

Note that adding an account to Nambu means you suddenly follow two extra accounts (@nambu and the developer). The checkbox to disable this behaviour doesn’t seem to work. Spammy if you ask me.

TweetDeck

On first startup, TweetDeck is very daunting. It more than fills my 13″ MacBook screen which is too much. I know that TweetDeck is -very- configurable but I did not feel up to the task. Additionally, TweetDeck notifications are not handled via Growl.

Yoono

Yoono, like TweetDeck, immediately took over my screen with absolutely nothing, asking me to configure my ‘columns’ from scratch. Too much hassle.

Conclusion

After trying all these clients I went back to Tweetie. It may not be perfect but I’m too used to it. I’ll probably switch to Nambu sometime soon. Both Echofon and Nambu I will recommend to people when asked for Twitter client advice. Update July 12th: I’ve switched to Nambu and I’m not switching back. Keyboard navigation is excellent (although a bit buggy at times). I do miss Tweetie’s Command-U to look at a specific user, but otherwise Nambu seems superior in many ways. I’ve updated the Nambu review above to account for bugfixes they made since.

a snappy definition

May 31st, 2010 15:06
snappy comeback, n.:
a witty retort designed to make the audience go ‘oh, SNAP’
Also – sometimes you don’t think of a snappy comeback until hours later. Is there a word for that?

Unix trivia: echo cat | sed statement

May 24th, 2010 09:48
I occasionally run into weird, funny or surprising aspects of Unix and Unix-like Operating Systems. I'll try to post about those regularly because they're fun.

This one popped up on irc a few weeks ago; a google search suggests it's at least 6 months old. I have been unable to track the original source down.

The question is: if you input echo cat | sed statement into your Unix shell, what comes out? Can you predict it? If not, can you try it and then explain it?

Ubuntu 9.10 to 10.04: worst upgrade experience ever.

May 18th, 2010 23:06
I'm used to smooth upgrades of both single packages and full distributions, with Debian and Ubuntu. Not so for this upgrade from 9.10 to 10.04. I'll sum up the failures I encountered and my suggested fixes:
  • after upgrading+rebooting, your native IPv6 is suddenly replaced by a freenet tunnel. Cause: you installed gw6c on 9.10 which came with no configuration; 10.04 updated the configuration to automatically use freenet. Fix: remove gw6c package.
  • you're using the excellent dovecot-postfix package, and after the upgrade, no mail comes in. Cause: cmusieve is gone. Fix: change dovecot config to use sieve.
  • lighttpd will not start, complaining about port 80. Cause: lighttpd ipv6 is basically broken in 10.04. Known issue, not fixed. Fix: add server.use-ipv6="enable" to lighttpd.conf
  • dovecot is not listening on v6 anymore. Cause: the upgrade messed up your listen directives. Fix: add listen = *, [::] to dovecot.conf
Extra free tip: get rid of some old kernels, they tend to eat a lot of diskspace with their modules.

Choosing a database for your project

April 29th, 2008 16:36

When you are designing a project with complex storage requirements and some demands on reliability and performance, a few options come to mind.

Even though at OpenPanel we have strong feelings about NIH, we didn’t think writing our own database store was the way to go. Many smart people have already written many different database backends, which means that in both quantity and quality the database area is well-covered. So, writing our own was right out.

The second option that comes to mind is to use the most basic kind of database available: a key/value store like Berkeley DB or GDBM. Combining a few key/value stores together yields a lot of flexibility, but there’s a lot of glue you need to write, then. In programming language terms, the key/value API is not powerful.

The only way out then seems to be SQL. The common choice seems to be MySQL, and with good reason - it’s robust, fast, flexible (supporting most of the SQL standard), comes pre-packaged for any distribution, and just about any sysadmin or PHP-developer you run into knows his way around it more or less. A close second would be Postgres, less popular with the common PHP developer crowd, more popular with seasoned developers in other languages, and understandably so.

However, we did not go for the path well-traveled. After evaluating our requirements, we realized we barely needed the power of a client/server model database with high concurrency support. Also, we figured choosing a slightly less popular implementation would deter people from messing with the database by hand.

We chose SQLite. It’s extremely lightweight, reliable, robust and surprisingly fast. Choosing SQLite means our database is just a file (directory, to be honest) in /var/opencore, where users and admins can’t run into it by accident while mucking about in phpMyAdmin - but when they really need to mess with the database, they can, with their familiar SQL idioms.

SQLite has most the features a developer would expect from a database; transactions, subqueries, decent indexing support, triggers, and room for extension with user defined functions. The only thing sorely missing is foreign key support, but that’s easily implemented as a bunch of triggers.

Even though we now had this powerful database engine, we put in a lot of effort mapping our idea of an object revision model onto it (more about that in a later post) - but SQLite made it a lot less painful.

SQLite’s only real limitation seems its lack of concurrency, which is made worse by the locking model used that seems to invite polling for access instead of forming some kind of queue. For this reason, OpenPanel keeps just one handle to the database file and manages exclusive and shared access to it via the normal pthread mechanisms.

Incidentally, the current (unpublished) version of our automated web application installer tries to shoehorn its data into a key/value store and it’s hurting - even from Python! We’ll probably rewrite those bits to use SQLite as well (but, of course, separate from the OpenPanel database).

Summarizing, if you are looking for a data store for your software project, consider SQLite. It’s as lightweight as most key/value store libraries but throws in a hell of a lot more featurewise.

scrolling in large resultsets with SQL

October 8th, 2007 08:48

This article on the SQLite wiki details how to efficiently maintain a small window on a big dataset without sacrificing performance, in environments that cannot maintain actual cursors, or where doing so is prohibitively expensive. Rule 2 in the article applies to other databases too; the method detailed in the article applies to any database in which keeping cursors is either impossible or expensive.

The article seems obvious; but I had not actually thought of that approach myself, so far.

finding dangling symbolic links

August 15th, 2007 20:41

find -L . -type l

Replace . with the directory you want to investigate, of course. Works with both GNU and (Free)BSD find. Rationale: -L makes find act on the (transitive closuse of the) target of a symlink - unless the target doesn’t exist, in which case find acts on the symlink itself. So, if the item to act on turns out to be a symlink, it has to be dangling.

HTTP on Unix sockets with Python

August 15th, 2007 15:55

Initially I had a more elaborate version based on the exact connect() code in the higher class, but this simpler version works just fine. Incidentally, what the xen xm commandline tool uses works identically ;)

class UHTTPConnection(httplib.HTTPConnection):
    """Subclass of Python library HTTPConnection that
       uses a unix-domain socket.
    """
 
    def __init__(self, path):
        httplib.HTTPConnection.__init__(self, 'localhost')
        self.path = path
 
    def connect(self):
        sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        sock.connect(self.path)
        self.sock = sock

This small hack plus the fine PHP serialization classes from Scott Hurring are the basis of OpenPanel.coreclient, which allows for very easy provisioning and querying of OpenPanel/opencore data. We use it mostly for unit testing.