On requiring English in a free software project

[[!tag ]]

This week's issue of LWN has a quote by Linus Torvalds on translating kernel messages to something else than English. He's against it:

Really. No translation. No design for translation. It's a nasty nasty rat-hole, and it's a pain for everybody.

There's another reason I fundamentally don't want translations for any kernel interfaces. If I get an error report, I want to be able to just 'git grep" it. Translations make that basically impossible.

So the fact is, I want simple English interfaces. And people who have issues with that should just not use them. End of story. Use the existing error numbers if you want internationalization, and live with the fact that you only get the very limited error number.

I can understand Linus's point of view. The LWN readers are having a discussion about it, and one of the comments there provoked this blog post:

It somewhat bothers me that English, being the lingua franca of of free software development, excludes a pretty huge parts of the world from participation. I thought that for a significant part of the world, writing an English commit message has to be more difficult than writing code.

I can understand that point of view as well.

Here's my point of view:

It is entirely true that if a project requires English for communication within the project, it discriminates against those who don't know English well.
Not having a common language within a project, between those who contribute to the project, now and later, would pretty much destroy any hope of productive collaboration.

If I have a free software project, and you ask me to merge something where commit messages are in Hindi, error messages in French, and code comments in Swahili, I'm not going to understand any of them. I won't merge what I don't understand.

If I write my commit messages in Swedish, my code comments in Finnish, and my error messages by entering randomly chosen words from /usr/share/dict/words into search engine, and taking the page title of the fourteenth hit, then you're not going to understand anything either. You're unlikely to make any changes to my project.

When Bo finds the project in 2038, and needs it to prevent the apocalypse from 32-time timestamps ending, and can't understand the README, humanity is doomed.

Thus, on balance, I'm OK with requiring the use of a single language for intra-project communication.
Users should not be presented with text in a language foreign to them. However, this raises a support issue, where a user may copy-paste an error message in their native language, and ask for help, but the developers don't understand the language, and don't even know what the error is. If they knew the error was "permission denied", they could tell the user to run the chmod command to fix the permissions. This is a dilemma.

I've solved the dilemma by having a unique error code for each error message. If the user tells me "R12345X: Xscpft ulkacsx ggg: /etc/obnam.conf!" I can look up R12345X and see that the error is that /etc/obnam.conf is not in the expected file format.

This could be improved by making the "parameters" for the error message easy to parse. Perhaps something like this:

R12345X: Xscpft ulkacsx ggg! filename=/etc/obnam.conf

Maintaining such error codes by hand would be quite tedious, of course. I invented a module for doing that. Each error message is represented by a class, and the class creates its own error code by taking the its Python module and class name, and computing and MD5 of that. The first five hexadecimal digits are the code, and get surrounded by R and X to make it easier to grep.

(I don't know if something similar might be used for the Linux kernel.)
Humans and inter-human communication is difficult. In many cases, there is not solution that's good for everyone. But let's not give up.