Loud ramblings of a Software Artisan

Sunday 29 January 2006

Postfix

By a snowy evening, I just finished reading Postfix, the definitive guide, by Kyle D. Dent.

I have two regrets about that book:

  1. I haven't read it earlier.
  2. I don't own it.

This book answers most questions you have about Postfix, how to use it and how to configure it.

I first installed Postfix in 1999 to start replacing the old Sendmail on the FreeBSD boxes we were using as SMTP gateway to feed incoming mail to the e-mail system that was choosen. It is way easier to use than Sendmail as you don't have to learn the turing complete syntax of sendmail.cf, and it fits the bill for 99% of the use cases (I still don't see the 1% missing).

Optimizing AbiWord, part 1

AbiWord is know for its speed and its manageable size. But AbiWord also have some real bottlenecks in the PieceTable, that data structure used to store the text and everything. The PieceTable is a seriously obscure piece of software, and it suffer some design flaw, either internal, or external because of the framework we use (that have been developed along the years). These flaws have been investigated a little, but not really addressed.

So I'll start with a simple case: loading a RTF with a lot of tables. We know that this kind of document take some time to load and display, and that document will actually come from one of our bug reports. And I will use the command line text converter to just measure the loading and not have the layout engine being sollicited. That will come later.

The tools I will use to analyse are callgrind and KCachegrind. callgrind is valgrind tool that performs profiling a bit like gprof, but at run time, not requiring to rebuild with the -pg C compiler flag. KCachegrind is its companion application, part of KDE, that provides a graphical representation of the callgrind's output. It help visualising the profiling results. I will rebuild AbiWord without debugging, because all the debugging messages just impact performance too much and will probably pollute the results, not counting the risk of asserts.

The first run is done this way:

$ callgrind abiword foo.rtf --too=foo.txt

I then open the callgrind.out.$! file in KCachegrind. I sort the list of function called on the "self" column, because what matters is the time spent in the function itself. We want to identify bottlenecks, and that is where they are. Sorting by "Incl." will just make the main() the top of the list. Here is what the list gives:

  1. is UT_GenericVector<>::addItem()
  2. is pf_Fragments::cleanFrags().
  3. is somewhere in the libc.

What shows up is that cleanFrags() accounts for 6.65% of the execution time and only 1,430 calls while addItem() has been for over 48,000 accounting for just 7.38% of the execution time. Let's measure that execution time. I'll do 3 rounds and only retain system and user time:

./src/wp/main/unix/AbiWord-2.6  --to=foo.txt  14.10s user 0.87s system 77% cpu 19.236 total
./src/wp/main/unix/AbiWord-2.6  --to=foo.txt  14.11s user 0.82s system 99% cpu 15.011 total
./src/wp/main/unix/AbiWord-2.6  --to=foo.txt  13.47s user 0.85s system 96% cpu 14.845 total

Note that 3 round is not statistically enoug, but I'll do approximations, that will be good enough.

Now let's look at the callee graph in KCachegrind:

We see that most of the time is spent in UT_Vector::addItem() and mostly in UT_Vector::grow(). This last method reallocate the internal vector buffer to allow storing more items. That is a good clue. Maybe we should just size the vector better or something. So let's have a look at the source. We see that we call UT_Vector::clear() and then re-add items into it. That is wrong. UT_Vector::clear() empty the vector. Most of the time we do that to add another bunch of items. Looking at the source of UT_Vector reveals that UT_Vector::clear() free the buffer, and we don't want to do that.

Here is the patch. Basically we just set the size of the vector to 0 and set the buffer to 0. This latest is needed because we assume that empty slots are set to 0. This is not an adequate design, but we have to deal with it.

And voila. First optimization. Here are the results:

We just realize that UT_Vector::grow() no longer gets called. That is a lot of time saved. Unfortunately the callee list is not accurate as apparently I am having some issues when calling callgrind. With or without the patch I get an error that seems to interrupt callgrind. So I won't post it here.

Time measurement give this:

./src/wp/main/unix/AbiWord-2.6  --to=foo.txt  12.58s user 0.57s system 59% cpu 22.274 total
./src/wp/main/unix/AbiWord-2.6  --to=foo.txt  12.28s user 0.45s system 98% cpu 12.884 total
./src/wp/main/unix/AbiWord-2.6  --to=foo.txt  12.16s user 0.43s system 99% cpu 12.669 total

We saved a couple of seconds with that patch. Still pf_Fragments::cleanFrags() is still called too expensive, but I don't see what else could be done locally. I have to analyse what it does, why do we need it, and what can we do to avoid it.

Wednesday 25 January 2006

My latest device running Linux

Since my current DSL modem was crashing at least once a day, I decided to go and replace it after only 14 months. I didn't have much choice available on the market, but I purchased a D-Link DSL-300T (not available in USA). Surprisingly (or not) it runs Linux as firmware and when you open the box, the first thing you see is a printed GPL license.

I haven't loaded the source code yet. I also wonder about porting (not myself) OpenWRT to it (it does not do wireless but it does ADSL). Someof the ADSL modems are listed as work in progress (actually the 300T is listed somewhere).

Missed oportunity?

Did we miss the opportunity to get the Nvidia driver open source because of GPL violation? That was in 2000...

Tuesday 24 January 2006

All your rights are belong to us

Looks like it is DRM (Digital Restriction Management as says the GPLv3) season, with a deal of bad news.

When RIAA and MPAA want to control consumer electronics or when the broadcast flag is back, both being closely related. But now there is an attempt to control the analog media. That show that 2006 will need a lot of fight to preserve users rights.

Friday 20 January 2006

Distros news bulletin

Some may have noticed the announcement, but Mandriva and HP are going to spread Linux in latin America. That is a first step in the good direction.

An uncommon target for Linux: writers. It is the target of Ghostwriter. Surprisingly, the word processor they provide is LyX. Back in the day, I used it, and I liked it.

Quote of the day

We can't separate the software from the hardware, we would be breaking the Microsoft - IBM agreement. -- IBM/Lenovo Canada Customer Relation representative when I requested the refund of the Windows operating system that came with my computer, software I don't agree the licence of.

That is not suprising, nor hot news, but that gives ammunition for the fight. I wish I had a recorder.

Another one bites the dust...

Konica-Monilta drops photography activity

Now the question: what will happen to all these undocumented devices (cameras) and and file formats (MRW raw files)? It is in the argument of most if not all the manufacturers to not document these because they claim they'll support them (even is the don't support more than 2 proprietary OS).

Polaroid, Agfa, Ilford, Konica-Minolta, etc. All of these have either closed, stopped photo activities, or went so much in downsizing in that area that they just have become a name in the history of photography. This is the living proof that even the biggest and oldest company can disappear suddenly one day. Who knows what Sony will do with the assets they purchased, what they will phase out. And who knows what will be lost in the limbs of business, be it this documentation probably worth thousands of hours of reverse engineering, and these photographic memories that might stay as a bunch of unused megabytes, maybe forever.

It just explain why relying on the manufacturer to support undocumented protocols and file format is not a good idea for the customer.

Thursday 19 January 2006

The price of MP3

We all know how this happened. There was that thing called MP3 in the mid-90's. It allowed to have compressed music with a decent quality, suitable for playback on a PC. That how it all started. People starting sharing music, making juke-box with a PC, P2P-sharing and Naptser, etc.

But one day, circa 1998, they started asking for royalties on the patent, when the fornat had really become popular through a networking effect. This make any MP3 decoder and encoder impossible to license under GPL or LGPL, and such a decoder or encoder incompatible was a GPL playback software. That is why Fluendo did that Christmas Present to the Linux community: a fully licensed MP3 decoder for gstreamer, available free of charge in binary form to anyone. At the same time, they provide the complete source code for use, under a MIT-like license. That is really open source software, just not compatible with the GPL because of patents. While I actually don't know how much Fluendo paid for it, I'd classify it as a USD$50,000 gift to the open source community.

Thank you Fluendo.

In the mean time do yourself a favor and use Ogg Vorbis to encode your music until the patents expire... or even after (Vorbis encoding produce a better quality output).

Monday 16 January 2006

Winter tech reading

Technical books I recently read. The choice comes from the availability of said books for me to borrow (from the office):

  • Linux Unwired, co-authored by Edd Dumbill. It is all about getting you Linux system wireless connected, be it through Wifi, Bluetooth, IRDA or even cellular phone. A good practical book on how to handle these technologies, even though they have evolved since. But reading this book is sometime still a bit depressing as well as the state of wireless on Linux

And from the sysadmin corner:

  • Spamassassin is a practical book for whoever wants to deploy Spamassassin on his email system. I explain how to install with various MTA, including Postfix. We should still leverage the problem of how to make a bayesian filter trained by the user seemlessly. Thunderbird marks messages as junk (on the IMAP server), but this information is not forwarded to a sever side filter. It would we be so much more efficient to have the IMAP server notify the junk flag change and run the message to the spam filter for the bayesian training. It would make so much sense... (and at the same time have Evolution do the same as Thunderbird for that aspect of junkmail handling).
  • Kerberos, the definitive guide an all-in-one manual for deploying the Kerberos authentication system into your organisation. Kerberos is a secure distributed authentication system developed at MIT, designed to allow single-sign-on login. It works on UNIX, MacOS X and Windows 2000 (it is part of Active Directory).

Sunday 15 January 2006

Parsing RAW files

I have been working on libopenraw lately. Things I'm trying to get is enough parsing to extract thumbnails out of RAW files. That would be the first step.

It took exifprobe and generated the structure of the file. I started with the Canon CR2 files created by the Canon 20D.

The Canon CR2 file is basically a TIFF/EP file, where IFD0 (Image File Directory) contains the reduced resolution JPEG preview, IFD1 contains the 160x120 thumbnail (this IFD contain only 2 tags, the offset and the size of the JPEG data), IFD2 contains an reduced RGB version of the image, and IFD3 contain the RAW data (CFA array). When trying to read IFD1, I get an error about missing ImageLength, because this IFD only contains the offset and the length of the JPEG thumbnail data.

My original plan was to use libtiff, but it appears that TIFFReadDirectory() can't cope with these because it tries to do some check to validate the TIFF file. It even crashes on NEF files coming from a Nikon D100.

The second solution is to use libexif. Libexif is an EXIF parsing library, and EXIF is actually structures la TIFF, byte-by-byte. I have a patch (to submit) for libexif that reads TIFF in addition to the JPEG.

A third solution, the one I dislike the most is to rewrite yet another TIF parser, or fork one of these to embed it in libopenraw and just use that. I'm afraid that it might be the only viable solution in the long term and some files like Olympus' completely corrupt the TIFF standard, and patching a third-party TIFF reader is probably out of question, be it libexif or libtiff.

Saturday 7 January 2006

Digital media devices access

If you want to access your digital camera, you use a program that use libgphoto2. Now, with version 2.1.99 (upcoming 2.2) we even handle the camera that are mounted as a disk. It took us time to implement that because we didn't see the interest until we realized that people wanted to use their camera with a digital photo application, be it f-spot, gtkam, digikam, gthumb, and that the developer of said application shouldn't have to figure out the details. So now, we provide a single API for all the cameras that we know how to communicate with.

But what if you want to access your digital audio player (iPod, iRiver, Nomad Jukebox, etc)?

Some devices work as USB Mass Storage, like the small flash based players, hence do not require anything special driver.

For the iPods, there is libgpod that developers use to access the iPod private and proprietary database, there is ipod-sharp and its companion libipoddevice for those who write in C#, ipodslave of KDE developers (it is a kio-slave module) and gnupod, the Perl version.

For the Nomad Jukebox, there is libnjb, that implements the DPE protocol.

For the new iRiver and other MTP devices, there is experimental support in.... libgphoto2. Yes, you read correctly, a digital camera driver can be used. The MTP protocol is a variation of PTP (Picture Transfer Protocol, used for still image cameras), mostly pushed by Microsoft who provide a standard driver in the latest Windows XP updates. It is some sort of corruption of PTP and I think that the idea behind is to be able to control what files can put on the device, like implementing DRM. A bit like Apple does with iPod preventing iTunes users to download from the iPod the music, requiring that the music be on both the iPod and the computer, and having implemented a hackish crippling of iTunes to prevent loading some third-party plugins that allowed users to download the music off the iPod.

And I probably miss a lot.

How confusing is that ? Confusing for the developer that has to write at least four different version of code to access most of the device, and to the user that may not be able to use the application XyZ with the ACME player.

The solution it to create a library for digital audio player access the same way we have libgphoto2. One API for all the devices. That would involve a few changes in libgphoto2 to extract the PTP protocol, and give libgphoto2_port is own life, because the PTP implementation uses it. That would give the following architecture (admire the ASCII art):

libgphoto2 ---> ptp2camlib -> libptp2 <---- libaudioplayer
 \                   |            /           /    \
  \                  v           /           /      \
   \->      libgphoto2_port   <-/ <---------/        --> other drivers

Friday 6 January 2006

Retro computing

Found on Boing Boing: Raymond Frohlich posted scans of the 1979 IBM Brochure for the IBM 4331 computer. page 1 and page 2. What I like is the "futurisitic" easthetic they gave to the pictures of the office space. A long way in 26 years.

That reminds me that I have a good deal of old thing (not that old) fromn some manufacturer at my mom's for collection value. Maybe I should run the scanner there.

Market Dumping by bundling

I'm not impressed at all, but at least this company listen to their customer, even if they don't satisfy it.

I got a nice Executive Customer Relations from Lenovo Canada on the phone. She called me following my request of refund for the operating system that came with my ThinkPad Z60t, operating system I don't want to use and was required to acquire with the hardware.

I'm not surprised by the answer: no.

She claims that in 5 years nobody asked to not have the operating system, on which I doubt, or she never got the information. She also claims that it is required to use the hardware. I can tell that it is not as I directly booted on my Linux installation CD without ever running that software.

What can we do? Apparently nobody in Canada has taken the care to investigate and enforce the law. And we should tell the proper authority (in Canada) about that problem. If every customer that buy a hardware bundled with software he does not use, like Windows with a laptop computer, complains, then it my open some eyes, at least I hope. If you have any case of a precedent in Canada, I'm willing to hear. The only attempt I know about in Canada is Yannick Delbecque's, still without any answer.

The law is here (direct link to the "offences" section). I'm not a lawyer, and therefore do not provide an advice, but my understanding that law is that bundling software in order to help prevent competition is somewhat considered as an offence to Competition Act. That company has also been found guilty, or somewhat responible on unfair licensing agreement with OEM. Settlements for 23M$ because some OS vendor couldn't bundle its product due to an already unfair agreement with OEM, or refunds obtained in small claim courts in the US.

sigh

Just remember that case is almost specific with each manufacturer and in each country. I'm making a case of Canada and its federal law. There could still be a provincial ruling of some sort. Anyway this is to my eyes a clear case of monopoly abuse

Monday 2 January 2006

Ubuntu Linux on a Thinkpad Z60t

I posted my installation notes of Ubuntu Linux on my Thinkpad Z60t.

Feel free to comment for questions and/or feedback.

Sunday 1 January 2006

gphoto2 2.1.99 "Happy New Year" released

I just released gphoto2 2.1.99. This release is required to build with libgphoto2 2.1.99 that has been released previously. One of the reason is that gphoto2 relied on some libgphoto2_port API that has changed.

gphoto2 is the command line front-end to libgphoto2.