I have been working on libopenraw lately. Things I'm trying to get is enough parsing to extract thumbnails out of RAW files. That would be the first step.

It took exifprobe and generated the structure of the file. I started with the Canon CR2 files created by the Canon 20D.

The Canon CR2 file is basically a TIFF/EP file, where IFD0 (Image File Directory) contains the reduced resolution JPEG preview, IFD1 contains the 160x120 thumbnail (this IFD contain only 2 tags, the offset and the size of the JPEG data), IFD2 contains an reduced RGB version of the image, and IFD3 contain the RAW data (CFA array). When trying to read IFD1, I get an error about missing ImageLength, because this IFD only contains the offset and the length of the JPEG thumbnail data.

My original plan was to use libtiff, but it appears that TIFFReadDirectory() can't cope with these because it tries to do some check to validate the TIFF file. It even crashes on NEF files coming from a Nikon D100.

The second solution is to use libexif. Libexif is an EXIF parsing library, and EXIF is actually structures la TIFF, byte-by-byte. I have a patch (to submit) for libexif that reads TIFF in addition to the JPEG.

A third solution, the one I dislike the most is to rewrite yet another TIF parser, or fork one of these to embed it in libopenraw and just use that. I'm afraid that it might be the only viable solution in the long term and some files like Olympus' completely corrupt the TIFF standard, and patching a third-party TIFF reader is probably out of question, be it libexif or libtiff.