Loud ramblings of a Software Artisan

Wednesday 21 December 2005

PIM Data synchronisation, part 3, the encoding

Something I forgot to talk about are the encoding for these iCalendar and vCard data. Why? Just because it is not clearly specified, or not in a way adapted to that use. I don't know what where the reason behind that, but something I'm sure is that it is a little bit shortsighted, as the RFC only address the use of that format as MIME content not providing any provision for a more versatile usage.

vCard: RFC 2426, section 5 state the difference between the vCard 2.1 format and the version 3.0 that the RFC specifies. It is written:

. The VCARD CHARSET type parameter has been eliminated. Character set can only be specified on the CHARSET parameter on the Content-Type MIME header field.

Previously we could specify CHARSET parameter for each field; now it is no longer allowed and requires that the outer MIME header specifies the encoding. This is quite restricting as we not necessarily have this in a MIME enveloppe. This leads to some weird behaviour with other software. For example, MacOS X Address Book application allows exporting addresses to vCard. If everything is in ASCII, then the vCard .vcf file is stored as an ASCII file. If it contains non ASCII data, like accents in French, the file is written in 16-bits Unicode, with a leading bom marker which is not without confusing other reader.

iCalendar: Section 4.1.4 for RFC 2445 state about encoding:

There is not a property parameter to declare the character set used in a property value. The default character set for an iCalendar object is UTF-8 as defined in RFC 2279.

Again, same as vCard, the charset specification is to be set by the MIME container. But this time, a default is specified: UTF-8. This is one point where vCard and iCalendar differ and where at least iCalendar has a sane default.

Now when the data comes from the device, sometime the developer has to figure out what charset it is. Even worse. Sometime it completely depend on the language the device is set in, and there might not be any way to know that, and assume that the user have the same language settings on both side. It is a problem as these encoding problem cause data loss of a various kind. Sometime it is just not the right char (but in that case that only cause differences), sometime it cause the string to be truncated because of a charset conversion failure.

This is something to be considered, and for the sync engine, I'd myself just go by using UTF-8 all the way.

Saturday 10 December 2005

PIM Data synchronisation, part 2, the data formats

You may want to read part 1 if you haven't already. I have explained why we might want to have synchronisation. Now I'll explain how we would do that.

First we need to understand how PIM data are or can be formatted. Actually there are various formats, more or less proprietary. And there are a couple of IETF format, vCard, vCalendar and iCalendar. iCalendar is the version 2.0 of vCalendar, and has been ratified as RFC 2445 by the IETF. It is the standard for storing calendaring information. vCard has been ratified as RFC 2426 and specifies a way to store and transmit contact information.

Both iCalendar and vCard have a common organisation, that is logical given the nature of the data. We have records that contain fields, with attributes. For example, a contact would be as follow:

BEGIN:vCard
VERSION:3.0
FN:Homer Simpson
ORG:Springfield Nuclear Plant
ADR;TYPE=HOME:;;742 Evergreen Terrace
 ;Springfield;IL;;U.S.A.
TEL;TYPE=VOICE,MSG,WORK:+1-555-555-1234
EMAIL;TYPE=INTERNET:chunkylover53@aol.com
URL:http://members.aol.com/~chunkylover53/
END:vCard

The vCard is a record, and each field is one of this lines that starts with a keyword followed by a ":", at least when it comes to the vCard format. iCalendar is quite similar, but allow embedding another record. No big deal. This is the logical organisation, and if you look closer, you'll realize that this is the design choice made for the Palm .pdb format where the data is just a bunch of records.

To be continued...