Something I forgot to talk about are the encoding for these iCalendar and vCard data. Why? Just because it is not clearly specified, or not in a way adapted to that use. I don't know what where the reason behind that, but something I'm sure is that it is a little bit shortsighted, as the RFC only address the use of that format as MIME content not providing any provision for a more versatile usage.

vCard: RFC 2426, section 5 state the difference between the vCard 2.1 format and the version 3.0 that the RFC specifies. It is written:

. The VCARD CHARSET type parameter has been eliminated. Character set can only be specified on the CHARSET parameter on the Content-Type MIME header field.

Previously we could specify CHARSET parameter for each field; now it is no longer allowed and requires that the outer MIME header specifies the encoding. This is quite restricting as we not necessarily have this in a MIME enveloppe. This leads to some weird behaviour with other software. For example, MacOS X Address Book application allows exporting addresses to vCard. If everything is in ASCII, then the vCard .vcf file is stored as an ASCII file. If it contains non ASCII data, like accents in French, the file is written in 16-bits Unicode, with a leading bom marker which is not without confusing other reader.

iCalendar: Section 4.1.4 for RFC 2445 state about encoding:

There is not a property parameter to declare the character set used in a property value. The default character set for an iCalendar object is UTF-8 as defined in RFC 2279.

Again, same as vCard, the charset specification is to be set by the MIME container. But this time, a default is specified: UTF-8. This is one point where vCard and iCalendar differ and where at least iCalendar has a sane default.

Now when the data comes from the device, sometime the developer has to figure out what charset it is. Even worse. Sometime it completely depend on the language the device is set in, and there might not be any way to know that, and assume that the user have the same language settings on both side. It is a problem as these encoding problem cause data loss of a various kind. Sometime it is just not the right char (but in that case that only cause differences), sometime it cause the string to be truncated because of a charset conversion failure.

This is something to be considered, and for the sync engine, I'd myself just go by using UTF-8 all the way.