When you work on a 18 year old code base like AbiWord, you encounter stuff from another age. This is the way it is in the lifecycle of software where the requirement and the tooling evolve.

Nonetheless, when AbiWord started in 1998, it was meant as a cross-platform code base written in C++ that had to compile on both Windows and Linux. C++ compiler where not as standard compliant as today so a lot of things where excluded: no template, so not standard C++ library (it was called STL at the time). Over the years, things have evolved, Mac support was added, gcc 4 got released (with much better C++ support), and in 2003 we started using template for the containers (not necessarily in that oder, BTW). Still no standard library. This came later. I just flipped the switch to make C++11 mandatory, more on that later.

As I was looking for some bugs I found it that with all that hodge podge of coding standard there wasn't any, and this caused some serious ownership problems where we'd be using freed memory. The worse is this lead to file corruption where we write garbage memory into files as are supposed to be valid XML. This is bad.

The core of the problem is the way we pass attributes / properties around. They are passed as a NULL terminated array of pointer to strings. Even index are keys, odd are string values. While keys are always considered static, values are not always. Sometime they are taken out of a std::string or a one of the custom string containers from the code base (more on that one later), sometime they are just strdup() and free() later (uh oh, memory leaks).

Maybe this is the good time to do a cleanup and modernize the code base and make sure we have safer code rather that trying to figure out one by one all the corner cases. And shall I add that there is virtually no tests on AbiWord? So it is gonna be epic.

As I'm writing this I have 8 patches with a couple very big, amounting to the following stats (from git):

134 files changed, 5673 insertions(+), 7730 deletions(-)

These numbers just show how broad the changes are, and it seems to work. The bugs I was seeing with valgrind are gone, no more access to freed memory. That's a good start.

Some of the 2000+ lines deleted are redundant code that could have been refactored (there are still a few places I marked for that), but a lot have to do with what I'm fixing. Also some changes are purely whitespace / indentation where it was relevant usually around an actual change.

Now, instead of passing around const char ** pointers, we pass around a const PP_PropertyVector & which is, currently, a typedef to std::vector<std::string>. To make things nice the main storage for these properties is now also a std::map<std::string, std::string> (possibly I will switch it to an unordered map) so that assignments are transparent to the std::string implementation. Before that it was a one of the custom containers.

Patterns like this:

 const char *props[] = { NULL, NULL, NULL, NULL };
 int i = 0;
 std::string value = object->getValue();
 props[i++] = "key";
 const char *s = strdup(value.c_str());
 props[i++] = s;
 thing->setProperties(props);
 free(s);

Turns to

 PP_PropertyValue props = {
   "key", object->getValue()
 };
 thing->setProperties(props);

Shorter, readable, less error prone. This uses C++11 initializer list. This explain some of the line removal.

Use C++ 11!

Something I can't recommend enough if you have a C++ code base is to switch to C++ 11. Amongst the new features, let me list the few that I find important:

  • auto for automatic type deduction. Make life easier in typing and also in code changes. I mostly always use it whe declaring an iterator from a container.
  • unique_ptr<> and shared_ptr<>. Smart pointer inherited from boost. But without the need for boost. unique_ptr<> replaces the dreaded auto_ptr<> that is now deprecated.
  • unordered_map<> and unordered_set<>: hash based map and set in the standard library.
  • lambda functions. Not need to explain, it was one of the big missing feature of C++ in the age of JavaScript popularity
  • move semantics: transfer the ownership of an object. Not easy to use in C++ but clearly beneficial for when you always ended up copying. This is a key part of the unique_ptr<> implementation to be usable in a container where auto_ptr<> didn't. The move semantic is the default behaviour of Rust while C++ copies.
  • initializer list allow construction of object by passing a list of initial values. I use this one a lot in this patch set for property vectors.

Don't implement your own containers.

Don't implement vector, map, set, associative container, string, lists. Use the standard C++ library instead. It is portable, it works and it likely does a better job than your own. I have another set of patches to properly remove these UT_Vector, UT_String, etc. from the AbiWord codebase. Some have been removed progressively, but it is still ongoing.

Also write tests.

This is something that is missing on AbiWord that I have tried to tackle a few time.

One more thing.

I could have mechanised these code changes to some extent, but then I wouldn't have had to review all that code in which I found issues that I addressed. Eyeball mark II is still good for that.

The patch (in progress)