Loud ramblings of a Software Artisan

Friday 15 April 2016

Modernizing AbiWord code

When you work on a 18 year old code base like AbiWord, you encounter stuff from another age. This is the way it is in the lifecycle of software where the requirement and the tooling evolve.

Nonetheless, when AbiWord started in 1998, it was meant as a cross-platform code base written in C++ that had to compile on both Windows and Linux. C++ compiler where not as standard compliant as today so a lot of things where excluded: no template, so not standard C++ library (it was called STL at the time). Over the years, things have evolved, Mac support was added, gcc 4 got released (with much better C++ support), and in 2003 we started using template for the containers (not necessarily in that oder, BTW). Still no standard library. This came later. I just flipped the switch to make C++11 mandatory, more on that later.

As I was looking for some bugs I found it that with all that hodge podge of coding standard there wasn't any, and this caused some serious ownership problems where we'd be using freed memory. The worse is this lead to file corruption where we write garbage memory into files as are supposed to be valid XML. This is bad.

The core of the problem is the way we pass attributes / properties around. They are passed as a NULL terminated array of pointer to strings. Even index are keys, odd are string values. While keys are always considered static, values are not always. Sometime they are taken out of a std::string or a one of the custom string containers from the code base (more on that one later), sometime they are just strdup() and free() later (uh oh, memory leaks).

Maybe this is the good time to do a cleanup and modernize the code base and make sure we have safer code rather that trying to figure out one by one all the corner cases. And shall I add that there is virtually no tests on AbiWord? So it is gonna be epic.

As I'm writing this I have 8 patches with a couple very big, amounting to the following stats (from git):

134 files changed, 5673 insertions(+), 7730 deletions(-)

These numbers just show how broad the changes are, and it seems to work. The bugs I was seeing with valgrind are gone, no more access to freed memory. That's a good start.

Some of the 2000+ lines deleted are redundant code that could have been refactored (there are still a few places I marked for that), but a lot have to do with what I'm fixing. Also some changes are purely whitespace / indentation where it was relevant usually around an actual change.

Now, instead of passing around const char ** pointers, we pass around a const PP_PropertyVector & which is, currently, a typedef to std::vector<std::string>. To make things nice the main storage for these properties is now also a std::map<std::string, std::string> (possibly I will switch it to an unordered map) so that assignments are transparent to the std::string implementation. Before that it was a one of the custom containers.

Patterns like this:

 const char *props[] = { NULL, NULL, NULL, NULL };
 int i = 0;
 std::string value = object->getValue();
 props[i++] = "key";
 const char *s = strdup(value.c_str());
 props[i++] = s;
 thing->setProperties(props);
 free(s);

Turns to

 PP_PropertyValue props = {
   "key", object->getValue()
 };
 thing->setProperties(props);

Shorter, readable, less error prone. This uses C++11 initializer list. This explain some of the line removal.

Use C++ 11!

Something I can't recommend enough if you have a C++ code base is to switch to C++ 11. Amongst the new features, let me list the few that I find important:

  • auto for automatic type deduction. Make life easier in typing and also in code changes. I mostly always use it whe declaring an iterator from a container.
  • unique_ptr<> and shared_ptr<>. Smart pointer inherited from boost. But without the need for boost. unique_ptr<> replaces the dreaded auto_ptr<> that is now deprecated.
  • unordered_map<> and unordered_set<>: hash based map and set in the standard library.
  • lambda functions. Not need to explain, it was one of the big missing feature of C++ in the age of JavaScript popularity
  • move semantics: transfer the ownership of an object. Not easy to use in C++ but clearly beneficial for when you always ended up copying. This is a key part of the unique_ptr<> implementation to be usable in a container where auto_ptr<> didn't. The move semantic is the default behaviour of Rust while C++ copies.
  • initializer list allow construction of object by passing a list of initial values. I use this one a lot in this patch set for property vectors.

Don't implement your own containers.

Don't implement vector, map, set, associative container, string, lists. Use the standard C++ library instead. It is portable, it works and it likely does a better job than your own. I have another set of patches to properly remove these UT_Vector, UT_String, etc. from the AbiWord codebase. Some have been removed progressively, but it is still ongoing.

Also write tests.

This is something that is missing on AbiWord that I have tried to tackle a few time.

One more thing.

I could have mechanised these code changes to some extent, but then I wouldn't have had to review all that code in which I found issues that I addressed. Eyeball mark II is still good for that.

The patch (in progress)

Thursday 16 July 2009

Eleven

Today AbiWord is eleven year old according to this commit.

Over these 11 years, it has been ported to 5 different platforms/toolkits, 2 of these having been dropped (QNX and BeOS), 1 on life support (MacOS X). It has had 3 build systems. Plain Makefiles, autotools and then a rewrite of the autotools build system for better, not mentioning the "IDE" support. It has had 3 different servers, one at SourceGear, one at the University of Omaha, Nebraska and the current at the University of Twente in the Netherlands, running respectively Debian, FreeBSD and Fedora. Also 2 version control systems: CVS then Subversion.

According to ohloh, AbiWord have had 79 committers, including the 5 top with over 1400 commits, 405K Lines of C++ Code (and more with the other languages) 1929 source files in the tree. According to ohloh, again, AbiWord code base is worth close to 10M$ (I'm not convinced about that last statement from ohloh metrics, but why not).

And we forgot to celebrate the 10 years last year. :-(

Friday 3 July 2009

A real paper cut

AbiWord had a long lasting usability issue: pressing the insert key caused to toggle the overwrite mode on and off. When doing so we provided two different feedback to the user:

  • a display in the status that switch from "INS" to "OVR"
  • the caret (insert point) switch to red.

This lead to different kind of complaints:

  • "When I type, the text to the right is replaced"
  • "Why is the insert point red? What did I do?"

See bug 3641

This reveal two problems. The first one is that the user didn't realise something happened. I hit a random key (ie he didn't realise which one) and something happened. The second the user noticed the caret changed colour, but still didn't know why.

I had a few ideas in mind.

  • Change the feedback, and there are a few options for that: change the caret shape (colour is never enough), change the status bar message, any other kind of notification
  • Do something for the key binding: popup a dialog, use clippy, play a music just make it disabled by default.

How I did implement it:

  • For now I changed the status bar message to be more readable. INS and OVR are just confusing obscure and an anachronism inherited from the AbiWord first step over 11 years ago mostly in trying to clone MS-Word with some of its atrocities. Now it is in plain $LANG (English here, but it is / will be localised, I hope).
  • I added a UI to enable the toggle. We had that option already in place, it was just on by default, not bound to any UI. I'm not a big fan of adding options, but that's just the best way to do it for now.

What can be done in the future?

  • Change the caret shape when in overwrite mode. I didn't want to do it that late in the release cycle has it seems to have been source of problems. Also it need to be well thought too as we also deal with bi-directional writing.

But that was a real paper cut for AbiWord. Not the only one, just one of them, and it was not that hard to fix. For the sake of it, I did it watching the BSG mini-series for the 3rd time.

Thursday 25 June 2009

8 years later

8 years after I filed bug 1349 in AbiWord, I implemented it. Definitely something that should have been done sooner.

It is about JPEG support. AbiWord has been able to import JPEG images for a good while, but it always converted the JPEG to PNG for internal storage. This is IMHO wrong, but at the time it was debated that it was the right tradeoff to allow using AbiWord on embedded platforms (I'm too lazy to dig up the archive). Anyway.

Tuesday I sat down and implemented the JPEG support, removing cruft, and cleaning up the rest. Basically when import and bitmap image, if the format is JPEG, the JPEG stream is kept as is, otherwise we convert to PNG, as usual. It hit SVN last night. The bonus is that a file with a JPEG in it will open properly in AbiWord 2.6 (and likely older), so even the issue I had with compatibility isn't.

This is probably the last real feature implemented for 2.8, and it will be in 2.7.6. Back to bug fixing.

Monday 8 June 2009

AbiWord 2.7.3

Marc released AbiWord 2.7.3 this week-end. I used the release to update the packages from Maemo since I fixed many of the issues that were filed against it. It should appear in the tablet's Application Manager if you have the extras-devel repository.

Note: there won't be any update to 2.6.x, as 2.7.3 should already be much better. Only Maemo 4.x is supported (only tested on 4.1 aka Diablo on a N800).

Update: if you don't have the extras-devel repository, you can install it. Diablo (4.1)

Update 2: and please, DO file bugs.

Monday 23 March 2009

AbiWord 2.6.8 and Summer of Code

AbiWord 2.6.8 has been released a few days ago. It include some memory leak fixes and other that I backported from trunk. Thanks to all the contributors.

Also this year, again, AbiWord take part of the summer of code. See the list of suggested ideas.

Tuesday 17 February 2009

Deprecated

Over the last week I spent some time rewriting bits of the AbiWord Gtk+ code to build with GTK_DISABLE_DEPRECATED defined. It took longer than expected as I originally thought that it was no more than 4 dialogs, but I was wrong... It is like a can of worm. I also fixed a few bugs in the mean time.

Wednesday 30 May 2007

Eight years ago

8 years ago, back when AbiWord was a commercial venture, the AbiWord folks from Sourcegear had a booth at Linux Expo 99.

Abi the ant was there. Marc dug up pictures that were still on the server. One did catch our eyes (along with the caption):

Miguel and his partners are starting a new company, International GNOME Support

Wednesday 18 April 2007

Intel UMPC running Linux

I'm sure you didn't miss Intel MID, Intel's answer to the wave of UMPC and Internet Tablet. It features Linux, Matchbox and Hildon, all glued together with a distribution based on the Chinese RedFlag. But the more interesting thing is that the core chipset is based on a Pentium M and GMA945, which is much more horse power than, say, the Nokia 800.

Read more about it here, here, here

Now interesting challenge: while I'm sure it will be relatively easy to have AbiWord and Gnumeric running on it, like they already do on Maemo, but how about porting OpenOffice.org? Sure it will be contrived, but we have a 256MB of RAM, 500MB of storage and a 600-800MHz CPU based on a Pentium M (with half the cache on than the one found in laptops) which should be more than enough.

Sunday 1 April 2007

Think tank

Pavel, this is a damn good idea.

Wednesday 14 February 2007

Thin clients

Another Linux thin client. This time it is the Canadian company Koolu (just ignore the proprietary Flash like I did). The interesting thing is that they run Ubuntu and OLPC (choice) on it, booting via PXE. OLPC is potentially the biggest deployment of AbiWord. The machine is still able to run big software like OpenOffice, and even has an optional hard drive.

Another interesting thing is that they have John "maddog" Hall as a CTO.

Update (02/15/07): the demo video is available in Ogg

See also: Linutop. BTW, about Linutop, if you are interested in it and are around Montreal, there will be a presentation of Linutop at the monthly meeting of FACIL. Info in French

Linux phone

The OpenMoko software platform has been released. OpenMoko is the the project initiated by FIC in Taiwan. Apparently the hardware is late, but they released the software still.

On the other hand, the Access Linux Platform has been released, and its openness looks much less appealing. ALP is the successor of PalmOS and is based on Linux.

They are both based on Gtk+ for their toolkit. That open more deployments for AbiWord.

Tuesday 23 January 2007

AbiWord 2.4.6 for Maemo 2.0

Thanks to tf, etrunko and uwog, there is now an AbiWord 2.4.6 package for Maemo 2.0 (the one that runs on the Nokia 770).

To get it set the following package source in the application manager:

  • Web address: http://www.abisource.com/downloads/apt/
  • Distribution: mistral
  • Components: user

Note: there are still bugs and I have barely anything to do with that version (due to lack of time to really get started on Maemo development).

Next step? Provide a build for the N800 (I don't know who in the team will have a device, but it won't be me). And build AbiWord 2.5.x

Adding features by removing code

Lately I have been removing code from AbiWord to make it cleaner. I have removed 3 different hash/map container classes, one set container class, with it associated RB Tree implementation, a pair container class. This for one feature: cleaner code. All of these have replaced by a standard container from libstdc++. I also templatized the string implementation class as it was already written in such a way that they were almost identical.

But why has this to be done? It is historical.

Originally, when the project started, back in 1997, it had to be build with compilers that were not up to the standard. So the project was started with its own implementation of standard containers, the same you find in the Standard C++ library. The use of templates was not allowed, nor namespaces. But today, this is no longer the case. We build with gcc everywhere, vendor compilers as an option, and it works well. In 2003, I did switch several of the major containers to use template instead of generic void* that go in the way when one decide to store integer in that. For 2.5.1 I decide it was time to remove the least used classed, even more, when in the case of map/hashtable, they were duplicating themselves. Yes, I removed 3 different instances of the same functionality! AbiWord should be less bloated.

More later...

Sunday 14 January 2007

Abi ports

So AbiWord 2.5.0 is out, heavy last minute hacking. Since the Linux version does not really need my help, being in good hands (don't worry I still have a couple of ideas under the hood), I spent some time setting the Mac build straight an square. The good news is that it finally compiles (maybe after 2.5.0 release, I don't remember if I had to commit more stuff), the bad is that it fails running, or at least running in something useful. I have to figure out what is going on.

One of the thing I had to spend some time is what I will call abiports: it is a ports system to provide the ever growing list of dependencies. As of today, AbiWord 2.5.0 depends on glib, wv, libgsf, libpng[1], popt[2] and enchant. The idea is to compile them with limited dependencies to embed them in the application bundle.

And for those who are wondering why I waste time on the Mac build instead of working on Free Software platforms, it is because somebody has to do it, for the sake of the project...

Notes

[1] yes this library is not part of MacOS X

[2] about to be ditched