You might want to read part 1 first.

Second round of optimization. I take the same sample document than with part 1, but this time I'll make it displayed. Here is the callee list in KCachegrind:

Obviously there is a list.

  1. UT_GenericVector<fp_Page*>::findItem()
  2. UT_GenericVector<fl_BlockLayout*>::findItem()
  3. UT_UTF8String::UTF8Iterator::sync()

We already know that UT_Vector::findItem() is as fast at it cans. It search linearly the vector. Since it is unordered there is no other choice. I'll skip for item 2 in the list, let's have a look at the use made for it with the callee graph:

We see where it is called from. Looking more closely at the source code you'll see that the vector is used as a queue, a queue of block to put in the spell cheker, and that findItem() is used to locate the element in the vector by index to move it in the queue, or out of the queue. That is where things are wrong. My solution to that is to double-link the fl_BlockLayout together for queueing. That way, when we need to change the queue order for a specific block, we do it really quickly. Change the 2 pointers of the fl_BlockLayout and of the neighbours. It is instant, whatever number of items you have in the list. As for memory it use sizeof(void*) bytes more, but that is actually worth it.

Now let's look at the profiling data after this optimization:

We clearly see that the top time consumer have changed. I mean number 2 is gone. It is hard to provide real data about the speed, but at least I know that it should be. UT_GenericVector<fp_Page*>::findItem() is still number one, it will come next.