VBA string manipulation optimization with custom classes

Here’s a little bit more on the subject of VBA string manipulation. I showed in a post the other day how things take longer and longer when concatenating large strings in VBA. One reason I was digging into this was that for very large datasets, JSON serialization  – 500k strings and above, was taking an inexplicable amount of time. It turned out that string concatenation and the interaction with the garbage collector was at the heart of the problem. To deal with this, I introduced a new class ( cStringChunker ) which manipulates strings ‘in place’, where its possible to do so. The results were pretty impressive – here’s a 100 cycles of some complex string concatenations .. the chunk method (in red) are consistent and orders of magnitude faster than the straight string method.

However, when I came to apply this to the JSON serializer, I found that I needed to create a bunch more methods in order to avoid converting back and forwards between strings and the chunker. Before we get into that, here’s the results of improvements in the JSON serializer by using the chunker rather than regular string operations. Again – the new serialization using the chunker (in blue) is very much faster than concatenating strings.

Additional methods

I avoid back and forward string conversions by implementing common string methods such as right, left , mid and so on as well as few new useful ones, which I’ll probably add to over time. The list of methods are at the end of this post, but here’s some examples of how to use. They are all chainable for convenience.

Here’s an example of a complex string operation

               debug.print cs.add(“quick “) _
                    .add(“brown”) _
                    .insert(“the”) _
                    .shift(4, 1) _
                    .add(” fox”) _
                    .add(” gets the chicken”) _
                    .add(“)”) _
                    .insert (“(“) _
                    .content

Versus the string equivalent

                s = s & “quick “
                s = s & “brown”
                s = “the” & s
                s = left(s, 4) & ” ” & Mid(s, 4)
                s = s & ” fox”
                s = s & ” gets the chicken”
                s = “(” & s & “)”
                debug.print s

Both give this output
(the quick brown fox gets the chicken)

This would trim all spaces from the end of a line, trim a linefeed if it’s there, and chop a comma if it’s there.
               t.chopWhile(” “)
                .chopIf(vbLf)
                .chopIf (“,”)

We also have getMid(), getLeft(), getRight() functions along with size and content properties.

Short strings

Since these are built in order to optimize long string manipulations, it is likely that the overhead of a custom class with shorter strings will be less efficient. Here’s the results of a test using string manipulation like the example above. As expected, the chunker is consistent, but slightly slower – but the execution times are miniscule in any case

A note about the tests

The timings are calculated by repeating the operation multiple times, and timing how long each set of  operations took. More information is in on the Excel Liberation site.

cStringChunker class methods and properties

cStringChunker code

For more stuff like this, visit the Excel liberation site.

Author: brucemcp

Leave a Reply

Your email address will not be published. Required fields are marked *