Here’s a little bit more on the subject of VBA string manipulation. I showed in a post the other day how things take longer and longer when concatenating large strings in VBA. One reason I was digging into this was that for very large datasets, JSON serialization – 500k strings and above, was taking an inexplicable amount of time. It turned out that string concatenation and the interaction with the garbage collector was at the heart of the problem. To deal with this, I introduced a new class ( cStringChunker ) which manipulates strings ‘in place’, where its possible to do so. The results were pretty impressive – here’s a 100 cycles of some complex string concatenations .. the chunk method (in red) are consistent and orders of magnitude faster than the straight string method.
However, when I came to apply this to the JSON serializer, I found that I needed to create a bunch more methods in order to avoid converting back and forwards between strings and the chunker. Before we get into that, here’s the results of improvements in the JSON serializer by using the chunker rather than regular string operations. Again – the new serialization using the chunker (in blue) is very much faster than concatenating strings.
I avoid back and forward string conversions by implementing common string methods such as right, left , mid and so on as well as few new useful ones, which I’ll probably add to over time. The list of methods are at the end of this post, but here’s some examples of how to use. They are all chainable for convenience.
Here’s an example of a complex string operation
debug.print cs.add(“quick “) _
.shift(4, 1) _
.add(” fox”) _
.add(” gets the chicken”) _
.insert (“(“) _
Versus the string equivalent
s = s & “quick “
s = s & “brown”
s = “the” & s
s = left(s, 4) & ” ” & Mid(s, 4)
s = s & ” fox”
s = s & ” gets the chicken”
s = “(” & s & “)”
Both give this output
(the quick brown fox gets the chicken)
This would trim all spaces from the end of a line, trim a linefeed if it’s there, and chop a comma if it’s there.
We also have getMid(), getLeft(), getRight() functions along with size and content properties.
Since these are built in order to optimize long string manipulations, it is likely that the overhead of a custom class with shorter strings will be less efficient. Here’s the results of a test using string manipulation like the example above. As expected, the chunker is consistent, but slightly slower – but the execution times are miniscule in any case
A note about the tests
cStringChunker class methods and properties
For more stuff like this, visit the Excel liberation site.