Arrays & Structs in-depth Part V
by
, 23-Sep-2009 at 08:00 AM (5668 Views)
In Part IV we discovered that passing very large arrays around via parameters and return values is usually very fast thanks to the built-in copy-on-write optimization. We also discovered that if you modify the array, you incur a copy operation and lose the benefit of the copy-on-write optimization. It may come as no surprise then that the fewer copy operations you perform, the better performance you get.
Up until now we've really just worked with local array variables, but in most programs you will find the need to use array properties. We know that thanks to the copy-on-write performance optimization, accessing array properties alone is not a performance issue. Afterall, getting an array property is merely a function that returns the array, which utilizes copy-on-write and is therefore very fast. Likewise, setting a property is merely a procedure with an array parameter, again utilizing copy-on-write making it very fast. Which is why in most cases working with array properties is really no different from local array variables.
Because we're used to the old Array class, it's very tempting to write code for array properties that mimics the interface of using the Array class.
If you've ever done something like the above, don't worry, you're not alone. Practically every VDF developer I know have been extremely tempted to write code like the above. We can't help it, we're used to the old Array class, and it seems so neat to add some extra wrapper methods like that, so it even behaves like the old Array class.Code:Object oTest is a cObject Property String[] pTestArray Function GetTestItem Integer indx Returns String String[] myArray Get pTestArray to myArray Function_Return myArray[indx] End_Function Procedure SetTestItem Integer indx String sValue String[] myArray Get pTestArray to myArray Move sValue to myArray[indx] Set pTestArray to myArray End_Procedure ...
It's also a very classic phenomena, every time a software developer is facing new technology, there's this incredibly irresistible urge to apply and use the new technology in the same old way. We're automatically assuming that what was a good way to use the old technology must also be the best way to use the new technology. It's a little bit like using a shiny new screwdriver in the same way as the old trustworthy hammer, or writing a new shiny Windows application that works and behaves just like the old DOS application, you know what I mean, we've all been there, I know I have.
Now, remember that just as we discovered in Part IV, unless you're working with very large arrays or accessing these wrapper methods thousands of times, the actual performance impact is likely irrelevant. What we're discussing here is why the above is an area to concentrate on when you have determined that there's a performance problem, or if you know beforehand that performance will be an important concern for the specific code.
Let's put this to the test. If we assume that we will be populating an array property using the access methods above, ideally this should have linear complexity. That is, we'd expect that the time it takes to populate the array should be approximately linear to the number of elements we add. But when we test it, we find that the performance is most definitely not linear.
These numbers are terrible! It takes 400 times longer to populate the array with only 20 times the number of elements, approximately 1ms for 100 elements, but around 460ms for 2000 elements. This is quadratic rather than linear performance.
So what happened? The answer of course has to do with the access methods and the copy-on-write optimization. Every time we add an element the access method incurs a copy operation, because it gets the array to a local variable, modifies it (which first must perform a copy, copy-on-write, remember?), and then we set the property again.
So the access methods are the culprit here. Fortunately there's a very simple solution. It's quite obvious, one just has to come to terms with the idea that access methods are not really improving or simplifying the code in this particular case. In Part VI we'll examine the copy-on-write performance implications related to this scenario in detail, and then discuss how to turn it all into linear performance and make everything super fast again.