View RSS Feed

Development Team Blog

Arrays & Structs in-depth Part V

Rate this Entry
In Part IV we discovered that passing very large arrays around via parameters and return values is usually very fast thanks to the built-in copy-on-write optimization. We also discovered that if you modify the array, you incur a copy operation and lose the benefit of the copy-on-write optimization. It may come as no surprise then that the fewer copy operations you perform, the better performance you get.

Up until now we've really just worked with local array variables, but in most programs you will find the need to use array properties. We know that thanks to the copy-on-write performance optimization, accessing array properties alone is not a performance issue. Afterall, getting an array property is merely a function that returns the array, which utilizes copy-on-write and is therefore very fast. Likewise, setting a property is merely a procedure with an array parameter, again utilizing copy-on-write making it very fast. Which is why in most cases working with array properties is really no different from local array variables.

Because we're used to the old Array class, it's very tempting to write code for array properties that mimics the interface of using the Array class.

Code:
Object oTest is a cObject
    Property String[] pTestArray
    
    Function GetTestItem Integer indx Returns String
        String[] myArray
        Get pTestArray to myArray
        Function_Return myArray[indx]
    End_Function
    
    Procedure SetTestItem Integer indx String sValue
        String[] myArray
        Get pTestArray to myArray
        Move sValue to myArray[indx]
        Set pTestArray to myArray
    End_Procedure
...
If you've ever done something like the above, don't worry, you're not alone. Practically every VDF developer I know have been extremely tempted to write code like the above. We can't help it, we're used to the old Array class, and it seems so neat to add some extra wrapper methods like that, so it even behaves like the old Array class.

It's also a very classic phenomena, every time a software developer is facing new technology, there's this incredibly irresistible urge to apply and use the new technology in the same old way. We're automatically assuming that what was a good way to use the old technology must also be the best way to use the new technology. It's a little bit like using a shiny new screwdriver in the same way as the old trustworthy hammer, or writing a new shiny Windows application that works and behaves just like the old DOS application, you know what I mean, we've all been there, I know I have.

Now, remember that just as we discovered in Part IV, unless you're working with very large arrays or accessing these wrapper methods thousands of times, the actual performance impact is likely irrelevant. What we're discussing here is why the above is an area to concentrate on when you have determined that there's a performance problem, or if you know beforehand that performance will be an important concern for the specific code.

Let's put this to the test. If we assume that we will be populating an array property using the access methods above, ideally this should have linear complexity. That is, we'd expect that the time it takes to populate the array should be approximately linear to the number of elements we add. But when we test it, we find that the performance is most definitely not linear.



These numbers are terrible! It takes 400 times longer to populate the array with only 20 times the number of elements, approximately 1ms for 100 elements, but around 460ms for 2000 elements. This is quadratic rather than linear performance.

So what happened? The answer of course has to do with the access methods and the copy-on-write optimization. Every time we add an element the access method incurs a copy operation, because it gets the array to a local variable, modifies it (which first must perform a copy, copy-on-write, remember?), and then we set the property again.

So the access methods are the culprit here. Fortunately there's a very simple solution. It's quite obvious, one just has to come to terms with the idea that access methods are not really improving or simplifying the code in this particular case. In Part VI we'll examine the copy-on-write performance implications related to this scenario in detail, and then discuss how to turn it all into linear performance and make everything super fast again.
Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	with-access-methods.jpg 
Views:	1959 
Size:	42.4 KB 
ID:	3772  
Attached Thumbnails Attached Files

Comments

  1. Frank Cheng's Avatar
    Thanks Sonny for the informative blog posts. I am pretty sure many of us are aware of the performance hit when you are changing/adding values in array properties that contain many elements, however I have not found any other way to optimize it besides sticking with the old array. Can't wait for Part VI of the series.

    Frank Cheng
  2. John Tuohy's Avatar
    Frank, I've seen it and Part VI is very cool!
  3. seanyboy's Avatar
    I too am quite excited about part 6.