View RSS Feed

Development Team Blog

Structs, Arrays and Optimization Strategies (Part 2 of 2)

Rate this Entry
In Part One I described the problem of performing lookups for each member of an array and how to use a cache when performing lookups repeatedly with multiple arrays sharing mostly the same members.

We left part one after describing a caching technique that required further optimization. In this article we will explore different array searching techniques, using the built-in VDF Search Array functions, to optimize our cache algorithm.

Third Strategy: Using a VDF SearchArray Function

We will abandon the FOR loop, used in Strategy #2, to search a name in the tNameData cache, as this caused an exponential increase in time with an increase in the number of names. Instead we will use the VDF SearchArray function.

For simple data types, such as string, integer, number, date, the SearchArray function uses fast comparison algorithms that are built into the runtime. However, our name data cache is defined as an array of struct type tNameData.

Code:
Property tNameData[] pNameDataCache
This means that we need to define a special comparison function in VDF code and use that when calling the SearchArray function.

Code:
Function CompareNameData tNameData SearchName tNameData ReferenceName Returns Integer
    If (SearchName.sName = ReferenceName.sName) Function_Return (EQ)
    Else Function_Return (NE)
End_Function  // CompareNameData
And below is the code to test the new name lookup algorithm...

Code:
Function LookupNamesTest3 String[] sNames Integer ByRef iTime Returns tNameData[]
    Integer iStart iStop
    Integer iName icName
    Integer icCache iTopOfCache iFound
    Boolean bChanged
    tNameData SearchName
    tNameData[] NameData NameCacheOld NameCacheNew
    
    Move (GetTickCount()) to iStart
    
    // Get the cached data....
    Move False to bChanged
    Get pNameDataCache to NameCacheOld
    Move NameCacheOld to NameCacheNew           // make a copy of the cache
    Move (SizeOfArray(NameCacheOld)) to icCache
    Move icCache to iTopOfCache
    
    // Iterate the set of names....
    Move (SizeOfArray(sNames)) to icName
    Decrement icName
    
    For iName from 0 to icName
        Move -1 to iFound
        
        If (icCache > 0) Begin    // don't try to search the cache if it is empty. This speeds up the first search.
            Move sNames[iName] to SearchName.sName
            Move (SearchArray(SearchName, NameCacheOld, Self, get_CompareNameData)) to iFound
        End
        
        Move sNames[iName] to NameData[iName].sName
        
        If (iFound >= 0) Begin
            // If found, then use the cached value....
            Move NameCacheOld[iFound].dDateOfBirth to NameData[iName].dDateOfBirth
        End
        Else Begin
            // Otherwise we have to look it up and add the value to the cache....
            Get GetDateOfBirth sNames[iName] to NameData[iName].dDateOfBirth
            
            Move NameData[iName].sName        to NameCacheNew[iTopOfCache].sName
            Move NameData[iName].dDateOfBirth to NameCacheNew[iTopOfCache].dDateOfBirth
            Increment iTopOfCache
            Move True to bChanged
        End
    Loop
    
    // If the cached has been updated then we need to save the changes....
    If (bChanged) Begin
        Set pNameDataCache to NameCacheNew
    End
    
    Move (GetTickCount()) to iStop
    Move (iStop - iStart) to iTime
    Function_Return NameData
End_Function  // LookupNamesTest3
This function is substantially the same as in Strategy #2 except that the FOR loop is replaced by a single call to the SearchArray function...

Code:
Move (SearchArray(SearchName, NameCacheOld, Self, get_CompareNameData)) to iFound
You can see here that we need to pass the function identifier of our VDF function CompareNameData. This function will be called for each member of the Name Data Cache that we need to search.

So how does the SearchArray function compare to the FOR loop in strategy #2?

1,000 names
1st run: 5.023 sec
2nd run: 1.373 sec
3rd - 6th runs: 1.778, 1.997, 2.153, 2.247 sec.

This strategy suffers the same linear time increase with cache size as strategy #2, except that it is slightly worse. What about scalability?

10,000 names
1st run: 50.28 sec
2nd run: 90.746 sec
3rd run: 93.912 sec

I didn't even try testing 50,000 names. This strategy clearly suffers the same exponential growth in time as the number of names increases as strategy #2, except once again, it is worse. We need another approach.

Fourth Strategy: Using the RuntimeArray Search Function

In this strategy we will reorganize the cached data so that, instead of storing the names and matching dates of birth in an array of tNameData structs, we will use two separate arrays. A string array to store the cached names, and a date array to store the cached dates of birth.

Code:
Property String[] pNamesCache
 Property Date[] pDatesCache
When we store a name/date pair in the cache we will carefully ensure that the array indexes of the name and corresponding date of birth match. This means that we can search the pNamesCache using SearchArray with the fast compare function built into the runtime. Once we have the array index of the cached name we can use that to get the cached date of birth. This is slightly more convoluted but it should perform better. Below is the test algorithim...

Code:
Function LookupNamesTest4 String[] sNames Integer ByRef iTime Returns tNameData[]
    Integer iStart iStop
    Integer iName icName
    Integer icCache iTopOfCache iFound iVoid
    Boolean bChanged
    String[] NamesCacheOld NamesCacheNew
    Date[] DatesCacheOld DatesCacheNew
    tNameData[] NameData
    
    Move (TimeBeginPeriod(1)) to iVoid
    Move (GetTickCount()) to iStart
    
    // Get the cached data....
    Move False to bChanged
    Get pNamesCache to NamesCacheOld
    Get pDatesCache to DatesCacheOld
    Move NamesCacheOld to NamesCacheNew           // make a copy of the cache
    Move DatesCacheOld to DatesCacheNew
    Move (SizeOfArray(NamesCacheOld)) to icCache
    Move icCache to iTopOfCache
    
    // Iterate the set of names....
    Move (SizeOfArray(sNames)) to icName
    Decrement icName
    
    For iName from 0 to icName
        Move -1 to iFound
        
        If (icCache > 0) Begin    // don't try to search the cache if it is empty. This speeds up the first search.
            Move (SearchArray(sNames[iName], NamesCacheOld)) to iFound
        End
        
        Move sNames[iName] to NameData[iName].sName
        
        If (iFound >= 0) Begin
            // If found, then use the cached value....
            Move DatesCacheOld[iFound] to NameData[iName].dDateOfBirth
        End
        Else Begin
            // Otherwise we have to look it up and add the value to the cache....
            Get GetDateOfBirth sNames[iName] to NameData[iName].dDateOfBirth
            
            Move NameData[iName].sName        to NamesCacheNew[iTopOfCache]
            Move NameData[iName].dDateOfBirth to DatesCacheNew[iTopOfCache]
            Increment iTopOfCache
            Move True to bChanged
        End
    Loop
    
    // If the cached has been updated then we need to save the changes....
    If (bChanged) Begin
        Set pNamesCache to NamesCacheNew
        Set pDatesCache to DatesCacheNew
    End
    
    Move (GetTickCount()) to iStop
    Move (iStop - iStart) to iTime
    Move (TimeEndPeriod(1)) to iVoid
    Function_Return NameData
End_Function  // LookupNamesTest4
So how does this one perform against the other strategies?

1,000 names
1st run: 5.085 sec
2nd run: 561 milliseconds
3rd - 6th runs: 655, 624, 593, 577, 624 milliseconds

The first run shows us that the extra complexity in managing the cached data has a minimal effect. The second run took 561 milliseconds! This is pretty close to our theoretical minimum of 500 milliseconds, a good result!

The 3rd - 6th tests show a fluctuation between 577 and 655 milliseconds. If there is a linear increase with the cache size, then it is not evident in this test. What about scalability?

10,000 names
1st run: 50.326 sec
2nd - 6th runs: 4.196, 4.243, 4.337, 4.332, 4.415 sec

50,000 names
1st run: 251.068 sec
2nd & 3rd runs: 91.385, 94.662 sec

In both cases the first run confirms that the new caching technique has no negative effect on the time.

With 10,000 names, the cached searches show a huge improvement. From 72 seconds down to 4 seconds. This is at the borderline our defined annoyance level. Also, at this test size we do begin to see a linear increase as the cache increases. However it is only a few hundred milliseconds each time. This is probably OK for our real-world application as we would expect the cache to grow by smaller and smaller amounts each time it is used.

With 50,000 names the results are much better for cached searches: 35 minutes down to 91 seconds. However 91 seconds is outside our acceptable range. If our expected name lists are going to be around 50,000 names then we need to find more speed.

Fifth Strategy: Using BinarySearchArray

The global BinarySearchArray function is supposed to perform significantly faster searches on large arrays. The way this function works is to employ the "high-low" strategy. With a sorted list it will pick the middle item and test if the target item is above or below (high or low) this in the sorted list. If it is above then it jumps to the next mid point between the tested position and the end of the list. If it was below then it takes the mid-point in the opposite direction. This strategy is repeated until the target is found.

A normal search algorithm that starts at the beginning of the list until it finds the target will, on average, need to search n/2 array members per search, where 'n' is the size of the array. The high low strategy, on the other hand, requires an average of only log(n). This means that for an array of, say 1,000,000 members, a linear search will require an average of 500,000 comparisons. The binary search only requires 6!!! The worst case scenario for 1,000,000 members is only 20 comparisons! So what is the catch?

The catch is that the array you are searching must be sorted! Our names cache is definitely not sorted. We could sort it using the SortArray function but then we cannot use it to lookup the corresponding data in the date of birth cache because the array indexes will no longer be synchronized in each array. This means that we would have to sort the dates cache 'manually' using some looping algorithm. Another problem is that each time the cache is updated it will need to be resorted.

Here is the solution presented by strategy #5: The first time a lookup is performed is when the cache is empty. We build the cache of names and sort it. Then we lookup each name in the sample to find its position in the sorted cache and store the corresponding date in the same position in the date of birth cache. Naturally we use the BinarySearchArray function to find each sorted name position.

This gives us our initial sorted cache. But what about subsequent searches. We can use the sorted cache but then we need to add new names to it, requiring a re-sort of the cache. I get around this by not storing new names in the sorted cache. All name/date pairs that are added after the initial sorted cache is built are stored separately in a non-sorted cache (as per strategy #4).

This means that there are three different situations that we have to handle when a name list lookup is performed.

  1. The first time a lookup is performed we take all of the names/dates and create a sorted cache. This cache is never changed, but it holds the majority of our total data set.
  2. The second time a lookup is performed we use the sorted cache and perform binary searches. All of the new names & dates are stored in a separate, unsorted, cache. This cache will grow but should remain much smaller than the sorted cache.
  3. All subsequent searches will first look in the sorted cache, using the binary search, then (if needed) look in the unsorted cache. All new names & dates are again added to the unsorted cache.

Here are the property declarations for the cache arrays use by this algorithm...

Code:
Property String[] pNamesCache
 Property Date[] pDatesCache

 Property String[] pSortedNamesCache
 Property Date[] pSortedDatesCache
And the function that tests this strategy....

Code:
Function LookupNamesTest5 String[] sNames Integer ByRef iTime Returns tNameData[]
    Integer iStart iStop
    Integer iName icName
    Integer icSortedCache icUnsortedCache iTopOfUnsortedCache iFound iVoid
    Integer i iCount
    Boolean bChanged
    String[] NamesCacheOld NamesCacheNew SortedNamesCache
    Date[] DatesCacheOld DatesCacheNew SortedDatesCache
    tNameData[] NameData
    
    Move (TimeBeginPeriod(1)) to iVoid
    Move (GetTickCount()) to iStart
    
    // Get the cached data....
    Move False to bChanged
    Get pNamesCache to NamesCacheOld
    Get pDatesCache to DatesCacheOld
    Move NamesCacheOld to NamesCacheNew           // make a copy of the cache
    Move DatesCacheOld to DatesCacheNew
    Move (SizeOfArray(NamesCacheOld)) to icUnsortedCache
    Move icUnsortedCache to iTopOfUnsortedCache

    Get pSortedNamesCache to SortedNamesCache
    Get pSortedDatesCache to SortedDatesCache
    Move (SizeOfArray(SortedNamesCache)) to icSortedCache
    
    // Iterate the set of names....
    Move (SizeOfArray(sNames)) to icName
    Decrement icName
    
    For iName from 0 to icName
        Move sNames[iName] to NameData[iName].sName
        
        // First search the sorted cache....
        Move -1 to iFound

        // don't try to search the cache if it is empty. This speeds up the first search.        
        If (icSortedCache > 0) Begin
            Move (BinarySearchArray(sNames[iName], SortedNamesCache)) to iFound
        End
        
        // If found, then use the cached value....
        If (iFound >= 0) Begin
            Move SortedDatesCache[iFound] to NameData[iName].dDateOfBirth
        End
        Else Begin
            // if we didn't find anything then search the unsorted cache....
            If (icUnsortedCache > 0) Begin
                Move (SearchArray(sNames[iName], NamesCacheOld)) to iFound
            End

            // If found, then use the cached value....
            If (iFound >= 0) Begin
                Move DatesCacheOld[iFound] to NameData[iName].dDateOfBirth
            End
        End
        
        // If not found, then we have to look it up and add the value to the cache....
        If (iFound < 0) Begin
            Get GetDateOfBirth sNames[iName] to NameData[iName].dDateOfBirth
            
            Move NameData[iName].sName        to NamesCacheNew[iTopOfUnsortedCache]
            Move NameData[iName].dDateOfBirth to DatesCacheNew[iTopOfUnsortedCache]
            Increment iTopOfUnsortedCache
            Move True to bChanged
        End
    Loop
    
    // If the cached has been updated then we need to save the changes....
    If (bChanged) Begin
        // If the sorted cache is empty then we will initialize it with the data we have now....
        If (icSortedCache = 0) Begin
            // First sort the names....
            Move (SortArray(NamesCacheNew)) to SortedNamesCache
            
            // Now lookup each name in the Sorted list to create the corresponding sorted dates....
            Move (iTopOfUnsortedCache - 1) to iCount
            
            For i from 0 to iCount
                // get the index into the sorted names array
                Move (BinarySearchArray(NamesCacheNew[i], SortedNamesCache)) to iFound
                // Save the corresponding date in the sorted position
                Move (DatesCacheNew[i]) to SortedDatesCache[iFound]
            Loop
            
            // Save the sorted cache data....
            Set pSortedNamesCache to SortedNamesCache
            Set pSortedDatesCache to SortedDatesCache
        End
        Else Begin
            // Otherwise we just update the unsorted cache....
            Set pNamesCache to NamesCacheNew
            Set pDatesCache to DatesCacheNew
        End
    End
    
    Move (GetTickCount()) to iStop
    Move (iStop - iStart) to iTime
    Move (TimeEndPeriod(1)) to iVoid
    Function_Return NameData
End_Function  // LookupNamesTest5
So how does this compare to the other strategies?

1,000 names
1st run: 5.07 sec
2nd - 6th runs: 515, 530, 515, 546, 530 ms

10,000 names
1st run: 50.107 sec
2nd - 6th runs: 624, 562, 656, 561, 608, 577 ms

50,000 names
1st run: 250.772 sec
2nd - 6th runs: 765, 811, 827, 812, 795 ms

These figures speak for themselves. For cached lookups we remain close to the theoretical minimum of 500ms regardless of the size of the names list that we are testing. What is surprising, but in hindsight shouldn't be, is that the time taken to set up the sorted cache in the first run is nearly identical to our baseline! Even in the case where we have to create and sort two arrays of 50,000 members! The global SortArray function is obviously very fast, this is good to know. We use a FOR loop for each name to create the sorted Dates cache, but inside the FOR loop we are using BinarySearchArray and we now know how fast that is.

If the unsorted cache was likely to grow significantly large then we would need to modify the algorithm to merge the sorted and unsorted caches whenever the unsorted cache reaches a certain size.

Conclusions

Strategy #3, which used a VDF compare function in calls to the global SearchArray function, was quite disapointing. This search technique turned out to be significantly worse than searching by just looping through the cached data array.

Strategy #4, which implicitly used the runtime's string compare in calls to the global SearchArray function, met all of our goals for processing lists of 1,000 names. Beyond 10,000 names, the technique did not provide enough performance.

Strategy #5 was the most complicated but it worked perfectly at all scales up to our limit of 50,000. In theory it should contine to perform well for much larger lists. The algorithim also has a lot of scope to be adapted to different data conditions; for example if the unsorted cache grows too large then it can be merged with the sorted cache to maintain good performance.

An interesting variation on Strategy #5 would be to store the sorted cache in a single tNameData array. You would need to create a VDF function to perform the searching and sorting. The following is an example of the compare function you would need...

Code:
Function CompareNameData tNameData SearchName tNameData  ReferenceName Returns Integer
    If (SearchName.sName >  ReferenceName.sName)  Function_Return (GT)
    Else If (SearchName.sName <  ReferenceName.sName)  Function_Return (LT)
    Else Function_Return (EQ)
End_Function  // CompareNameData
This function would be called by SortArray and BinarySearchArray. The searching should still have very good performance here because the number of times that the CompareNameData function is ever called would be small (less than 10 times).

This article has explained different algorithims for attaining good performance. This can be thought of as the 'fish for a day'. Perhaps more importantly, it also explores profiling important operations, setting a benchmark and a progressive, step-wise approach to identify and address performance problems.

For more information about structs, arrays, searching and sorting take a look at the Arrays & Structs in-depth blog articles by Sonny Falk.

The Cached Lookup Tester Workspace

Attached to this blog article is the full workspace I created for testing the five strategies. All of the code presented in these articles can be found there.

Sidebar: Frozen Bananas and Sleep Function Variability

In part one of this blog I described how the operation to fetch a date of birth was simulated by a function called GetDateOfBirth. In this function I call the Win32 Sleep function to simulate a delay of X milliseconds.

Code:
Function GetDateOfBirth String sName Returns Date
    Integer iTestLatency iVoid iDay iMonth iYear
        
    // simulate waiting for the specified time....
    Get Value of oLatency_frm to iTestLatency
    Move (Sleep(iTestLatency)) to iVoid
    Move (Random(11)) to iDay
    Move (Random(11)) to iMonth
    Move (Random(9)) to iYear
    Increment iDay
    Increment iMonth
    Move (iYear + 2000) to iYear
        
    Function_Return (Date(String(iDay) + "/" + String(iMonth) + "/" + String(iYear)))
End_Function  // GetDateOfBirth
During the tests I set the latency period for each call to GetDateOfBirth to 5 milliseconds. This meant that the baseline algorithm took around 5 seconds to perform 1,000 searches, 50 seconds to perform 10,000 searches and so on. A funny thing happened while I was half way through writing up part one: suddenly, when I ran the test program, the baseline was returning 15 seconds for 1,000 searches and 150 seconds for 10,000.

I ran through the usual steps to try to get to the bottom of this discrepancy. I debugged the code to make sure that the correct latency of 5 milliseconds was being passed to the function; I rolled back any code I had written since the last time the test ran correctly; I shut down and restarted my laptop, it still came out wrong.

Then I noticed that my laptop was running pretty hot and the cooling fan was spinning full speed. I wondered whether Hewlett Packard had designed the laptop to slow down the processor speed when things start getting too hot. I lifted my laptop and the desk underneath was pretty darn hot. Then I had a brilliant idea: I elevated the laptop with some thick books, got a frozen banana out of the refrigerator and placed it underneath where the cooling fan sucked in fresh air. Pretty soon the laptop had cooled down and the fan had stopped spinning. I was pretty sure that the frozen banana would fix the problem with my test program, it was the ONLY explanation darn it!

Alas the banana did not fix my problem. Strangely though, after a couple of hours, my test application went back to posting 5 second results for 1,000 searches and I continued writing, but that was not the end of the story. The next day, while I was writing part two, at about the same time in the evening, my test application suddenly started giving the 15 second results per 1,000 searches again. Perhaps two bananas would do the trick!! No, there had to be another explanation.

I started playing around with the latency time being used by the GetDateOfBirth function. I found that I could set the latency to anything from 1 to 15 milliseconds and I would get the same result for 1,000 loops: 15 seconds. When I set the latency to 0 then 1,000 loops would drop to 10 milliseconds. If I put it up to 16 then, instead of getting a time of 16 seconds I got 30 seconds for 1,000 loops! Somehow, at this particular time of day, the Win32 API Sleep function had a minimum granularity of 15 milliseconds! This was ruining my tests.

After a bit of Googling I discovered that Windows has a built-in global timer resolution and this affects the system performance and all kinds of windows functionality such as time-out intervals and CPU power management power saving modes. My frozen banana solution wasn't too far off the mark! For some reason, during certain periods of the day, my system's global timer resolution decreases to 15 milliseconds!

Happily there is a solution to this. The Windows API provides another function, timeBeginPeriod, which you can call to override the global timer resolution. Calling timeBeginPeriod(1) sets the resolution to 1 millisecond. I call this function at the beginning of each test run, then call timeEndPeriod at the end of the run to reset the global timer resolution (that's important). You can see all of this in action in the TestApplication workspace which is attached to this blog article. The foibles of software development...
Attached Thumbnails Attached Files

Updated 26-Oct-2009 at 11:44 AM by John van Houten

Categories
Uncategorized

Comments