Structs, Arrays and Optimization Strategies (Part 2 of 2)
by
, 16-Oct-2009 at 05:50 AM (4111 Views)
In Part One I described the problem of performing lookups for each member of an array and how to use a cache when performing lookups repeatedly with multiple arrays sharing mostly the same members.
We left part one after describing a caching technique that required further optimization. In this article we will explore different array searching techniques, using the built-in VDF Search Array functions, to optimize our cache algorithm.
Third Strategy: Using a VDF SearchArray Function
We will abandon the FOR loop, used in Strategy #2, to search a name in the tNameData cache, as this caused an exponential increase in time with an increase in the number of names. Instead we will use the VDF SearchArray function.
For simple data types, such as string, integer, number, date, the SearchArray function uses fast comparison algorithms that are built into the runtime. However, our name data cache is defined as an array of struct type tNameData.
This means that we need to define a special comparison function in VDF code and use that when calling the SearchArray function.Code:Property tNameData[] pNameDataCache
And below is the code to test the new name lookup algorithm...Code:Function CompareNameData tNameData SearchName tNameData ReferenceName Returns Integer If (SearchName.sName = ReferenceName.sName) Function_Return (EQ) Else Function_Return (NE) End_Function // CompareNameData
This function is substantially the same as in Strategy #2 except that the FOR loop is replaced by a single call to the SearchArray function...Code:Function LookupNamesTest3 String[] sNames Integer ByRef iTime Returns tNameData[] Integer iStart iStop Integer iName icName Integer icCache iTopOfCache iFound Boolean bChanged tNameData SearchName tNameData[] NameData NameCacheOld NameCacheNew Move (GetTickCount()) to iStart // Get the cached data.... Move False to bChanged Get pNameDataCache to NameCacheOld Move NameCacheOld to NameCacheNew // make a copy of the cache Move (SizeOfArray(NameCacheOld)) to icCache Move icCache to iTopOfCache // Iterate the set of names.... Move (SizeOfArray(sNames)) to icName Decrement icName For iName from 0 to icName Move -1 to iFound If (icCache > 0) Begin // don't try to search the cache if it is empty. This speeds up the first search. Move sNames[iName] to SearchName.sName Move (SearchArray(SearchName, NameCacheOld, Self, get_CompareNameData)) to iFound End Move sNames[iName] to NameData[iName].sName If (iFound >= 0) Begin // If found, then use the cached value.... Move NameCacheOld[iFound].dDateOfBirth to NameData[iName].dDateOfBirth End Else Begin // Otherwise we have to look it up and add the value to the cache.... Get GetDateOfBirth sNames[iName] to NameData[iName].dDateOfBirth Move NameData[iName].sName to NameCacheNew[iTopOfCache].sName Move NameData[iName].dDateOfBirth to NameCacheNew[iTopOfCache].dDateOfBirth Increment iTopOfCache Move True to bChanged End Loop // If the cached has been updated then we need to save the changes.... If (bChanged) Begin Set pNameDataCache to NameCacheNew End Move (GetTickCount()) to iStop Move (iStop - iStart) to iTime Function_Return NameData End_Function // LookupNamesTest3
You can see here that we need to pass the function identifier of our VDF function CompareNameData. This function will be called for each member of the Name Data Cache that we need to search.Code:Move (SearchArray(SearchName, NameCacheOld, Self, get_CompareNameData)) to iFound
So how does the SearchArray function compare to the FOR loop in strategy #2?
1,000 names
1st run: 5.023 sec
2nd run: 1.373 sec
3rd - 6th runs: 1.778, 1.997, 2.153, 2.247 sec.
This strategy suffers the same linear time increase with cache size as strategy #2, except that it is slightly worse. What about scalability?
10,000 names
1st run: 50.28 sec
2nd run: 90.746 sec
3rd run: 93.912 sec
I didn't even try testing 50,000 names. This strategy clearly suffers the same exponential growth in time as the number of names increases as strategy #2, except once again, it is worse. We need another approach.
Fourth Strategy: Using the RuntimeArray Search Function
In this strategy we will reorganize the cached data so that, instead of storing the names and matching dates of birth in an array of tNameData structs, we will use two separate arrays. A string array to store the cached names, and a date array to store the cached dates of birth.
When we store a name/date pair in the cache we will carefully ensure that the array indexes of the name and corresponding date of birth match. This means that we can search the pNamesCache using SearchArray with the fast compare function built into the runtime. Once we have the array index of the cached name we can use that to get the cached date of birth. This is slightly more convoluted but it should perform better. Below is the test algorithim...Code:Property String[] pNamesCache Property Date[] pDatesCache
So how does this one perform against the other strategies?Code:Function LookupNamesTest4 String[] sNames Integer ByRef iTime Returns tNameData[] Integer iStart iStop Integer iName icName Integer icCache iTopOfCache iFound iVoid Boolean bChanged String[] NamesCacheOld NamesCacheNew Date[] DatesCacheOld DatesCacheNew tNameData[] NameData Move (TimeBeginPeriod(1)) to iVoid Move (GetTickCount()) to iStart // Get the cached data.... Move False to bChanged Get pNamesCache to NamesCacheOld Get pDatesCache to DatesCacheOld Move NamesCacheOld to NamesCacheNew // make a copy of the cache Move DatesCacheOld to DatesCacheNew Move (SizeOfArray(NamesCacheOld)) to icCache Move icCache to iTopOfCache // Iterate the set of names.... Move (SizeOfArray(sNames)) to icName Decrement icName For iName from 0 to icName Move -1 to iFound If (icCache > 0) Begin // don't try to search the cache if it is empty. This speeds up the first search. Move (SearchArray(sNames[iName], NamesCacheOld)) to iFound End Move sNames[iName] to NameData[iName].sName If (iFound >= 0) Begin // If found, then use the cached value.... Move DatesCacheOld[iFound] to NameData[iName].dDateOfBirth End Else Begin // Otherwise we have to look it up and add the value to the cache.... Get GetDateOfBirth sNames[iName] to NameData[iName].dDateOfBirth Move NameData[iName].sName to NamesCacheNew[iTopOfCache] Move NameData[iName].dDateOfBirth to DatesCacheNew[iTopOfCache] Increment iTopOfCache Move True to bChanged End Loop // If the cached has been updated then we need to save the changes.... If (bChanged) Begin Set pNamesCache to NamesCacheNew Set pDatesCache to DatesCacheNew End Move (GetTickCount()) to iStop Move (iStop - iStart) to iTime Move (TimeEndPeriod(1)) to iVoid Function_Return NameData End_Function // LookupNamesTest4
1,000 names
1st run: 5.085 sec
2nd run: 561 milliseconds
3rd - 6th runs: 655, 624, 593, 577, 624 milliseconds
The first run shows us that the extra complexity in managing the cached data has a minimal effect. The second run took 561 milliseconds! This is pretty close to our theoretical minimum of 500 milliseconds, a good result!
The 3rd - 6th tests show a fluctuation between 577 and 655 milliseconds. If there is a linear increase with the cache size, then it is not evident in this test. What about scalability?
10,000 names
1st run: 50.326 sec
2nd - 6th runs: 4.196, 4.243, 4.337, 4.332, 4.415 sec
50,000 names
1st run: 251.068 sec
2nd & 3rd runs: 91.385, 94.662 sec
In both cases the first run confirms that the new caching technique has no negative effect on the time.
With 10,000 names, the cached searches show a huge improvement. From 72 seconds down to 4 seconds. This is at the borderline our defined annoyance level. Also, at this test size we do begin to see a linear increase as the cache increases. However it is only a few hundred milliseconds each time. This is probably OK for our real-world application as we would expect the cache to grow by smaller and smaller amounts each time it is used.
With 50,000 names the results are much better for cached searches: 35 minutes down to 91 seconds. However 91 seconds is outside our acceptable range. If our expected name lists are going to be around 50,000 names then we need to find more speed.
Fifth Strategy: Using BinarySearchArray
The global BinarySearchArray function is supposed to perform significantly faster searches on large arrays. The way this function works is to employ the "high-low" strategy. With a sorted list it will pick the middle item and test if the target item is above or below (high or low) this in the sorted list. If it is above then it jumps to the next mid point between the tested position and the end of the list. If it was below then it takes the mid-point in the opposite direction. This strategy is repeated until the target is found.
A normal search algorithm that starts at the beginning of the list until it finds the target will, on average, need to search n/2 array members per search, where 'n' is the size of the array. The high low strategy, on the other hand, requires an average of only log(n). This means that for an array of, say 1,000,000 members, a linear search will require an average of 500,000 comparisons. The binary search only requires 6!!! The worst case scenario for 1,000,000 members is only 20 comparisons! So what is the catch?
The catch is that the array you are searching must be sorted! Our names cache is definitely not sorted. We could sort it using the SortArray function but then we cannot use it to lookup the corresponding data in the date of birth cache because the array indexes will no longer be synchronized in each array. This means that we would have to sort the dates cache 'manually' using some looping algorithm. Another problem is that each time the cache is updated it will need to be resorted.
Here is the solution presented by strategy #5: The first time a lookup is performed is when the cache is empty. We build the cache of names and sort it. Then we lookup each name in the sample to find its position in the sorted cache and store the corresponding date in the same position in the date of birth cache. Naturally we use the BinarySearchArray function to find each sorted name position.
This gives us our initial sorted cache. But what about subsequent searches. We can use the sorted cache but then we need to add new names to it, requiring a re-sort of the cache. I get around this by not storing new names in the sorted cache. All name/date pairs that are added after the initial sorted cache is built are stored separately in a non-sorted cache (as per strategy #4).
This means that there are three different situations that we have to handle when a name list lookup is performed.
- The first time a lookup is performed we take all of the names/dates and create a sorted cache. This cache is never changed, but it holds the majority of our total data set.
- The second time a lookup is performed we use the sorted cache and perform binary searches. All of the new names & dates are stored in a separate, unsorted, cache. This cache will grow but should remain much smaller than the sorted cache.
- All subsequent searches will first look in the sorted cache, using the binary search, then (if needed) look in the unsorted cache. All new names & dates are again added to the unsorted cache.
Here are the property declarations for the cache arrays use by this algorithm...
And the function that tests this strategy....Code:Property String[] pNamesCache Property Date[] pDatesCache Property String[] pSortedNamesCache Property Date[] pSortedDatesCache
So how does this compare to the other strategies?Code:Function LookupNamesTest5 String[] sNames Integer ByRef iTime Returns tNameData[] Integer iStart iStop Integer iName icName Integer icSortedCache icUnsortedCache iTopOfUnsortedCache iFound iVoid Integer i iCount Boolean bChanged String[] NamesCacheOld NamesCacheNew SortedNamesCache Date[] DatesCacheOld DatesCacheNew SortedDatesCache tNameData[] NameData Move (TimeBeginPeriod(1)) to iVoid Move (GetTickCount()) to iStart // Get the cached data.... Move False to bChanged Get pNamesCache to NamesCacheOld Get pDatesCache to DatesCacheOld Move NamesCacheOld to NamesCacheNew // make a copy of the cache Move DatesCacheOld to DatesCacheNew Move (SizeOfArray(NamesCacheOld)) to icUnsortedCache Move icUnsortedCache to iTopOfUnsortedCache Get pSortedNamesCache to SortedNamesCache Get pSortedDatesCache to SortedDatesCache Move (SizeOfArray(SortedNamesCache)) to icSortedCache // Iterate the set of names.... Move (SizeOfArray(sNames)) to icName Decrement icName For iName from 0 to icName Move sNames[iName] to NameData[iName].sName // First search the sorted cache.... Move -1 to iFound // don't try to search the cache if it is empty. This speeds up the first search. If (icSortedCache > 0) Begin Move (BinarySearchArray(sNames[iName], SortedNamesCache)) to iFound End // If found, then use the cached value.... If (iFound >= 0) Begin Move SortedDatesCache[iFound] to NameData[iName].dDateOfBirth End Else Begin // if we didn't find anything then search the unsorted cache.... If (icUnsortedCache > 0) Begin Move (SearchArray(sNames[iName], NamesCacheOld)) to iFound End // If found, then use the cached value.... If (iFound >= 0) Begin Move DatesCacheOld[iFound] to NameData[iName].dDateOfBirth End End // If not found, then we have to look it up and add the value to the cache.... If (iFound < 0) Begin Get GetDateOfBirth sNames[iName] to NameData[iName].dDateOfBirth Move NameData[iName].sName to NamesCacheNew[iTopOfUnsortedCache] Move NameData[iName].dDateOfBirth to DatesCacheNew[iTopOfUnsortedCache] Increment iTopOfUnsortedCache Move True to bChanged End Loop // If the cached has been updated then we need to save the changes.... If (bChanged) Begin // If the sorted cache is empty then we will initialize it with the data we have now.... If (icSortedCache = 0) Begin // First sort the names.... Move (SortArray(NamesCacheNew)) to SortedNamesCache // Now lookup each name in the Sorted list to create the corresponding sorted dates.... Move (iTopOfUnsortedCache - 1) to iCount For i from 0 to iCount // get the index into the sorted names array Move (BinarySearchArray(NamesCacheNew[i], SortedNamesCache)) to iFound // Save the corresponding date in the sorted position Move (DatesCacheNew[i]) to SortedDatesCache[iFound] Loop // Save the sorted cache data.... Set pSortedNamesCache to SortedNamesCache Set pSortedDatesCache to SortedDatesCache End Else Begin // Otherwise we just update the unsorted cache.... Set pNamesCache to NamesCacheNew Set pDatesCache to DatesCacheNew End End Move (GetTickCount()) to iStop Move (iStop - iStart) to iTime Move (TimeEndPeriod(1)) to iVoid Function_Return NameData End_Function // LookupNamesTest5
1,000 names
1st run: 5.07 sec
2nd - 6th runs: 515, 530, 515, 546, 530 ms
10,000 names
1st run: 50.107 sec
2nd - 6th runs: 624, 562, 656, 561, 608, 577 ms
50,000 names
1st run: 250.772 sec
2nd - 6th runs: 765, 811, 827, 812, 795 ms
These figures speak for themselves. For cached lookups we remain close to the theoretical minimum of 500ms regardless of the size of the names list that we are testing. What is surprising, but in hindsight shouldn't be, is that the time taken to set up the sorted cache in the first run is nearly identical to our baseline! Even in the case where we have to create and sort two arrays of 50,000 members! The global SortArray function is obviously very fast, this is good to know. We use a FOR loop for each name to create the sorted Dates cache, but inside the FOR loop we are using BinarySearchArray and we now know how fast that is.
If the unsorted cache was likely to grow significantly large then we would need to modify the algorithm to merge the sorted and unsorted caches whenever the unsorted cache reaches a certain size.
Conclusions
Strategy #3, which used a VDF compare function in calls to the global SearchArray function, was quite disapointing. This search technique turned out to be significantly worse than searching by just looping through the cached data array.
Strategy #4, which implicitly used the runtime's string compare in calls to the global SearchArray function, met all of our goals for processing lists of 1,000 names. Beyond 10,000 names, the technique did not provide enough performance.
Strategy #5 was the most complicated but it worked perfectly at all scales up to our limit of 50,000. In theory it should contine to perform well for much larger lists. The algorithim also has a lot of scope to be adapted to different data conditions; for example if the unsorted cache grows too large then it can be merged with the sorted cache to maintain good performance.
An interesting variation on Strategy #5 would be to store the sorted cache in a single tNameData array. You would need to create a VDF function to perform the searching and sorting. The following is an example of the compare function you would need...
This function would be called by SortArray and BinarySearchArray. The searching should still have very good performance here because the number of times that the CompareNameData function is ever called would be small (less than 10 times).Code:Function CompareNameData tNameData SearchName tNameData ReferenceName Returns Integer If (SearchName.sName > ReferenceName.sName) Function_Return (GT) Else If (SearchName.sName < ReferenceName.sName) Function_Return (LT) Else Function_Return (EQ) End_Function // CompareNameData
This article has explained different algorithims for attaining good performance. This can be thought of as the 'fish for a day'. Perhaps more importantly, it also explores profiling important operations, setting a benchmark and a progressive, step-wise approach to identify and address performance problems.
For more information about structs, arrays, searching and sorting take a look at the Arrays & Structs in-depth blog articles by Sonny Falk.
The Cached Lookup Tester Workspace
Attached to this blog article is the full workspace I created for testing the five strategies. All of the code presented in these articles can be found there.
Sidebar: Frozen Bananas and Sleep Function Variability
In part one of this blog I described how the operation to fetch a date of birth was simulated by a function called GetDateOfBirth. In this function I call the Win32 Sleep function to simulate a delay of X milliseconds.
During the tests I set the latency period for each call to GetDateOfBirth to 5 milliseconds. This meant that the baseline algorithm took around 5 seconds to perform 1,000 searches, 50 seconds to perform 10,000 searches and so on. A funny thing happened while I was half way through writing up part one: suddenly, when I ran the test program, the baseline was returning 15 seconds for 1,000 searches and 150 seconds for 10,000.Code:Function GetDateOfBirth String sName Returns Date Integer iTestLatency iVoid iDay iMonth iYear // simulate waiting for the specified time.... Get Value of oLatency_frm to iTestLatency Move (Sleep(iTestLatency)) to iVoid Move (Random(11)) to iDay Move (Random(11)) to iMonth Move (Random(9)) to iYear Increment iDay Increment iMonth Move (iYear + 2000) to iYear Function_Return (Date(String(iDay) + "/" + String(iMonth) + "/" + String(iYear))) End_Function // GetDateOfBirth
I ran through the usual steps to try to get to the bottom of this discrepancy. I debugged the code to make sure that the correct latency of 5 milliseconds was being passed to the function; I rolled back any code I had written since the last time the test ran correctly; I shut down and restarted my laptop, it still came out wrong.
Then I noticed that my laptop was running pretty hot and the cooling fan was spinning full speed. I wondered whether Hewlett Packard had designed the laptop to slow down the processor speed when things start getting too hot. I lifted my laptop and the desk underneath was pretty darn hot. Then I had a brilliant idea: I elevated the laptop with some thick books, got a frozen banana out of the refrigerator and placed it underneath where the cooling fan sucked in fresh air. Pretty soon the laptop had cooled down and the fan had stopped spinning. I was pretty sure that the frozen banana would fix the problem with my test program, it was the ONLY explanation darn it!
Alas the banana did not fix my problem. Strangely though, after a couple of hours, my test application went back to posting 5 second results for 1,000 searches and I continued writing, but that was not the end of the story. The next day, while I was writing part two, at about the same time in the evening, my test application suddenly started giving the 15 second results per 1,000 searches again. Perhaps two bananas would do the trick!! No, there had to be another explanation.
I started playing around with the latency time being used by the GetDateOfBirth function. I found that I could set the latency to anything from 1 to 15 milliseconds and I would get the same result for 1,000 loops: 15 seconds. When I set the latency to 0 then 1,000 loops would drop to 10 milliseconds. If I put it up to 16 then, instead of getting a time of 16 seconds I got 30 seconds for 1,000 loops! Somehow, at this particular time of day, the Win32 API Sleep function had a minimum granularity of 15 milliseconds! This was ruining my tests.
After a bit of Googling I discovered that Windows has a built-in global timer resolution and this affects the system performance and all kinds of windows functionality such as time-out intervals and CPU power management power saving modes. My frozen banana solution wasn't too far off the mark! For some reason, during certain periods of the day, my system's global timer resolution decreases to 15 milliseconds!
Happily there is a solution to this. The Windows API provides another function, timeBeginPeriod, which you can call to override the global timer resolution. Calling timeBeginPeriod(1) sets the resolution to 1 millisecond. I call this function at the beginning of each test run, then call timeEndPeriod at the end of the run to reset the global timer resolution (that's important). You can see all of this in action in the TestApplication workspace which is attached to this blog article. The foibles of software development...