View RSS Feed

Development Team Blog

Arrays & Structs in-depth Part II

Rating: 8 votes, 4.50 average.
In Part I we were just getting warmed up and started looking at the basics of arrays and struct types. Now things are about to get more complicated as we dig in deeper.

Sorting Arrays
If you have Integer[] myArray for example, then sorting is very simple and straightforward. You simply do Move (SortArray(myArray)) to myArray, and you're done. The runtime takes care of all the magic to sort the array elements. But if you have an array of struct type, then it gets a little more complicated as the runtime would need some help to determine the desired sort order of two elements. You provide that with a custom comparison function. Consider the following:

Code:
Struct tCustomerInfo
    String sFirstName
    String sLastName
End_Struct
...
tCustomerInfo[] customers
...
If you want to sort customers on first name, you'd write something like this:

Code:
Function CompareCustomerFirstName tCustomerInfo lhs tCustomerInfo rhs
    If (lhs.sFirstName < rhs.sFirstName) Function_Return (LT)
    If (lhs.sFirstName > rhs.sFirstName) Function_Return (GT)
    Function_Return (EQ)
End_Function

Procedure Foo
    tCustomerInfo[] customers
    ...
    Move (SortArray(customers,self,get_CompareCustomerFirstName)) to customers
    ...
End_Procedure
If you want to sort on last name you'd obviously compare sLastName instead. If you want to sort on last name followed by first name, you'd write something like this:
Code:
Function CompareCustomerFirstName tCustomerInfo lhs tCustomerInfo rhs
    If (lhs.sLastName < rhs.sLastName) Function_Return (LT)
    If (lhs.sLastName > rhs.sLastName) Function_Return (GT)
    If (lhs.sFirstName < rhs.sFirstName) Function_Return (LT)
    If (lhs.sFirstName > rhs.sFirstName) Function_Return (GT)
    Function_Return (EQ)
End_Function
SearchArray Function
Searching for an element in an array is just as easy as sorting if you have an Integer[] for example. You simply do Move (SearchArray(1234,myArray)) to iFoundIndex. Likewise, if you have an array of struct type, you must provide a comparison function to help the runtime compare two struct elements. You can use the same comparison function as in the example with sorting an array above. You can also use a slightly simpler function where you're returning only EQ or NE, like this:

Code:
Function CompareCustomerFirstName tCustomerInfo lhs tCustomerInfo rhs
    If (lhs.sFirstName = rhs.sFirstName) Function_Return (EQ)
    Function_Return (NE)
End_Function
Often it's easier to just reuse an existing comparison function that returns one of the three possible values LT, GT, EQ. But sometimes you're looking for something specific that may not be the same as the full sort order, such as if you're looking for the first match of sFirstName while the sort order is last name followed by first name. Or if you're looking for a match in either first name or last name. It's easy to see how you can implement such a comparison function, and that you can have many different comparison functions(even with the same struct type) and pick and choose which one you want to use at the moment.

BinarySearchArray Function
You can also search for an element using BinarySeachArray(), which can be more effective than SearchArray() if the array has a lot of elements. However, BinarySearchArray() is not as flexible. First, in order to use BinarySearchArray(), the array must already be sorted. Second, you must also use the same comparison function that was used to sort the array, and it must return one of the three possible values LT, GT or EQ.

In most cases it's preferred to just use SearchArray() as it's much more flexible and allows searching without first sorting. The advantage of BinarySearchArray() with regards to performance doesn't really come into play unless you have a very large array (think thousands or tens of thousands of elements at least), or you perform many searches in performance critical code without adding/changing elements.

When to use Struct vs. Array
Deciding whether to use a struct or array is obviously a design decision and it depends on the circumstances. Technically in many cases you can use either or, but very often it can lead to very different complexities. As a DataFlex developer it's often very easy to fall into the trap of picking the more familiar approach of using an array, even when a struct would perhaps be a better choice. Afterall, arrays have been around much longer in VDF. In that case it might be worthwhile to step back and reflect on whether a struct might actually be a better choice.

In general the recommendation would be to use a struct first and foremost, and only when necessary use an array. Of course, sometimes you don't really have a choice, such as when interfacing with an external control or dll, and you just have to go along with the design at hand. But when you do have a choice:
  • Consider whether what you're modeling is many of the same values(array is preferred), or whether it's many different values(struct is preferred). For example, a set of lottery numbers are many-of-the-same, whereas first name, last name and balance are many-different-values. If referring to the different values by index comes naturally, then array is a good choice. But if referring to the different values by name comes more natural, then you should really consider a struct.
  • Consider whether the multiple values are naturally of the same type(array is ok) or different type(struct preferred).
  • And finally consider whether the number of values are reasonably fixed for the specific problem(struct is ok), or if the number of values will vary all the time(array is required).

If you find that you need a multi-dimensional array, you should reconsider whether a struct is a better choice. Multi-dimensional arrays are more difficult to work with conceptually. It takes a lot more thinking to model Move foo[index1][index2] to xyz than using struct member names with dot notation.

If you need sorting capabilities, then multi-dimensional arrays are definitely not a good choice, and you should consider a single-dimensional array of struct type instead.

In Part III we'll look at multi-dimensional arrays and why it's often better to replace it with a single dimensional array of struct type instead.

Comments

  1. chuckatkinson's Avatar
    I'm really enjoying these blog entries. Great work !

    I have some confusion with the lastname & firstname sort comparison function. How does this work? It seems to me that a function_return will 'break out' of the function and not execute the code below it. So how does it do the firstname compares?
    Code:
    Function CompareCustomerFirstName tCustomerInfo lhs tCustomerInfo rhs
        If (lhs.sLastName < rhs.sLastName) Function_Return (LT)
        If (lhs.sLastName > rhs.sLastName) Function_Return (GT)
        If (lhs.sFirstName < rhs.sFirstName) Function_Return (LT)
        If (lhs.sFirstName > rhs.sFirstName) Function_Return (GT)
        Function_Return (EQ)
    End_Function
  2. Focus's Avatar
    I was only looking at this today and thought the same .... look again .... the first two if's will of course be false if both last names are the same .... ie you have already sorted them ...
  3. Focus's Avatar
    Interesting series Sonny

    Structs and Arrays are very good and make code more readable and organised.

    As a side note if you access the contents of a struct or an array more than once in a procedure or function then it is quicker to copy it to a local variable first

    It is only a tiny amount of time but when you do the same thing lots of times eg when sorting .... it all adds up.

    This little example shows some free juice when sorting. The quicker version on my PC is around 30% faster

    Code:
    Struct tTestStruct
        Integer iVal
    End_Struct
     
    Function CompareFunc tTestStruct stTestStruct1 tTestStruct stTestStruct2 Returns Integer
        If (stTestStruct1.iVal > stTestStruct2.iVal) Function_Return (GT)
        If (stTestStruct1.iVal < stTestStruct2.iVal) Function_Return (LT)
        Function_Return (EQ)
    End_Function
     
    Function CompareQuickerFunc tTestStruct stTestStruct1 tTestStruct stTestStruct2 Returns Integer
        Integer iVal1 iVal2
     
        Move stTestStruct1.iVal to iVal1
        Move stTestStruct2.iVal to iVal2
        If (iVal1 > iVal2) Function_Return (GT)
        If (iVal1 < iVal2) Function_Return (LT)
        Function_Return (EQ)
    End_Function
     
    Procedure TestStructSort
        tTestStruct[] stTestStruct stTestStructSorted
        Integer iLoop iLoop2 
        Number nTime1 nTime2 
        For iLoop from 0 to 10000
            Move (Random(1000000)) to stTestStruct[iLoop].iVal
        Loop
        For iLoop2 from 1 to 5
            Send vsDoStartTickStamp
            Move (SortArray(stTestStruct, Self, get_CompareFunc)) to stTestStructSorted
            Send vsDoEndTickStamp
            Get vsTicksElapsed to nTime1
     
            Send vsDoStartTickStamp
            Move (SortArray(stTestStruct, Self, get_CompareQuickerFunc)) to stTestStructSorted
            Send vsDoEndTickStamp
            Get vsTicksElapsed to nTime2
            Showln nTime1
            Showln nTime2
            Showln ((nTime2/nTime1)*100.0)
            Showln ""
     
        Loop
     
    End_Procedure
  4. Clive Richmond's Avatar
    Hi Chuck,

    Eventually the last name of both the lhs and the rhs match, or become (EQ), and therefore moves onto to comparing the first names.

    I must admit I found this a little tricky to begin with but it does, eventually, all make sense

    Clive