View RSS Feed

Development Team Blog

Arrays & Structs in-depth Part III

Rate this Entry
In Part II of this multi-part series we discussed sorting and searching arrays, and we also mentioned that multi-dimensional arrays aren't really suited for sorting and searching, we'll dig into that a little deeper here and see how a struct type is often a better solution.

Multi-dimensional Arrays
Despite what the documentation may seem to suggest, you really cannot sort multi-dimensional arrays. You cannot search a multi-dimensional array either, at least not in any sensible manner. But the documentation is not wrong, you can sort one dimension of a multi-dimensional array(but only one), and it's rarely what you want so it's better to forget all about it.

As mentioned before, multi-dimensional arrays really should be avoided if possible, struct types are much easier to model and work with. Using multi-dimensional arrays is inherently complex and difficult to understand. Most people can visualize a two dimensional array like a table with rows and columns, and perhaps even a three dimensional array like a cube. But adding a fourth dimension increases complexity enormously, visualizing a hypercube and such is something only few people can do easily. Yet most people have no trouble visualizing complex hierarchical tree structures using nested structs and single-dimensional arrays.

Let's examine this closer with the example of SearchArray(), it takes the value to search for and the array as parameters, and then it returns the index of the element matching the value. Easy enough, simple and straightforward to use with a single dimensional array. The function simply compares each element in the array, from the first element to the last, and returns the index of the matching item. But what about a multi-dimensional array? First, if it's a multi-dimensional array(and especially a jagged array), what is the type of an element in the first dimension anyway? Typically you'd visualize the element in the first dimension of a two-dimensional array as an array itself. But that probably makes no sense in this case, you really wanted to search for an element, didn't you? Here's the first problem, it's not even clear what you would be searching for in the first place.

Taking this further with multi-dimensional arrays, assuming that you want to search for an actual element, how is it supposed to search for the element? In a single-dimensional array it's obvious, you can start with the first element and stop with the last, left to right. But in a multi-dimensional array, how are you supposed to search? Every item, left to right, top to bottom? Well, sometimes you only want to search some items(e.g. a specific column). OK, so how would you specify that then? This is already starting to get so complicated that my head is spinning. How do you even identify the element if you could find it? SearchArray() simply returns the index of the specified element, but in a multi-dimensional array you cannot identify a specific element with a single index, so now it must return an array of indexes instead.

And what about sorting? Surely you don't want to sort all items left to right, top to bottom, do you? You probably want to sort a column, right? No, wait a second, surely you don't want to sort just one column, you probably want to sort the rows according to a column, right? Like sorting all the rows in alphabetical order based on the first column, not exchanging the first column of each row...

Clearly sorting/searching multi-dimensional arrays are much more complicated and difficult to deal with than single-dimensional arrays. Obviously it could be done, but we'd probably have a really hard time documenting it, and nobody would ever be able to understand how to use it. This is why we simply stayed away from that altogether, there's no built-in support for searching/sorting multi-dimensional arrays. There are far too many questions about how that should actually behave. And everybody will have different ideas about how it should work in different situations.

What about that stuff in the documentation explaining how to sort a multi-dimensional array? Remember how multi-dimensional arrays are like arrays of arrays? Well, the last dimension of a multi-dimensional array then is just like a single-dimensional array. So, if you pass the last dimension of a multi-dimensional array as a parameter, it's exactly the same as passing a single-dimensional array. To the function, its implementation, the runtime, or any function whatsoever, there's no difference, it's a single-dimensional array. That's why it works. It doesn't really support multi-dimensional arrays, you're actually passing a single-dimensional array, even though you might not realize it.

Sounds complicated? That's because it is complicated, which is why it's usually better to stay away from all this with multi-dimensional arrays.

Struct Instead of Multi-dimensional Arrays
In most cases you can just step back and quite easily find a slightly different way to organize your data into a struct type instead. Often you find that there's additional data needed anyway, which would make a struct a better choice regardless. Arrays tend to be considered more low-level, and struct types tend to model the data closer to the problem domain. Generally, the closer to the metal you are, and the further from the problem domain abstraction, the more difficult it usually is to visualize the data and how to manipulate it.

But what if you really need the low-level abstraction of multi-dimensional arrays? We can still use a struct to represent the row, and then just a single dimensional array to represent the columns for that row. By doing that you get rid of all the difficult to define behaviors, and you end up with something that's reasonably straightforward and intuitive, yet with the same low-level abstraction as a multi-dimensional array.

Basically, instead of using a multi-dimensional array, such as Integer[][], consider a struct like the following:
Code:
Struct tMyRow
    Integer[] columns
End_Struct
Now this struct represents a row. You can easily create an array of rows, tMyRow[], and voila, you have a table with rows and columns. Notice how you can now use names that makes it much easier to read and understand the code when referencing a particular column on a specified row, myRows[2].columns[3] for example.

OK, so you can use a struct to represent the row, it looks kind of the same as a multi-dimensional array. So now what? Let's see how you can sort all rows by the first column. By using a custom comparison function, you can basically write something like this now:

Code:
Function CompareRowFirstColumn tMyRow lhs tMyRow rhs Returns Integer
    If (lhs.columns[0]<rhs.columns[0]) Function_Return (LT)
    If (lhs.columns[0]>rhs.columns[0]) Function_Return (GT)
    Function_Return (EQ)
End_Function
Easy as pie! Everything just falls into place and becomes so much easier by just reorganizing the data into a struct and single-dimensional arrays.

In Part IV we'll examine array performance in general, as well as the internal copy-on-write performance optimization, and how to use it to your advantage (and why it rarely matters anyway).

Comments