View RSS Feed

Development Team Blog

What does Argument_Size really do?

Rate this Entry
The basic effect of Set_Argument_Size is that it limits the maximum size of strings that can be manipulated. Mark wonders if it specifies a fixed string variable size. A simplified explanation would be that it limits the upper size of String variables, but String variables do not have a fixed size.

A clue to the low level technical explanation is in the name Argument_Size. It doesn't actually limit the size of String variables, or String properties. It's really a limit of String type command arguments, not to be confused with method parameters.

The low level virtual machine commands in VDF, such as Move, take a number of arguments. When the VM executes the command, the arguments are copied to a storage area, sort of like virtual machine registers. The VDF command execution subsystem always uses these registers as intermediate storage, and at the end, the result is copied from the intermediate storage to the destination variable.

This means that for a Move command for example, string arguments are copied to the intermediate storage, manipulated, and then the result is copied to the destination variable.

So Argument_Size really refers to the internal buffer size of the intermediate storage for string command arguments. This is operating at a very low level of the VDF runtime, and it was designed this way a long, long time ago, back in character mode days.

Most of the time you never even care about Argument_Size, because the default value is good enough. When dealing with very large strings, you can use Set_Argument_Size to increase the buffer. The theoretical upper limit is around 1 GB. But it would just be crazy to set it that high, and not possible in practice. If you find yourself wanting to set it higher than say 10 MB, it's probably better to rethink the design.

While it's possible to work with strings that are tens of megabytes in length, it's often a real performance drag with a whole lot of copying going on, and tons of CPU cache misses. A better and more dynamic solution is to work with data in smaller chunks. If you can split the work into smaller chunks, there will be no upper limit to how much data you can deal with in total. That's a double-win, since it will also typically be faster due to better use of the cache. Of course, you also have to choose a good chunk size. Too many, and too small chunks can also cause a performance problem.

The scenario I hear about the most is some kind of process where you import/export data, often transferring the data over the Internet. In those scenarios, transferring the data is another complicating factor. You run into all sorts of problems, like timeouts, dropped connections and other error recovery issues. In that case, splitting the data into smaller chunks is even better, because it also lets you manage error recovery. If you got 8 out of 10 chunks sent across and processed correctly, you only need to resend those 2 chunks that failed, instead of doing the whole thing all over again. So now it's, faster, no upper limit, easy to do error recovery, and also more reliable transfers, as it's generally more reliable to transfer smaller chunks than really large amounts of data due to various timeout restrictions, network reliability and so on.

If you really need to work with large amounts of data, and all you need to do is pass it around from one place to another, there's a simple solution for that too. You can allocate a memory buffer yourself using Alloc(), pass it around as a pointer, and clean it up using Free() when you're done. Many of the VDF classes already use that technique, like Get paXml, LoadXMLFromAddress and so on. Yet another option is to use native VDF arrays, which will automatically manage memory allocations. These techniques are very useful when you need to pass data around, and don't really need to manipulate it using common String functions.

Back to Argument_Size again, what happens when you exceed the argument string size? In most cases the commands truncate the argument or result. But in some cases it may actually crash. Which is better? The answer of course is neither. Your customer is not going to be very happy regardless of whether data was lost because it was truncated, or if it crashed. In many cases a crash is probably preferable, at least then you know something went wrong. If the program accidentally stores truncated data, you may not even notice until much later when it's too late to recover the original data.

In summary, Argument_Size does not directly control the size of String variables or String properties. It does however place a limit to the size of strings you can manipulate, which often has the same effect. Local String variables and properties are completely dynamic in size[*]. Most of the time you never care about Argument_Size, but if you need to manipulate large strings, you typically use Set_Argument_Size to temporarily change the internal buffer size. If you need to work with data that's tens or hundreds of megabytes, consider splitting it up in smaller chunks, which will be faster, more reliable, and removes any upper limit.

For the quiz of the day: If you have a pointer to a null terminated C string, and you don't know if it's too long, or you need to find out how long it is in order to use Set_Argument_Size, how come you can't use Length() to find out? For extra credit: Which function can you use instead?

[*] Global string variables are actually fixed in size at compile time, but you'd never use global variables, would you?

Comments

  1. Focus's Avatar
    I presume because using Length will cause it to be copied to the internal buffer before testing the length ? You should use CStringLength instead ?
  2. Sonny Falk's Avatar
    That was really quick, you're of course absolutely correct. When you use Length() it first copies the string into the internal buffer, effectively truncating it at argument_size, which means the length reported cannot be more than argument_size.

    And you're also right about CStringLength() which can be used to test the length of a null terminated C string referenced by a pointer. Since it takes a Pointer/Address rather than a String, there's no copying going on, and therefore it's not limited by argument_size.
  3. Ian Smith's Avatar
    I've learnt something today... Thanks Sonny