Results 1 to 10 of 20

Thread: Windows API wrappers and the state of brokenness

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1

    Default Windows API wrappers and the state of brokenness


    I'm still working through documentation, read most of it, but skimmed through other parts, so my understanding might still be off a bit, but after some more experimentation it looks like to me that a lot of the old Windows API wrappers have a certain state of brokenness.
    It might work, because of the lower characters in UTF-8 looking much like ANSI, but as soon as it has "real" unicode characters in there your code silently corrupts the data.
    So the chances you might find it out via your deployed applications when the user copies in some unicode into your app instead of while testing are much bigger...

    That is not completely unexpected, however DataFlex also isn't helping us as much as it could.
    So we have a String data type which is now by default utf-8, but it is also a bucket that can contain many things.
    It might contain utf-8, or ANSI or OEM.
    Whatever the encoding is, DataFlex won't tell you, the debugger now shows you only utf-8 strings.. so no help from there either.
    So it's like the oem/ansi problem before, except it is now worse as the string data type does not keep track of the current encoding, so it can't show you whatever the encoding type is.

    For UCS-2 type of string data we now have the WString data type (why not call it a WideString type, but ok) which does automatic string conversion.
    This is a huge help as at least you know the data is not utf-8.

    For the native Windows API there is pretty much always a widestring variant.
    As soon as you communicate with 3rd party DLLs there might not be a widestring variant.
    So we would need to communicate with those via ANSI/OEM in that case.

    What I am missing is a AnsiString data type
    If you have an Ansistring data type then at least I can see on the data type that it is not holding unicode data.
    If it also automatically does a utf8-to-oem and oem-to-utf8 conversion similar to WString then at least I know what the data is.
    Better yet, on each implicit conversion, from utf8 to oem, the compiler can show a warning of "possible data loss", so that you know where to pay more attention.
    IOW I then have a lot more visual clues in my code thanks to the data type already knowing that I have to be careful.
    If OTOH we use "string everywhere", like it is now, then I have to go hunt for explicit utf8tooem / utf8toansi conversions (and viceversa) to make sure the conversions have been made correctly. Mix it up, or do a conversion 2 times and things will be corrupted.

    Instead of an AnsiString data type, it might also help if we had an encoding attribute on the string. Although that is probably never happening.

    Another -much smaller- pain point is that the debugger no longer shows the character data in UChar arrays, only the decimal value.
    I still prefer to see oem (or ansi) characters there for each cell as it will help with non unicode data.

    The way it is now, conversion is going to be very painful for any application that uses strings and raw Windows API calls and while it kind of has to be painful I also think it can be improved upon.

    Sorry for the long rant, I think I need some coffee now.
    Last edited by wila; 18-Nov-2019 at 05:31 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts