PDA

View Full Version : Windows API wrappers and the state of brokenness



wila
18-Nov-2019, 05:22 AM
Hi,

I'm still working through documentation, read most of it, but skimmed through other parts, so my understanding might still be off a bit, but after some more experimentation it looks like to me that a lot of the old Windows API wrappers have a certain state of brokenness.
It might work, because of the lower characters in UTF-8 looking much like ANSI, but as soon as it has "real" unicode characters in there your code silently corrupts the data.
So the chances you might find it out via your deployed applications when the user copies in some unicode into your app instead of while testing are much bigger...

That is not completely unexpected, however DataFlex also isn't helping us as much as it could.
So we have a String data type which is now by default utf-8, but it is also a bucket that can contain many things.
It might contain utf-8, or ANSI or OEM.
Whatever the encoding is, DataFlex won't tell you, the debugger now shows you only utf-8 strings.. so no help from there either.
So it's like the oem/ansi problem before, except it is now worse as the string data type does not keep track of the current encoding, so it can't show you whatever the encoding type is.

For UCS-2 type of string data we now have the WString data type (why not call it a WideString type, but ok) which does automatic string conversion.
This is a huge help as at least you know the data is not utf-8.

For the native Windows API there is pretty much always a widestring variant.
As soon as you communicate with 3rd party DLLs there might not be a widestring variant.
So we would need to communicate with those via ANSI/OEM in that case.

What I am missing is a AnsiString data type
If you have an Ansistring data type then at least I can see on the data type that it is not holding unicode data.
If it also automatically does a utf8-to-oem and oem-to-utf8 conversion similar to WString then at least I know what the data is.
Better yet, on each implicit conversion, from utf8 to oem, the compiler can show a warning of "possible data loss", so that you know where to pay more attention.
IOW I then have a lot more visual clues in my code thanks to the data type already knowing that I have to be careful.
If OTOH we use "string everywhere", like it is now, then I have to go hunt for explicit utf8tooem / utf8toansi conversions (and viceversa) to make sure the conversions have been made correctly. Mix it up, or do a conversion 2 times and things will be corrupted.

Instead of an AnsiString data type, it might also help if we had an encoding attribute on the string. Although that is probably never happening.

Another -much smaller- pain point is that the debugger no longer shows the character data in UChar arrays, only the decimal value.
I still prefer to see oem (or ansi) characters there for each cell as it will help with non unicode data.

The way it is now, conversion is going to be very painful for any application that uses strings and raw Windows API calls and while it kind of has to be painful I also think it can be improved upon.

Sorry for the long rant, I think I need some coffee now.
--
Wil

Harm Wibier
18-Nov-2019, 07:56 AM
Hi Wil,

Just wanted to clarify (also for others reading this) that with the brokenness of windows API wrappers you mean external API’s using external_function defined in custom packages that haven’t been converted yet, right? For the windows API’s in the DataFlex packages we have provided wrapper functions that do provide the necessary conversions for developers as much as possible.

It is true that the partial compatibility of UTF-8 with ANSI & OEM (when only using ASCII characters they are the same) is both an advantage and a disadvantage. The advantage is that if you don’t mind about non ASCII data things simply work, while the disadvantage is that it is hard to spot issues (you’ll have to test with special characters to find them).

When converting your packages I recommend to take on the mindset that strings in DataFlex are always UTF-8, and you convert them just before calling an external API, to whatever encoding the external API works with. When converting packages that use ANSI API’s you either have to adjust the conversions, as there should already be ToANSI / ToOEM conversions in your source, or you’ll change them to call Unicode API’s. Note that we have added warnings for ToOEM & ToANSI to help you find these conversions.

Also note that the average DataFlex developer is not a tool builder and will not work so close on the Windows API’s. We also tried to stay away from overcomplicating the language by adding multiple ways to work with strings (initially WString was not in plans). The reason why WString is not called WideString is because we deliberately wanted to make it look a little geeky ;).

We’ll have a look at how UChar arrays are presented in the debugger, I did notice that with extended characters it looks a little funky.

Regards,

wila
18-Nov-2019, 08:36 AM
Hi Harm,

Thanks for the reply.
Sorry, yes this is about writing DLL wrappers, not the existing Windows API as DAW has wrapped most of the windows API using the available widestring versions. The parts that DAW did not wrap already most likely also have a WideString variant available, so there's no issue with that.
I wrote this post when waking up... and while I rewrote it a bit I forgot to adjust the subject line.

Obviously it can work without an ansistring type, but things get really muddy once you start calling functions with parameters from your code.
If we have an ansistring type, it is clear what is in the string, if we don't... it can contain anything.



Also note that the average DataFlex developer is not a tool builder and will not work so close on the Windows API’s. We also tried to stay away from overcomplicating the language by adding multiple ways to work with strings (initially WString was not in plans).

Which is why I understand that DAW only considered a WideString variable type. In regards to the windows API everything has simply changed over to widestring. You didn't need an AnsiString type to get this far.
Tool builders - and developers who are trying to understand that code - however do need a more clear separation.

In my view, by not wanting to overcomplicate the language, you are instead making the code required to write more complicated when interfacing with a component that has no unicode support.

--
Wil

Frank Cheng
18-Nov-2019, 08:39 AM
It's a long road for us to DF 20 since we do so much API wrapping. I also write lots of API on a daily basis.

It would be nice to have a bit more documentation on the API front for the "Component writers". I can guess that those undocumented PointerToString/PointerToWString functions do, but it's nice to have a section in the help showing EXACTLY what they do (and showing all the related functions for External_Function/API) I can't even find them in FMAC. Yikes.

I don't even want to try to recompile our apps in DF 20 as we use tons of APIs (which exclusively deal with ANSI). Looking at the wrapper for PathRemoveExtensionW is enough for me to think about how many similar APIs I have in my apps and how long it will take to convert all those.

Frank Cheng

wila
18-Nov-2019, 08:47 AM
Frank,

Yep.

I did recompile The Hammer version 4 in DF20 and was surprised that it compiled just fine after a minor changes.

The custom file open dialog that we use in there no longer worked, well it worked, but files selected returned as gobbledygook.
So OK, the reason we had that was for multi-file support so that was easy, I ripped it out and replaced it with DAWs new file open dialog.

Things instantly became less nice once I noticed that the treeview on the left now lists every method in "Chinese" and that's when I got a bit of an "uh-hoh" feeling and started to understand the implications of what is happening.

--
Wil

Frank Cheng
18-Nov-2019, 08:57 AM
Your app just got translated to Chinese for free, what more do you want? Lol, with tears...

Frank Cheng

Frank Cheng
18-Nov-2019, 10:27 AM
Here is another idea - write a wrapper DLL that wraps all the Windows APIs to take UTF8 strings. It's a lot faster to do it in C++ than to convert our existing VDF API wrappers.

Frank Cheng

wila
18-Nov-2019, 11:18 AM
Frank,


Here is another idea - write a wrapper DLL that wraps all the Windows APIs to take UTF8 strings. It's a lot faster to do it in C++ than to convert our existing VDF API wrappers.

But that only works if you have the source of all the components that you use?

FWIW, the treeview issue in The Hammer 4 with DataFlex 20 has been resolved.
It was a widestring thing as DAW has changed all controls, so if you extend functionality you now must provide any string as WideString.

--
Wil

Frank Cheng
18-Nov-2019, 12:12 PM
Wil,

You don't have to have the source. Take PathRemoveExtension for example - I definitely don't have the source for that function.

However I could do this in VDF


External_Function PathRemoveExtension "PathRemoveExtensionVDF" MyWin32API.dll Pointer pString Returns Integer


In C++, compile with UNICODE symbol



EXTERN_C WINAPI void PathRemoveExtensionVDF(LPSTR lpString)
{
MultiByteToWideChar ...
PathRemoveExtension ...
WideCharToMultiByte ...
}


It still seems like quite a bit of work :p

Frank Cheng

wila
18-Nov-2019, 12:44 PM
Sure you can add another layer of indirection, but I fail to see how that simplifies things.
Certainly when looking at it from a long term perspective.
--
Wil

Frank Cheng
18-Nov-2019, 03:01 PM
The layer of indirection (or abstraction) is that I can make the APIs work with OEM / UTF-8 without touching my VDF source files. I suspect it will take a few months to do a full regression test on my apps if we are to go DF 20. (Hopefully we can use the same branch of VDF source code for both 19.1 and 20)

Frank Cheng

wila
18-Nov-2019, 03:50 PM
Frank,

Of course I cannot make a statement for your source code base as I am completely unfamiliar with it, but for me it seems more logical to try and address this at the DataFlex level. If you think it is too difficult to do so at the DataFlex level then please speak up and tell DAW why and what you think they should do.

As for keeping source in both DF19.1 and DF20. That seems possible.
I'm doing exactly this now with The Hammer 4 and I have not bumped into any problems with that, nor do I see issues with other applications.
For The Hammer 4, the lowest DataFlex version it will compile in is DF19.1 (or else it becomes too much work for my liking)

But do note that I edit using The Hammer 3, not using the DataFlex 20 Studio (no BOM markers in my source if I can prevent it)
--
Wil

Stephen W. Meeley
20-Dec-2019, 10:10 AM
Wil,

We've just published Technology Preview 2 (20.0.2) and the debugger should now deal with UChar arrays better. See the updated "Working with the DataFlex 2020 Technology Preview (https://www.dataaccess.eu/Working-with-the-DataFlex-2020-Technology-Preview.pdf-f736)" document for more information.

DaveR
15-Jan-2020, 06:23 PM
Frank,

Of course I cannot make a statement for your source code base as I am completely unfamiliar with it, but for me it seems more logical to try and address this at the DataFlex level. If you think it is too difficult to do so at the DataFlex level then please speak up and tell DAW why and what you think they should do.

As for keeping source in both DF19.1 and DF20. That seems possible.
I'm doing exactly this now with The Hammer 4 and I have not bumped into any problems with that, nor do I see issues with other applications.
For The Hammer 4, the lowest DataFlex version it will compile in is DF19.1 (or else it becomes too much work for my liking)

But do note that I edit using The Hammer 3, not using the DataFlex 20 Studio (no BOM markers in my source if I can prevent it)
--
Wil
PMJI but this is the only mention of toAnsi.

I use code to write an atom to see if a program is already in use. I haven't found a replacement for 'ToAnsi' that doesn't give me an illegal datatype conversion 4381 error. Very frustrating.


//creates a New and Unique Atom, or returns False if This atom exists
Function CreateUniqueAtom String sAtom Returns Boolean
Integer hAtom
#IF (!@<200)
Move (ToANSI(sAtom)) to sAtom //19.1 working
#ELSE
Move (Utf8ToAnsi(sAtom)) to sAtom //always
#ENDIF
Move (szGlobalFindAtom(sAtom)) to hAtom
If (hAtom) Begin
Function_Return False
End
Move (szGlobalAddAtom(sAtom)) to hAtom
Function_Return True
End_Function

wila
15-Jan-2020, 06:41 PM
Dave,

Why are you using a "Utf8ToAnsi" function, you normally would not use that for unicode.
For the DF2020 code you should have separate widechar external_function declarations for the GlobalFindAtom functions.

Edit:
PS, if needed then I'll be happy to rewrite those functions to unicode & 64 bit compatible versions, but as you can guess I will have to charge the time spent.

--
Wil

DaveR
16-Jan-2020, 09:23 AM
Dave,

Why are you using a "Utf8ToAnsi" function, you normally would not use that for unicode.
For the DF2020 code you should have separate widechar external_function declarations for the GlobalFindAtom functions.

Edit:
PS, if needed then I'll be happy to rewrite those functions to unicode & 64 bit compatible versions, but as you can guess I will have to charge the time spent.

--
Wil
It wasn't a cry for help but a moan. :cool: Just seemed on topic.

Should say it's the first time I've opened the Studio so everything is a per default. My DF20 life so far was 'copy from 19.1 and mass compile' with my batch file

We'll never need Unicode in my lifetime. I'm merely chasing down things that make noise when running under DF20. Amongst that, I'm trying to understand why, when the warning says 'Obsolete use of ToANSI, use Utf8ToAnsi or OemToUtf8 instead', either of those options makes the runtime fart when I do...

DaveR
16-Jan-2020, 09:31 AM
It wasn't a cry for help but a moan. :cool: Just seemed on topic.

Should say it's the first time I've opened the Studio so everything is a per default. My DF20 life so far was 'copy from 19.1 and mass compile' with my batch file

We'll never need Unicode in my lifetime. I'm merely chasing down things that make noise when running under DF20. Amongst that, I'm trying to understand why, when the warning says 'Obsolete use of ToANSI, use Utf8ToAnsi or OemToUtf8 instead', either of those options makes the runtime fart when I do...

and answered my own issue, I missed the 'save source files in OEM' added in preview 2. Shouldn't the default for this be 'on'? or at least, could it be a workspace setting so that workspaces in version control are treated consistently by whichever machine the are handled by?

wila
16-Jan-2020, 09:56 AM
Dave,

I wasn't reading it as a cry for help, but you did complain about why your code did not work.



Use Windows.pkg
Use CharTranslate.pkg

Procedure test
String sTest
Move "boo" to sTest
Move (utf8toansi(sTest)) to sTest
Send Info_Box sTest
End_Procedure

Send test
winput windowindex


compiles and runs fine on DF20 in both 32 bit as well as 64 bit.

We never ever moan here, really honest, pinky promise. :rolleyes:

BTW, you might not need unicode, but it is all in unicode in DF20 anyways.
It is a matter of 'go with the flow' as otherwise you are just creating more work for yourself down the line.
--
Wil

DaveR
16-Jan-2020, 10:02 AM
Dave,

I wasn't reading it as a cry for help, but you did complain about why your code did not work.



Use Windows.pkg
Use CharTranslate.pkg

Procedure test
String sTest
Move "boo" to sTest
Move (utf8toansi(sTest)) to sTest
Send Info_Box sTest
End_Procedure

Send test
winput windowindex


compiles and runs fine on DF20 in both 32 bit as well as 64 bit.

We never ever moan here, really honest, pinky promise. :rolleyes:

BTW, you might not need unicode, but it is all in unicode in DF20 anyways.
It is a matter of 'go with the flow' as otherwise you are just creating more work for yourself down the line.
--
Wil

down the line is a rocking chair... :cool:

DaveR
16-Jan-2020, 11:05 AM
down the line is a rocking chair... :cool:

Weirdness on this continues. I'm going to break this out as another thread.