PDA

View Full Version : DF20 UCharArrayToString creates a different length string



wila
17-Aug-2021, 05:06 PM
Hi,

Jonas found the following issue.

Converting a UChar array with multiple zero characters tucked at the end of the array produces a different size string on DF19.1 and earlier than it does on DF20.0

One can argue that it is expected, but this is one of those surprises that can byte you.



Use Windows.pkg

// Discovered by Jonas Ekström
Procedure UCharArrayToString_Test1
UChar[] uaCharacters
String sResult
Integer iExpectedLength

Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("e")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("s")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move (UCharArrayToString(uaCharacters)) to sResult

// DataFlex 18.1 - 19.1 does not return the same result as 20.0
// DFCompatibilityLayer returns the same result as 20.0
Showln "Array has " (SizeOfArray(uaCharacters)) " characters"
Showln "String has " (Length(sResult)) " characters"
End_Procedure

Send UCharArrayToString_Test1

Inkey WindowIndex


Run it on DF19.1 and the output is:


Array has 6 characters
String has 6 characters


Run is on DF20.0 and you'll get:


Array has 6 characters
String has 5 characters


--
Wil

wila
17-Aug-2021, 06:04 PM
Here's another variant..



Use Windows.pkg

Procedure UCharArrayToString_Test1
UChar[] uaCharacters
String sResult

Move 239 to uaCharacters[SizeOfArray(uaCharacters)]
Move 187 to uaCharacters[SizeOfArray(uaCharacters)]
Move 191 to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("e")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("s")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move (UCharArrayToString(uaCharacters)) to sResult

// DataFlex 18.1 - 19.1 does not return the same result as 20.0
Showln "Array has " (SizeOfArray(uaCharacters)) " characters"
Showln "String has " (Length(sResult)) " characters"
End_Procedure

Send UCharArrayToString_Test1

Inkey WindowIndex


Now the output on DF20 is even more different (hint: that's the BOM there at the start)

edit... let me show the output as well (DF20 only)



Array has 9 characters
String has 6 characters


--
Wil

wila
17-Aug-2021, 06:11 PM
Then this one does also not return what you would expect...



Use Windows.pkg

Procedure UCharArrayToString_Test1
UChar[] uaCharacters
String sResult

Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("e")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("s")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("a")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("b")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("c")) to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move (UCharArrayToString(uaCharacters)) to sResult

Showln "Array has " (SizeOfArray(uaCharacters)) " characters"
Showln "String has " (Length(sResult)) " characters"
End_Procedure

Send UCharArrayToString_Test1

Inkey WindowIndex


The output in DF19.1 is easy, but what is it in DF20?



Array has 9 characters
String has 8 characters


Nope, not what I had expected.

Just one of the zero's is ignored, is the #0"a" transposed to a code point?

Note that I'm not calling ANY of the above a bug, just pointing that things are not like you expect them to be when thinking like good old DataFlex and that binary data really should no longer be put in a string.
--
Wil

wila
17-Aug-2021, 06:37 PM
Ah... we can make it more fun by showing the contents of the string too.



Use Windows.pkg

Procedure UCharArrayToString_Test1
UChar[] uaCharacters
String sResult

Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("e")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("s")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("t")) to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("a")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("b")) to uaCharacters[SizeOfArray(uaCharacters)]
Move (Ascii("c")) to uaCharacters[SizeOfArray(uaCharacters)]
Move 0 to uaCharacters[SizeOfArray(uaCharacters)]
Move (UCharArrayToString(uaCharacters)) to sResult

Showln "Array has " (SizeOfArray(uaCharacters)) " characters"
Showln "String has " (Length(sResult)) " characters"
Showln sResult
End_Procedure

Send UCharArrayToString_Test1

Inkey WindowIndex


So same as before except that we now also show the contents of the string..

and the result is... (again DF20)



Array has 9 characters
String has 8 characters
test


Clear as ... :)

FWIW here's the DF19.1 output


Array has 9 characters
String has 9 characters
test


--
Wil

Frank Cheng
17-Aug-2021, 10:49 PM
Hi Wil,

According to the online help - UCharArrayToString doesn't add a zero terminator if the UChar array's last byte is a zero terminator.
https://docs.dataaccess.com/dataflexhelp/#t=mergedProjects%2FLanguageReference%2FUCharArray ToString.htm

So that explains the behavior you saw.

However in DF 19.X and prior, if the UCharArrayToString counted every character regardless, then that would be a compatibility issue.

Side note, this compiles fine


UChar[] u
String s
Move "TEST" to s
Move (StringToUCharArray(s)) to u
Showln (SizeOfArray(u))


However this gives me compile error


String s
Move "TEST" to s
Showln (SizeOfArray(StringToUCharArray(s))) // << ERROR: 54 Invalid symbol in expression Argument does not evaluate to array type


Frank Cheng

Vincent Oorsprong
18-Aug-2021, 01:12 AM
Frank,

The last code example fails to compile in 18.2 and 19.1 as well. I've logged it for improvement research.

wila
18-Aug-2021, 05:09 AM
Frank,

Nice find (both of them).

The reason I posted this was because Jonas his finding about the double zero at the end resulting in different string length's was weird.
Yes, that at the very least is a compatibility issue. Since the behavior is documented, we cannot call that a bug.

Of course with strings now being UTF8 and no longer a "bag of bytes" there will be a lot of compatibility issues if you try to use it to store binary data.
Initially it might seem to work & pass a few tests, but ultimately you'll find out that it does not work and that doing so is a "bad idea" (tm)

So the other posts where more to drive awareness that things have changed and this is regardless if you only use one language in your applications.
It was mostly just me playing "what if" with a few strings and seeing the differences between DF20 and DF19.1.
--
Wil

Frank Cheng
18-Aug-2021, 07:27 AM
Hi Wil,

Even though it's documented, it still seems fishy to me as I expect "UCharArrayToString" to be the exact opposite of "StringToUCharArray".


String s
UChar[] u
Integer i
Move (Repeat(Character(0),10)) to s
For i From 1 to 10
Move (StringToUCharArray(s)) to u
Move (UCharArrayToString(u)) to s // Will contain only 9 "Character(0)", repeat the translation 9 more times, then you get a zero-length string
Loop


Frank Cheng

wila
18-Aug-2021, 02:52 PM
Peculiar..

btw.. what's the value of i after that loop... do you know (without looking it up)

--
Wil

Frank Cheng
18-Aug-2021, 03:16 PM
I would say 11 without actually testing it. I know that to be true in C++ and C#. I haven't verified that in other languages.

Frank Cheng

wila
18-Aug-2021, 03:31 PM
Yes, correct. 10 points, I mean 11, or well.. and it makes sense as well as it is the first value that is out of the range, but it is not literally what the code says.
--
Wil

Harm Wibier
31-Aug-2021, 02:43 AM
This will be fixed for next rev.

wila
31-Aug-2021, 05:00 AM
Thanks Harm,


This will be fixed for next rev.

As it was documented behavior (as Frank pointed out) and a variety of things have been pointed out working differently - because unicode -, can you tell us what will be fixed?

Is it the double zero not being added, change in behavior?

--
Wil

Harm Wibier
31-Aug-2021, 07:25 AM
We'll remove the logic that removes the last byte if it is 0 byte (and change the DOC accordingly).

wila
31-Aug-2021, 09:57 AM
We'll remove the logic that removes the last byte if it is 0 byte (and change the DOC accordingly).

Cool!

Thanks Harm.
--
Wil

Marcia Booth
11-Sep-2021, 04:52 PM
Hello! We have released an updated build of DataFlex 2021 (20.0.7.152) that addresses various issues reported after the release of DataFlex 2021. The post on this release (https://support.dataaccess.com/Forums/showthread.php?67724-DataFlex-2021-Updated-Release-(20-0-7-152)-Published-Update-Now!) includes links to the full release notes as well as to downloading the new installation programs.


The issue you reported here has been addressed in this new build. Thank you for helping us make DataFlex a better product for the whole community!

wila
13-Sep-2021, 08:01 AM
Tested again and can confirm it is fixed.

One interesting side note is that in one of my tests I also included a BOM and that it is counted as 1 character.
I think that makes perfect sense (just pointing it out).

--
Wil

Harm Wibier
21-Sep-2021, 03:00 AM
Thanks Wil!