Sture,

you can read your UTF8 string into a uChar array, and then parse it, once you know the rules for the string..

See Here

The value of each individual byte indicates its UTF-8 function, as follows:

  • 00 to 7F hex (0 to 127): first and only byte of a sequence.
  • 80 to BF hex (128 to 191): continuing byte in a multi-byte sequence.
  • C2 to DF hex (194 to 223): first byte of a two-byte sequence.
  • E0 to EF hex (224 to 239): first byte of a three-byte sequence.
  • F0 to FF hex (240 to 255): first byte of a four-byte sequence.

UTF-8 remains a simple, single-byte, ASCII-compatible encoding method, as long as no characters greater than 127 are directly present. This means that an HTML document technically declared to be encoded as UTF-8 can remain a normal single-byte ASCII file. The document can remain so even though it may contain Unicode characters above 127, as long as all characters above 127 are referred to indirectly by ampersand entities.