Unicode

MoneyWorks 7 stores all text internally using the Unicode UTF-8 encoding.

This allows proper compatibility between Mac and Windows for Roman text outside the ASCII range (which previously was not translated between MacRoman and WinLatin), and also allows for input and output of non-Roman text (Chinese, Japanese, exotic symbols, Emoji etc).

The main things that will affect developers and advanced users will be import/export and some rare cases of calculations involving text.

Importing

XML is always UTF-8.

Plain text importing assumes that text files are UTF-8; if a non-UTF-8 character is found, the text will be reinterpreted using the default system codepage (usually MacRoman or WinLatin).

Calculations

In general, most things should just work as expected.

Char() was originally added to support hand-encoding of Code128 barcodes. However, as a generalised codepoint-to-character function it now takes a 32-bit Unicode code point and returns a UTF-8 string.

Code128() returns ASCII-compatible strings in the codepoint range 0-106. Since ASCII is forwards compatible with UTF-8, there is no issue.

The following string functions take their arguments/return values in unicode codepoint counts, not bytes:

Length(). Note that Length(Char(x)) is 1 for any codepoint x.

Left(), Mid(), Right()Pad()PositionInText().

These operate on unicode characters, regardless of the number of bytes used to encode them.

Cases that have encoding-dependencies:

HexEncode()—This exposes the UTF-8 encoding of text.

Since its output is a text string, it is an error for HexDecode() to produce invalid UTF-8. Invalid characters are transcoded as '?'.

Checksum() checksums the UTF-8 encoding of the text, so will differ from a checksum calculated by v6 or earlier if the text is not ASCII.

Slice() and Dice() delimiters are required to be ASCII (i.e. single-byte encoded)

Sort() is currently a bytewise sort (does not respect accents etc).

Posted in Database, Esoterica, MWScript, Reporting | Comments Off on Unicode