[unixODBC-support] How driver manager converts between Unicode and ANSI
nick at lurcher.org
Thu Apr 11 21:46:16 BST 2013
On 11/04/13 17:32, Daniel Vogelbacher wrote:
> as far as I understand the official ODBC spec, the DM must convert
> between wide-strings and ansi-strings. On Windows this is done by
> converting from unicode to the current code page (locale setting) and
> vice versa.
> For example, if I use a ANSI-only driver and call SQLExecDirectW(),
> the unicode string gets converted to my local code page (iso8859-1 or
> something else) and passed to driver's SQLExecDirect().
> In real world, I discoverd two issues:
> 1.) Most drivers ANSI functions expecting strings not in code page
> encoding, but in a driver-specifc encoding, for example a character
> set specified inside the DSN (like CharSet=utf8).
> If a user loads a ansi-only driver which expects string in encoding
> XY, how does the DM knows about that to perform the correct
> conversion between unicode and XY? (this is more a windows issue, but
> related to the next issue)
> 2.) The DM from unixODBC seems to do something totally curious when
> converting between unicode and ansi. I expected that it uses
> mbstowcs() & co. for conversion regarding the locale setting
> (en_US.utf8 or something else).
> But a lot of tests and a final look into the code later I discovered
> that the DM just choose iso8859-1... ?!
> This breaks the usage of the wide-api on application side and a
> ansi-only driver (like sqliteodbc) which expects UTF-8 strings.
> Is this really intended?
> But even if the DM uses the locale information (how I expected),
> there is issue no. 1 for drivers which are expecting a specific
> charset (like the sqlite odbc driver).
> I hope someone could help me with this. It's very confusing.
TBH, your questions mimic the confusion and compromises that is
involved. The default is 8859, but as you say Windows does much the
same. You can specify other iconv targets when you configure, but I had
to pick something for a default.
It can't use msbtowcs as sizeof( SQLWCHAR ) != sizeof( wchar_t ).
But you finally point out the real problem, it doent matter what the
driver manager uses, as the driver may ignore all that and do something
else. And of course, the driver manager can only convert those bits its
has access to, calls SQLGetData( SQL_C_CHAR ) on a unicode column and
the driver manager has no say in what happens.
And then there is the multibyte sequences like UTF8, The Easysoft
drivers have options to use UTF8, and so do others, but unlike other
DM's unixODBC doesn't treat them as WCHAR types, there is no point, and
it contradicts XOpen if it did. But you still have the problem of what
to do with part reads breaking a character sequence.
As you say its confusing, but I dont know of any way of simplifying it
without breaking something or loosing something that someone needs.
More information about the unixODBC-support