[unixODBC-dev] ANSI to Unicode mapping issues (resend)

Nick Gorham nick at lurcher.org
Fri May 2 13:51:54 BST 2014


On 02/05/14 02:31, David Brown wrote:
> (Pardon me if this is a duplicate - I tried sending it a few days ago 
> from a different address, but it didn't appear to go through)
>
> We have been building and shipping an older ANSI version of our ODBC 
> driver
> (StarSQL) in Unix/Linux environments. We recently ported our current 
> Unicode
> ODBC driver (which has been running on Windows for several years) to 
> Linux,
> and ran into some issues that appear to be related to the unixODBC Driver
> Manager mappings from ANSI entry point to the driver's Unicode entry 
> points
> when an ANSI application invokes ODBC calls to a Unicode driver.
>
> Has anyone else encountered any of these issues?  Thoughts on a solution?

I had a look at this over lunch.
>
> We are using the 2.3.2 release.
>
> Here is a list of the issues encountered by the developer of our driver:
>
> 1)      The Driver Manager does not map calls from an ANSI application's
> call to SQLGet/SetStmtOption to a Unicode driver's SQLGet/SetStmtAttrW 
> entry
> points. It only does the mapping to SQLGet/SetStmtAttr for ANSI 
> drivers. We
> were able to work around this by adding SQLGet/SetStmtOption function 
> entry
> points in our driver, but we shouldn't have to do that.

Not sure if by "Driver Manager" here you mean unixODBC or not, but if 
you do, I am unsure of this. The DM code for SQLSetStmtOption, does have a

else if ( CHECK_SQLSETSTMTATTRW( statement -> connection ))

which should (or at least thats the intent) map that call to SQLSetStmtAttrW

SQLGetStmtOption is missing the mapping I will add that.


>
> 2)      SQLSetDescField does not alter the length supplied by the
> application ("buffer_length") when the field supplied is a string which
> value gets converted to Unicode before being passed to the Unicode 
> Driver.
> In this particular Unicode ODBC API, the buffer_length should be a
> byte-count, not a character-count. The implementation of 
> SQLGetDescField in
> the unixODBC driver manager does deal with this better and divides
> string_length by sizeof(SQLWCHAR) before returning to the application. 
> That
> works better, but is too simplistic for multi-byte ANSI data (e.g. UTF-8)
> See
> #3.
>
> 3)      Conversions between Unicode and ANSI are almost universally 
> assuming
> that one byte of ANSI data will produce two bytes of Unicode data (when
> sizeof(SQLWCHAR) is 2). The code needs to check the length of the 
> resulting
> string (ANSI or Unicode) whenever such a conversion occurs and then 
> use the
> resulting length when passing it on to the driver or calling application.
> Functions like  the ANSI versions of  SQLPrepare and SQLExecDirect can't
> just perform an ansi-to-unicode translation and then pass the application
> supplied length to the Unicode driver.
>
>
> Looking at the unixODBC code, it seems clear that we were exposed to 
> similar
> issues
> with our old ANSI driver when called from a Unicode application.
> Applications using parameter markers rather than string literals would be
> less sensitive to the limitations of the current unixODBC driver manager
> implementation since keywords and identifiers are less likely to contain
> "problematic" characters , but it would seem important to address this 
> none
> the less.
>
> Any suggestions would be appreciated.

Looking at the code, I can see its possible that I could make the calls 
via iconv modify the buffer length before its passed to the driver for 
the cases of SQLSet* and SQLPrepare type things, it gets a bit (lot) 
more of a problem if we want the driver manager to do the same thing 
going back, In the simple case its fine, I can convert from W to A, get 
the new length, and pass that length to the app, but what about the 
cases where the target buffer is not large enough, I can't convert data 
I dont have, so the length in that case will be a guess at best.

Generally another case where ODBC and UTF is a problem.

What have you found the microsoft DM does under these conditions? Does 
it handle (for example) converting UTF-8 to UTC-2 or UTF-16 with the 
change in length, or does it avoid the problem?

-- 
Nick


More information about the unixODBC-dev mailing list