[unixODBC-support] Unicode Support for UCS-2 databases
Ingmar Koecher [ NETIKUS.NET ltd ]
ingmar.koecher at netikus.net
Sun Jul 26 02:59:36 BST 2009
> >>>> -----Original Message-----
> >>>> From: unixodbc-support-bounces at mailman.unixodbc.org
> >>>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On
> >>>> Behalf Of Nick Gorham
> >>>> Sent: Friday, July 24, 2009 5:55 PM
> >>>> To: Support for the unixODBC project
> >>>> Subject: Re: [unixODBC-support] Unicode Support for
> UCS-2 databases
> >>>>
> >>>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
> >>>>
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I am having some difficulties adding UTF-16 encoded
> data to UCS-2
> >>>>> databases (e.g. SQL Server) using unixODBC. Most of the
> >>>>>
> >>>>>
> >>>> problems seem
> >>>>
> >>>>
> >>>>> to appear as soon as I attempt to use any of the W()
> >>>>>
> >>>>>
> >>>> functions (e.g.
> >>>>
> >>>>
> >>>>> SQLDriverConnectW()) opposed to the ASCII counterparts.
> >>>>>
> >>>>> If I read a UTF-8 encoded file on Linux for example, and
> >>>>>
> >>>>>
> >>>> add it to a
> >>>>
> >>>>
> >>>>> MySQL UTF-8 database for example, then it will work and I
> >>>>>
> >>>>>
> >>>> don't even
> >>>>
> >>>>
> >>>>> have to do anything (other than enclosing the field with
> >>>>>
> >>>>>
> >>>> N''). So this
> >>>>
> >>>>
> >>>>> works well.
> >>>>>
> >>>>> If I try to write to a UTF-16/UCS-2 database however, I
> >>>>>
> >>>>>
> >>>> start having
> >>>>
> >>>>
> >>>>> all sorts of problems. If I store a SQL statement:
> >>>>>
> >>>>> wchar_t sqlStmt[] = L"INSERT INTO MyTable (field) values (?)";
> >>>>>
> >>>>> then SQLExecDirectW will complain (or better the database
> >>>>>
> >>>>>
> >>>> will) as it
> >>>>
> >>>>
> >>>>> only sees the "I" characters, the first one. Almost as if it's
> >>>>> expecting an ASCII string.
> >>>>>
> >>>>> However, at this point I can't even connect using
> >>>>>
> >>>>>
> >>>> SQLDriverConnectW()
> >>>>
> >>>>
> >>>>> when passing a wchar_t string:
> >>>>>
> >>>>> SQLDriverConnectW (hdbc, 0, (SQLWCHAR *) L"MyDsnName", ....);
> >>>>>
> >>>>> as it complains that it's not a valid DSN. My guess is that
> >>>>>
> >>>>>
> >>>> it's only
> >>>>
> >>>>
> >>>>> looking for the first string as well here - or does the
> >>>>>
> >>>>>
> >>>> odbc.ini file
> >>>>
> >>>>
> >>>>> actually need to be UTF-16 encoded? Is there something
> >>>>>
> >> else that I
> >>
> >>>>> need to do, to get this to work?
> >>>>>
> >>>>> Is there any sample code that shows how to deal with
> UTF-16/UCS-2
> >>>>> data, or is this not very common?
> >>>>>
> >>>>> I'm pretty much at a loss here. The biggest problem I
> >>>>>
> >>>>>
> >>>> cannot seem to
> >>>>
> >>>>
> >>>>> resolve, is how I can get UTF-8 data on Non-Windows
> >>>>>
> >>>>>
> >>>> platforms into a
> >>>>
> >>>>
> >>>>> UCS-2 database. I can convert the UTF-8 string into a
> >>>>>
> >>>>>
> >>>> UTF-16 string,
> >>>>
> >>>>
> >>>>> but that's about it.
> >>>>>
> >>>>> I've tried to find information about wchar_t handling of
> >>>>>
> >>>>>
> >>>> unixODBC, and
> >>>>
> >>>>
> >>>>> the ....W() functions, with little success though.
> >>>>>
> >>>>> Any insight that can be provided would be greatly appreciated.
> >>>>>
> >>>>>
> >>>>> Thank you,
> >>>>> Ingmar.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> I would check, but wchar_t is often 32bits not the 16 bits
> >>>> ODBC expects.
> >>>>
> >>>> --
> >>>> Nick
> >>>>
> >>>>
> >>> Thanks Nick. Yes, wchar_t is 4 bytes on OS X (and Linux as well it
> >>> appears), so I suppose that could be an issue. I though
> >>>
> >> however, that
> >>
> >>> the ODBC implementation would take that into consideration.
> >>>
> >> I figured,
> >>
> >>> that the ....W() functions on OS X / Linux would work
> >>>
> >> correctly with a
> >>
> >>> wchar_t, regardless of its storage size.
> >>>
> >>> Am I mistaken?
> >>>
> >>>
> >> Yep, sorry, its 16bit as in windows. At least for unixODBC,
> >> other driver
> >> managers seem to vary from 8 to 32 bit. You can build
> >> unixODBC to use 4
> >> byte unicode, but then you need to find drivers that do the same.
> >>
> >
> > Thanks for your responses.
> >
> > OK, that explains why I'm having so many issues. I just
> figured that the
> > driver manager would adopt to whichever platform I am
> using, and convert
> > to the driver accordingly. But I guess that's not the case.
> >
> > Do you know how people usually work around this problem in
> an efficient
> > manner, on platforms where wchar_t uses 4 bytes?
> >
> They don't use wchar_t, unsigned short [] is fine to hold the
> text, you
> just need a small bunch of functions to work on it as required.
OK, that's what I did (unsigned short). However, I "discovered" a
compiler option for gcc, that allows you to set the size of "wchar_t" to
be 2 bytes. I have no experience with it yet, but activating it allows
me to use wchar_t instead of unsigned short.
After setting that, I'm still not having any luck. If I use the ASCII
variant of the SQLConnect(), all is well:
rv = SQLConnect(hdlConn, (SQLCHAR*) "MyDSN", SQL_NTS, (SQLCHAR*)
"dbuser", SQL_NTS, (SQLCHAR*) "verysecret", SQL_NTS);
And I can connect. If I use the wide version, it doesn't:
rv = SQLConnectW(hdlConn, (SQLWCHAR*) L"MyDSN", SQL_NTS, (SQLWCHAR*)
L"dbuser", SQL_NTS, (SQLWCHAR*) L"verysecret", SQL_NTS);
I have also created unsigned short arrays and manually populated them,
with the same results. I get this error:
[unixODBC][ (48) (ret=-1)
I also discovered SQL_NTSL, but I'm not sure what its purpose is - I
couldn't find any documentation. I'm not sure what I'm doing wrong at
this point.
I do have two other questions though:
1. Do I have to use SQLConnectW() in order to submit any UCS-2 data
(e.g. using SQLExecDirectW()) to the database using ODBC? Or can I use
SQLConnect() and then SQLExecDirectW()?
2. If I use SQLConnectW(), then does the odbc.ini file also have to be
encoded in UTF-16/UCS-2?
Thanks again,
Ingmar.
More information about the unixODBC-support
mailing list