[unixODBC-support] Unicode Support for UCS-2 databases

Ingmar Koecher [ NETIKUS.NET ltd ] ingmar.koecher at netikus.net
Sun Jul 26 02:59:36 BST 2009


  > >>>> -----Original Message-----
> >>>> From: unixodbc-support-bounces at mailman.unixodbc.org 
> >>>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On 
> >>>> Behalf Of Nick Gorham
> >>>> Sent: Friday, July 24, 2009 5:55 PM
> >>>> To: Support for the unixODBC project
> >>>> Subject: Re: [unixODBC-support] Unicode Support for 
> UCS-2 databases
> >>>>
> >>>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
> >>>>     
> >>>>         
> >>>>> Hello,
> >>>>>
> >>>>> I am having some difficulties adding UTF-16 encoded 
> data to UCS-2 
> >>>>> databases (e.g. SQL Server) using unixODBC. Most of the 
> >>>>>       
> >>>>>           
> >>>> problems seem 
> >>>>     
> >>>>         
> >>>>> to appear as soon as I attempt to use any of the W() 
> >>>>>       
> >>>>>           
> >>>> functions (e.g. 
> >>>>     
> >>>>         
> >>>>> SQLDriverConnectW()) opposed to the ASCII counterparts.
> >>>>>
> >>>>> If I read a UTF-8 encoded file on Linux for example, and 
> >>>>>       
> >>>>>           
> >>>> add it to a 
> >>>>     
> >>>>         
> >>>>> MySQL UTF-8 database for example, then it will work and I 
> >>>>>       
> >>>>>           
> >>>> don't even 
> >>>>     
> >>>>         
> >>>>> have to do anything (other than enclosing the field with 
> >>>>>       
> >>>>>           
> >>>> N''). So this 
> >>>>     
> >>>>         
> >>>>> works well.
> >>>>>
> >>>>> If I try to write to a UTF-16/UCS-2 database however, I 
> >>>>>       
> >>>>>           
> >>>> start having 
> >>>>     
> >>>>         
> >>>>> all sorts of problems. If I store a SQL statement:
> >>>>>
> >>>>> wchar_t sqlStmt[] = L"INSERT INTO MyTable (field) values (?)";
> >>>>>
> >>>>> then SQLExecDirectW will complain (or better the database 
> >>>>>       
> >>>>>           
> >>>> will) as it 
> >>>>     
> >>>>         
> >>>>> only sees the "I" characters, the first one. Almost as if it's 
> >>>>> expecting an ASCII string.
> >>>>>
> >>>>> However, at this point I can't even connect using 
> >>>>>       
> >>>>>           
> >>>> SQLDriverConnectW() 
> >>>>     
> >>>>         
> >>>>> when passing a wchar_t string:
> >>>>>
> >>>>> SQLDriverConnectW (hdbc, 0, (SQLWCHAR *) L"MyDsnName", ....);
> >>>>>
> >>>>> as it complains that it's not a valid DSN. My guess is that 
> >>>>>       
> >>>>>           
> >>>> it's only 
> >>>>     
> >>>>         
> >>>>> looking for the first string as well here - or does the 
> >>>>>       
> >>>>>           
> >>>> odbc.ini file 
> >>>>     
> >>>>         
> >>>>> actually need to be UTF-16 encoded? Is there something 
> >>>>>           
> >> else that I 
> >>     
> >>>>> need to do, to get this to work?
> >>>>>
> >>>>> Is there any sample code that shows how to deal with 
> UTF-16/UCS-2 
> >>>>> data, or is this not very common?
> >>>>>
> >>>>> I'm pretty much at a loss here. The biggest problem I 
> >>>>>       
> >>>>>           
> >>>> cannot seem to 
> >>>>     
> >>>>         
> >>>>> resolve, is how I can get UTF-8 data on Non-Windows 
> >>>>>       
> >>>>>           
> >>>> platforms into a 
> >>>>     
> >>>>         
> >>>>> UCS-2 database. I can convert the UTF-8 string into a 
> >>>>>       
> >>>>>           
> >>>> UTF-16 string, 
> >>>>     
> >>>>         
> >>>>> but that's about it.
> >>>>>
> >>>>> I've tried to find information about wchar_t handling of 
> >>>>>       
> >>>>>           
> >>>> unixODBC, and 
> >>>>     
> >>>>         
> >>>>> the ....W() functions, with little success though.
> >>>>>
> >>>>> Any insight that can be provided would be greatly appreciated.
> >>>>>
> >>>>>
> >>>>> Thank you,
> >>>>> Ingmar.
> >>>>>
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I would check, but wchar_t is often 32bits not the 16 bits 
> >>>> ODBC expects.
> >>>>
> >>>> -- 
> >>>> Nick
> >>>>     
> >>>>         
> >>> Thanks Nick. Yes, wchar_t is 4 bytes on OS X (and Linux as well it
> >>> appears), so I suppose that could be an issue. I though 
> >>>       
> >> however, that
> >>     
> >>> the ODBC implementation would take that into consideration. 
> >>>       
> >> I figured,
> >>     
> >>> that the ....W() functions on OS X / Linux would work 
> >>>       
> >> correctly with a
> >>     
> >>> wchar_t, regardless of its storage size.
> >>>
> >>> Am I mistaken?
> >>>   
> >>>       
> >> Yep, sorry, its 16bit as in windows. At least for unixODBC, 
> >> other driver 
> >> managers seem to vary from 8 to 32 bit. You can build 
> >> unixODBC to use 4 
> >> byte unicode, but then you need to find drivers that do the same.
> >>     
> >
> > Thanks for your responses.
> >
> > OK, that explains why I'm having so many issues. I just 
> figured that the
> > driver manager would adopt to whichever platform I am 
> using, and convert
> > to the driver accordingly. But I guess that's not the case.
> >
> > Do you know how people usually work around this problem in 
> an efficient
> > manner, on platforms where wchar_t uses 4 bytes?
> >

> They don't use wchar_t, unsigned short [] is fine to hold the 
> text, you 
> just need a small bunch of functions to work on it as required.

OK, that's what I did (unsigned short). However, I "discovered" a
compiler option for gcc, that allows you to set the size of "wchar_t" to
be 2 bytes. I have no experience with it yet, but activating it allows
me to use wchar_t instead of unsigned short.

After setting that, I'm still not having any luck. If I use the ASCII
variant of the SQLConnect(), all is well:

rv = SQLConnect(hdlConn, (SQLCHAR*) "MyDSN", SQL_NTS, (SQLCHAR*)
"dbuser", SQL_NTS, (SQLCHAR*) "verysecret", SQL_NTS);

And I can connect. If I use the wide version, it doesn't:

rv = SQLConnectW(hdlConn, (SQLWCHAR*) L"MyDSN", SQL_NTS, (SQLWCHAR*)
L"dbuser", SQL_NTS, (SQLWCHAR*) L"verysecret", SQL_NTS);

I have also created unsigned short arrays and manually populated them,
with the same results. I get this error:

[unixODBC][ (48) (ret=-1)

I also discovered SQL_NTSL, but I'm not sure what its purpose is - I
couldn't find any documentation. I'm not sure what I'm doing wrong at
this point.

I do have two other questions though:

1. Do I have to use SQLConnectW() in order to submit any UCS-2 data
(e.g. using SQLExecDirectW()) to the database using ODBC? Or can I use
SQLConnect() and then SQLExecDirectW()?

2. If I use SQLConnectW(), then does the odbc.ini file also have to be
encoded in UTF-16/UCS-2?


Thanks again,
Ingmar.


More information about the unixODBC-support mailing list