[unixODBC-support] Unicode Support for UCS-2 databases

Ingmar Koecher [ NETIKUS.NET ltd ] ingmar.koecher at netikus.net
Sat Jul 25 15:02:53 BST 2009


 > -----Original Message-----
> From: unixodbc-support-bounces at mailman.unixodbc.org 
> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On 
> Behalf Of Nick Gorham
> Sent: Saturday, July 25, 2009 3:04 AM
> To: Support for the unixODBC project
> Subject: Re: [unixODBC-support] Unicode Support for UCS-2 databases
> 
> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
> >> -----Original Message-----
> >> From: unixodbc-support-bounces at mailman.unixodbc.org 
> >> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On 
> >> Behalf Of Nick Gorham
> >> Sent: Friday, July 24, 2009 5:55 PM
> >> To: Support for the unixODBC project
> >> Subject: Re: [unixODBC-support] Unicode Support for UCS-2 databases
> >>
> >> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
> >>     
> >>> Hello,
> >>>
> >>> I am having some difficulties adding UTF-16 encoded data to UCS-2 
> >>> databases (e.g. SQL Server) using unixODBC. Most of the 
> >>>       
> >> problems seem 
> >>     
> >>> to appear as soon as I attempt to use any of the W() 
> >>>       
> >> functions (e.g. 
> >>     
> >>> SQLDriverConnectW()) opposed to the ASCII counterparts.
> >>>
> >>> If I read a UTF-8 encoded file on Linux for example, and 
> >>>       
> >> add it to a 
> >>     
> >>> MySQL UTF-8 database for example, then it will work and I 
> >>>       
> >> don't even 
> >>     
> >>> have to do anything (other than enclosing the field with 
> >>>       
> >> N''). So this 
> >>     
> >>> works well.
> >>>
> >>> If I try to write to a UTF-16/UCS-2 database however, I 
> >>>       
> >> start having 
> >>     
> >>> all sorts of problems. If I store a SQL statement:
> >>>
> >>> wchar_t sqlStmt[] = L"INSERT INTO MyTable (field) values (?)";
> >>>
> >>> then SQLExecDirectW will complain (or better the database 
> >>>       
> >> will) as it 
> >>     
> >>> only sees the "I" characters, the first one. Almost as if it's 
> >>> expecting an ASCII string.
> >>>
> >>> However, at this point I can't even connect using 
> >>>       
> >> SQLDriverConnectW() 
> >>     
> >>> when passing a wchar_t string:
> >>>
> >>> SQLDriverConnectW (hdbc, 0, (SQLWCHAR *) L"MyDsnName", ....);
> >>>
> >>> as it complains that it's not a valid DSN. My guess is that 
> >>>       
> >> it's only 
> >>     
> >>> looking for the first string as well here - or does the 
> >>>       
> >> odbc.ini file 
> >>     
> >>> actually need to be UTF-16 encoded? Is there something 
> else that I 
> >>> need to do, to get this to work?
> >>>
> >>> Is there any sample code that shows how to deal with UTF-16/UCS-2 
> >>> data, or is this not very common?
> >>>
> >>> I'm pretty much at a loss here. The biggest problem I 
> >>>       
> >> cannot seem to 
> >>     
> >>> resolve, is how I can get UTF-8 data on Non-Windows 
> >>>       
> >> platforms into a 
> >>     
> >>> UCS-2 database. I can convert the UTF-8 string into a 
> >>>       
> >> UTF-16 string, 
> >>     
> >>> but that's about it.
> >>>
> >>> I've tried to find information about wchar_t handling of 
> >>>       
> >> unixODBC, and 
> >>     
> >>> the ....W() functions, with little success though.
> >>>
> >>> Any insight that can be provided would be greatly appreciated.
> >>>
> >>>
> >>> Thank you,
> >>> Ingmar.
> >>>
> >>>   
> >>>       
> >> I would check, but wchar_t is often 32bits not the 16 bits 
> >> ODBC expects.
> >>
> >> -- 
> >> Nick
> >>     
> >
> > Thanks Nick. Yes, wchar_t is 4 bytes on OS X (and Linux as well it
> > appears), so I suppose that could be an issue. I though 
> however, that
> > the ODBC implementation would take that into consideration. 
> I figured,
> > that the ....W() functions on OS X / Linux would work 
> correctly with a
> > wchar_t, regardless of its storage size.
> >
> > Am I mistaken?
> >   
> Yep, sorry, its 16bit as in windows. At least for unixODBC, 
> other driver 
> managers seem to vary from 8 to 32 bit. You can build 
> unixODBC to use 4 
> byte unicode, but then you need to find drivers that do the same.

Thanks for your responses.

OK, that explains why I'm having so many issues. I just figured that the
driver manager would adopt to whichever platform I am using, and convert
to the driver accordingly. But I guess that's not the case.

Do you know how people usually work around this problem in an efficient
manner, on platforms where wchar_t uses 4 bytes?


Thanks,
Ingmar. 


More information about the unixODBC-support mailing list