[unixODBC-support] Unicode Support for UCS-2 databases

Nick Gorham nick at lurcher.org
Sat Jul 25 15:25:15 BST 2009


Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
>  > -----Original Message-----
>   
>> From: unixodbc-support-bounces at mailman.unixodbc.org 
>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On 
>> Behalf Of Nick Gorham
>> Sent: Saturday, July 25, 2009 3:04 AM
>> To: Support for the unixODBC project
>> Subject: Re: [unixODBC-support] Unicode Support for UCS-2 databases
>>
>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
>>     
>>>> -----Original Message-----
>>>> From: unixodbc-support-bounces at mailman.unixodbc.org 
>>>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On 
>>>> Behalf Of Nick Gorham
>>>> Sent: Friday, July 24, 2009 5:55 PM
>>>> To: Support for the unixODBC project
>>>> Subject: Re: [unixODBC-support] Unicode Support for UCS-2 databases
>>>>
>>>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
>>>>     
>>>>         
>>>>> Hello,
>>>>>
>>>>> I am having some difficulties adding UTF-16 encoded data to UCS-2 
>>>>> databases (e.g. SQL Server) using unixODBC. Most of the 
>>>>>       
>>>>>           
>>>> problems seem 
>>>>     
>>>>         
>>>>> to appear as soon as I attempt to use any of the W() 
>>>>>       
>>>>>           
>>>> functions (e.g. 
>>>>     
>>>>         
>>>>> SQLDriverConnectW()) opposed to the ASCII counterparts.
>>>>>
>>>>> If I read a UTF-8 encoded file on Linux for example, and 
>>>>>       
>>>>>           
>>>> add it to a 
>>>>     
>>>>         
>>>>> MySQL UTF-8 database for example, then it will work and I 
>>>>>       
>>>>>           
>>>> don't even 
>>>>     
>>>>         
>>>>> have to do anything (other than enclosing the field with 
>>>>>       
>>>>>           
>>>> N''). So this 
>>>>     
>>>>         
>>>>> works well.
>>>>>
>>>>> If I try to write to a UTF-16/UCS-2 database however, I 
>>>>>       
>>>>>           
>>>> start having 
>>>>     
>>>>         
>>>>> all sorts of problems. If I store a SQL statement:
>>>>>
>>>>> wchar_t sqlStmt[] = L"INSERT INTO MyTable (field) values (?)";
>>>>>
>>>>> then SQLExecDirectW will complain (or better the database 
>>>>>       
>>>>>           
>>>> will) as it 
>>>>     
>>>>         
>>>>> only sees the "I" characters, the first one. Almost as if it's 
>>>>> expecting an ASCII string.
>>>>>
>>>>> However, at this point I can't even connect using 
>>>>>       
>>>>>           
>>>> SQLDriverConnectW() 
>>>>     
>>>>         
>>>>> when passing a wchar_t string:
>>>>>
>>>>> SQLDriverConnectW (hdbc, 0, (SQLWCHAR *) L"MyDsnName", ....);
>>>>>
>>>>> as it complains that it's not a valid DSN. My guess is that 
>>>>>       
>>>>>           
>>>> it's only 
>>>>     
>>>>         
>>>>> looking for the first string as well here - or does the 
>>>>>       
>>>>>           
>>>> odbc.ini file 
>>>>     
>>>>         
>>>>> actually need to be UTF-16 encoded? Is there something 
>>>>>           
>> else that I 
>>     
>>>>> need to do, to get this to work?
>>>>>
>>>>> Is there any sample code that shows how to deal with UTF-16/UCS-2 
>>>>> data, or is this not very common?
>>>>>
>>>>> I'm pretty much at a loss here. The biggest problem I 
>>>>>       
>>>>>           
>>>> cannot seem to 
>>>>     
>>>>         
>>>>> resolve, is how I can get UTF-8 data on Non-Windows 
>>>>>       
>>>>>           
>>>> platforms into a 
>>>>     
>>>>         
>>>>> UCS-2 database. I can convert the UTF-8 string into a 
>>>>>       
>>>>>           
>>>> UTF-16 string, 
>>>>     
>>>>         
>>>>> but that's about it.
>>>>>
>>>>> I've tried to find information about wchar_t handling of 
>>>>>       
>>>>>           
>>>> unixODBC, and 
>>>>     
>>>>         
>>>>> the ....W() functions, with little success though.
>>>>>
>>>>> Any insight that can be provided would be greatly appreciated.
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Ingmar.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> I would check, but wchar_t is often 32bits not the 16 bits 
>>>> ODBC expects.
>>>>
>>>> -- 
>>>> Nick
>>>>     
>>>>         
>>> Thanks Nick. Yes, wchar_t is 4 bytes on OS X (and Linux as well it
>>> appears), so I suppose that could be an issue. I though 
>>>       
>> however, that
>>     
>>> the ODBC implementation would take that into consideration. 
>>>       
>> I figured,
>>     
>>> that the ....W() functions on OS X / Linux would work 
>>>       
>> correctly with a
>>     
>>> wchar_t, regardless of its storage size.
>>>
>>> Am I mistaken?
>>>   
>>>       
>> Yep, sorry, its 16bit as in windows. At least for unixODBC, 
>> other driver 
>> managers seem to vary from 8 to 32 bit. You can build 
>> unixODBC to use 4 
>> byte unicode, but then you need to find drivers that do the same.
>>     
>
> Thanks for your responses.
>
> OK, that explains why I'm having so many issues. I just figured that the
> driver manager would adopt to whichever platform I am using, and convert
> to the driver accordingly. But I guess that's not the case.
>
> Do you know how people usually work around this problem in an efficient
> manner, on platforms where wchar_t uses 4 bytes?
>
>   

They don't use wchar_t, unsigned short [] is fine to hold the text, you 
just need a small bunch of functions to work on it as required.

--
Nick


More information about the unixODBC-support mailing list