[unixODBC-support] Unicode Support for UCS-2 databases
Nick Gorham
nick at lurcher.org
Sat Jul 25 15:25:15 BST 2009
Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
> > -----Original Message-----
>
>> From: unixodbc-support-bounces at mailman.unixodbc.org
>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On
>> Behalf Of Nick Gorham
>> Sent: Saturday, July 25, 2009 3:04 AM
>> To: Support for the unixODBC project
>> Subject: Re: [unixODBC-support] Unicode Support for UCS-2 databases
>>
>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
>>
>>>> -----Original Message-----
>>>> From: unixodbc-support-bounces at mailman.unixodbc.org
>>>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On
>>>> Behalf Of Nick Gorham
>>>> Sent: Friday, July 24, 2009 5:55 PM
>>>> To: Support for the unixODBC project
>>>> Subject: Re: [unixODBC-support] Unicode Support for UCS-2 databases
>>>>
>>>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
>>>>
>>>>
>>>>> Hello,
>>>>>
>>>>> I am having some difficulties adding UTF-16 encoded data to UCS-2
>>>>> databases (e.g. SQL Server) using unixODBC. Most of the
>>>>>
>>>>>
>>>> problems seem
>>>>
>>>>
>>>>> to appear as soon as I attempt to use any of the W()
>>>>>
>>>>>
>>>> functions (e.g.
>>>>
>>>>
>>>>> SQLDriverConnectW()) opposed to the ASCII counterparts.
>>>>>
>>>>> If I read a UTF-8 encoded file on Linux for example, and
>>>>>
>>>>>
>>>> add it to a
>>>>
>>>>
>>>>> MySQL UTF-8 database for example, then it will work and I
>>>>>
>>>>>
>>>> don't even
>>>>
>>>>
>>>>> have to do anything (other than enclosing the field with
>>>>>
>>>>>
>>>> N''). So this
>>>>
>>>>
>>>>> works well.
>>>>>
>>>>> If I try to write to a UTF-16/UCS-2 database however, I
>>>>>
>>>>>
>>>> start having
>>>>
>>>>
>>>>> all sorts of problems. If I store a SQL statement:
>>>>>
>>>>> wchar_t sqlStmt[] = L"INSERT INTO MyTable (field) values (?)";
>>>>>
>>>>> then SQLExecDirectW will complain (or better the database
>>>>>
>>>>>
>>>> will) as it
>>>>
>>>>
>>>>> only sees the "I" characters, the first one. Almost as if it's
>>>>> expecting an ASCII string.
>>>>>
>>>>> However, at this point I can't even connect using
>>>>>
>>>>>
>>>> SQLDriverConnectW()
>>>>
>>>>
>>>>> when passing a wchar_t string:
>>>>>
>>>>> SQLDriverConnectW (hdbc, 0, (SQLWCHAR *) L"MyDsnName", ....);
>>>>>
>>>>> as it complains that it's not a valid DSN. My guess is that
>>>>>
>>>>>
>>>> it's only
>>>>
>>>>
>>>>> looking for the first string as well here - or does the
>>>>>
>>>>>
>>>> odbc.ini file
>>>>
>>>>
>>>>> actually need to be UTF-16 encoded? Is there something
>>>>>
>> else that I
>>
>>>>> need to do, to get this to work?
>>>>>
>>>>> Is there any sample code that shows how to deal with UTF-16/UCS-2
>>>>> data, or is this not very common?
>>>>>
>>>>> I'm pretty much at a loss here. The biggest problem I
>>>>>
>>>>>
>>>> cannot seem to
>>>>
>>>>
>>>>> resolve, is how I can get UTF-8 data on Non-Windows
>>>>>
>>>>>
>>>> platforms into a
>>>>
>>>>
>>>>> UCS-2 database. I can convert the UTF-8 string into a
>>>>>
>>>>>
>>>> UTF-16 string,
>>>>
>>>>
>>>>> but that's about it.
>>>>>
>>>>> I've tried to find information about wchar_t handling of
>>>>>
>>>>>
>>>> unixODBC, and
>>>>
>>>>
>>>>> the ....W() functions, with little success though.
>>>>>
>>>>> Any insight that can be provided would be greatly appreciated.
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Ingmar.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I would check, but wchar_t is often 32bits not the 16 bits
>>>> ODBC expects.
>>>>
>>>> --
>>>> Nick
>>>>
>>>>
>>> Thanks Nick. Yes, wchar_t is 4 bytes on OS X (and Linux as well it
>>> appears), so I suppose that could be an issue. I though
>>>
>> however, that
>>
>>> the ODBC implementation would take that into consideration.
>>>
>> I figured,
>>
>>> that the ....W() functions on OS X / Linux would work
>>>
>> correctly with a
>>
>>> wchar_t, regardless of its storage size.
>>>
>>> Am I mistaken?
>>>
>>>
>> Yep, sorry, its 16bit as in windows. At least for unixODBC,
>> other driver
>> managers seem to vary from 8 to 32 bit. You can build
>> unixODBC to use 4
>> byte unicode, but then you need to find drivers that do the same.
>>
>
> Thanks for your responses.
>
> OK, that explains why I'm having so many issues. I just figured that the
> driver manager would adopt to whichever platform I am using, and convert
> to the driver accordingly. But I guess that's not the case.
>
> Do you know how people usually work around this problem in an efficient
> manner, on platforms where wchar_t uses 4 bytes?
>
>
They don't use wchar_t, unsigned short [] is fine to hold the text, you
just need a small bunch of functions to work on it as required.
--
Nick
More information about the unixODBC-support
mailing list