[unixODBC-support] Unicode Support for UCS-2 databases
Martin J. Evans
martin.evans at easysoft.com
Sun Jul 26 10:10:26 BST 2009
Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
> > >>>> -----Original Message-----
>
>>>>>> From: unixodbc-support-bounces at mailman.unixodbc.org
>>>>>> [mailto:unixodbc-support-bounces at mailman.unixodbc.org] On
>>>>>> Behalf Of Nick Gorham
>>>>>> Sent: Friday, July 24, 2009 5:55 PM
>>>>>> To: Support for the unixODBC project
>>>>>> Subject: Re: [unixODBC-support] Unicode Support for
>>>>>>
>> UCS-2 databases
>>
>>>>>> Ingmar Koecher [ NETIKUS.NET ltd ] wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am having some difficulties adding UTF-16 encoded
>>>>>>>
>> data to UCS-2
>>
>>>>>>> databases (e.g. SQL Server) using unixODBC. Most of the
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> problems seem
>>>>>>
>>>>>>
>>>>>>
>>>>>>> to appear as soon as I attempt to use any of the W()
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> functions (e.g.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> SQLDriverConnectW()) opposed to the ASCII counterparts.
>>>>>>>
>>>>>>> If I read a UTF-8 encoded file on Linux for example, and
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> add it to a
>>>>>>
>>>>>>
>>>>>>
>>>>>>> MySQL UTF-8 database for example, then it will work and I
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> don't even
>>>>>>
>>>>>>
>>>>>>
>>>>>>> have to do anything (other than enclosing the field with
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> N''). So this
>>>>>>
>>>>>>
>>>>>>
>>>>>>> works well.
>>>>>>>
>>>>>>> If I try to write to a UTF-16/UCS-2 database however, I
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> start having
>>>>>>
>>>>>>
>>>>>>
>>>>>>> all sorts of problems. If I store a SQL statement:
>>>>>>>
>>>>>>> wchar_t sqlStmt[] = L"INSERT INTO MyTable (field) values (?)";
>>>>>>>
>>>>>>> then SQLExecDirectW will complain (or better the database
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> will) as it
>>>>>>
>>>>>>
>>>>>>
>>>>>>> only sees the "I" characters, the first one. Almost as if it's
>>>>>>> expecting an ASCII string.
>>>>>>>
>>>>>>> However, at this point I can't even connect using
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> SQLDriverConnectW()
>>>>>>
>>>>>>
>>>>>>
>>>>>>> when passing a wchar_t string:
>>>>>>>
>>>>>>> SQLDriverConnectW (hdbc, 0, (SQLWCHAR *) L"MyDsnName", ....);
>>>>>>>
>>>>>>> as it complains that it's not a valid DSN. My guess is that
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> it's only
>>>>>>
>>>>>>
>>>>>>
>>>>>>> looking for the first string as well here - or does the
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> odbc.ini file
>>>>>>
>>>>>>
>>>>>>
>>>>>>> actually need to be UTF-16 encoded? Is there something
>>>>>>>
>>>>>>>
>>>> else that I
>>>>
>>>>
>>>>>>> need to do, to get this to work?
>>>>>>>
>>>>>>> Is there any sample code that shows how to deal with
>>>>>>>
>> UTF-16/UCS-2
>>
>>>>>>> data, or is this not very common?
>>>>>>>
>>>>>>> I'm pretty much at a loss here. The biggest problem I
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> cannot seem to
>>>>>>
>>>>>>
>>>>>>
>>>>>>> resolve, is how I can get UTF-8 data on Non-Windows
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> platforms into a
>>>>>>
>>>>>>
>>>>>>
>>>>>>> UCS-2 database. I can convert the UTF-8 string into a
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> UTF-16 string,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> but that's about it.
>>>>>>>
>>>>>>> I've tried to find information about wchar_t handling of
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> unixODBC, and
>>>>>>
>>>>>>
>>>>>>
>>>>>>> the ....W() functions, with little success though.
>>>>>>>
>>>>>>> Any insight that can be provided would be greatly appreciated.
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ingmar.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I would check, but wchar_t is often 32bits not the 16 bits
>>>>>> ODBC expects.
>>>>>>
>>>>>> --
>>>>>> Nick
>>>>>>
>>>>>>
>>>>>>
>>>>> Thanks Nick. Yes, wchar_t is 4 bytes on OS X (and Linux as well it
>>>>> appears), so I suppose that could be an issue. I though
>>>>>
>>>>>
>>>> however, that
>>>>
>>>>
>>>>> the ODBC implementation would take that into consideration.
>>>>>
>>>>>
>>>> I figured,
>>>>
>>>>
>>>>> that the ....W() functions on OS X / Linux would work
>>>>>
>>>>>
>>>> correctly with a
>>>>
>>>>
>>>>> wchar_t, regardless of its storage size.
>>>>>
>>>>> Am I mistaken?
>>>>>
>>>>>
>>>>>
>>>> Yep, sorry, its 16bit as in windows. At least for unixODBC,
>>>> other driver
>>>> managers seem to vary from 8 to 32 bit. You can build
>>>> unixODBC to use 4
>>>> byte unicode, but then you need to find drivers that do the same.
>>>>
>>>>
>>> Thanks for your responses.
>>>
>>> OK, that explains why I'm having so many issues. I just
>>>
>> figured that the
>>
>>> driver manager would adopt to whichever platform I am
>>>
>> using, and convert
>>
>>> to the driver accordingly. But I guess that's not the case.
>>>
>>> Do you know how people usually work around this problem in
>>>
>> an efficient
>>
>>> manner, on platforms where wchar_t uses 4 bytes?
>>>
>>>
>
>
>> They don't use wchar_t, unsigned short [] is fine to hold the
>> text, you
>> just need a small bunch of functions to work on it as required.
>>
>
> OK, that's what I did (unsigned short). However, I "discovered" a
> compiler option for gcc, that allows you to set the size of "wchar_t" to
> be 2 bytes. I have no experience with it yet, but activating it allows
> me to use wchar_t instead of unsigned short.
>
> After setting that, I'm still not having any luck. If I use the ASCII
> variant of the SQLConnect(), all is well:
>
> rv = SQLConnect(hdlConn, (SQLCHAR*) "MyDSN", SQL_NTS, (SQLCHAR*)
> "dbuser", SQL_NTS, (SQLCHAR*) "verysecret", SQL_NTS);
>
> And I can connect. If I use the wide version, it doesn't:
>
> rv = SQLConnectW(hdlConn, (SQLWCHAR*) L"MyDSN", SQL_NTS, (SQLWCHAR*)
> L"dbuser", SQL_NTS, (SQLWCHAR*) L"verysecret", SQL_NTS);
>
> I have also created unsigned short arrays and manually populated them,
> with the same results. I get this error:
>
> [unixODBC][ (48) (ret=-1)
>
> I also discovered SQL_NTSL, but I'm not sure what its purpose is - I
> couldn't find any documentation. I'm not sure what I'm doing wrong at
> this point.
>
> I do have two other questions though:
>
> 1. Do I have to use SQLConnectW() in order to submit any UCS-2 data
> (e.g. using SQLExecDirectW()) to the database using ODBC? Or can I use
> SQLConnect() and then SQLExecDirectW()?
>
> 2. If I use SQLConnectW(), then does the odbc.ini file also have to be
> encoded in UTF-16/UCS-2?
>
>
> Thanks again,
> Ingmar.
> _______________________________________________
> unixODBC-support mailing list
> unixODBC-support at mailman.unixodbc.org
> http://mailman.unixodbc.org/mailman/listinfo/unixodbc-support
>
>
>
Ingmar,
Perl's DBD::ODBC uses unicode with ODBC drivers under unixODBC however,
for various reasons I could go in to it does not do it in the
tradiditional way. Normally, to build to use unicode in unixODBC you
define the UNICODE macro (see sqlucode.h) and this maps SQLxxx calls to
SQLxxxW calls automatically (in your app code via the C preprocessor).
If like, DBD::ODBC you need to use ANSI and Wide versions of the
functions, you need to leave the UNICODE macro undefined and
specifically call SQLxxx (or SQLxxxA) or SQLxxxW. Also note that you
need a driver which has the SQLxxxW functions or when you make a Wide
call, unixODBC will attempt to translate your unicode characters to ANSI
(obviously lossy) and call SQLxxx in the driver.
As for the odbc ini files I don't think UTF-16/UCS-2 is expected but I
have not confirmed this in the source (Nick will know though). I can
confirm I call SQLDriverConnectW on unix with a unicode aware driver but
ANSI strings encoded in UTF-16 and unixODBC happily finds the DSNs in
the ini file (Note, I've not tried DSN names etc with character values >
127). If I were you I'd use SQLDriverConnect instead of SQLConnect - it
is more flexible. I'd also pass the length in chrs(not bytes if calling
SQLConnectW) for strings instead of SQL_NTS - there was a bug a while
ago which you'll miss if you do that.
You should call SQLDriverConnectW if you are later going to use Wide
functions - I cannot remember exactly why but it caused an issue later
if you didn't. It /may/ have been that if you didn't call
SQLDriverConnectW then unixODBC ignored the W functions in the driver -
perhaps Nick will remember - unfortunately I omitted to comment on this
in DBD::ODBC :-(
To ensure compatibility with ODBC you really ought to use SQLWCHAR but
as Nick says it is an unisigned short.
Lastly, I'd make sure you are using the latest unixODBC as some older
versions had various problems when using unicode (specifically 2.2.11
and a few later versions in the cursor library).
BTW, what was the compiler flag that changes wchar_t to 2 bytes?
Martin
More information about the unixODBC-support
mailing list