[unixODBC-dev] iconv/ICU

Peter Harvey pharvey at codebydesign.com
Tue Nov 6 10:07:53 GMT 2007


On Tuesday 06 November 2007 00:02:20 Nick Gorham wrote:
> Peter Harvey wrote:
> > Steve Langasek wrote:
> >> On Mon, Nov 05, 2007 at 02:05:57PM -0800, Peter Harvey wrote:
> >>> I have to wonder if using ICU instead of iconv would better support
> >>> loss-less string handling within unixODBC? Would probably mean that
> >>> some string handling code could be removed from __info.c and that
> >>> strings would never have to be dumbed down due to a lack of feature
> >>> support in iconv.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> I know that ICU is not best represented in C (as compared to C++ and
> >>> Java) but may be a better fit than iconv?
> >>
> >> Hrm, what's the problem needing solving here?  AFAIK, ICU is primarily
> >> needed if you have to do canonicalization of Unicode strings; is that
> >> relevant to UnixODBC?
> >>
> >> ICU's a pretty heavy library to pull in if you don't really have use
> >> for it.
> >
> > Yes - agreed. I am not sure it is needed but let me throw out some
> > potential reasons and see if they can (or even need) to be addressed
> > via ICU etc. Here is a starter...
> >
> > Dumbing Down
> >
> > At the moment - some of the string processing occurs only after the
> > string is dumbed down (to ascii) - allowing the use of standard C
> > string processing. Is this lossless, given our use?  I suspect its
> > lossless for ASCII stuff like the standard keywords but are there
> > similar circumstances where we could loose characters?
> >
> > --
> > Peter
>
> Can you give a example of this loss of information? I think the code
> (for example) in SQLConnectW passes the entire string to the driver, the
> DM looks for ASCII versions of the keywords like DRIVER= but they are
> ascii anyway. I guess, if we had UTF8 DSN names it might cause a
> problem, but I don;t know of anyone doing that at the moment.
>
> I do think I checked that it works though when I did the W versions of
> the odbcinst API.

From cvs...

A. SQLPrepareW.c Line 149 appears to dumb down the SQL text for an error 
message. This could result in some character loss. In fact the trace file 
content seems to be limited to 8bit characters at the moment. 

B. At the moment anything going to/from the ini files must be 8bit characters. 
A DSN value string could loose characters and so can any other value string.

C. SQLDriverConnectW.c Line 142 appears to dumb down the entire connect 
string - which will probably be fine for the keywords but not always for the 
values. This applies to the SQLBrowseConnectW as well.

Also...

1. My understanding is that ICU supports many more encodings than does iconv.
2. I bet some string handling code in unixODBC may be simplified or removed as 
ICU does support some string handling beyond conversion.
3. ICU supports translation tables which could be handy for diagnostic 
messages and trace messages.

Anyway... just more thinking out-loud. Its very late here so I am off to bed. 
I may look for more (better) examples tomorrow.

--
Peter



More information about the unixODBC-dev mailing list