[unixODBC-support] Work arround Linux Debian, Microsoft SQL server ODBC driver and charset (ISO8859, UTF8, UCS2)

Edouard Gaulué listes at e-gaulue.com
Tue Sep 20 10:42:21 BST 2016


Nice to have come back on it to explain it more.

On the web you can find this: 
https://msdn.microsoft.com/en-us/library/hh568448(v=sql.110).aspx that 
may be clearer for you than for me. The Driver Manager recommanded and 
installed by Microsoft on Linux is unixODBC.

I have enabled logging, and it's driving me mad. I've got two version of 
unixODBC, one with --with-iconv-char-enc=UTF-8 (version1) and the other 
not (version2).

Here are the results comparing logs and what I see in my bash which is 
utf8 locale (hex code for 'è' is tabulated on the right):

UNICODE Using encoding ASCII 'UTF-8' and UNICODE 'UCS-2LE'
     sqlcmd
         execute UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' WHERE 
id=1105270
             bash-utf8: UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270                c3a8
             log: SQL = [UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La 
pi?ce' WHERE id=1105270                e8
         request SELECT EU_ENUMERE FROM EXP_F_DOCLIGNES WHERE id=1105270
             log: Buffer = [La pi?ce](unicode)                         
                 e8
             bash-utf8: La pièce                                         
c3a8
     sqli
         request SELECT EU_ENUMERE FROM EXP_F_DOCLIGNES WHERE id=1105270
             log: Buffer = [La pièce]                                 
         c3a8
             bash-utf8: La pièce                                         
c3a8
         execute UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' WHERE 
id=1105270
             bash-utf8: UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270                c3a8
             SQL = [UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270][length = 66]        c3a8
         request SELECT EU_ENUMERE FROM EXP_F_DOCLIGNES WHERE id=1105270
             Buffer = [La pièce]                                         
c3a8
             bash-utf8: La pièce                                         
c3a8

UNICODE Using encoding ASCII 'ISO8859-1' and UNICODE 'UCS-2LE'
     sqlcmd
         execute UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' WHERE 
id=1105270
             bash-utf8: UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270                c3a8
             SQL = [UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270                      c3a8
         request SELECT EU_ENUMERE FROM EXP_F_DOCLIGNES WHERE id=1105270
             Buffer = [La pièce](unicode)                             
             c3a8
             bash-utf8: La pièce                                         
c3a8
     sqli
         request SELECT EU_ENUMERE FROM EXP_F_DOCLIGNES WHERE id=1105270
             Buffer = [La pièce]                                     
     c383 c2a8    <--- why ?
             bash-utf8: La pièce                                         
c3a8         <-----|
         execute UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' WHERE 
id=1105270
             bash-utf8: UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270                c3a8
             SQL = [UPDATE EXP_F_DOCLIGNES SET EU_ENUMERE='La pièce' 
WHERE id=1105270][length = 66]        c383 c2a8
         request SELECT EU_ENUMERE FROM EXP_F_DOCLIGNES WHERE id=1105270
             Buffer = [La pièce]                                     
        c383 c283 c382 c2a8
             bash-utf8: La pièce                                     
     c383 c2a8


Does anybody know:
  * if logging is done before or after charset conversion? For execute? 
For request results?
  * if logging is done on converted request/execute even if conversion 
is not involved?

It looks like sqlcmd use a lot of W functions and SQL result is said to 
be SQL_WCHAR, so I think you are totaly right: unixODBC driver manager 
conversion may be never used. But if I look at the logs, hex codes used 
for 'è' are different for version1 and version2. If conversion is not 
involved, I should see the same hex code in the logs, no?

For isql, that the contrary (as you said), it only uses ansi api calls. 
But as the logs look different, I've got the feeling logging is done 
after the charset conversion, as the hex code are not the same for 'è' 
with version1 and version2.

As you can see, with version 2, I still miss something, or my bash 
auto-convert on its own. Please help.

Regards, EG

Le 19/09/2016 à 14:05, Nick Gorham a écrit :
> On 15/09/16 16:52, Edouard Gaulué wrote:
>>
>> *** Analyse ***
>>
>> sqlcmd maybe integrates something to detect your encoding and to 
>> comply with the connexion one. isql doesn't, nor PHP through PDO. 
>> But, if default is set to UTF-8 then those applications do the job.
>
> i think its possible that sqlcmd is reading the input and converting 
> in itself into wide characters, and then calling the W functions the 
> driver so UTF8 is not being seen by the driver. And likewise with data 
> coming back. isql is a simple test tool that is not specific to any 
> driver so just takes a string from the terminal and passes it to the 
> ansi api calls.
>>
>>
>> *** Questions ***
>>
>> How can read (SELECT) be OK and not write (INSERT, UPDATE) ? That the 
>> first thing I found strange. Why applications understand the output 
>> should be in UTF-8 whereas thay are said to be in ISO8859-1. Is it a 
>> kind of OS feature, knowing you are displaying UTF-8 ?
>
> Not sure, I would turn on unixODBC logging and see what the app is 
> doing with the driver.
>>
>> As far as I understand, --with-iconv-char-enc=UTF-8 changes the 
>> default character encoding but it can be overrided by 
>> directives/options, if implemented. Is this true?
>>
>> If not, what is the risk of compiling with 
>> --with-iconv-char-enc=UTF-8 ? on a UTF-8 OS ?
>
> Normally none. The only time the driver manager will use that info is 
> when it needs to convert from wide to ansi and vice versa. So for 
> example if the driver only has the W functions, and the application 
> calls the A points, the the driver manager will call teh W function on 
> the applications behalf converting the data using the selected encoding.
>
> I am not sure about the MS driver, but the Easysoft SQL Server driver 
> has the ability to set the client side encoding, so the driver knows 
> 1. Client character set, 2. Server character set (from the server 
> info), so can convert correctly from 1 to 2, and back.
>



More information about the unixODBC-support mailing list