[unixODBC-support] Shared lib crash on load when linked with -lodbc or dlopen called

Martin J. Evans martin.evans at easysoft.com
Thu Mar 3 09:17:07 GMT 2005


A couple of suggestions to look at:

o Is unixODBC built threaded with -D_REENTRANT and -pthread (the default) but
  your code is not? Reason I ask is that on some glibc systems I think the
  structure used by dlopen are different when built the two ways. This can
  causes crashes on unloading.

o what arguments are you passing to dlopen? RTLD_GLOBAL?
  You need to be careful what flags you pass in case there are some symbol
  clashes. e.g.

  your code defines MYFUNC
            |
            v
     you dlopen unixODBC
            |
            v
  unixODBC dlopens driver
  The driver needs MYFUNC in a dependent shared object fred.so
  where MYFUNC in fred.so has different arguments

  Depending on what flag you pass to dlopen the MYFUNC in the ODBC driver can
  get resolved in your code or in the dependent fred.so. Solaris has RTLD_GROUP
  for this which forces resolution downwards in the chain from where the
  unresolved symbol is found. dlopen on my Linux machine does not have
  RTLD_GROUP but I think you could see the effect above if you used RTLD_GLOBAL.

o Do any of the shared objects you are loading (or are loaded through
  dependencies) define _init or _fini entry points? The dynamic linker will
  call these on loading and unloading the libraries and they might contain
  something causing the problem you are seeing. If you use the debugging method
  below I thin you get to see which _init and _fini functions are called.

What you can do is:

export LD_DEBUG=files
export LD_DEBUG_OUTPUT=/tmp/ld.out
run your program

Afterwards /tmp/ld.out will contain debugging from the dynamic linker showing
you what files are loaded/unloaded etc. There are other options - just do:

LD_DEBUG=help ls

and you will see them.

Let us know what the problem was.

Martin
--
Martin J. Evans
Easysoft Ltd, UK
Development


On 03-Mar-2005 Zachary Bedell wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hello all!
> 
> I really hate to post this to the mailing list because I feel too much 
> like a begging newbie.  It seems like I must be missing something 
> obvious, but after three days of solid hacking, I'm still in the dark.  
> That said, I *am* a newbie when it comes to ODBC as this is only the 
> second project I've tried with unixODBC.  (The first project is working 
> pretty well now, so I know how to do something right.)
> 
> I'm working on porting some Windows ODBC code to run under Linux 
> (kernel 2.4, libc 2.3.4, gcc 3.3.4 on Gentoo with unixODBC 2.2.10).  
> The code is part of a shared object that is loaded as a plugin to the 
> Helix DNA server from RealNetworks.  The Windows version of this code 
> is known to work.  I thought this would be an easy port job, but I 
> guess that's what I get for thinking...
> 
> The Windows code uses LoadLibrary to load odbc32.dll and GetProcAddress 
> to get function pointers.  My first attempt was to convert the function 
> pointer calls to direct calls and link with -lodbc.  That compiled 
> fine, but as soon as Helix loaded the module, it would crash with 
> SIGCHLD.  Linking that without -lodbc didn't crash Helix, but it of 
> course complained as soon as it tried to access any of the ODBC 
> functions since the symbols didn't resolve.
> 
> I switched to manually calling dlopen to load the library and dlsym to 
> get pointers to the various functions.  I probably should have started 
> that way as it was much less work.  A few macros to map the parameters 
> from LoadLibrary, GetProcAddress, and FreeLibrary onto dlopen, dlsym, 
> and dlclose; and I was done.  That way, the library loads successfully 
> and properly queries the database.  Unfortunately right after Helix 
> deconstructs the class, it SIGCHLD's again.  I found that if I run with 
> the call to dlopen commented, the server does not crash.  Also, if I 
> comment out all the function bodies and ONLY call dlopen, then the 
> server still crashes after deconstructing the last instance of the 
> class.
> 
> I've managed to narrow down the original cause of my crashes to the act 
> of loading the unixODBC libodbc.so.  If I don't load the library, I 
> don't crash.  If I do load the library, even if I don't do anything 
> else, I still crash.  I've also tried using both FreeTDS and MyODBC as 
> the drivers with unixODBC.  The behavior's the same with both, so I'm 
> pretty confident that rules out FreeTDS as the cause.
> 
> I'll try to describe as much as I can about the environment that I'm 
> calling unixODBC from, in case any of these details are helpful:
> The code has a global void * that holds a handle returned from dlopen 
> and a number of global function pointers for the ODBC API functions.  
> The .cpp file for this object contains static functions (not class 
> members) that call dlopen and dlsym before the first time the ODBC 
> functions are needed.  The LoadODBCLibrary function is called each time 
> the database portion of the class is initialized, and each time it 
> checks to see if the global handle to the library is NULL.  If the 
> handle is non-NULL, it does nothing as some previous invocation has 
> already loaded the library.  I've confirmed (with printf's) that dlopen 
> is only called once.
> 
> The global function pointers survive multiple instantiations of the 
> class and dlclose is finally called when Helix sends a shutdown message 
> to the plugin (another static function defined by Helix's plugin API).  
> I've got the code filled up with printf's at the moment, and I've 
> confirmed that Helix is never sending the shutdown message and the 
> nothing is trying to call dlclose before the crash  (that's as it 
> should be since Helix would ordinarily keep running and continue using 
> the ODBC library).  I can also see that the object is created and 
> destroyed multiple times during Helix's startup routines without any 
> problem.  The catch is that those startup routines don't load the ODBC 
> library.  That happens later only when the database portion of the 
> class is init'ed.  Once the library IS loaded, then once all the 
> instances of the class are released, Helix crashes.  I've confirmed 
> using printf's in my code and with additional debugging message that 
> Helix produces that control has returned to Helix (outside of my 
> object) before it crashes.  The deconstructor method of my class 
> completes and no other functions in my code are called between the end 
> of the deconstructor and the crash.  There is a debug message from 
> Helix that announces it's deleting the Client object that was using my 
> code, and a few milliseconds later it crashes.  The deconstructor 
> doesn't call dlclose nor touch the function pointers at all.
> 
> Up until the crash, the ODBC library is functioning correctly.  It does 
> successfully query and return data from my MS SQL Server or MySQL.
> 
> 
> So I feel like a heel begging for help here.  The project in question 
> is only sort-of open source as you need to execute a clickwrap 
> agreement to get it.  That said, I'm totally clueless and would REALLY 
> appreciate any debugging pointers that might help me track down what's 
> going on here.  Is there anything magic I can do to debug what might be 
> happening in libodbc or elsewhere after my code is done?  Is there any 
> code in the ODBC libraries that could execute "on its own" after all of 
> the various handles are released?  I'll take anything at all!
> 
> Thanks in advance,
> Zac Bedell
> 



More information about the unixODBC-support mailing list