[unixODBC-support] Contention problem

Reza Taheri rtaheri at vmware.com
Fri Apr 7 15:19:55 BST 2017


I am running into an ODBC performance problem that looks like mutex contention and getting serialized behind some resource. We are experienced performance engineers, but not very experienced with odbc, which we have been using for a few years during the development of this TPC benchmark.

We use unixODBC-2.3.4, built with: ./configure --prefix=/usr --sysconfdir=/etc --enable-threads=yes --enable-drivers=yes --enable-driver-conf=yes

My config files have:

[root at w1-tpcv-vm-49 ~]# cat /etc/odbcinst.ini
# Example driver definitions

# Driver from the postgresql-odbc package
# Setup from the unixODBC package
[PostgreSQL]
Description   = ODBC for PostgreSQL
Driver        = /usr/lib/psqlodbcw.so
Setup         = /usr/lib/libodbcpsqlS.so
Driver64      = /usr/pgsql-9.3/lib/psqlodbcw.so
Setup64              = /usr/lib64/libodbcpsqlS.so
FileUsage     = 1
Threading     = 0
[root at w1-tpcv-vm-49 ~]# cat /etc/odbc.ini
[PSQL2]
Description     = PostgreSQL
Driver          = PostgreSQL
Database = tpcv
ServerName = w1-tpcv-vm-50
UserName = tpcv
Password = tpcv
Port            = 5432

We have plenty of CPU and disk resources available in our driver, client, and server VMs (all with high vCPU counts) on a dedicated server. As we increase the number of threads of execution in the benchmark, not only doesn’t throughput go up, it actually degrades. Naturally we chased the usual database performance problem suspects, but hit dead ends. We then divided the client workload between 4 separate VMs (all pointing to the same backend server), and got over 5X the throughput. Looking more closely at the client tier, below is what I see in the strace output when everything flows through one VM. The output is almost nothing but futex calls:
17:52:52.763616 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000076>
17:52:52.763738 futex(0x7fae463a9f00, FUTEX_WAIT_PRIVATE, 2, NULL)                                  = -1 EAGAIN (Resource temporarily unavailable) <0.000016>
17:52:52.763793 futex(0x7fae463a9f00, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000010>
17:52:52.763867 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)                                  = -1 EAGAIN (Resource temporarily unavailable) <0.000038>
17:52:52.763982 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000037>
17:52:52.764078 futex(0x7fae18000020, FUTEX_WAKE_PRIVATE, 1)                                        = 0 <0.000010>
17:52:52.764182 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)                                  = -1 EAGAIN (Resource temporarily unavailable) <0.000030>
17:52:52.764264 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000075>
17:52:52.764401 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000014>
17:52:52.764455 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)                                  = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
17:52:52.764507 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000025>
17:52:52.764579 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000010>
17:52:52.764821 sendto(227, "\x51\x00\x00\x00\x0b\x43\x4f\x4d\x4d\x49\x54\x00", 12, MSG_NOSIGNAL, NULL, 0) = 12 <0.000029>
17:52:52.764911 recvfrom(227, 0x7fae18058760, 4096, 16384, 0, 0)                                    = -1 EAGAIN (Resource temporarily unavailable) <0.000107>
17:52:52.765065 poll([{fd=227, events=POLLIN}], 1, 4294967295)                                      = 1 ([{fd=227, revents=POLLIN}]) <0.000017>
17:52:52.765185 recvfrom(227, "\x43\x00\x00\x00\x0b\x43\x4f\x4d\x4d\x49\x54\x00\x5a\x00\x00\x00\x05\x49", 4096, MSG_NOSIGNAL, NULL, NULL) = 18 <0.000018>
17:52:52.765258 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)                                  = 0 <0.000470>
17:52:52.765764 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)                                  = -1 EAGAIN (Resource temporarily unavailable) <0.000052>
17:52:52.765908 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)                                        = 1 <0.000073>

And perf top shows a similar picture:

    9.89%  [kernel]                         [k] _raw_spin_unlock_irqrestore
  4.86%  [kernel]                         [k] finish_task_switch
  3.53%  [kernel]                         [k] _raw_spin_lock
  3.00%  libodbc.so.2.0.0                 [.] __check_stmt_from_dbc_v
  2.80%  [kernel]                         [k] __do_softirq
  2.43%  psqlodbcw.so                     [.] 0x000000000003b146
  1.95%  libodbc.so.2.0.0                 [.] __validate_stmt
  1.93%  libc-2.17.so                     [.] _IO_vfscanf
  1.91%  libodbc.so.2.0.0                 [.] __validate_dbc
  1.90%  psqlodbcw.so                     [.] copy_and_convert_field
  1.80%  libc-2.17.so                     [.] _int_malloc
  1.58%  libc-2.17.so                     [.] malloc
  1.42%  libc-2.17.so                     [.] _int_free
  1.36%  libodbc.so.2.0.0                 [.] __release_desc

Any ideas what is going wrong?  This might be a red herring, but the response time degradation appears worse for transactions with a higher number of output parameters and more calls to SQLBindCol() and SQLFetch()

Thanks,
Reza

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.unixodbc.org/pipermail/unixodbc-support/attachments/20170407/9a73d817/attachment-0001.html>


More information about the unixODBC-support mailing list