A little bit of untruthiness about MySQL and Threads

Curt Monash has an interesting post. As with everything on the web, I agree and disagree with various bits - which is one of the great things about blogging ... but this comment concerns me enough to respond, since it's not so much an opinion but incorrect technical information...
As for your distinction between too many connections and memory pools — lack of memory is the reason for limits on connections.
Lack of memory is actually _NOT_ the reason for limits on connections in a well tuned MySQL installation. MySQL is a multi-threaded application and as such in its current implementation allocates a thread to a connection. In the standard web hosting platform, which is Linux, there is a point at which Linux itself can't actually deal with the threads effectively, thus getting you into an overloaded run queue situation. The answer to high-traffic and high concurrency situations is actually often to lower the number of connections to decrease the amount of time linux spends managing the thread queue and thus lowering your individual thread service time. People fight this a lot, since they think "I'm maxing out my connections, I should increase the setting", but it is quite easy to demonstrate.
There is a patch coming up to decouple the two which should help the few people out there who cannot, for whatever reason, manage a sensible number of connections. But it is very rare that I see a client for whom this "problem" can't be managed or fixed by simple education.
In any case, the issue is almost never (I say almost because you can always grossly misconfigure anything) memory related.
What’s more, my web host, who to my knowledge doesn’t handle any terribly busy sites, finds that MySQL will at times eat up all his RAM even so.
In that case, I would suggest that you tell your web host to learn how to configure MySQL properly. MySQL will only eat up all his RAM if he configures it to do so. If he isn't sure how to do that, I'd be happy to help him out!
I would also like to add that there is often an unfair double standard applied to MySQL as far as this goes. No one seems to doubt that a well functioning Oracle system needs a team of well-trained Oracle DBAs to tune and run. But many times when I suggest that a problem can be solved by learning a little more about how MySQL works, people tell me that I have just some how pointed out a deficiency in MySQL. Try running an top-10 web property with 100+ Oracle databases with a team of 1 DBA and see how long it remains running. I'd be happy to take the challenge of running the same thing on MySQL.
Technorati Tags:
0 comments
Tags: performance

further malloc() scaling

As a quick follow up to the last post on malloc() speed comparisons, I decided to extend the size out a bit to cover malloc()ing up to 500M.
Time for        128k:   0.035259
Time for        256k:   0.009718
Time for        1M:     0.478129
Time for        5M:     0.968945
Time for        10M:    0.965172
Time for        50M:    0.674316
Time for        500M:   1.018901

As you can see, once you make the jump up to mmap() (>256k), the cost is fairly well constant (give or take fluctuations). So it's not that huge memory buffers are terrible, just that there is a cost difference between the smaller and larger buffer sizes that may or may not matter in your case.
0 comments
Tags: performance

more on malloc() speed

The recent read_buffer issue got me curious, so I hacked up a quick (ugly) test program to see if I could show the different speeds of malloc()ing different buffer sizes. Here's the test code:

 

 

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define LOOP 100000

int main(void) {

  int x[4] = { 128,256,1024,5*1024 } ;
  int f = 0;
  for(f=0;f<4;f++)
  {
    int val = x[f];
    struct timespec before;
    struct timespec after;
    time_t diff;
    float ndiff;
    int loop;

    clock_gettime(CLOCK_MONOTONIC, &before);
    for(loop=0; loop < LOOP; ++loop)
    {
      void *buffer= malloc(val);
    }
    clock_gettime(CLOCK_MONOTONIC, &after);

    if (before.tv_nsec > after.tv_nsec)
    {
      ndiff = (float)(after.tv_nsec + 10000000000 - before.tv_nsec)/10000000000.0;
      diff = after.tv_sec - 1 - before.tv_sec;
    }
    else
    {
      ndiff = (float)(after.tv_nsec - before.tv_nsec)/1000000000.0;
      diff = after.tv_sec - before.tv_sec;
    }
    printf("Time for %dk:\t%f\n",val,((float)diff+ndiff));
  }
  return 0;
}

And the results.

 

Time for 128k:  0.002641 Time for 256k:  0.002628 Time for 1024k: 0.097950 Time for 5120k: 0.060240 

Which is rather huge and in line with the read buffer performance we were talking about. malloc()ing 1M isn't 4x as slow as doing 256k... it's 37x as slow. Obviously, there is some variance here, and I should probably make this run a little longer. So setting LOOP to 100000, I get this:
Time for 128k:  0.035105 Time for 256k:  0.028631 Time for 1024k: 0.942634 Time for 5120k: 0.985856 

Luckily this shows about 10x the performance of before, so we're at least on the right track here.
The long and short of this is that bigger memory buffers aren't always better. Much depends on how and when they are used. Much also depends on where your bottleneck is. Many database loads are disk bound, so the cost of 37x isn't much, since we're talking microseconds here anyway. But when you're talking high concurrency and everything is in memory, shaving microseconds from your queries can actually be crucial.
7 comments
Tags: performance

Read Buffer performance hit

I had some fun yesterday with some odd performance problems. So I did a run with oprofile and got this:
561612   25.0417  /lib64/tls/libc-2.3.4.so memset
429457   19.1491  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page
214268    9.5540  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_page_fault
144293    6.4339  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_no_page
94410     4.2097  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux buffered_rmqueue
64998     2.8982  /lib64/tls/libc-2.3.4.so memcpy
59565     2.6559  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __down_read_trylock
59369     2.6472  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux handle_mm_fault
47312     2.1096  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux free_hot_cold_page
39161     1.7462  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux release_pages
39140     1.7452  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux unmap_vmas
27200     1.2128  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __alloc_pages
21520     0.9596  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __pagevec_lru_add_active
20772     0.9262  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_add_anon_rmap
19852     0.8852  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux page_remove_rmap
16424     0.7323  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux bad_range
11993     0.5348  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep
Which seemed to me like an awful lot of page faults, especially on a system that wasn't swapping at all.
Alexander and I got to talking about it, and I learned a few new things. The first thing is that mmap() appears as page_faults, because the machinery for doing mmap() and for doing swap is actually the same. Fun.
Back to the problem at hand... the following was in the my.cnf: read_buffer=5M As I understand it, if read_buffer set to over 256K, it uses mmap() instead of malloc() for memory allocation. Actually - this is a libc malloc thing and is tunable, but defaults to 256k. From the manual:
Each thread that does a sequential scan allocates a buffer of this size (in bytes) for each table it scans.
In this case, the server is pretty much handling click logging, so all of the queries are pure inserts. Why then would the read buffer get used at all?
Well, it just so happens that this is a PHP app and is not using any sort of persistent connections, so the app is connecting and disconnecting 60 - 100 times a second. Each of those connections is going to have to authenticate against (at least) the mysql.user table. On the other hand, that _should_ be cached, so it shouldn't be a problem. However, lowering the read buffer is quite easy ... and it'll be interesting.
So, theory to the fire - I reduced read_buffer to 128k (the default) and then got this output from oprofile:
2220     15.7760  /lib64/tls/libc-2.3.4.so memset
785       5.5785  /lib64/tls/libc-2.3.4.so memcpy
608       4.3206  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux thread_return
398       2.8283  /usr/libexec/mysqld      yyparse(void*)
337       2.3948  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux system_call
234       1.6629  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_gettimeoffset_hpet
197       1.3999  /lib64/tls/libc-2.3.4.so _int_malloc
183       1.3005  /lib64/tls/libpthread-2.3.4.so pthread_mutex_lock
181       1.2862  /lib64/tls/libpthread-2.3.4.so pthread_mutex_unlock
181       1.2862  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page
176       1.2507  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux wake_futex
143       1.0162  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux sys_tgkill
138       0.9807  /usr/libexec/mysqld      yylex(void*, void*)
130       0.9238  /usr/libexec/mysqld      update_sys_var_str(sys_var_str*, _pthread_rwlock_t*, set_var*)
126       0.8954  /jbd                     (no symbols)
123       0.8741  /usr/libexec/mysqld      my_strntoul_ucs2
106       0.7533  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __might_sleep
106       0.7533  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_futex
104       0.7391  /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux tcp_recvmsg
Which is more what we would expect.
The results on the system as a whole were that with read buffer set to 5M, extra 3-5% usr and 6-12% sys cpu time taken on an 8 cpu box (as reported by top) or, to look at it another way, about 100% of a single CPU taken by mysqld.
With read_buffer reduced to 128k, the CPU usage dropped to just about nothing.
3 comments
Tags: performance

OProfile, Kernel Images and InnoDB (oh my!)

Kristian Köhntopp has a wonderful article about using oprofile to track down problems in running programs. I thought I'd add a few thoughts.
If you need to get a vmlinux kernel on redhat, apparently you just need to install kernel-debuginfo, which will provide a vmlinux image you can profile against.
If you are using debian, unfortunately there is no package I could find to allow you to get a vmlinux. so what I did was: ($kver isn't a real variable - it's your kernel version. tab completion probably comes in handy at some point)
  1. Install linux-tree-$kver - which gets you the debian kernel sources
  2. Unpack the tar.bz2 file that is now in /usr/src
  3. Copy /boot/config-$kver to /usr/src/linux-$kver/.config
  4. cd /usr/src/linux-$kver
  5. make oldconfig
  6. make prepare
  7. make vmlinux
  8. cp vmlinux /boot
And now I get things that look like this:
322451    5.7214  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  cmp_dtuple_rec_with_match
304963    5.4111  /lib/libpthread-0.60.so                                                                            __pthread_rwlock_rdlock_internal
275748    4.8927  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  my_lengthsp_8bit
275503    4.8884  /boot/vmlinux                                                                                       try_to_wake_up
221124    3.9235  /lib/libpthread-0.60.so                                                                            __pthread_rwlock_wrlock_internal
188777    3.3496  /boot/vmlinux                                                                                       futex_wait
170249    3.0208  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  btr_search_guess_on_hash
144030    2.5556  /lib/libc-2.3.2.so                                                                                   _wordcopy_bwd_aligned
131726    2.3373  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  rec_get_nth_field
125462    2.2261  /lib/libpthread-0.60.so                                                                            __pthread_mutex_unlock_internal
122389    2.1716  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  safe_mutex_lock
116276    2.0631  /boot/vmlinux                                                                                       futex_wake
115654    2.0521  /usr/local/mysql-debug-4.1.22-unknown-linux-gnu-x86_64-glibc23/bin/mysqld  safe_mutex_unlock

Which - btw - shows what your system looks like when it's experiencing the InnoDB thread concurrency bug.
0 comments
Tags: performance