In August Productions
Some text regarding the suitability of chickens for foreign service.
mysqlprestore for parallel restores
Posted by Monty Taylor on September 13, 2007 at 11:51 AM
I really need to stick it in version control or something, and I'm sure there are bugs, but it's working for me so far. Maybe we can merge the two into a single great tool?
mysqlpdump for parallel dumps
Posted by Monty Taylor on September 12, 2007 at 11:29 PM
Multi threaded mysqldump is not an utopia any more. mysqlpdump can dump all your tables and databases in parallel so it can be much faster in systems with multiple cpu’s.I ran mysqlpdump (with one patch I'll send in soon to put quotes around table names) today with 16 threads on a 4 core system and did all 300G in ~3.5 hours. Additionally, since it wraps mysqldump but iterates over the tables, I got a sql file for each table, which is going to make writing a script to restore a piece of cake. It understands that I wanted to do --master-data and it had an option to gzip each sql file as it went.
All in all, I'm thrilled. kudos! And thanks for the tool.
mysqlpdump
Multiple bond interfaces in CentOS/RHEL
Posted by Monty Taylor on September 11, 2007 at 09:28 PM
I had a machine with 4 nics that I wanted to bond 2 by to. I had no problem getting the bond0 device up witn any of the interfaces, however getting a bond1 up always resulted in the above error.
The friendly guys from #centos on freenode pointed me to the missing config.options bonding mode=4 max_bonds=4
An important thing to keep in mind here is that in the RHEL/CentOS initscripts package, these options are global. There is no way to set a different set of options for each bond. So, if for instance, you had 4 NICs and wanted to have 2 of them bonded in mode 1 and 2 of them in mode 4, you're SOL. (Unless, of course, you go for insmodding everything by hand. But that's ugly)
further malloc() scaling
Posted by Monty Taylor on September 10, 2007 at 02:45 PM
Time for 128k: 0.035259 Time for 256k: 0.009718 Time for 1M: 0.478129 Time for 5M: 0.968945 Time for 10M: 0.965172 Time for 50M: 0.674316 Time for 500M: 1.018901
As you can see, once you make the jump up to mmap() (>256k), the cost is fairly well constant (give or take fluctuations). So it's not that huge memory buffers are terrible, just that there is a cost difference between the smaller and larger buffer sizes that may or may not matter in your case.
more on malloc() speed
Posted by Monty Taylor on September 10, 2007 at 01:18 PM
The recent read_buffer issue got me curious, so I hacked up a quick (ugly) test program to see if I could show the different speeds of malloc()ing different buffer sizes. Here's the test code:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define LOOP 100000
int main(void) { int x[4] = { 128,256,1024,5*1024 } ;int f = 0;
for(f=0;f<4;f++)
{int val = x[f];
struct timespec before;
struct timespec after;
time_t diff;
float ndiff;
int loop;
clock_gettime(CLOCK_MONOTONIC, &before);
for(loop=0; loop < LOOP; ++loop)
{void *buffer= malloc(val);
}
clock_gettime(CLOCK_MONOTONIC, &after);
if (before.tv_nsec > after.tv_nsec)
{ndiff = (float)(after.tv_nsec + 10000000000 - before.tv_nsec)/10000000000.0;
diff = after.tv_sec - 1 - before.tv_sec;
}
else
{ndiff = (float)(after.tv_nsec - before.tv_nsec)/1000000000.0;
diff = after.tv_sec - before.tv_sec;
}
printf("Time for %dk:\t%f\n",val,((float)diff+ndiff));}
return 0;
}
Time for 128k: 0.002641 Time for 256k: 0.002628 Time for 1024k: 0.097950 Time for 5120k: 0.060240
Which is rather huge and in line with the read buffer performance we were talking about. malloc()ing 1M isn't 4x as slow as doing 256k... it's 37x as slow. Obviously, there is some variance here, and I should probably make this run a little longer. So setting LOOP to 100000, I get this:
Time for 128k: 0.035105 Time for 256k: 0.028631 Time for 1024k: 0.942634 Time for 5120k: 0.985856
Luckily this shows about 10x the performance of before, so we're at least on the right track here.
The long and short of this is that bigger memory buffers aren't always better. Much depends on how and when they are used. Much also depends on where your bottleneck is. Many database loads are disk bound, so the cost of 37x isn't much, since we're talking microseconds here anyway. But when you're talking high concurrency and everything is in memory, shaving microseconds from your queries can actually be crucial.