more on malloc() speed

The recent read_buffer issue got me curious, so I hacked up a quick (ugly) test program to see if I could show the different speeds of malloc()ing different buffer sizes. Here's the test code:

 

 

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define LOOP 100000

int main(void) {

  int x[4] = { 128,256,1024,5*1024 } ;
  int f = 0;
  for(f=0;f<4;f++)
  {
    int val = x[f];
    struct timespec before;
    struct timespec after;
    time_t diff;
    float ndiff;
    int loop;

    clock_gettime(CLOCK_MONOTONIC, &before);
    for(loop=0; loop < LOOP; ++loop)
    {
      void *buffer= malloc(val);
    }
    clock_gettime(CLOCK_MONOTONIC, &after);

    if (before.tv_nsec > after.tv_nsec)
    {
      ndiff = (float)(after.tv_nsec + 10000000000 - before.tv_nsec)/10000000000.0;
      diff = after.tv_sec - 1 - before.tv_sec;
    }
    else
    {
      ndiff = (float)(after.tv_nsec - before.tv_nsec)/1000000000.0;
      diff = after.tv_sec - before.tv_sec;
    }
    printf("Time for %dk:\t%f\n",val,((float)diff+ndiff));
  }
  return 0;
}

And the results.

 

Time for 128k:  0.002641 Time for 256k:  0.002628 Time for 1024k: 0.097950 Time for 5120k: 0.060240 

Which is rather huge and in line with the read buffer performance we were talking about. malloc()ing 1M isn't 4x as slow as doing 256k... it's 37x as slow. Obviously, there is some variance here, and I should probably make this run a little longer. So setting LOOP to 100000, I get this:
Time for 128k:  0.035105 Time for 256k:  0.028631 Time for 1024k: 0.942634 Time for 5120k: 0.985856 

Luckily this shows about 10x the performance of before, so we're at least on the right track here.
The long and short of this is that bigger memory buffers aren't always better. Much depends on how and when they are used. Much also depends on where your bottleneck is. Many database loads are disk bound, so the cost of 37x isn't much, since we're talking microseconds here anyway. But when you're talking high concurrency and everything is in memory, shaving microseconds from your queries can actually be crucial.