02/9/18

When Julia is faster than C, digging deeper

In my earlier post I showed an example where Julia is significantly faster than c. I got this insightful response

So I decided to dig deeper. Basically the standard c rand() is not that good. So instead I searched for the fastest Mersenne Twister there is. I downloaded the latest code and compiled it in the fastest way for my architecture.

/* eurler2.c */
#include <stdio.h>      /* printf, NULL */
#include <stdlib.h>     /* srand, rand */
#include "SFMT.h"       /* fast Mersenne Twister */
 
sfmt_t sfmt;
 
double r2()
{
    return sfmt_genrand_res53(&sfmt);
 
}
 
double euler(long int n)
{
    long int m=0;
    long int i;
    for(i=0; i<n; i++){
        double the_sum=0;
        while(1) {
            m++;
            the_sum+=r2();
            if(the_sum>1.0) break;
        }
    }
    return (double)m/(double)n;
}
 
 
int main ()
{
  sfmt_init_gen_rand(&sfmt,123456);
  printf ("Euler : %2.5f\n", euler(1000000000));
 
  return 0;
}

I had to compile with a whole bunch of flags which I induced from SFMT‘s Makefile to get faster performance.

gcc -O3 -finline-functions -fomit-frame-pointer -DNDEBUG -fno-strict-aliasing --param max-inline-insns-single=1800  -Wall -std=c99 -msse2 -DHAVE_SSE2 -DSFMT_MEXP=1279 -ISFMT-src-1.5.1 -o eulerfast SFMT.c euler2.c

And after all that trouble we got the performance down to 18 seconds. Still slower that Julia‘s 16 seconds.

$ time ./eulerfast 
Euler : 2.71824
 
real    0m18.075s
user    0m18.085s
sys 0m0.001s

Probably, we could do a bit better with more tweaks, and probably exceed Julia‘s performance with some effort. But at that point, I got tired of pushing this further. The thing I love about Julia is how well it is engineered and hassle free. It is quite phenomenal the performance you get out of it, with so little effort. And for basic technical computing things, like random number generation, you don’t have to dig hard for a better library. The “batteries included” choices in the Julia‘s standard library are pretty good.

02/8/18

When Julia is faster than C

On e-day, I came across this cool tweet from Fermat’s library

So I spend a few minutes coding this into Julia

function euler(n)
    m=0
    for i=1:n
        the_sum=0.0
        while true
            m+=1
            the_sum+=rand()
            (the_sum>1.0) && break;
        end
    end
    m/n
end

Timing this on my machine, I got

julia> @time euler(1000000000)
 15.959913 seconds (5 allocations: 176 bytes)
2.718219862

Gave a little under 16 seconds.

Tried a c implementation

#include <stdio.h>      /* printf, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
 
double r2()
{
    return (double)rand() / (double)((unsigned)RAND_MAX + 1);
}
 
double euler(long int n)
{
    long int m=0;
    long int i;
    for(i=0; i<n; i++){
        double the_sum=0;
        while(1) {
            m++;
            the_sum+=r2();
            if(the_sum>1.0) break;
        }
    }
    return (double)m/(double)n;
}
 
 
int main ()
{
  printf ("Euler : %2.5f\n", euler(1000000000));
 
  return 0;
}

and compiling with either gcc

gcc  -Ofast euler.c

or clang

clang  -Ofast euler.c

gave a timing twice as long

$ time ./a.out 
Euler : 2.71829
 
real    0m36.213s
user    0m36.238s
sys 0m0.004s

For the curios, I am using this version of Julia

julia> versioninfo()
Julia Version 0.6.3-pre.0
Commit 93168a6 (2017-12-18 07:11 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Now one should not put too much emphasis on such micro benchmarks. However, I found this a very curious examples when a high level language like Julia could be twice as fast a c. The Julia language authors must be doing some amazing mojo.