11/9/19

Evaluating Arbitrary Precision Integer Expressions in Julia using Metaprogramming

While watching the Mathologer masterclass on power sums

I came across a challenge to evaluate the following sum
\(1^{10}+2^{10}+\cdots+1000^{10}\)

This can be easily evaluated via brute force in Julia to yield

function power_cal(n)
  a=big(0)
  for i=1:n
    a+=big(i)^10
  end
  a 
end
 
julia> power_cal(1000)
91409924241424243424241924242500

Note the I had to use big to makes sure we are using BigInt in the computation. Without that, we would be quickly running in into an overflow issue and we will get a very wrong number.

In the comment section of the video I found a very elegant solution to the above sum, expressed as

(1/11) * 1000^11 + (1/2) * 1000^10 + (5/6) * 1000^9 – 1000^7 + 1000^5-(1/2) * 1000^3 + (5/66) * 1000 = 91409924241424243424241924242500

If I try to plug this into the Julia, I get

julia> (1/11) * 1000^11 + (1/2) * 1000^10 + (5/6) * 1000^9 - 1000^7 + 1000^5-(1/2) * 1000^3 + (5/66) * 1000
-6.740310541071357e18

This negative answer is not surprising at all, because we obviously ran into an overflow. We can, of course, go through that expression and modify all instances of Int64 with BigInt by wrapping it in the big function. But that would be cumbersome to do by hand.

The power of Metaprogramming

In Julia, metaprogramming allows you to write code that creates code, the idea here to manipulate the abstract syntax tree (AST) of a Julia expression. We start to by “quoting” our original mathematical expressing into a Julia expression. In the at form it is not evaluated yet, however we can always evaluate it via eval.

julia> ex1=:((1/11) * 1000^11 + (1/2) * 1000^10 + (5/6) * 1000^9 - 1000^7 + 1000^5-(1/2) * 1000^3 + (5/66) * 1000)
 
:((((((1 / 11) * 1000 ^ 11 + (1 / 2) * 1000 ^ 10 + (5 / 6) * 1000 ^ 9) - 1000 ^ 7) + 1000 ^ 5) - (1 / 2) * 1000 ^ 3) + (5 / 66) * 1000)
 
julia> dump(ex1)
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol +
    2: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol -
        2: Expr
          head: Symbol call
          args: Array{Any}((3,))
            1: Symbol +
            2: Expr
              head: Symbol call
              args: Array{Any}((3,))
                1: Symbol -
                2: Expr
                3: Expr
            3: Expr
              head: Symbol call
              args: Array{Any}((3,))
                1: Symbol ^
                2: Int64 1000
                3: Int64 5
 
 
julia> eval(ex1)
-6.740310541071357e18

The output of dump show the full AST in all its glory (…well almost the depth is a bit truncated). Notice that here all our numbers are interpreted as Int64.
Now we walk through the AST and change all occurrences of Int64 with BigInt by using the big function.

function makeIntBig!(ex::Expr)
  args=ex.args
  for i in eachindex(args)
       if args[i] isa Int64
          args[i]=big(args[i])
       end
       if args[i] isa Expr
          makeIntBig!(args[i])
       end
   end
end
 
julia> ex2=copy(ex1)
:((((((1 / 11) * 1000 ^ 11 + (1 / 2) * 1000 ^ 10 + (5 / 6) * 1000 ^ 9) - 1000 ^ 7) + 1000 ^ 5) - (1 / 2) * 1000 ^ 3) + (5 / 66) * 1000)
 
julia> makeIntBig!(ex2)
 
julia> eval(ex2)
9.14099242414242434242419242425000000000000000000000000000000000000000000000014e+31

We see an improvement here, but the results are not very satisfactory. The divisions yield BigFloat results, which had a tiny bit of floating point errors. Can we do better?
Julia has support for Rational expressions baked in. We can use that improve the results. We just need to search for call expressions the / symbol and replace it by the // symbol. For safety we just have to makes sure the operands are as subtype of Integer.

function makeIntBig!(ex::Expr)
    args=ex.args
    if ex.head == :call && args[1]==:/ && 
              length(args)==3 && 
              all(x->typeof(x) <: Integer,args[2:end]) 
        args[1]=://
        args[2]=big(args[2])
        args[3]=big(args[3])
    else 
        for i in eachindex(args)
           if args[i] isa Int64
              args[i]=big(args[i])
           end
           if args[i] isa Expr
              makeIntBig!(args[i])
           end
        end
    end
end
 
julia> ex2=copy(ex1);
 
julia> makeIntBig!(ex2)
 
julia> eval(ex2)
91409924241424243424241924242500//1

Now that is much better! We have not lost any precision and we ended us with a Rational expression.

Finally, we can build a macro so the if we run into such expressions in the future and we want to evaluate them, we could just conveniently call it.

macro eval_bigInt(ex)
   makeIntBig!(ex)
   quote # Removing the denominator if it is redundant 
      local x=$ex
      (x isa Rational && x.den==1) ? x.num : x 
   end
end

and we can now simply evaluate our original expression as

julia> @eval_bigInt (1/11) * 1000^11 + (1/2) * 1000^10 + (5/6) * 1000^9 - 1000^7 + 1000^5-(1/2) * 1000^3 + (5/66) * 1000
91409924241424243424241924242500

07/28/18

Exploring left truncatable primes

Recently I came across a fascinating Numberphile video on truncatable primes

I immediately thought it would be cool to whip a quick Julia code to get the full enumeration of all left truncatable primes, count the number of branches and also get the largest left truncatable prime.

using Primes
 
function get_left_primes(s::String)
    p_arr=Array{String,1}()
    for i=1:9
        number_s="$i$s"
        if isprime(parse(BigInt, number_s))
            push!(p_arr,number_s)
        end
    end
    p_arr
end
 
function get_all_left_primes(l)
    r_l= Array{String,1}()
    n_end_points=0
    for i in l
        new_l=get_left_primes(i)
        isempty(new_l) && (n_end_points+=1)
        append!(r_l,new_l)
        next_new_l,new_n=get_all_left_primes(new_l)
        n_end_points+=new_n # counting the chains
        append!(r_l,next_new_l)
    end
    r_l, n
end

The first function just prepends a number (expressed in String for convenience) and checks for it possible primes that can emerge from a single digit prepending. For example:

julia> get_left_primes("17")
2-element Array{String,1}:
 "317"
 "617"

The second function, just makes extensive use of the first to get all left truncatable primes and also count the number of branches.

julia> all_left_primes, n_branches=get_all_left_primes([""])
(String["2", "3", "5", "7", "13", "23", "43", "53", "73", "83""6435616333396997", "6633396997", "76633396997", "963396997", "16396997", "96396997", "616396997", "916396997", "396396997", "4396396997"], 1442)
 
julia> n_branches
1442
 
julia> all_left_primes
4260-element Array{String,1}:
 "2"
 "3"
 "5"
 "7"
 "13"
 "23""96396997"
 "616396997"
 "916396997"
 "396396997"
 "4396396997"

So we the full list of possible left truncatable primes with a length 4260. Also the total number of branches came to 1442.

We now get the largest left truncatable primes with the following one liner:

julia> largest_left_prime=length.(all_left_primes)|>indmax|> x->all_left_primes[x]
"357686312646216567629137"

After this fun exploration, I found an implementation in Julia for just getting the largest left truncatable prime for any base in Rosseta Code.

04/1/18

Iterating with Dates and Time in Julia

Julia has good documentation on dealing with Dates and Time, however that is often in the context constructing and Date and Time objects. In this post, I am focus on the ability to iterate over Dates and Times. This is very useful in countless application.

We start of by capturing this moment and moving ahead into the future

julia> using Dates
julia> this_moment=now()
2018-04-01T23:13:33.437

In one hour that will be

julia> this_moment+Dates.Hour(1)
2018-04-02T00:13:33.437

Notice that Julia was clever enough properly interpret that we will be on the in another day after exactly one hour. Thanks to it multiple dispatch of the DateTime type to be able to do TimeType period arithmatic.

You can then write a nice for loop that does something every four hours for the next two days.

julia> for t=this_moment:Dates.Hour(4):this_moment+Dates.Day(2)
    println(t)
    #or somethings special with that time
end
2018-04-01T23:13:33.437
2018-04-02T03:13:33.437
2018-04-02T07:13:33.437
2018-04-02T11:13:33.437
2018-04-02T15:13:33.437
2018-04-02T19:13:33.437
2018-04-02T23:13:33.437
2018-04-03T03:13:33.437
2018-04-03T07:13:33.437
2018-04-03T11:13:33.437
2018-04-03T15:13:33.437
2018-04-03T19:13:33.437
2018-04-03T23:13:33.437

Often we are not so interested in the full dates. For example if we are reading a video file and we want to get a frame every 5 seconds while using VideoIO.jl. We can deal here with the simpler Time type.

julia> video_start=Dates.Time(0,5,20)
00:05:20


Here we are interested in starting 5 minutes and 20 seconds into the video.
Now we can make a nice loop from the start to finish

for t=video_start:Dates.Second(5):video_start+Dates.Hour(2)
    h=Dates.Hour(t).value
    m=Dates.Minute(t).value
    s=Dates.Second(t).value
    ms=Dates.Millisecond(t).value
    # Do something interesting with ffmpeg seek on the video
end

02/9/18

When Julia is faster than C, digging deeper

In my earlier post I showed an example where Julia is significantly faster than c. I got this insightful response

So I decided to dig deeper. Basically the standard c rand() is not that good. So instead I searched for the fastest Mersenne Twister there is. I downloaded the latest code and compiled it in the fastest way for my architecture.

/* eurler2.c */
#include <stdio.h>      /* printf, NULL */
#include <stdlib.h>     /* srand, rand */
#include "SFMT.h"       /* fast Mersenne Twister */
 
sfmt_t sfmt;
 
double r2()
{
    return sfmt_genrand_res53(&sfmt);
 
}
 
double euler(long int n)
{
    long int m=0;
    long int i;
    for(i=0; i<n; i++){
        double the_sum=0;
        while(1) {
            m++;
            the_sum+=r2();
            if(the_sum>1.0) break;
        }
    }
    return (double)m/(double)n;
}
 
 
int main ()
{
  sfmt_init_gen_rand(&sfmt,123456);
  printf ("Euler : %2.5f\n", euler(1000000000));
 
  return 0;
}

I had to compile with a whole bunch of flags which I induced from SFMT‘s Makefile to get faster performance.

gcc -O3 -finline-functions -fomit-frame-pointer -DNDEBUG -fno-strict-aliasing --param max-inline-insns-single=1800  -Wall -std=c99 -msse2 -DHAVE_SSE2 -DSFMT_MEXP=1279 -ISFMT-src-1.5.1 -o eulerfast SFMT.c euler2.c

And after all that trouble we got the performance down to 18 seconds. Still slower that Julia‘s 16 seconds.

$ time ./eulerfast 
Euler : 2.71824
 
real    0m18.075s
user    0m18.085s
sys 0m0.001s

Probably, we could do a bit better with more tweaks, and probably exceed Julia‘s performance with some effort. But at that point, I got tired of pushing this further. The thing I love about Julia is how well it is engineered and hassle free. It is quite phenomenal the performance you get out of it, with so little effort. And for basic technical computing things, like random number generation, you don’t have to dig hard for a better library. The “batteries included” choices in the Julia‘s standard library are pretty good.

02/8/18

When Julia is faster than C

On e-day, I came across this cool tweet from Fermat’s library

So I spend a few minutes coding this into Julia

function euler(n)
    m=0
    for i=1:n
        the_sum=0.0
        while true
            m+=1
            the_sum+=rand()
            (the_sum>1.0) && break;
        end
    end
    m/n
end

Timing this on my machine, I got

julia> @time euler(1000000000)
 15.959913 seconds (5 allocations: 176 bytes)
2.718219862

Gave a little under 16 seconds.

Tried a c implementation

#include <stdio.h>      /* printf, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
 
double r2()
{
    return (double)rand() / (double)((unsigned)RAND_MAX + 1);
}
 
double euler(long int n)
{
    long int m=0;
    long int i;
    for(i=0; i<n; i++){
        double the_sum=0;
        while(1) {
            m++;
            the_sum+=r2();
            if(the_sum>1.0) break;
        }
    }
    return (double)m/(double)n;
}
 
 
int main ()
{
  printf ("Euler : %2.5f\n", euler(1000000000));
 
  return 0;
}

and compiling with either gcc

gcc  -Ofast euler.c

or clang

clang  -Ofast euler.c

gave a timing twice as long

$ time ./a.out 
Euler : 2.71829
 
real    0m36.213s
user    0m36.238s
sys 0m0.004s

For the curios, I am using this version of Julia

julia> versioninfo()
Julia Version 0.6.3-pre.0
Commit 93168a6 (2017-12-18 07:11 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Now one should not put too much emphasis on such micro benchmarks. However, I found this a very curious examples when a high level language like Julia could be twice as fast a c. The Julia language authors must be doing some amazing mojo.

08/6/17

Solving the code lock riddle with Julia

I came across a neat math puzzle involving counting the number of unique combinations in a hypothetical lock where digit order does not count. Before you continue, please watch at least the first minute of following video:

The rest of the video describes two related approaches for carrying out the counting. Often when I run into complex counting problems, I like to do a sanity check using brute force computation to make sure I have not missed anything. Julia is fantastic choice for doing such computation. It has C like speed, and with an expressiveness that rivals many other high level languages.

Without further ado, here is the Julia code I used to verify my solution the problem.

  1. function unique_combs(n=4)
  2.     pat_lookup=Dict{String,Bool}()
  3.     for i=0:10^n-1
  4.         d=digits(i,10,n) # The digits on an integer in an array with padding
  5.         ds=d |> sort |> join # putting the digits in a string after sorting
  6.         get(pat_lookup,ds,false) || (pat_lookup[ds]=true)
  7.     end
  8.     println("The number of unique digits is $(length(pat_lookup))")
  9. end

In line 2 we create a dictionary that we will be using to check if the number fits a previously seen pattern. The loop starting in line 3, examines all possible ordered combinations. The digits function in line 4 takes any integer and generate an array of its constituent digits. We generate the unique digit string in line 5 using pipes, by first sorting the integer array of digits and then combining them in a string. In line 6 we check if the pattern of digits was seen before and make use of quick short short-circuit evaluation to avoid an if-then statement.

07/14/17

Julia calling C: A more minimal example

Earlier I presented a minimal example of Julia calling C. It mimics how one would go about writing C code, wrapping it a library and then calling it from Julia. Today I came across and even more minimal way of doing that while reading an excellent blog on Julia’s syntactic loop fusion. Associated with the blog was notebook that explores the matter further.

Basically, you an write you C in a string and pass it directly to the compiler. It goes something like

using Libdl
C_code= raw"""
       double mean(double a, double b) {
         return (a+b) / 2;
       }
       """
const Clib=tempname()
open(`gcc -fPIC -O3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
     print(f, C_code)
end

The tempname function generate a unique temporary file path. On my Linux system Clib will be string like "/tmp/juliaivzRkT". That path is used to generate a library name "/tmp/juliaivzRkT.so" which will then used in the ccall:

meanc(a,b)=ccall((:mean,Clib),Float64,(Float64,Float64),a,b)
julia> meanc(3,4)
3.5

This approach would not be recommended if you are writing anything sophisticated in C. However, it is fun to experiment with for short bits of C code that you might like to call from Julia. Saves you the hassle of creating a Makefile, compiling, etc…

06/29/17

Solving the Fish Riddle with JuMP

Recently I came across a nice Ted-Ed video presenting a Fish Riddle.

I thought it would be fun to try solving it using Julia’s award winning JuMP package. Before we get started, please watch the above video-you might want to pause at 2:24 if you want to solve it yourself.

To attempt this problem in Julia, you will have to install the JuMP package.

julia> Pkg.add("JuMP")

JuMP provides an algebraic modeling language for dealing with mathematical optimization problems. Basically, that allows you to focus on describing your problem in a simple syntax and it would then take care of transforming that description in a form that can be handled by any number of solvers. Those solvers can deal with several types of optimization problems, and some solvers are more generic than others. It is important to pick the right solver for the problem that you are attempting.

The problem premises are:
1. There are 50 creatures in total. That includes sharks outside the tanks and fish
2. Each SECTOR has anywhere from 1 to 7 sharks, with no two sectors having the same number of sharks.
3. Each tank has an equal number of fish
4. In total, there are 13 or fewer tanks
5. SECTOR ALPHA has 2 sharks and 4 tanks
6. SECTOR BETA has 4 sharsk and 2 tanks
We want to find the number of tanks in sector GAMMA!

Here we identify the problem as mixed integer non-linear program (MINLP). We know that because the problem involves an integer number of fish tanks, sharks, and number of fish inside each tank. It also non-linear (quadratic to be exact) because it involves multiplying two two of the problem variables to get the total number or creatures. Looking at the table of solvers in the JuMP manual. pick the Bonmin solver from AmplNLWriter package. This is an open source solver, so installation should be hassle free.

julia> Pkg.add("AmplNLWriter")

We are now ready to write some code.

using JuMP, AmplNLWriter
 
# Solve model
m = Model(solver=BonminNLSolver())
 
# Number of fish in each tank
@variable(m, n>=1, Int)
 
# Number of sharks in each sector
@variable(m, s[i=1:3], Int)
 
# Number of tanks in each sector
@variable(m, nt[i=1:3]>=0, Int)
 
@constraints m begin
    # Constraint 2
    sharks[i=1:3], 1 <= s[i] <= 7
    numfish[i=1:3], 1 <= nt[i]
      # Missing uniqueness in restriction
    # Constraint 4
    sum(nt) <= 13
    # Constraint 5
    s[1] == 2
    nt[1] == 4
    # Constraint 6
    s[2] == 4
    nt[2] == 2
end
 
# Constraints 1 & 3
@NLconstraint(m, s[1]+s[2]+s[3]+n*(nt[1]+nt[2]+nt[3]) == 50)
 
# Solve it
status = solve(m)
 
sharks_in_each_sector=getvalue(s)
fish_in_each_tank=getvalue(n)
tanks_in_each_sector=getvalue(nt)
 
@printf("We have %d fishes in each tank.\n", fish_in_each_tank)
@printf("We have %d tanks in sector Gamma.\n",tanks_in_each_sector[3])
@printf("We have %d sharks in sector Gamma.\n",sharks_in_each_sector[3])

In that representation we could not capture the restriction that “no two sectors having the same number of sharks”. We end up with the following output:

We have 4 fishes in each tank.
We have 4 tanks in sector Gamma.
We have 4 sharks in sector Gamma.

Since the problem domain is limited, we can possible fix that by adding a constrain that force the number of sharks in sector Gamma to be greater than 4.

@constraint(m,s[3]>=5)

This will result in an answer that that does not violate any of the stated constraints.

We have 3 fishes in each tank.
We have 7 tanks in sector Gamma.
We have 5 sharks in sector Gamma.

However, this seems like a bit of kludge. The proper way go about it is represent the number of sharks in the each sector as binary array, with only one value set to 1.

# Number of sharks in each sector
@variable(m, s[i=1:3,j=1:7], Bin)

We will have to modify our constraint block accordingly

@constraints m begin
    # Constraint 2
    sharks[i=1:3], sum(s[i,:]) == 1
    u_sharks[j=1:7], sum(s[:,j]) <=1 # uniquness
    # Constraint 4
    sum(nt) <= 13
    # Constraint 5
    s[1,2] == 1
    nt[1] == 4
    # Constraint 6
    s[2,4] == 1
    nt[2] == 2
end

We invent a new variable array st to capture the number of sharks in each sector. This simply obtained by multiplying the binary array by the vector \([1,2,\ldots,7]^\top\)

@variable(m,st[i=1:3],Int)
@constraint(m, st.==s*collect(1:7))

We rewrite our last constraint as

# Constraints 1 & 3
@NLconstraint(m, st[1]+st[2]+st[3]+n*(nt[1]+nt[2]+nt[3]) == 50)

After the model has been solved, we extract our output for the number of sharks.

sharks_in_each_sector=getvalue(st)

…and we get the correct output.

This problem might have been an overkill for using a full blown mixed integer non-linear optimizer. It can be solved by a simple table as shown in the video. However, we might not alway find ourselves in such a fortunate position. We could have also use mixed integer quadratic programming solver such as Gurobi which would be more efficient for that sort of problem. Given the small problem size, efficiency hardly matters here.

06/12/17

Reading DataFrames with non-UTF8 encoding in Julia

Recently I ran into problem where I was trying to read a CSV files from a Scandinavian friend into a DataFrame. I was getting errors it could not properly parse the latin1 encoded names.

I tried running

using DataFrames
dataT=readtable("example.csv", encoding=:latin1)

but the got this error

ArgumentError: Argument 'encoding' only supports ':utf8' currently.

The solution make use of (StringEncodings.jl)[https://github.com/nalimilan/StringEncodings.jl] to wrap the file data stream before presenting it to the readtable function.

f=open("example.csv","r")
s=StringDecoder(f,"LATIN1", "UTF-8")
dataT=readtable(s)
close(s)
close(f)

The StringDecoder generates an IO stream that appears to be utf8 for the readtable function.

06/12/17

Tupper’s self-referential formula in Julia

I was surprised when I came across on Tupper’s formula on twitter. I felt the compulsion to implement it in Julia.

The formula is expressed as

\({1\over 2} < \left\lfloor \mathrm{mod}\left(\left\lfloor {y \over 17} \right\rfloor 2^{-17 \lfloor x \rfloor – \mathrm{mod}(\lfloor y\rfloor, 17)},2\right)\right\rfloor\)

and yields bitmap facsimile of itself.

In [1]:
k=big"960 939 379 918 958 884 971 672 962 127 852 754 715 004 339 660 129 306 651 505 519 271 702 802 395 266 424 689 642 842 174 350 718 121 267 153 782 770 623 355 993 237 280 874 144 307 891 325 963 941 337 723 487 857 735 749 823 926 629 715 517 173 716 995 165 232 890 538 221 612 403 238 855 866 184 013 235 585 136 048 828 693 337 902 491 454 229 288 667 081 096 184 496 091 705 183 454 067 827 731 551 705 405 381 627 380 967 602 565 625 016 981 482 083 418 783 163 849 115 590 225 610 003 652 351 370 343 874 461 848 378 737 238 198 224 849 863 465 033 159 410 054 974 700 593 138 339 226 497 249 461 751 545 728 366 702 369 745 461 014 655 997 933 798 537 483 143 786 841 806 593 422 227 898 388 722 980 000 748 404 719"
setprecision(BigFloat,10000);

In the above, the big integer is the magic number that lets us generate the image of the formula. I also need to setprecision of BigFloat to be very high, as rounding errors using the default precision does not get us the desired results. The implementation was inspired by the one in Python, but I see Julia a great deal more concise and clearer.

In [2]:
function tupper_field(k)
    field=Array{Bool}(17,106)
    for (ix,x) in enumerate(0.0:1:105.0), (iy,y) in enumerate(k:k+16)
        field[iy,107-ix]=1/2<floor(mod(floor(y/17)*2^(-17*floor(x)-mod(floor(y),17)),2))
    end
   field 
end
In [3]:
f=tupper_field(k);
using Images
img = colorview(Gray,.!f)
Out[3]:

I just inverted the boolean array here to get the desired bitmap output.