07/14/17

Julia calling C: A more minimal example

Earlier I presented a minimal example of Julia calling C. It mimics how one would go about writing C code, wrapping it a library and then calling it from Julia. Today I came across and even more minimal way of doing that while reading an excellent blog on Julia’s syntactic loop fusion. Associated with the blog was notebook that explores the matter further.

Basically, you an write you C in a string and pass it directly to the compiler. It goes something like

C_code= """
double mean(double a, double b) {
return (a+b) / 2;
}
"""
const Clib=tempname()
open(gcc -fPIC -O3 -xc -shared -o $(Clib * "." * Libdl.dlext) -, "w") do f print(f, C_code) end The tempname function generate a unique temporary file path. On my Linux system Clib will be string like "/tmp/juliaivzRkT". That path is used to generate a library name "/tmp/juliaivzRkT.so" which will then used in the ccall: julia> x=ccall((:mean,Clib),Float64,(Float64,Float64),2.0,5.0) 3.5 This approach would be be recommended if are writing anything sophisticated in C. However, it fun to experiment with for short bits of C code that you might like to call from Julia. Saves you the hassle of creating a Makefile, compiling, etc… 06/29/17 Solving the Fish Riddle with JuMP Recently I came across a nice Ted-Ed video presenting a Fish Riddle. I thought it would be fun to try solving it using Julia’s award winning JuMP package. Before we get started, please watch the above video-you might want to pause at 2:24 if you want to solve it yourself. To attempt this problem in Julia, you will have to install the JuMP package. julia> Pkg.add("JuMP") JuMP provides an algebraic modeling language for dealing with mathematical optimization problems. Basically, that allows you to focus on describing your problem in a simple syntax and it would then take care of transforming that description in a form that can be handled by any number of solvers. Those solvers can deal with several types of optimization problems, and some solvers are more generic than others. It is important to pick the right solver for the problem that you are attempting. The problem premises are: 1. There are 50 creatures in total. That includes sharks outside the tanks and fish 2. Each SECTOR has anywhere from 1 to 7 sharks, with no two sectors having the same number of sharks. 3. Each tank has an equal number of fish 4. In total, there are 13 or fewer tanks 5. SECTOR ALPHA has 2 sharks and 4 tanks 6. SECTOR BETA has 4 sharsk and 2 tanks We want to find the number of tanks in sector GAMMA! Here we identify the problem as mixed integer non-linear program (MINLP). We know that because the problem involves an integer number of fish tanks, sharks, and number of fish inside each tank. It also non-linear (quadratic to be exact) because it involves multiplying two two of the problem variables to get the total number or creatures. Looking at the table of solvers in the JuMP manual. pick the Bonmin solver from AmplNLWriter package. This is an open source solver, so installation should be hassle free. julia> Pkg.add("AmplNLWriter") We are now ready to write some code. using JuMP, AmplNLWriter # Solve model m = Model(solver=BonminNLSolver()) # Number of fish in each tank @variable(m, n>=1, Int) # Number of sharks in each sector @variable(m, s[i=1:3], Int) # Number of tanks in each sector @variable(m, nt[i=1:3]>=0, Int) @constraints m begin # Constraint 2 sharks[i=1:3], 1 <= s[i] <= 7 numfish[i=1:3], 1 <= nt[i] # Missing uniqueness in restriction # Constraint 4 sum(nt) <= 13 # Constraint 5 s[1] == 2 nt[1] == 4 # Constraint 6 s[2] == 4 nt[2] == 2 end # Constraints 1 & 3 @NLconstraint(m, s[1]+s[2]+s[3]+n*(nt[1]+nt[2]+nt[3]) == 50) # Solve it status = solve(m) sharks_in_each_sector=getvalue(s) fish_in_each_tank=getvalue(n) tanks_in_each_sector=getvalue(nt) @printf("We have %d fishes in each tank.\n", fish_in_each_tank) @printf("We have %d tanks in sector Gamma.\n",tanks_in_each_sector[3]) @printf("We have %d sharks in sector Gamma.\n",sharks_in_each_sector[3]) In that representation we could not capture the restriction that “no two sectors having the same number of sharks”. We end up with the following output: We have 4 fishes in each tank. We have 4 tanks in sector Gamma. We have 4 sharks in sector Gamma. Since the problem domain is limited, we can possible fix that by adding a constrain that force the number of sharks in sector Gamma to be greater than 4. @constraint(m,s[3]>=5) This will result in an answer that that does not violate any of the stated constraints. We have 3 fishes in each tank. We have 7 tanks in sector Gamma. We have 5 sharks in sector Gamma. However, this seems like a bit of kludge. The proper way go about it is represent the number of sharks in the each sector as binary array, with only one value set to 1. # Number of sharks in each sector @variable(m, s[i=1:3,j=1:7], Bin) We will have to modify our constraint block accordingly @constraints m begin # Constraint 2 sharks[i=1:3], sum(s[i,:]) == 1 u_sharks[j=1:7], sum(s[:,j]) <=1 # uniquness # Constraint 4 sum(nt) <= 13 # Constraint 5 s[1,2] == 1 nt[1] == 4 # Constraint 6 s[2,4] == 1 nt[2] == 2 end We invent a new variable array st to capture the number of sharks in each sector. This simply obtained by multiplying the binary array by the vector $[1,2,\ldots,7]^\top$ @variable(m,st[i=1:3],Int) @constraint(m, st.==s*collect(1:7)) We rewrite our last constraint as # Constraints 1 & 3 @NLconstraint(m, st[1]+st[2]+st[3]+n*(nt[1]+nt[2]+nt[3]) == 50) After the model has been solved, we extract our output for the number of sharks. sharks_in_each_sector=getvalue(st) …and we get the correct output. This problem might have been an overkill for using a full blown mixed integer non-linear optimizer. It can be solved by a simple table as shown in the video. However, we might not alway find ourselves in such a fortunate position. We could have also use mixed integer quadratic programming solver such as Gurobi which would be more efficient for that sort of problem. Given the small problem size, efficiency hardly matters here. 06/12/17 Reading DataFrames with non-UTF8 encoding in Julia Recently I ran into problem where I was trying to read a CSV files from a Scandinavian friend into a DataFrame. I was getting errors it could not properly parse the latin1 encoded names. I tried running using DataFrames dataT=readtable("example.csv", encoding=:latin1) but the got this error ArgumentError: Argument 'encoding' only supports ':utf8' currently. The solution make use of (StringEncodings.jl)[https://github.com/nalimilan/StringEncodings.jl] to wrap the file data stream before presenting it to the readtable function. f=open("example.csv","r") s=StringDecoder(f,"LATIN1", "UTF-8") dataT=readtable(s) close(s) close(f) The StringDecoder generates an IO stream that appears to be utf8 for the readtable function. 06/12/17 Tupper’s self-referential formula in Julia I was surprised when I came across on Tupper’s formula on twitter. I felt the compulsion to implement it in Julia. The formula is expressed as ${1\over 2} < \left\lfloor \mathrm{mod}\left(\left\lfloor {y \over 17} \right\rfloor 2^{-17 \lfloor x \rfloor - \mathrm{mod}(\lfloor y\rfloor, 17)},2\right)\right\rfloor$ and yields bitmap facsimile of itself. In [1]: k=big"960 939 379 918 958 884 971 672 962 127 852 754 715 004 339 660 129 306 651 505 519 271 702 802 395 266 424 689 642 842 174 350 718 121 267 153 782 770 623 355 993 237 280 874 144 307 891 325 963 941 337 723 487 857 735 749 823 926 629 715 517 173 716 995 165 232 890 538 221 612 403 238 855 866 184 013 235 585 136 048 828 693 337 902 491 454 229 288 667 081 096 184 496 091 705 183 454 067 827 731 551 705 405 381 627 380 967 602 565 625 016 981 482 083 418 783 163 849 115 590 225 610 003 652 351 370 343 874 461 848 378 737 238 198 224 849 863 465 033 159 410 054 974 700 593 138 339 226 497 249 461 751 545 728 366 702 369 745 461 014 655 997 933 798 537 483 143 786 841 806 593 422 227 898 388 722 980 000 748 404 719" setprecision(BigFloat,10000);  In the above, the big integer is the magic number that lets us generate the image of the formula. I also need to setprecision of BigFloat to be very high, as rounding errors using the default precision does not get us the desired results. The implementation was inspired by the one in Python, but I see Julia a great deal more concise and clearer. In [2]: function tupper_field(k) field=Array{Bool}(17,106) for (ix,x) in enumerate(0.0:1:105.0), (iy,y) in enumerate(k:k+16) field[iy,107-ix]=1/2<floor(mod(floor(y/17)*2^(-17*floor(x)-mod(floor(y),17)),2)) end field end  In [3]: f=tupper_field(k); using Images img = colorview(Gray,.!f)  Out[3]: I just inverted the boolean array here to get the desired bitmap output. 05/17/17 Exploring Fibonacci Fractions with Julia Recently, I came across a fascinating blog and video from Mind your Decisions. It is about how a fraction $\frac{1}{999,999,999,999,999,999,999,998,,999,999,999,999,999,999,999,999}$ would show the Fibonacci numbers in order when looking at its decimal output. On a spreadsheet and most standard programming languages, such output can not be attained due to the limited precision for floating point numbers. If you try this on R or Python, you will get an output of 1e-48. Wolfram alpha,however,allows arbitrary precision. In Julia by default we get a little better that R and Python julia> 1/999999999999999999999998999999999999999999999999 1.000000000000000000000001000000000000000000000002000000000000000000000003000002e-48 julia> typeof(ans) BigFloat We observe here that we are getting the first few Fibonacci numbers $1, 1, 2, 3$. We need more precision to get more numbers. Julia has arbitrary precision arithmetic baked into the language. We can crank up the precision of the BigFloat type on demand. Of course, the higher the precision, the slower the computation and the greater the memory we use. We do that by setprecision. julia> setprecision(BigFloat,10000) 10000 Reevaluating, we get julia> 1/999999999999999999999998999999999999999999999999 1.00000000000000000000000100000000000000000000000200000000000000000000000300000000000000000000000500000000000000000000000800000000000000000000001300000000000000000000002100000000000000000000003400000000000000000000005500000000000000000000008900000000000000000000014400000000000000000000023300000000000000000000037700000000000000000000061000000000000000000000098700000000000000000000159700000000000000000000258400000000000000000000418100000000000000000000676500000000000000000001094600000000000000000001771100000000000000000002865700000000000000000004636800000000000000000007502500000000000000000012139300000000000000000019641800000000000000000031781100000000000000000051422900000000000000000083204000000000000000000134626900000000000000000217830900000000000000000352457800000000000000000570288700000000000000000922746500000000000000001493035200000000000000002415781700000000000000003908816900000000000000006324598600000000000000010233415500000000000000016558014100000000000000026791429600000000000000043349443700000000000000070140873300000000000000113490317000000000000000183631190300000000000000297121507300000000000000480752697600000000000000777874204900000000000001258626902500000000000002036501107400000000000003295128009900000000000005331629117300000000000008626757127200000000000013958386244500000000000022585143371700000000000036543529616200000000000059128672987900000000000095672202604100000000000154800875592000000000000250473078196100000000000405273953788100000000000655747031984200000000001061020985772300000000001716768017756500000000002777789003528800000000004494557021285300000000007272346024814100000000011766903046099400000000019039249070913500000000030806152117012900000000049845401187926400000000080651553304939300000000130496954492865700000000211148507797805000000000341645462290670700000000552793970088475700000000894439432379146400000001447233402467622100000002341672834846768500000003788906237314390600000006130579072161159100000009919485309475549700000016050064381636708800000025969549691112258500000042019614072748967300000067989163763861225800000110008777836610193100000177997941600471418900000288006719437081612000000466004661037553030900000754011380474634642900001220016041512187673800001974027421986822316700003194043463499009990500005168070885485832307200008362114348984842297700013530185234470674604900021892299583455516902600035422484817926191507500057314784401381708410100092737269219307899917600150052053620689608327700242789322839997508245300392841376460687116573000635630699300684624818301028472075761371741391301664102775062056366209602692574850823428107600904356677625885484473810507049252476708912581411411405930102594397055221918455182579303309636633329861112681897706691855248316295261201016328488578177407943098723020343826493703204299739348832404671111147398462369176231164814351698201718008635835925499096664087184867000739850794865805193502836665349891529892378369837405200686395697571872674070550577925589950242511475751264321287522115185546301842246877472357697022053e-48 That is looking much better. However it we be nice if we could extract the Fibonacci numbers that are buried in that long decimal. Using the approach in the original blog. We define a function y(x)=one(x)-x-x^2 and calculate the decimal a=1/y(big"1e-24") Here we use the non-standard string literal big"..." to insure proper interpretation of our input. Using BigFloat(1e-24)) would first construct at floating point with limited precision and then do the conversion. The initial loss of precision will not be recovered in the conversion, and hence the use of big. Now we extract our Fibonacci numbers by this function function extract_fib(a) x=string(a) l=2 fi=BigInt[] push!(fi,1) for i=1:div(length(x)-24,24) j=parse(BigInt,x[l+1+(i-1)*24:l+i*24]) push!(fi,j) end fi end Here we first convert our very long decimal number of a string and they we exploit the fact the Fibonacci numbers occur in blocks that 24 digits in length. We get out output in an array of BigInt. We want to compare the output with exact Fibonacci numbers, we just do a quick and non-recursive implementation. function fib(n) f=Vector{typeof(n)}(n+1) f[1]=f[2]=1; for i=3:n+1 f[i]=f[i-1]+f[i-2] end f end Now we compare… fib_exact=fib(200); fib_frac=extract_fib(a); for i in eachindex(fib_frac) println(fib_exact[i], " ", fib_exact[i]-fib_frac[i]) end We get a long sequence, we just focused here on when the discrepancy happens. ... 184551825793033096366333 0 298611126818977066918552 0 483162952612010163284885 0 781774079430987230203437 -1 1264937032042997393488322 999999999999999999999998 2046711111473984623691759 1999999999999999999999997 ... The output shows that just before the extracted Fibonacci number exceeds 24 digits, a discrepancy occurs. I am not quite sure why, but this was a fun exploration. Julia allows me to do mathematical explorations that would take one or even two orders of magnitude of effort to do in any other language. 03/27/17 Using pipes while running external programs in Julia Recently I was using Julia to run ffprobe to get the length of a video file. The trouble was the ffprobe was dumping its output to stderr and I wanted to take that output and run it through grep. From a bash shell one would typically run: ffprobe somefile.mkv 2>&1 |grep Duration This would result in an output like  Duration: 00:04:44.94, start: 0.000000, bitrate: 128 kb/s This works because we used 2>&1 to redirect stderr to stdout which would in be piped to grep. If you were try to run this in Julia julia> run(ffprobe somefile.mkv 2>&1 |grep Duration) you will get errors. Julia does not like pipes | inside the backticks command (for very sensible reasons). Instead you should be using Julia’s pipeline command. Also the redirection 2>&1 will not work. So instead, the best thing to use is and instance of Pipe. This was not in the manual. I stumbled upon it in an issue discussion on GitHub. So a good why to do what I am after is to run. julia> p=Pipe() Pipe(uninit => uninit, 0 bytes waiting) julia> run(pipeline(ffprobe -i somefile.mkv,stderr=p)) This would create a pipe object p that is then used to capture stderr after the execution of the command. Next we need to close the input end of the pipe. julia> close(p.in) Finally we can use the pipe with grep to filter the output. julia> readstring(pipeline(p,grep Duration)) " Duration: 00:04:44.94, start: 0.000000, bitrate: 128 kb/s\n" We can then do a little regex magic to get the duration we are after. julia> matchall(r"(\d{2}:\d{2}:\d{2}.\d{2})",ans)[1] "00:04:44.94" 01/13/17 Kaperkar’s Constant I was recently introduced to Kaperkar’s Constant. It is quite magical. You take any four digit number $A$, sort the digits from highest to lowest to create a new number $A^{\text{high}}$, sort the digits from lowest to highest to get $A^{\text{low}}$, and calculate and new number $A= A^{\text{high}}- A^{\text{low}}$. You repeat this procedure enough times and you end up with $A=6174$. I made a nifty implementation of that in Julia below. 09/25/16 Julia calling C: A minimal example This blog is a “Hello World” example of Julia calling C. We start of by at bit of C code we want to call from Julia. We write the following in calc_mean.c double mean(double a, double b) { return (a+b) / 2; } To build the library, we need to create a Makefile CC=gcc CFLAGS=-c -Wall -fPIC SOURCES=calc_mean.c OBJECTS=$(SOURCES:.c=.o)

.c.o:
$(CC)$(CFLAGS) $< -o$@

lib: $(OBJECTS)$(CC) -shared -fPIC -o libmean.so \$(OBJECTS)

clean:
rm *.o *.so

The option fPIC and -shared are essential for Julia to be able to resolve the function in our library. Now we are almost ready to build our library. From the bash terminal we invoke:

make lib

This will generate a libmean.so file.

In Julia we call the function in our c library by

x=ccall((:mean,"libmean"),Float64,(Float64,Float64),2.0,5.0)
println(x)
3.5

For this to work,

• Julia must be running either on the same path where libmean.so resides,
• the path to libmean.so is in LD_LIBRARY_PATH, or
• the path to the library is pushed to Libdl.DL_LOAD_PATH via

push!(Libdl.DL_LOAD_PATH,"path_to_libmean.so")

P.S. Thanks to Christopher Rackauckas for tips on Julia highlighting.