Reading DataFrames with non-UTF8 encoding in Julia

Recently I ran into problem where I was trying to read a CSV files from a Scandinavian friend into a DataFrame. I was getting errors it could not properly parse the latin1 encoded names.

I tried running

using DataFrames
dataT=readtable("example.csv", encoding=:latin1)

but the got this error

ArgumentError: Argument 'encoding' only supports ':utf8' currently.

The solution make use of (StringEncodings.jl)[] to wrap the file data stream before presenting it to the readtable function.

s=StringDecoder(f,"LATIN1", "UTF-8")

The StringDecoder generates an IO stream that appears to be utf8 for the readtable function.

Leave a Reply