There are a couple of very good sites (kevinweil, hadoop-gpl-compression) about how to set up LZO compression on Hadoop 0.20 so I won’t go on detail here. Just wanted to mention a couple of problems I had due to be using Snow Leopard 10.6.
First install the LZO libraries for Mac OS X from the tarball:
tar -xzf lzo-2.04.tar.gz
env CFLAGS=”-arch x86_64″ ./configure –build=x86_64-darwin –enable-shared –disable-asm –prefix=/path_to_lzo-2.04/
make; make install
The problem comes now. You have to compile the LZO compression for Hadoop:
env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/ C_INCLUDE_PATH=/path_to_lzo-2.04/include LIBRARY_PATH=/path_to_lzo-2.04/lib CFLAGS=”-arch x86_64″ ant clean compile-native test tar
This is because the soft link that points to the java headers on Snow Leopard is broken and the headers are nowhere. To get the proper ones you have to install the Mac OS X development tools which contains them. Then you just have to remove the broken soft link and create the correct one:
ln -s /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/inclue -> /Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Headers
Once compiled you’ll have the jar file and the compiled LZO libraries (which you’ll have to add to the JAVA_LIBRARY_PATH on the hadoop-env.sh path) in ~./kevinweil-hadoop-lzo-0e70051/build/native/Mac_OS_X-x86_64-64
Hadoop native libraries are needed too. The default compiled ones doesn’t work on Mac OS X so you will have to compile them too. If the headers are properly set, no problems should happen:
env JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home CFLAGS=”-arch x86_64″ LDFLAGS=-L/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Libraries ANT_OPTS=-d64 ant clean compile-native
The compiled native libraries (you’ll have to add them to the JAVA_LIBRARY_PATH too) will be created in ~./hadoop-0.20.2+737/build/native/Mac_OS_X-x86_64-64/lib
Now just have to set properly the libraries paths and configure Hadoop to use the LZO compression as explained on the mentioned sites. Then, you’re done!



I wish more people would write sites like this that are actually interesting to read. With all the fluff floating around on the internet, it is a great change of pace to read a site like yours instead.