Friday, January 23, 2015

Hadoop for Cassandra: CqlInputFormat != CqlPagingInputFormat != ColumnFamilyInputFormat


We haven't had cause to write a Hadoop job against Cassandra since the old days of thrift.  (since we introduced Elastic Search in our system)   But this week, we found ourselves needing to get some metrics on data stored in the actual C* tables.

I went to the documentation and found this page:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configHadoop.html

That page references:
"CQL partition input format: ColumnFamilyInputFormat class"

I was familiar with the ColumnFamilyInputFormat class from the old thrift days, and I was pretty sure that a new InputFormat was available that used CQL.  I headed over to the code, dropped down to the 2.0 branch and found this:
https://github.com/apache/cassandra/blob/cassandra-2.0/examples/hadoop_cql3_word_count/src/WordCount.java

Notice that WordCount.java imports:
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat

I went happily along my way and implemented the MapReduce code using this InputFormat, but the compiler kept complaining that CqlPagingInputFormat could not be found. After some investigation, it looks like that class was removed from cassandra-all, sometime between 2.0.3 and 2.0.11. See below:

➜  tusk  unzip -l /Users/bone/.m2/repository/org/apache/cassandra/cassandra-all/2.0.11/cassandra-all-2.0.11.jar | grep Cql | grep Input
     2882  10-21-14 16:31   org/apache/cassandra/hadoop/cql3/CqlInputFormat.class
➜  tusk  unzip -l /Users/bone/.m2/repository/org/apache/cassandra/cassandra-all/2.0.3/cassandra-all-2.0.3.jar | grep Cql | grep Input
     1359  11-22-13 08:56   org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat$1.class
     2875  11-22-13 08:56   org/apache/cassandra/hadoop/cql3/CqlPagingInputFormat.class

It looks like the crew is already addressing it: https://github.com/apache/cassandra/commit/e550ea60212e933f3849a11717ba4ef916fc4aa3

Hopefully no one else runs into this. ;)

Tuesday, October 21, 2014

Brew trouble w/ Yosemite (gcc47 and subversion)


Mileage may vary on this one, but after upgrading to Yosemite, I encountered two issues with brew.

The first was with gcc47.  After running, "brew upgrade".  It got stuck on the gcc47 formula, which error'd with:

==> Upgrading gcc47
gcc47: OS X Mavericks or older is required for stable.
Use `brew install devel or --HEAD` for newer.
Error: An unsatisfied requirement failed this build.

I decided it was better to move to gcc48 than to bother trying to get gcc47 working. To do that, just install gcc48 and remove gcc47 with:


➜  ~  brew install gcc48
➜  ~  brew uninstall gcc47
That fixed that.  Then, I ran into an issue with subversion, which relied on serf.  The failed to compile with:
#include <apr_pools .h="">
         ^
1 error generated.
scons: *** [context.o] Error 1
scons: building terminated because of errors.
There is an open issue with brew for that one. (https://github.com/Homebrew/homebrew/issues/33422)

But you can get around the issue by installing the command line tools for xcode with:

➜ xcode-select --install

If you run into the same issues, hopefully that fixes you.