Thursday, June 21, 2012

Indexing JSON in Cassandra


Cassandra has native indexing capabilities, but those capabilities only work if the values stored in your columns are the values you want indexed.  If the data is structured in some way (e.g. JSON or Protobuf), it is impossible to leverage Cassandra's indexing capabilities out of the box.

A while back we had a discussion on the dev list regarding indexing attributes stored in columns in Cassandra.  I think Jermiah Jordan had the best description of the problem.  Lately, we found ourselves with the same problem so we ended up building it into our cassandra-indexing extension.

Our cassandra-indexing module, is an AOP-based extension to Cassandra.  Drop the jar file in the Cassandra lib directory, update the start script to include the extension and the aspect will take care of indexing information as you mutate the data in Cassandra.

To support indexing a specific field within the column value, we added the ability for a user to specify a field to index (not just a column name).  We then parse the JSON document as it is written in the mutation, extract the values and create entries in the wide-row index for that column family.

Specifics for the new JSON-based indexing configuration can be found on the wiki.  Admittedly, we are still light on documentation.  If you have any trouble using the new capability, let us know.

Tuesday, June 19, 2012

Cassandra as an integral part of a Big Data platform


There is more to a successful Big Data platform than simply deploying a NoSQL database.  That database needs to integrate with the rest of your architecture, and nowadays that often requires a polyglot approach to persistence. 

As part of the release of the new Apache Cassandra RefCard, we did an interview on Cassandra's role in a complete Big Data platform.  The interview covers the challenges of and solutions for integrating NoSQL into a Big Data enterprise solution.



Monday, June 18, 2012

Apache Cassandra Cheat Sheet / RefCard


Months in the making, we just published the Apache Cassandra RefCard!
http://refcardz.dzone.com/refcardz/apache-cassandra

Thanks to all that helped out.

Specific shout outs to:
David Strauss @ Pantheon
Kris Hahn @ Datastax
and Jonathan Ellis @ Datastax

thanks guys.