Home | Looking for something? Sign In | New here? Sign Up | Log out

Dec 7, 2010

SQL for the NoSQL

But there is one slight issue with NoSQL. You can’t use SQL.

This is, unfortunately for the time being, a non-starter in certain business domains. For instance, I recently designed a data mart for a large financial services company. A non-relational, schema-less datastore made a lot of sense here — the client wanted extreme flexibly (that is, as the data mart evolved, they wanted to add new sources of data — i.e. more columns) and they wanted speed (both speed of development and speed of reads). The resultant data model turned out to be highly de-normalized and was, in many respects, an obvious candidate for a non-relational, aka NoSQL, datastore. There was one requirement, however, that made the decision to use a NoSQL datastore quite difficult to make: ad hoc queries. Ad hoc queries performed by non-technical people otherwise known as financial analysts.

SQL is the lingua franca of the business analyst. SQL is easy. SQL has been around for an eternity. SQL is taught in many courses at both the undergraduate and graduate level. There are myriad books on SQL. SQL is what a lot of people use to mine data at institutions the world over (because Oracle, IBM’s DB2, and Microsoft SQLServer (to name a few) have been installed and used for years at these places — entire infrastructures have been built around them).

As a result, moving to a NoSQL datastore effectively means that the ad hoc ease at which data was previously queried goes away. True, one can build bridges to translate native protocols (i.e. MongoDB‘s JSON) to SQL, but, that involves a cost; what’s more, you can train business analysts to use a native protocol (warning: an MBA doesn’t imply ease at learning MapReduce). Alas, in some cases, these costs outweigh an immediate benefit: an existing option in the form of relational databases.

This frustration of essential data lock-in with existing NoSQL datastores in certain business scenarios lead me to explore building a JDBC bridge, specifically for Amazon’s SimpleDB, dubbed SimpleJDBC.

SimpleJDBC is a JDBC driver for SimpleDB that supports basic SQL inserts, selects, updates, and deletes with the standard JDBC interfaces you’ve worked with for years. What’s more, PreparedStatments are also supported (although, there isn’t any performance boost as of yet). All that’s required to use the driver is an Amazon Web Services account and Java 5.

At this point, the driver isn’t 100% ANSI SQL compliant — there are features inherent in SimpleDB that don’t translate directly into SQL and vice versa. For example, joins aren’t supported (yet) due to SimpleDB’s lack of joins; that is, you can only query one domain at a time. Other common SQL clauses like order by, having, union for instance, aren’t supported either (but count(*) is supported).

Nevertheless, a simple insert in SimpleDB can be achieved as follows:

Class.forName("com.beacon50.jdbc.aws.SimpleDBDriver");
Connection conn = DriverManager
.getConnection("jdbc:simpledb://sdb.amazonaws.com", prop);
Statement st = conn.createStatement();
String insert = "INSERT INTO users (name, age) VALUES ('Ann Smith', 33)";
int val = st.executeUpdate(insert);

In the above code, prop is a Properties object, which must hold AWS credentials —

Properties prop = new Properties();
prop.setProperty("secretKey", ...);
prop.setProperty("accessKey", ...);

PreparedStatements also work:

PreparedStatement pstmt = conn
.prepareStatement("INSERT INTO users (name, age) VALUES (?, ?)");
pstmt.setString(1, "Anny Smith");
pstmt.setInt(2, 33);
int val = pstmt.executeUpdate();

In both inserts from above, if an ID/id (i.e. “INSERT INTO users (id, name, age) VALUES (?, ?, ?)“) property isn’t provided in the insert statement, SimpleJDBC will automatically create it; thus, the record in SimpleDB for each insert will have a UUID generated for the underlying Item (in SimpleDB speak: the item’s name).

SQL selects work much the same — like with any other JDBC driver, a ResultSet is returned and you can obtain various types via their respective get methods:

String qry = "select * from users where name = 'Joe Smith'";
Statement st = conn.createStatement();
ResultSet rs = st.executeQuery(qry);

while (rs.next()) {
String age = rs.getString("age");
int iage = rs.getInt("age");
//....
}

Under the covers, SimpleJDBC encodes numbers (due to SimpleDB’s simple datatype: strings) and decodes numbers upon selects.

The intent of SimpleJDBC is to make it possible to use the myriad tools available today that rely on JDBC for functionality. For instance, with SimpleJDBC, ETL processes that either move data into or out of SimpleDB can be realized (for example with Scriptella). What’s more, now data that is in SimpleDB can be easily retrieved. Once again, ad hoc queries are possible. The analyst can still mine data as needed (to the extent that their mining efforts conform to what the driver supports!).

Label