Choosing an SQL Engine for Analytics

I’ve been struggling for a while on which database to use for my working data. I used to use MS Access quite a lot. The problems with MS Access include but are not limited to:

  • 2 GB file size limit, at least historically
  • Versions change with each edition of MS Office
  • Sort of tough to write SQL scripts
  • Very little automation, ie compression, backup, etc.
  • Windows only

I used Oracle for a few years as a result of my previous employer being an Oracle shop. I then switched to SQL Server when I changed jobs. A full blown client/server DB really does not make a lot of sense for much of what I do. I don’t run a transactional data store. I don’t need to have dozens of users hooked to the DB. And I do sometimes need access to my data when I am not hooked to the mother-ship. So I could run the free version of SQL Server on my laptop or run MySQL on my laptop, but both of these options rub me the wrong way. Why? I do a lot of data analysis in R which is RAM intensive. Running a DB server on my laptop means that some fraction of my RAM is going to be taken up by the db server software which is hanging out waiting for me to throw requests at it. I could manually hack around this by starting the server before I load data and then killing it after the data is loaded. That’s just too big of a pain in my rectum. Oh yeah, one more design requirement: I want to be able to push the whole DB out to a storage blob at Amazon and pound on it using EC2 machines, running Linux. Plus I am cheap and don’t want to pay a lot.

I’ll probably end up with a model where I keep some master data sets on a client/server DB and then I will replicate chunks of that to my laptop into my serverless db. I’ll probably also put output from my desktop db back into the server after analytic work is  done.

I knew about SQLite because of an interview with its author, Richard Hipp on FLOSS Weekly. There’s also a video of Hipp talking at the Googleplex. I wish that guy was my neighbor. He seems like the type of guy who would shovel your walk for you then apologize for not doing it perfectly by sending over homemade cookies. Unrelated to the cookies, I really like that SQLite is weakly typed.  I’m a free spirit like that.

I did some digging for SQLite alternatives and came up with some stuff at StackOverflow. You can read the post but it reminded me of Firebird. I’m immediately drawn to FireBird since their logo looks so dang much like the Ruger logo:

downloads

fb-facts

But is Firebird able to be run severless?  If I have to install a server then I would just as well run MySQL.

Berkeley DB seems like another option worth investigating, although I am not sure if I can use it without really embedding it in another program the way that I can with SQLite.

SQLite gets bonus points for having native R drivers meaning that I don’t have to go through a connector technology like ODBC. This is important enough that I should probably make that a requirement. I think Berekley DB has support in R as well. I know for a fact that writing back to SQL Server through the R ODBC package (RODBC) is like pushing a car with a rope, but only slower. Plus I don’t know how to make ODBC work on Linux. Not rocket science, I am sure, but still one more thing I would have to learn before I do that which I am paid to do.

I’m going to do some testing, but it looks like I should test real life performance of SQLlite and Firebird with my data.  More to come on this, I am sure.

3 Comments

  1. J DeLong says:

    I just read the intro to the Chris Lewis book SQLite published by Sams. In the intro by Richard Hipp. Here’s the final paragraph:

    “I have formally dedicated my contributions to SQLite to the public domain, and I insist that other contributors do likewise. (I keep a file of signed copyright releases in the fire safe at my office.) This means that there are no legal restrictions on copying SQLite. My name appears nowhere in the code. You are free to do with SQLite whatever you want. I hope that you find SQLite to be helpful and that you will use it for good. You are welcome to use SQLite for commercial purposes and if you do, I wish you a good profit. Regardless of how you use it, I hope the fact that you have received SQLite freely will inspire you to give something back in return—not to me personally but to the community or to the world as a whole. Finally, and most importantly, I encourage you to find forgiveness for your own soul and to forgive others in return. Be at peace.”

    How can you not like this guy? Geeeesh.

  2. [...] I have been struggling for what type of database to use for my analytics work. SQL Server is a really good database but I always get the feeling it [...]

  3. to domain name says:

    hi guys…

    hi guysI would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well and i have start my own blog now, , thanks for your effort…

Leave a Reply