Archive for the ‘Uncategorized’ Category

Fitting Distribution X to Data From Distribution Y

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed [...]

Shell scripting EC2 for fun and profit

Lately I’ve been doing some work with creating ad-hoc clusters of EC2 machines. My ultimate goal is to create a simple way to spin up a cluster of EC2 machines for use with Bryan Lewis’s very cool doRedis backend for the R foreach package. But that’s a whole other post. What I was scratching my [...]

The best interview question I’ve ever been asked

In 2005 I was interviewing for a job as Risk Manager with Genworth Financial. I was working a gig up in Armonk, NY so I hopped a car to the GNW office and met with Mark Griffin, at that point the Chief Risk Office (CRO) for GNW. After some small talk, Mark asked me the [...]

Details of two-way sync between two Ubuntu machines

In a previous post I discussed my frustrations with trying to get Dropbox or Spideroak to perform BOTH encrypted remote backup and AND fast two way file syncing. This is the detail of how I set up for two machines, both Ubuntu 10.10, to perform two way sync where a file change on either machine [...]

Fast Two Way Sync in Ubuntu!

I love the portability of a laptop. I have a 45 min train ride twice a day and I fly a little too, so having my work with me on my laptop is very important. But I hate doing long running analytics on my laptop when I’m in the office because it bogs down my [...]

Where the heck has JD been?

It’s been pointed out to me that I haven’t had any blog posts in a while. It’s true. I’m fairly slack. But in the last few months I’ve changed jobs (same firm, new role), written an R abstraction on top of Hadoop, been to China, and managed to stay married. While that sounds pretty awesome, [...]

Controlling Amazon Web Services using rJava and the AWS Java SDK

I’ve been messing around with using Amazon Web Services for a while. I’ve had some projects where I wanted to upload files to S3 or fire off EMR jobs. I’ve been controlling AWS services using a hodgepodge of command line tools and the R system() function to call the tools from the command line. [...]

The O’Reilly Safari Books Online app broke my heart

I’m a huge O’Reilly Media fan boy. I can’t hide it. I hear Tim O’Reilly speak at conferences and I think to myself, “Screw being president, I want to be Tim O’Reilly.” I’ve been a subscriber to their online book services called Safari Books Online for years. Every month I see the bill for $43 [...]

Connecting to SQL Server from R using RJDBC

A few months ago I switched my laptop from Windows to Ubuntu Linux. I had been connecting to my corporate SQL Server database using RODBC on Windows so I attempted to get ODBC connectivity up and running on Ubuntu. ODBC on Ubuntu turned into an exercise in futility. I spent many hours over many days [...]

Principal Component Analysis (PCA) vs Ordinary Least Squares (OLS): A Visual Explanation

Over at stats.stackexchange.com recently, a really interesting question was raised about principal component analysis (PCA). The gist was “Thanks to my college class I can do the math, but what does it MEAN?”
I felt like this a number of times in my life. Many of my classes were focused on the technical implementations they kinda [...]