Posts Tagged ‘R’

Solving easy problems the hard way

There’s a charming little brain teaser that’s going around the Interwebs. It’s got various forms, but they all look something like this:
This problem can be solved by pre-school children in 5-10 minutes, by programer – in 1 hour, by people with higher education … well, check it yourself! 
8809=6
7111=0
2172=0
6666=4
1111=0
3213=0
7662=2
9313=1
0000=4
2222=0
3333=0
5555=0
8193=3
8096=5
7777=0
9999=4
7756=1
6855=3
9881=5
5531=0
2581=?
SPOILER ALERT…
The answer has to do with how many [...]

Fitting Distribution X to Data From Distribution Y

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed [...]

Shell scripting EC2 for fun and profit

Lately I’ve been doing some work with creating ad-hoc clusters of EC2 machines. My ultimate goal is to create a simple way to spin up a cluster of EC2 machines for use with Bryan Lewis’s very cool doRedis backend for the R foreach package. But that’s a whole other post. What I was scratching my [...]

Details of two-way sync between two Ubuntu machines

In a previous post I discussed my frustrations with trying to get Dropbox or Spideroak to perform BOTH encrypted remote backup and AND fast two way file syncing. This is the detail of how I set up for two machines, both Ubuntu 10.10, to perform two way sync where a file change on either machine [...]

Fast Two Way Sync in Ubuntu!

I love the portability of a laptop. I have a 45 min train ride twice a day and I fly a little too, so having my work with me on my laptop is very important. But I hate doing long running analytics on my laptop when I’m in the office because it bogs down my [...]

Where the heck has JD been?

It’s been pointed out to me that I haven’t had any blog posts in a while. It’s true. I’m fairly slack. But in the last few months I’ve changed jobs (same firm, new role), written an R abstraction on top of Hadoop, been to China, and managed to stay married. While that sounds pretty awesome, [...]

Controlling Amazon Web Services using rJava and the AWS Java SDK

I’ve been messing around with using Amazon Web Services for a while. I’ve had some projects where I wanted to upload files to S3 or fire off EMR jobs. I’ve been controlling AWS services using a hodgepodge of command line tools and the R system() function to call the tools from the command line. [...]

Connecting to SQL Server from R using RJDBC

A few months ago I switched my laptop from Windows to Ubuntu Linux. I had been connecting to my corporate SQL Server database using RODBC on Windows so I attempted to get ODBC connectivity up and running on Ubuntu. ODBC on Ubuntu turned into an exercise in futility. I spent many hours over many days [...]

Principal Component Analysis (PCA) vs Ordinary Least Squares (OLS): A Visual Explanation

Over at stats.stackexchange.com recently, a really interesting question was raised about principal component analysis (PCA). The gist was “Thanks to my college class I can do the math, but what does it MEAN?”
I felt like this a number of times in my life. Many of my classes were focused on the technical implementations they kinda [...]

Third, and Hopefully Final, Post on Correlated Random Normal Generation (Cholesky Edition)

When I did a brief post three days ago I had no plans on writing two more posts on correlated random number generation. But I’ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, Gappy, calls me out and says, “the way mensches do multivariate (log)normal [...]