<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cerebral Mastication</title>
	<atom:link href="http://www.cerebralmastication.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cerebralmastication.com</link>
	<description>Something to Chew On</description>
	<lastBuildDate>Thu, 02 Sep 2010 18:47:24 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Third, and Hopefully Final, Post on Correlated Random Normal Generation (Cholesky Edition)</title>
		<link>http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/</link>
		<comments>http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 18:03:21 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[risk]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=824</guid>
		<description><![CDATA[When I did a brief post three days ago I had no plans on writing two more posts on correlated random number generation. But I&#8217;ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, Gappy, calls me out and says, &#8220;the way mensches do multivariate (log)normal [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_825" class="wp-caption alignleft" style="width: 260px"><a href="http://www.sabix.org/bulletin/b39/vie.html" onclick="pageTracker._trackPageview('/outgoing/www.sabix.org/bulletin/b39/vie.html?referer=');"><img class="size-medium wp-image-825 " title="39-cholesky" src="http://www.cerebralmastication.com/wp-content/uploads/2010/09/39-cholesky-250x300.jpg" alt="" width="250" height="300" /></a><p class="wp-caption-text">André-Louis Cholesky is my homeboy</p></div>
<p>When I did a <a href="http://www.cerebralmastication.com/2010/08/stochastic-simulation-with-copulas-in-r/">brief post three days ago</a> I had no plans on writing two more posts on correlated random number generation. But I&#8217;ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, <a href="http://www.cerebralmastication.com/2010/08/stochastic-simulation-with-copulas-in-r/comment-page-1/#comment-5068">Gappy, calls me out</a> and says, &#8220;the way mensches do multivariate (log)normal variates is via Cholesky. It’s simple, instructive, and fast.&#8221;  And I think we&#8217;re all smart enough to read through Mr. Gappy&#8217;s comment and see that he&#8217;s saying I&#8217;m a complicated, opaque, and slow, גוי‎. My wife called and said his list would be more accurate if he added &#8216;emotionally detached.&#8217; I have no idea what she means.</p>
<p>At any rate, in response to Gappy&#8217;s comment, here is the third verse (same as the first). The crux of the change is the following lines:</p>
<pre>
<blockquote>

# shift the mean of ourData to zero
ourData0 &lt;- as.data.frame(sweep(ourData,2,colMeans(ourData),"-"))

#Cholesky Decomposition of the covariance matrix
C &lt;- chol(nearPD(cov(ourData0))$mat)

#create a matrix of random standard normals
Z &lt;- matrix(rnorm(n * ncol(ourData)), ncol(ourData))

#multiply the standard normals by the transpose of the Cholesky
X &lt;- t(C) %*% Z

myDraws &lt;- data.frame(as.matrix(t(X)))
names(myDraws) &lt;- names(ourData)

# we still need to shift the means of the samples.

# shift the mean of the draws over to match the starting data
myDraws &lt;- as.data.frame(sweep(myDraws,2,colMeans(ourData),"+"))
</blockquote>
</pre>
<p><em><strong>Edit: </strong>When I first publishes this example, I didn&#8217;t shift the means prior to taking the cov(). I&#8217;ve sense corrected that.  Also thanks to @fdaapproved on Twitter who pointed out that I can replace the loop above with myDraws &lt;- as.data.frame(sweep(t(X),2,colMeans(ourData),&#8221;+&#8221;))</em></p>
<p>This method, which uses Cholesky decomposition, is how I initially learned to create correlated random draws. I think this method is comparable to the mvrnorm() method. mvrnorm() is handy because it wraps everything above in one single line of code. But the above method is reliant only on the Matrix package and that&#8217;s only for the nearPD() function. If you are familiar with the guts of the mvrnorm() function and the chol() function, I&#8217;d love for you to comment on any technical differences. I looked briefly at the code for both and quickly realized my matrix math was rusty enough that it was going to take a while for me to sort through the code.</p>
<p>If you want the whole script you can find it embedded below <a href="http://gist.github.com/562567" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/562567?referer=');">and on Github</a>.</p>
<script src="http://gist.github.com/562567.js"></script>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/09/cholesk-post-on-correlated-random-normal-generation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Even Simpler Multivariate Correlated Simulations</title>
		<link>http://www.cerebralmastication.com/2010/08/even-simpler-multivariate-correlated-simulations/</link>
		<comments>http://www.cerebralmastication.com/2010/08/even-simpler-multivariate-correlated-simulations/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 15:17:27 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[risk]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=804</guid>
		<description><![CDATA[So after yesterday&#8217;s post on Simple Simulation using Copulas I got a very nice email that basically begged the question, &#8220;Dude, why are you making this so hard?&#8221; The author pointed out that if what I really want is a Gaussian correlation structure for Gaussian distributions then I could simply use the mvrnorm() function from [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/08/Screenshot-Untitled-Window-3.png"><img class="alignleft size-full wp-image-803" title="mvrnorm example" src="http://www.cerebralmastication.com/wp-content/uploads/2010/08/Screenshot-Untitled-Window-3.png" alt="" width="341" height="221" /></a>So after yesterday&#8217;s post on <a href="http://www.cerebralmastication.com/2010/08/stochastic-simulation-with-copulas-in-r/">Simple Simulation using Copulas</a> I got a very nice email that basically begged the question, &#8220;Dude, why are you making this so hard?&#8221; The author pointed out that if what I really want is a Gaussian correlation structure for Gaussian distributions then I could simply use the mvrnorm() function from the MASS package. Well I did a quick</p>
<blockquote><p>?mvrnorm</p></blockquote>
<p>and, I&#8217;ll be damned, he&#8217;s right! The advantage of using a copula is the ability to simulate correlation structures where the correlation is different for different levels of values. So that gives the flexibility to make the tails of the distributions more correlated, for example. But my example yesterday was purposefully simple&#8230; so simple that a copula was not even needed.</p>
<p>After creating my sample data all I really needed to do was this:</p>
<blockquote><p>myDraws &lt;- mvrnorm(1e5, mu=mean(ourData), Sigma=cov(ourData))</p></blockquote>
<p>So I  took my example from yesterday and updated it using the mvrnorm() code and, as is my custom, put up a <a href="http://gist.github.com/559082" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/559082?referer=');">Github gist.</a> The code is embedded below as well. I added a little ggplot2 code at the end that will create a facet plot of the 4 distributions showing the shape of the distributions of both the starting data and the simulated data. The plot in the upper left of this post is the ggplot output.</p>
<p><em><strong>EDIT: </strong></em>The email hipping me to this was sent by <a href="http://dirk.eddelbuettel.com" onclick="pageTracker._trackPageview('/outgoing/dirk.eddelbuettel.com?referer=');">Dirk Eddelbuettel</a> who&#8217;s been very helpful to me more times than I can count. I had omitted his name initially. However after confirming with Dirk, he told me it was OK to mention him by name in this post.</p>
<script src="http://gist.github.com/559082.js"></script>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/08/even-simpler-multivariate-correlated-simulations/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Stochastic Simulation With Copulas in R</title>
		<link>http://www.cerebralmastication.com/2010/08/stochastic-simulation-with-copulas-in-r/</link>
		<comments>http://www.cerebralmastication.com/2010/08/stochastic-simulation-with-copulas-in-r/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 20:12:34 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[risk]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=782</guid>
		<description><![CDATA[A friend of mine gave me a call last week and was wondering if I had a little R code that could illustrate how to do a Cholesky decomposition. He ultimately wanted to build a Monte Carlo model with correlated variables. I pointed him to a number of packages that do Cholesky decomp but then [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cafepress.com/+ringer_t,350602392" onclick="pageTracker._trackPageview('/outgoing/www.cafepress.com/+ringer_t_350602392?referer=');"><img class="alignleft size-full wp-image-792" style="margin: 5px; border: 2px solid black;" title="econModels" src="http://www.cerebralmastication.com/wp-content/uploads/2010/08/econModels.jpg" alt="You know we do! " width="206" height="162" /></a>A friend of mine gave me a call last week and was wondering if I had a little R code that could illustrate how to do a <a href="http://en.wikipedia.org/wiki/Cholesky_decomposition" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Cholesky_decomposition?referer=');">Cholesky decomposition</a>. He ultimately wanted to build a Monte Carlo model with correlated variables. I pointed him to a number of packages that do Cholesky decomp but then I recommended he consider just using a Gaussian <a href="http://en.wikipedia.org/wiki/Copula_%28statistics%29" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Copula_28statistics_29?referer=');">Copula </a> and R for the whole simulation. For most of my copula needs in R, I use the <a href="http://cran.r-project.org/web/packages/QRMlib/index.html" onclick="pageTracker._trackPageview('/outgoing/cran.r-project.org/web/packages/QRMlib/index.html?referer=');">QRMlib package</a> which is a code companion to the book <a href="http://www.amazon.com/gp/product/0691122555?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0691122555" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0691122555?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=0691122555&amp;referer=');"><span style="text-decoration: underline;">Quantitative Risk Management: Concepts, Techniques and Tools</span></a> by Alexander J. McNeil, Rudiger Frey and Paul Embrechts. The book is only loosely coupled (pun intended) with the code in the QRMlib package. I really wish the book had been written with code examples and tight linkage between the book and the code. Of course I&#8217;m the type of guy who prefers code snip-its to mathematical notation.</p>
<p>I had some code where I used the QRMlib package, but it was really messy and fairly specific to my use case. So I whipped up very simple example of how to create correlated random draws from a multivariate distribution. In this example I used normally distributed marginals and Gaussian correlation to keep things simple and easy to follow. Rather than blogging through the code, I added a shit load (metric ass ton, if you&#8217;re in Canada) of comments. The code is designed to be stepped through. So don&#8217;t just run the whole blob and wonder what happened.</p>
<p>Walk through the code and if you find any errors be sure and let me know.</p>
<p>The code is embedded in a Github gist below, but if you are reading this in an aggregator (shout out to <a href="http://www.r-bloggers.com/" onclick="pageTracker._trackPageview('/outgoing/www.r-bloggers.com/?referer=');">R-Bloggers</a>) you&#8217;ll need to <a href="http://gist.github.com/557900" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/557900?referer=');">manually go to the gist</a>.</p>
<script src="http://gist.github.com/557900.js"></script>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/08/stochastic-simulation-with-copulas-in-r/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Give Away Something Then Sell Something</title>
		<link>http://www.cerebralmastication.com/2010/08/give-away-something-then-sell-something/</link>
		<comments>http://www.cerebralmastication.com/2010/08/give-away-something-then-sell-something/#comments</comments>
		<pubDate>Thu, 19 Aug 2010 15:02:06 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hardware]]></category>
		<category><![CDATA[strategy]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=774</guid>
		<description><![CDATA[My wife and I bought a foreclosed house a few months ago. This house had been part of mortgage fraud and we bought it at auction. Interesting life experience, to say the least. The finished basement was built with radiant heat tubing poured into the concrete. These pipes are designed to be hooked to a [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_775" class="wp-caption alignleft" style="width: 260px"><a href="http://www.radiantcompany.com/system/closed.shtml" onclick="pageTracker._trackPageview('/outgoing/www.radiantcompany.com/system/closed.shtml?referer=');"><img class="size-full wp-image-775" style="border: 2px solid black; margin: 5px;" title="HOUSE RENOVATION 11 045 A" src="http://www.cerebralmastication.com/wp-content/uploads/2010/08/HOUSE-RENOVATION-11-045-A.jpg" alt="" width="250" height="260" /></a><p class="wp-caption-text">Radiant Heat System. Not in my house... yet! </p></div>
<p>My wife and I bought a foreclosed house a few months ago. This house had been part of mortgage fraud and we bought it at auction. Interesting life experience, to say the least. The finished basement was built with radiant heat tubing poured into the concrete. These pipes are designed to be hooked to a hot water heater so the warm water can provide radiant heat through the floors in the basement. I love radiant heat in basements. It makes the floor warm and the whole basement feels cozy. The heat radiates up to the rest of the house and it&#8217;s fairly energy efficient.</p>
<p>The radiant heat system in our basement was never finished, however. The pipes were there but there was no hot water heater, pumps, or thermostat. I started scouring the Web for information on radiant heat systems. There&#8217;s lots of sites selling radiant heat related bits, but it was VERY hard to find detailed info on the types of systems or what the options are for radiant heat. I wanted to educate myself on the pros and cons of different systems. Do I use a hot water heater? Maybe a boiler? Should hook it up with my potable hot water heater? I was full of questions and struggled to find anything useful. Until I stumbled on <a href="http://www.radiantcompany.com/" onclick="pageTracker._trackPageview('/outgoing/www.radiantcompany.com/?referer=');">Radiant Floor Company</a>. Their whole business model is around selling assemblies to help the DIY market install/upgrade/maintain their radiant floor systems. And their site has, hands down, the best information about radiant floor systems. It&#8217;s not the prettiest site in the world, but they have a great <a href="http://www.radiantcompany.com/system/overview.shtml" onclick="pageTracker._trackPageview('/outgoing/www.radiantcompany.com/system/overview.shtml?referer=');">intro</a>, then sections on each major type of system.</p>
<p>They provided me information that I couldn&#8217;t get anywhere else. And as a result I&#8217;m probably going to buy my parts from them (they are working up a quote for me today!). I&#8217;ll probably go with an on-demand hot water heater and a fully closed system. And they will get my business because they gave me the best information AND they have good prices. It takes BOTH info and price to make a sale on-line. Amazon gives me info through their customer reviews and ratings. Radiant Floor Co gives me info through great documentation and background info. This combo of price + info means that some providers will compete on information + reasonable price while others will compete on absolute lowest price with little info. I love this. It gives me, as a consumer, options.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/08/give-away-something-then-sell-something/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Starting an EC2 Machine Then Setting Up a Socks Proxy&#8230; From R!</title>
		<link>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/</link>
		<comments>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 22:07:12 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=748</guid>
		<description><![CDATA[I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I alluded to using Amazon EC2 as a way to get around [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg"><img class="alignleft size-full wp-image-765" title="firewallkat" src="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg" alt="" width="361" height="312" /></a>I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I <a href="http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/">alluded to using Amazon EC2 as a way to get around your corporate IT</a> <span style="text-decoration: line-through;">mind control voyeurs</span> service providers. This tunneling method is one of the 5 or so ways I have used EC2 to set up a tunnel. I used to fire these tunnels up manually using the <a href="https://console.aws.amazon.com" onclick="pageTracker._trackPageview('/outgoing/console.aws.amazon.com?referer=');">Amazon AWS Management Console</a> then opening a shell prompt and entering:</p>
<blockquote>
<pre>ssh -i ~/MyPersonalKey.pem -D 9999 root@ec2-184-73-41-72.compute-1.amazonaws.com</pre>
</blockquote>
<p>the -i switch tells ssh to use my RSA identity file stored in ~/MyPersonalKey.pem</p>
<p>the machine name (ec2-184-73-41-72.compute-1.amazonaws.com) I get from the AWS Management Console</p>
<p>the -D is the magic. -D opens an dynamic port forwarding tunnel between my Linux box and the EC2 machine. This is, for all intent and purposes, an encrypted SOCKS4 proxy on port 9999 of localhost. Then I just have to change my proxy settings in Firefox to use use a SOCKS host.</p>
<p>Now that&#8217;s all pretty easy. And I like easy. But it&#8217;s not easy ENOUGH. You see, I&#8217;m lazy. I&#8217;m not just lazy in the &#8220;I&#8217;ll do it mañana&#8221; sort of way, but in the &#8220;I&#8217;m too damn lazy to click my mouse 5 times&#8221; way.</p>
<p>So I want this easier. Well, I can make the proxy settings in Firefox easier through the use of the <a href="https://addons.mozilla.org/en-US/firefox/addon/1557/" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/1557/?referer=');">Quick Proxy extension for Firefox</a>. That&#8217;s a good start. It turns on and off the proxy with a single mouse click. But I still have to go into the AWS management web site, fire up a machine then log in via SSH. Let&#8217;s make that part easier!</p>
<p>While it&#8217;s not simple to install and configure, the EC2 command line tools are going to be required in order to make a script that fires up EC2 and then connects to the instance with ssh. I struggled getting the tools to run until I found <a href="http://linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/" onclick="pageTracker._trackPageview('/outgoing/linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/?referer=');">this tutorial</a>.</p>
<p>Your file locations and names may be different than the tutorial. Change appropriately. I followed the tutorial instructions but I created a key named ec2ApiTools which will come in handy later.</p>
<p>After you get the EC2 tool up and running and you can do something like list the available AMIs without an error you can stop with the tutorial. I&#8217;ve been doing a lot of shell scripting lately so I said to myself, &#8220;Self, let&#8217;s script the ssh connection in R!&#8221; For the record, I always end my impredicative in an explanation point which I verbally pronounce as, &#8220;BANG!&#8221; As a result, when I talk to myself it sounds like two 10 year old boys playing cops and robbers. Anyhow, I did script it with R using Rscript. Because I&#8217;m a man who listens to myself.</p>
<p>And since you were kind enough to slog through my channeling the drunken ghost of James Joyce, here&#8217;s my script:</p>
<script src="http://gist.github.com/478930.js"></script>
<p>If you&#8217;re reading this in an RSS reader of for some other reason don&#8217;t see an R script above, <a href="http://gist.github.com/478930#file_start_ec2_instance_ssh.r" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/478930_file_start_ec2_instance_ssh.r?referer=');">here&#8217;s your link</a>.</p>
<p>The only two EC2 API commands I use in the script are  <em>ec2-run-instances</em> which starts the instance and <em>ec2-describe-instances</em> which gives me a list of running instances and their details.The rest of the script is simply parsing the output and figuring out which instances was started last.</p>
<p>I&#8217;ve now set up a launcher panel item that starts the script. Then when I see the xterm window come up I click the little red button in the lower right corner of my browser which switches on the Firefox proxy. Then I&#8217;m safe to surf <a href="http://www.sofmag.com/" onclick="pageTracker._trackPageview('/outgoing/www.sofmag.com/?referer=');">Soldier of Fortune Magazine</a> without the interference of my corp firewall.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bootstrapping the latest R into Amazon Elastic Map Reduce</title>
		<link>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/</link>
		<comments>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 15:38:42 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[EMR]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=736</guid>
		<description><![CDATA[I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the Stack Overflow [r] community. I have no [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg"><img class="alignleft size-full wp-image-737" style="margin: 6px; border: 2px solid black;" title="boot" src="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg" alt="" width="210" height="294" /></a>I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the <a href="http://stackoverflow.com/questions/tagged/r" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/tagged/r?referer=');">Stack Overflow [r] </a>community. I have no idea how crappy coders like me got anything at all done before the Interwebs.</p>
<p>One of the immediate hurdles faced when trying to use AMZN EMR in anger is that the default version of R on EMR is 2.7.1. Yes, that is indeed the version that Moses taught the Israelites to use while they wandered in the desert. I&#8217;m impressed by your religious knowledge. At any rate, all kinds of things go to hell when you try to run code and load packages in 2.7.1. When I first started fighting with EMR the only solution was to backport my code and alter any packages so they would run in 2.7.1. Yes, that is, as Moses would say, a Nudnik. Nudnik also happens to be the pet name my neighbors have given me. They love me. Where was I? Oh yeah, Methusla&#8217;s R version. Recently Amazon released a neat feature called &#8220;Bootstrapping&#8221; for EMR. Before you start thinking about sampling and resampling and all that  crap, let me clarify. This is NOT statistical bootstrapping. It&#8217;s called bootstrapping because it&#8217;s code that runs after each node boots up, but before the mapper procedure runs. So to get a more modern version of R loaded on to each node I set up a little script that updates the sources.list file and then installs the latest version of R. And since I&#8217;m a caring, sharing guy, here&#8217;s my script:</p>
<script src="http://gist.github.com/455962.js"></script>
<p>And if that doesn&#8217;t show up for some reason, you can find all<a href="http://gist.github.com/455962" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/455962?referer=');"> 5 lines of its bash glory here over at github</a>.</p>
<p>If you&#8217;re not conveniently located in Chicago, IL you may want to change your R mirror location. The bootstrap action can be set up from the EMR web GUI or if you&#8217;re firing the jobs off using the elastic-mapreduce command line tools you just add the following option: &#8220;&#8211;bootstrap-action s3://myBucket/bootstrap.sh&#8221; assuming myBucket is the bucket with your script in it and bootstrap.sh contains your bootstrap shell script. And then, as my buddies in Dublin say, &#8220;Bob&#8217;s your mother&#8217;s brother.&#8221;</p>
<p>And before you ask, yes, this slows crap down. I&#8217;ll probably hack together a script that will take the R binaries and other needed upgrades out of Amazon S3 and load them in a bootstrap action which will greatly speed things up. The above example has one clear advantage over loading binaries from S3: It works right now. And remember folks, code that works right now kicks code that &#8220;might work someday&#8221; right in the balls. And then mocks it while it cries.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Chicago R Meetup: Healthier than Drinking Alone</title>
		<link>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/</link>
		<comments>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/#comments</comments>
		<pubDate>Mon, 24 May 2010 20:20:34 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[meetup]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=728</guid>
		<description><![CDATA[I&#8217;m kinda blown away by the number of folks who have joined the Chicago R User Group (RUG) in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');"><img class="alignleft" title="meetup, y'all" src="http://cvillevegan.com/wp-content/uploads/2009/06/Meetup-Logo-1.jpg" alt="" width="251" height="185" /></a>I&#8217;m kinda blown away by the number of folks who have joined the <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">Chicago R User Group (RUG)</a> in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days away!) I&#8217;m very pleased that this many people in Chicago find the R language interesting and/or valuable. Of course, there is the possibility that some of the 25 who are attending are simply hoping for some free beer. I was a member of a vegan society for 2 years because they had free beer. The week I accidentally showed up with a six pack of White Castle sliders really blew my cover. That&#8217;s how I discovered that you can scare off angry vegans by waving a steaming hot onion covered meat-like patty in their face. True story. And when I say &#8220;true story&#8221; I mean &#8220;total lie&#8221;.</p>
<p>By the way, I&#8217;m already recruiting presenters for next month&#8217;s RUG meetup. And I&#8217;m also looking for locations. So if you have an idea for either, let me know. I promise to not throw any mini burgers at you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virtual Conference: R the Language</title>
		<link>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/</link>
		<comments>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/#comments</comments>
		<pubDate>Tue, 04 May 2010 02:27:03 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[vconf]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=722</guid>
		<description><![CDATA[On Tuesday May 4th at 9:30 PM central, 10:30 eastern, I&#8217;ll be giving a live online presentation as part of the Vconf.org open conference series. I&#8217;ll be speaking about R and why I started using R a couple years ago. This is NOT going to be a technical presentation but rather an illustration of how [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin: 11px;" title="vconf logo" src="http://vconf.org/media/images/boxxee.jpg" alt="" width="100" height="100" />On Tuesday May 4th at 9:30 PM central, 10:30 eastern, I&#8217;ll be giving a <a href="http://vconf.org/presentation/r-the-language/" onclick="pageTracker._trackPageview('/outgoing/vconf.org/presentation/r-the-language/?referer=');">live online presentation</a> as part of the Vconf.org open conference series. I&#8217;ll be speaking about R and why I started using R a couple years ago. This is NOT going to be a technical presentation but rather an illustration of how an R convert was created and why R became part of my daily tool set.</p>
<p>If your not familiar with the vconf.org project, you should <a href="http://vconf.org/be-a-speaker/" onclick="pageTracker._trackPageview('/outgoing/vconf.org/be-a-speaker/?referer=');">read a little about it</a>. It&#8217;s just getting started but I love the idea that it&#8217;s not for profit and all presentations are Creative Commons license. You know that cool new technology you&#8217;ve been playing with? Yeah that one. You really should give a vconf about it. I know I&#8217;d like to hear about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simulating Dart Throws in R</title>
		<link>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/</link>
		<comments>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 18:05:20 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[darts]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=435</guid>
		<description><![CDATA[Back in November 2009 Wired wrote an article about some grad students who decided to try to stochastically model throwing darts. Because I don&#8217;t actually read printed material I didn&#8217;t see the article until a couple of months ago. My immediate thought was, &#8220;hey, I drink beer. I throw darts. I build stochastic models. Why [...]]]></description>
			<content:encoded><![CDATA[<p>Back in November 2009 <a href="http://www.wired.com/magazine/2009/11/st_darts/" onclick="pageTracker._trackPageview('/outgoing/www.wired.com/magazine/2009/11/st_darts/?referer=');">Wired wrote an article </a>about some grad students who decided to try to stochastically model throwing darts. Because I don&#8217;t actually read printed material I didn&#8217;t see the article until a couple of months ago. My immediate thought was, &#8220;hey, I drink beer. I throw darts. I build stochastic models. Why haven&#8217;t I done this?&#8221; Well we all know why I haven&#8217;t done this. I have a job and a 2 year old daughter and I like my wife. Well a funny thing happened a few weeks ago. I sat down and was thinking about this problem and then 5 hours later I had a working dart simulator in my text editor. I don&#8217;t remember writing this. So <a href="http://en.wikipedia.org/wiki/Occam%27s_razor" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Occam_27s_razor?referer=');">Occam&#8217;s Razor </a>says that the most likely explanation is the simplest explanation. So clearly I was abducted by aliens and someone broke into my office and built a dart simulator.</p>
<p>I do reinsurance modeling to pay the bills and it immediacy hit me that this type of modeling is very similar to what I do for work. This similarity became the impetus for my presentation at <a href="http://www.rinfinance.com/agenda/" onclick="pageTracker._trackPageview('/outgoing/www.rinfinance.com/agenda/?referer=');">R in Finance 2010 </a>which starts today.</p>
<p>I dumped the dart board code into a github gist which can be found here:</p>
<script src="http://gist.github.com/278148.js"></script>
<p>If the embedded code is not showing up, you can get to it <a href="http://gist.github.com/278148" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/278148?referer=');">directly on Github</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>I don&#8217;t even know how wrong I am!</title>
		<link>http://www.cerebralmastication.com/2010/04/i-dont-even-know-how-wrong-i-am/</link>
		<comments>http://www.cerebralmastication.com/2010/04/i-dont-even-know-how-wrong-i-am/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 15:21:29 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[descisions]]></category>
		<category><![CDATA[risk]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=664</guid>
		<description><![CDATA[I&#8217;ve been a long time reader of the blog &#8220;Messy Matters&#8221; (which  invokes terrible images now that I am potty training a toddler). The  authors, Sharad Goel and Daniel Reeves are  academics who work in the Microeconomics and Social Systems (get it,  MESS?!?) lab funded by Yahoo!. (What does Strunk and [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_705" class="wp-caption alignleft" style="width: 293px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/04/rummy.jpg"><img class="size-full wp-image-705 " title="rummy" src="http://www.cerebralmastication.com/wp-content/uploads/2010/04/rummy.jpg" alt="" width="283" height="204" /></a><p class="wp-caption-text">&quot;as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don&#39;t know we don&#39;t know.&quot; US Defense Secretary Donald Rumsfeld, February 12, 2002</p></div>
<p>I&#8217;ve been a long time reader of the blog &#8220;Messy Matters&#8221; (which  invokes terrible images now that I am potty training a toddler). The  authors, <a href="http://www.cam.cornell.edu/%7Esharad" onclick="pageTracker._trackPageview('/outgoing/www.cam.cornell.edu/_7Esharad?referer=');">Sharad Goel</a> and <a href="http://ai.eecs.umich.edu/people/dreeves" onclick="pageTracker._trackPageview('/outgoing/ai.eecs.umich.edu/people/dreeves?referer=');">Daniel Reeves</a> are  academics who work in the Microeconomics and Social Systems (get it,  MESS?!?) lab funded by Yahoo!. (What does Strunk and White say about  punctuation after a proper noun which includes punctuation as part of  the proper noun?) Anyhow, the Messy Matters blog had a very interesting  post recently about testing to see if you are overconfident. The gist is this: take a test and try to not answer each question exactly but give an upper and lower bound which you think represents a 90% confidence band around the  right answer. If you haven&#8217;t seen this done, you should <a href="http://messymatters.com/2010/02/28/calibration/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/messymatters.com/2010/02/28/calibration/?referer=');">go take and look</a> and then read the rest of this blog  post.</p>
<p>I didn&#8217;t do worth a shit on their &#8220;overconfidence&#8221; test. I think I  got 5 of the ranges right. The other 5 times the real answer fell  outside my bounds. As I was answering the questions I had this strong  feeling of not being confident at all. I was very tempted to answer HUGE  ranges on some of the questions because I felt totally unable to make a  good guess. But I took a swag and tried to put in big ranges, but not  TOO big, if I didn&#8217;t know the answer. I&#8217;m not the only one who struggled  with this test. In their <a href="http://messymatters.com/2010/03/31/calibration-results/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/messymatters.com/2010/03/31/calibration-results/?referer=');">summary of results </a>I fall in the 76th percentile.  Hey, I&#8217;m above average&#8230; or at least above the median. Clearly I didn&#8217;t know how wrong I was in many cases. But does this  mean I am &#8220;overconfident&#8221;? I don&#8217;t think so. I think this means  something a bit more subtle. This exercise reminded me of creating a  forecasting model and trying to predict values far outside the training  data.</p>
<p>Having read the book <a href="http://www.amazon.com/gp/product/0805078533?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0805078533" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0805078533?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=0805078533&amp;referer=');"><em>On Intelligence </em></a>I am convinced that one of  the main functions of the human brain (or at least the prefrontal  cortex)  is to be a pattern matching  machine. We all build little mental models in our head all the time. And  these models are trained, by definition, on the situations which we run  into day in and day out. And these models are VERY accurate around the  mean (i.e. around the experiences we are used to having). For example,  how small of a piece of sand can you feel between your teeth? Our brains  have a &#8216;model&#8217; of what it normally feels like when our teeth close  against each other. The slightest unexpected disruption in that pattern  triggers our brain to notice. Ever miss a step when walking down stairs?  When did you know you were in trouble? Probably when your foot was  about 2 inches past where you expected the next step to be. You didn&#8217;t  have to wait for your face to hit the railing before your mental model  of step walking was throwing warning bells. Us humans are freaking  amazing mental model makers!</p>
<p>Well we&#8217;re amazing&#8230; except when we suck. When we suck is when we  are faced with trying to predict something that is orders of magnitude  outside our experience. The question on the MESS test which I struggled  the most was the question about how much an  empty 747 weighs. I don&#8217;t ever deal with massive weights. Ever. I only  had two reference points which I could think up: 1) my first car was a  &#8216;69 Cadillac which I  know weighed 5,040 lbs. We used to call it &#8220;Two and a half tons of  fun.&#8221; and 2) a hopper bottom rail car carries ~3500 bushels of corn  which is ~ 196,000 lbs. And I&#8217;ve never been up next to a 747. But they  are HUGE. I&#8217;ve seen pictures of the space shuttle riding around on the  back of one of those bad boys. But they have to be pretty light relative  to their volume because they have a lot of cargo room. And then I did  the math on how many 1969 Cadillacs = 1 rail car of corn&#8230; almost  39!?!? But rail cars on not that big. I&#8217;ve climbed up on rail cars of  grain. Kinda seems like it should be about 10 Cadillacs big. At that  point I was pretty perplexed and just guessed a range which turned out  to be WAAAY too high. It turns out that a 747 weighs around 360,000 lbs,  which is less than 2 rail cars of corn (not including the actual cars,  just the weight of the corn!). My intuition, as trained by my two data points, didn&#8217;t do worth a tinker&#8217;s damn at guessing the weight of airplanes.</p>
<p>But here&#8217;s the whole point of that last paragraph: If a human has no  reference points and no experience with a domain, we (or at least me)  can&#8217;t make good guesses and, more importantly, we can&#8217;t know how bad our  guesses are!  <strong>We CAN&#8217;T know how much we suck! </strong>If you think in  terms of distributions, this exercise is akin to having a very small  sample size and trying to guess the distribution&#8217;s second moment (the  standard deviation). Well shit, we know in practice that if we have  small samples the mean has a big error term but the standard deviation  has an even BIGGER error term.</p>
<p>So simply put, <strong>providing confidence  bands around a guess which is out of my area of experience is really  hard and I&#8217;m not good at it</strong>. The biggest problem is knowing when I&#8217;m out  of my domain. In both <em>The Black Swan </em>and <em>Fooled by Randomness</em>, Nassim  Nicholas Taleb points out that the single strongest predictor for how  bad someone is going to do at <em>the confidence band game </em>is if they hold a PhD. If anyone has a reference on the study he refers to, I&#8217;d love to see it. I&#8217;m resisting the temptation to throw stones at both actuaries and finance quants right here. And if I didn&#8217;t live in a glass house, I would!</p>
<p>My take away from all this is that confidence bands around a guess should <strong>not </strong>be expected to be statistically accurate. That&#8217;s the very nature of not knowing something at all. We don&#8217;t even know what we don&#8217;t know (thank you Donald Rumsfeld). The very definition of an expert might be someone who, if they don&#8217;t know the exact answer, can at least put confidence bands around their guess. In other words, <span style="color: #ff0000;"><strong>you have to have some level of knowledge to put accurate confidence bands around a guess</strong></span>. And failing to be able to do that is not necessarily overconfidence. It might just be ignorance.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/i-dont-even-know-how-wrong-i-am/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
