<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cerebral Mastication &#187; Uncategorized</title>
	<atom:link href="http://www.cerebralmastication.com/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cerebralmastication.com</link>
	<description>Something to Chew On</description>
	<lastBuildDate>Fri, 16 Jul 2010 22:07:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Starting an EC2 Machine Then Setting Up a Socks Proxy&#8230; From R!</title>
		<link>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/</link>
		<comments>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 22:07:12 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=748</guid>
		<description><![CDATA[I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I alluded to using Amazon EC2 as a way to get around [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg"><img class="alignleft size-full wp-image-765" title="firewallkat" src="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg" alt="" width="361" height="312" /></a>I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I <a href="http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/">alluded to using Amazon EC2 as a way to get around your corporate IT</a> <span style="text-decoration: line-through;">mind control voyeurs</span> service providers. This tunneling method is one of the 5 or so ways I have used EC2 to set up a tunnel. I used to fire these tunnels up manually using the <a href="https://console.aws.amazon.com" onclick="pageTracker._trackPageview('/outgoing/console.aws.amazon.com?referer=');">Amazon AWS Management Console</a> then opening a shell prompt and entering:</p>
<blockquote>
<pre>ssh -i ~/MyPersonalKey.pem -D 9999 root@ec2-184-73-41-72.compute-1.amazonaws.com</pre>
</blockquote>
<p>the -i switch tells ssh to use my RSA identity file stored in ~/MyPersonalKey.pem</p>
<p>the machine name (ec2-184-73-41-72.compute-1.amazonaws.com) I get from the AWS Management Console</p>
<p>the -D is the magic. -D opens an dynamic port forwarding tunnel between my Linux box and the EC2 machine. This is, for all intent and purposes, an encrypted SOCKS4 proxy on port 9999 of localhost. Then I just have to change my proxy settings in Firefox to use use a SOCKS host.</p>
<p>Now that&#8217;s all pretty easy. And I like easy. But it&#8217;s not easy ENOUGH. You see, I&#8217;m lazy. I&#8217;m not just lazy in the &#8220;I&#8217;ll do it mañana&#8221; sort of way, but in the &#8220;I&#8217;m too damn lazy to click my mouse 5 times&#8221; way.</p>
<p>So I want this easier. Well, I can make the proxy settings in Firefox easier through the use of the <a href="https://addons.mozilla.org/en-US/firefox/addon/1557/" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/1557/?referer=');">Quick Proxy extension for Firefox</a>. That&#8217;s a good start. It turns on and off the proxy with a single mouse click. But I still have to go into the AWS management web site, fire up a machine then log in via SSH. Let&#8217;s make that part easier!</p>
<p>While it&#8217;s not simple to install and configure, the EC2 command line tools are going to be required in order to make a script that fires up EC2 and then connects to the instance with ssh. I struggled getting the tools to run until I found <a href="http://linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/" onclick="pageTracker._trackPageview('/outgoing/linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/?referer=');">this tutorial</a>.</p>
<p>Your file locations and names may be different than the tutorial. Change appropriately. I followed the tutorial instructions but I created a key named ec2ApiTools which will come in handy later.</p>
<p>After you get the EC2 tool up and running and you can do something like list the available AMIs without an error you can stop with the tutorial. I&#8217;ve been doing a lot of shell scripting lately so I said to myself, &#8220;Self, let&#8217;s script the ssh connection in R!&#8221; For the record, I always end my impredicative in an explanation point which I verbally pronounce as, &#8220;BANG!&#8221; As a result, when I talk to myself it sounds like two 10 year old boys playing cops and robbers. Anyhow, I did script it with R using Rscript. Because I&#8217;m a man who listens to myself.</p>
<p>And since you were kind enough to slog through my channeling the drunken ghost of James Joyce, here&#8217;s my script:</p>
<script src="http://gist.github.com/478930.js"></script>
<p>If you&#8217;re reading this in an RSS reader of for some other reason don&#8217;t see an R script above, <a href="http://gist.github.com/478930#file_start_ec2_instance_ssh.r" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/478930_file_start_ec2_instance_ssh.r?referer=');">here&#8217;s your link</a>.</p>
<p>The only two EC2 API commands I use in the script are  <em>ec2-run-instances</em> which starts the instance and <em>ec2-describe-instances</em> which gives me a list of running instances and their details.The rest of the script is simply parsing the output and figuring out which instances was started last.</p>
<p>I&#8217;ve now set up a launcher panel item that starts the script. Then when I see the xterm window come up I click the little red button in the lower right corner of my browser which switches on the Firefox proxy. Then I&#8217;m safe to surf <a href="http://www.sofmag.com/" onclick="pageTracker._trackPageview('/outgoing/www.sofmag.com/?referer=');">Soldier of Fortune Magazine</a> without the interference of my corp firewall.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bootstrapping the latest R into Amazon Elastic Map Reduce</title>
		<link>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/</link>
		<comments>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 15:38:42 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[EMR]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=736</guid>
		<description><![CDATA[I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the Stack Overflow [r] community. I have no [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg"><img class="alignleft size-full wp-image-737" style="margin: 6px; border: 2px solid black;" title="boot" src="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg" alt="" width="210" height="294" /></a>I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the <a href="http://stackoverflow.com/questions/tagged/r" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/tagged/r?referer=');">Stack Overflow [r] </a>community. I have no idea how crappy coders like me got anything at all done before the Interwebs.</p>
<p>One of the immediate hurdles faced when trying to use AMZN EMR in anger is that the default version of R on EMR is 2.7.1. Yes, that is indeed the version that Moses taught the Israelites to use while they wandered in the desert. I&#8217;m impressed by your religious knowledge. At any rate, all kinds of things go to hell when you try to run code and load packages in 2.7.1. When I first started fighting with EMR the only solution was to backport my code and alter any packages so they would run in 2.7.1. Yes, that is, as Moses would say, a Nudnik. Nudnik also happens to be the pet name my neighbors have given me. They love me. Where was I? Oh yeah, Methusla&#8217;s R version. Recently Amazon released a neat feature called &#8220;Bootstrapping&#8221; for EMR. Before you start thinking about sampling and resampling and all that  crap, let me clarify. This is NOT statistical bootstrapping. It&#8217;s called bootstrapping because it&#8217;s code that runs after each node boots up, but before the mapper procedure runs. So to get a more modern version of R loaded on to each node I set up a little script that updates the sources.list file and then installs the latest version of R. And since I&#8217;m a caring, sharing guy, here&#8217;s my script:</p>
<script src="http://gist.github.com/455962.js"></script>
<p>And if that doesn&#8217;t show up for some reason, you can find all<a href="http://gist.github.com/455962" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/455962?referer=');"> 5 lines of its bash glory here over at github</a>.</p>
<p>If you&#8217;re not conveniently located in Chicago, IL you may want to change your R mirror location. The bootstrap action can be set up from the EMR web GUI or if you&#8217;re firing the jobs off using the elastic-mapreduce command line tools you just add the following option: &#8220;&#8211;bootstrap-action s3://myBucket/bootstrap.sh&#8221; assuming myBucket is the bucket with your script in it and bootstrap.sh contains your bootstrap shell script. And then, as my buddies in Dublin say, &#8220;Bob&#8217;s your mother&#8217;s brother.&#8221;</p>
<p>And before you ask, yes, this slows crap down. I&#8217;ll probably hack together a script that will take the R binaries and other needed upgrades out of Amazon S3 and load them in a bootstrap action which will greatly speed things up. The above example has one clear advantage over loading binaries from S3: It works right now. And remember folks, code that works right now kicks code that &#8220;might work someday&#8221; right in the balls. And then mocks it while it cries.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Chicago R Meetup: Healthier than Drinking Alone</title>
		<link>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/</link>
		<comments>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/#comments</comments>
		<pubDate>Mon, 24 May 2010 20:20:34 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[meetup]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=728</guid>
		<description><![CDATA[I&#8217;m kinda blown away by the number of folks who have joined the Chicago R User Group (RUG) in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');"><img class="alignleft" title="meetup, y'all" src="http://cvillevegan.com/wp-content/uploads/2009/06/Meetup-Logo-1.jpg" alt="" width="251" height="185" /></a>I&#8217;m kinda blown away by the number of folks who have joined the <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">Chicago R User Group (RUG)</a> in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days away!) I&#8217;m very pleased that this many people in Chicago find the R language interesting and/or valuable. Of course, there is the possibility that some of the 25 who are attending are simply hoping for some free beer. I was a member of a vegan society for 2 years because they had free beer. The week I accidentally showed up with a six pack of White Castle sliders really blew my cover. That&#8217;s how I discovered that you can scare off angry vegans by waving a steaming hot onion covered meat-like patty in their face. True story. And when I say &#8220;true story&#8221; I mean &#8220;total lie&#8221;.</p>
<p>By the way, I&#8217;m already recruiting presenters for next month&#8217;s RUG meetup. And I&#8217;m also looking for locations. So if you have an idea for either, let me know. I promise to not throw any mini burgers at you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virtual Conference: R the Language</title>
		<link>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/</link>
		<comments>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/#comments</comments>
		<pubDate>Tue, 04 May 2010 02:27:03 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[vconf]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=722</guid>
		<description><![CDATA[On Tuesday May 4th at 9:30 PM central, 10:30 eastern, I&#8217;ll be giving a live online presentation as part of the Vconf.org open conference series. I&#8217;ll be speaking about R and why I started using R a couple years ago. This is NOT going to be a technical presentation but rather an illustration of how [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin: 11px;" title="vconf logo" src="http://vconf.org/media/images/boxxee.jpg" alt="" width="100" height="100" />On Tuesday May 4th at 9:30 PM central, 10:30 eastern, I&#8217;ll be giving a <a href="http://vconf.org/presentation/r-the-language/" onclick="pageTracker._trackPageview('/outgoing/vconf.org/presentation/r-the-language/?referer=');">live online presentation</a> as part of the Vconf.org open conference series. I&#8217;ll be speaking about R and why I started using R a couple years ago. This is NOT going to be a technical presentation but rather an illustration of how an R convert was created and why R became part of my daily tool set.</p>
<p>If your not familiar with the vconf.org project, you should <a href="http://vconf.org/be-a-speaker/" onclick="pageTracker._trackPageview('/outgoing/vconf.org/be-a-speaker/?referer=');">read a little about it</a>. It&#8217;s just getting started but I love the idea that it&#8217;s not for profit and all presentations are Creative Commons license. You know that cool new technology you&#8217;ve been playing with? Yeah that one. You really should give a vconf about it. I know I&#8217;d like to hear about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simulating Dart Throws in R</title>
		<link>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/</link>
		<comments>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 18:05:20 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[darts]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=435</guid>
		<description><![CDATA[Back in November 2009 Wired wrote an article about some grad students who decided to try to stochastically model throwing darts. Because I don&#8217;t actually read printed material I didn&#8217;t see the article until a couple of months ago. My immediate thought was, &#8220;hey, I drink beer. I throw darts. I build stochastic models. Why [...]]]></description>
			<content:encoded><![CDATA[<p>Back in November 2009 <a href="http://www.wired.com/magazine/2009/11/st_darts/" onclick="pageTracker._trackPageview('/outgoing/www.wired.com/magazine/2009/11/st_darts/?referer=');">Wired wrote an article </a>about some grad students who decided to try to stochastically model throwing darts. Because I don&#8217;t actually read printed material I didn&#8217;t see the article until a couple of months ago. My immediate thought was, &#8220;hey, I drink beer. I throw darts. I build stochastic models. Why haven&#8217;t I done this?&#8221; Well we all know why I haven&#8217;t done this. I have a job and a 2 year old daughter and I like my wife. Well a funny thing happened a few weeks ago. I sat down and was thinking about this problem and then 5 hours later I had a working dart simulator in my text editor. I don&#8217;t remember writing this. So <a href="http://en.wikipedia.org/wiki/Occam%27s_razor" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Occam_27s_razor?referer=');">Occam&#8217;s Razor </a>says that the most likely explanation is the simplest explanation. So clearly I was abducted by aliens and someone broke into my office and built a dart simulator.</p>
<p>I do reinsurance modeling to pay the bills and it immediacy hit me that this type of modeling is very similar to what I do for work. This similarity became the impetus for my presentation at <a href="http://www.rinfinance.com/agenda/" onclick="pageTracker._trackPageview('/outgoing/www.rinfinance.com/agenda/?referer=');">R in Finance 2010 </a>which starts today.</p>
<p>I dumped the dart board code into a github gist which can be found here:</p>
<script src="http://gist.github.com/278148.js"></script>
<p>If the embedded code is not showing up, you can get to it <a href="http://gist.github.com/278148" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/278148?referer=');">directly on Github</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>I don&#8217;t even know how wrong I am!</title>
		<link>http://www.cerebralmastication.com/2010/04/i-dont-even-know-how-wrong-i-am/</link>
		<comments>http://www.cerebralmastication.com/2010/04/i-dont-even-know-how-wrong-i-am/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 15:21:29 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[descisions]]></category>
		<category><![CDATA[risk]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=664</guid>
		<description><![CDATA[I&#8217;ve been a long time reader of the blog &#8220;Messy Matters&#8221; (which  invokes terrible images now that I am potty training a toddler). The  authors, Sharad Goel and Daniel Reeves are  academics who work in the Microeconomics and Social Systems (get it,  MESS?!?) lab funded by Yahoo!. (What does Strunk and [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_705" class="wp-caption alignleft" style="width: 293px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/04/rummy.jpg"><img class="size-full wp-image-705 " title="rummy" src="http://www.cerebralmastication.com/wp-content/uploads/2010/04/rummy.jpg" alt="" width="283" height="204" /></a><p class="wp-caption-text">&quot;as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don&#39;t know we don&#39;t know.&quot; US Defense Secretary Donald Rumsfeld, February 12, 2002</p></div>
<p>I&#8217;ve been a long time reader of the blog &#8220;Messy Matters&#8221; (which  invokes terrible images now that I am potty training a toddler). The  authors, <a href="http://www.cam.cornell.edu/%7Esharad" onclick="pageTracker._trackPageview('/outgoing/www.cam.cornell.edu/_7Esharad?referer=');">Sharad Goel</a> and <a href="http://ai.eecs.umich.edu/people/dreeves" onclick="pageTracker._trackPageview('/outgoing/ai.eecs.umich.edu/people/dreeves?referer=');">Daniel Reeves</a> are  academics who work in the Microeconomics and Social Systems (get it,  MESS?!?) lab funded by Yahoo!. (What does Strunk and White say about  punctuation after a proper noun which includes punctuation as part of  the proper noun?) Anyhow, the Messy Matters blog had a very interesting  post recently about testing to see if you are overconfident. The gist is this: take a test and try to not answer each question exactly but give an upper and lower bound which you think represents a 90% confidence band around the  right answer. If you haven&#8217;t seen this done, you should <a href="http://messymatters.com/2010/02/28/calibration/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/messymatters.com/2010/02/28/calibration/?referer=');">go take and look</a> and then read the rest of this blog  post.</p>
<p>I didn&#8217;t do worth a shit on their &#8220;overconfidence&#8221; test. I think I  got 5 of the ranges right. The other 5 times the real answer fell  outside my bounds. As I was answering the questions I had this strong  feeling of not being confident at all. I was very tempted to answer HUGE  ranges on some of the questions because I felt totally unable to make a  good guess. But I took a swag and tried to put in big ranges, but not  TOO big, if I didn&#8217;t know the answer. I&#8217;m not the only one who struggled  with this test. In their <a href="http://messymatters.com/2010/03/31/calibration-results/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/messymatters.com/2010/03/31/calibration-results/?referer=');">summary of results </a>I fall in the 76th percentile.  Hey, I&#8217;m above average&#8230; or at least above the median. Clearly I didn&#8217;t know how wrong I was in many cases. But does this  mean I am &#8220;overconfident&#8221;? I don&#8217;t think so. I think this means  something a bit more subtle. This exercise reminded me of creating a  forecasting model and trying to predict values far outside the training  data.</p>
<p>Having read the book <a href="http://www.amazon.com/gp/product/0805078533?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0805078533" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0805078533?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=0805078533&amp;referer=');"><em>On Intelligence </em></a>I am convinced that one of  the main functions of the human brain (or at least the prefrontal  cortex)  is to be a pattern matching  machine. We all build little mental models in our head all the time. And  these models are trained, by definition, on the situations which we run  into day in and day out. And these models are VERY accurate around the  mean (i.e. around the experiences we are used to having). For example,  how small of a piece of sand can you feel between your teeth? Our brains  have a &#8216;model&#8217; of what it normally feels like when our teeth close  against each other. The slightest unexpected disruption in that pattern  triggers our brain to notice. Ever miss a step when walking down stairs?  When did you know you were in trouble? Probably when your foot was  about 2 inches past where you expected the next step to be. You didn&#8217;t  have to wait for your face to hit the railing before your mental model  of step walking was throwing warning bells. Us humans are freaking  amazing mental model makers!</p>
<p>Well we&#8217;re amazing&#8230; except when we suck. When we suck is when we  are faced with trying to predict something that is orders of magnitude  outside our experience. The question on the MESS test which I struggled  the most was the question about how much an  empty 747 weighs. I don&#8217;t ever deal with massive weights. Ever. I only  had two reference points which I could think up: 1) my first car was a  &#8216;69 Cadillac which I  know weighed 5,040 lbs. We used to call it &#8220;Two and a half tons of  fun.&#8221; and 2) a hopper bottom rail car carries ~3500 bushels of corn  which is ~ 196,000 lbs. And I&#8217;ve never been up next to a 747. But they  are HUGE. I&#8217;ve seen pictures of the space shuttle riding around on the  back of one of those bad boys. But they have to be pretty light relative  to their volume because they have a lot of cargo room. And then I did  the math on how many 1969 Cadillacs = 1 rail car of corn&#8230; almost  39!?!? But rail cars on not that big. I&#8217;ve climbed up on rail cars of  grain. Kinda seems like it should be about 10 Cadillacs big. At that  point I was pretty perplexed and just guessed a range which turned out  to be WAAAY too high. It turns out that a 747 weighs around 360,000 lbs,  which is less than 2 rail cars of corn (not including the actual cars,  just the weight of the corn!). My intuition, as trained by my two data points, didn&#8217;t do worth a tinker&#8217;s damn at guessing the weight of airplanes.</p>
<p>But here&#8217;s the whole point of that last paragraph: If a human has no  reference points and no experience with a domain, we (or at least me)  can&#8217;t make good guesses and, more importantly, we can&#8217;t know how bad our  guesses are!  <strong>We CAN&#8217;T know how much we suck! </strong>If you think in  terms of distributions, this exercise is akin to having a very small  sample size and trying to guess the distribution&#8217;s second moment (the  standard deviation). Well shit, we know in practice that if we have  small samples the mean has a big error term but the standard deviation  has an even BIGGER error term.</p>
<p>So simply put, <strong>providing confidence  bands around a guess which is out of my area of experience is really  hard and I&#8217;m not good at it</strong>. The biggest problem is knowing when I&#8217;m out  of my domain. In both <em>The Black Swan </em>and <em>Fooled by Randomness</em>, Nassim  Nicholas Taleb points out that the single strongest predictor for how  bad someone is going to do at <em>the confidence band game </em>is if they hold a PhD. If anyone has a reference on the study he refers to, I&#8217;d love to see it. I&#8217;m resisting the temptation to throw stones at both actuaries and finance quants right here. And if I didn&#8217;t live in a glass house, I would!</p>
<p>My take away from all this is that confidence bands around a guess should <strong>not </strong>be expected to be statistically accurate. That&#8217;s the very nature of not knowing something at all. We don&#8217;t even know what we don&#8217;t know (thank you Donald Rumsfeld). The very definition of an expert might be someone who, if they don&#8217;t know the exact answer, can at least put confidence bands around their guess. In other words, <span style="color: #ff0000;"><strong>you have to have some level of knowledge to put accurate confidence bands around a guess</strong></span>. And failing to be able to do that is not necessarily overconfidence. It might just be ignorance.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/i-dont-even-know-how-wrong-i-am/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Chicago R User Group&#8230; It&#8217;s for the sexy people!</title>
		<link>http://www.cerebralmastication.com/2010/04/chicago-r-user-group-its-for-the-sexy-people/</link>
		<comments>http://www.cerebralmastication.com/2010/04/chicago-r-user-group-its-for-the-sexy-people/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 16:42:07 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[RUG]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=672</guid>
		<description><![CDATA[I think we all know that Morris Day was talking about when he wrote the lyrics to &#8220;The Bird&#8221;:
Yes! Hold on now, this dance ain&#8217;t for everybody.
Just the sexy people.
White folks, you&#8217;re much too tight.
You gotta shake your head like the black folks.
You might get some tonight.
Look out!
That&#8217;s right, he was talking about the new [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_673" class="wp-caption alignleft" style="width: 179px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/04/Morris-Day-Article.jpg"><img class="size-full wp-image-673 " style="border: 2px solid black; margin: 3px;" title="Morris-Day-Article" src="http://www.cerebralmastication.com/wp-content/uploads/2010/04/Morris-Day-Article.jpg" alt="Give it up for Morris Day and The Mother Fucking Time!!!!" width="169" height="228" /></a><p class="wp-caption-text">Morris Day, y&#39;all! </p></div>
<p>I think we all know that Morris Day was talking about when he wrote the lyrics to &#8220;The Bird&#8221;:</p>
<blockquote><p>Yes! Hold on now, this dance ain&#8217;t for everybody.<br />
Just the sexy people.<br />
White folks, you&#8217;re much too tight.<br />
You gotta shake your head like the black folks.<br />
You might get some tonight.<br />
Look out!</p></blockquote>
<p>That&#8217;s right, he was talking about the new <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">R User Group in Chicago</a>! a.k.a Chicago RUG! We know that R is sexy because <a href="http://www.cerebralmastication.com/2009/02/hal-varian-google%E2%80%99s-chief-economist-thinks-i-am-sexy/">statistical analysis is sexy</a>. That is, if you&#8217;re doing it right! Even Mike Driscol at Dataspora knows that <a href="http://dataspora.com/blog/sexy-data-geeks/" onclick="pageTracker._trackPageview('/outgoing/dataspora.com/blog/sexy-data-geeks/?referer=');">Data Geeks have to get their sexy on</a>.  There is no doubt that Chicago is sexy. The second city is so damned sexy that Karen Abbott wrote <a href="http://www.amazon.com/gp/product/0812975995?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0812975995" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0812975995?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=0812975995&amp;referer=');"><em>Sin in the Second City </em></a>and managed to get it on the NYT best sellers list. She makes me reconsider my agrarian interpretation of Chicago&#8217;s &#8220;meat packing&#8221; heritage. <em><strong>*rim shot* </strong></em>Thank you, thank you.<em><strong> </strong></em>I&#8217;ll be here all week. Try the veal!<em><strong><br />
</strong></em></p>
<p>If you&#8217;re in Chicagoland and reading this blog then you have every reason to get over to the <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">Chicago R User Group web site </a>and sign up! I&#8217;m looking forward to meeting all the Chicago R users in the near future. In case you&#8217;re afraid you won&#8217;t recognize me I&#8217;ll be the one that looks just like Morris Day&#8230; only white&#8230; and not as well dressed&#8230; and kinda nerdy. But otherwise, just like Morris.</p>
<p>Now shut up and dance!</p>
<p><a href="http://listen.grooveshark.com/#/s/The+Bird/2G8yQK" onclick="pageTracker._trackPageview('/outgoing/listen.grooveshark.com/_/s/The+Bird/2G8yQK?referer=');">Morris Day and the Time on Grooveshark! </a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/chicago-r-user-group-its-for-the-sexy-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Math is Statistics</title>
		<link>http://www.cerebralmastication.com/2010/04/the-future-of-math-is-statistics/</link>
		<comments>http://www.cerebralmastication.com/2010/04/the-future-of-math-is-statistics/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 16:01:40 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[TED]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=668</guid>
		<description><![CDATA[The future of math is statistics&#8230; and the language of that future is R:

I&#8217;ve often thought there was way too little &#8220;statistical intuition&#8221; in the workplace. I think Author Benjamin would agree. 
]]></description>
			<content:encoded><![CDATA[<p>The future of math is statistics&#8230; and the language of that future is R:</p>
<p><object width="446" height="326"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/ArthurBenjamin_2009-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/ArthurBenjamin-2009.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=587&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=arthur_benjamin_s_formula_for_changing_math_education;year=2009;theme=how_we_learn;theme=numbers_at_play;theme=bold_predictions_stern_warnings;theme=ted_in_3_minutes;event=TED2009;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="446" height="326" allowFullScreen="true" flashvars="vu=http://video.ted.com/talks/dynamic/ArthurBenjamin_2009-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/ArthurBenjamin-2009.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=587&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=arthur_benjamin_s_formula_for_changing_math_education;year=2009;theme=how_we_learn;theme=numbers_at_play;theme=bold_predictions_stern_warnings;theme=ted_in_3_minutes;event=TED2009;"></embed></object></p>
<p>I&#8217;ve often thought there was way too little &#8220;statistical intuition&#8221; in the workplace. I think Author Benjamin would agree. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/the-future-of-math-is-statistics/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Lookup Performance in R</title>
		<link>http://www.cerebralmastication.com/2010/04/lookup-performance-in-r/</link>
		<comments>http://www.cerebralmastication.com/2010/04/lookup-performance-in-r/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 13:05:12 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=660</guid>
		<description><![CDATA[Rumor has it that Joe Adler, author of the O&#8217;Reilly Book R in a Nutshell, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His latest article is about lookup performance in R. He does a great job giving code [...]]]></description>
			<content:encoded><![CDATA[<p>Rumor has it that Joe Adler, author of the O&#8217;Reilly Book <a href="http://www.amazon.com/gp/product/059680170X?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=059680170X" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/059680170X?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=059680170X&amp;referer=');">R in a Nutshell</a>, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His <a href="http://broadcast.oreilly.com/2010/03/lookup-performance-in-r.html" onclick="pageTracker._trackPageview('/outgoing/broadcast.oreilly.com/2010/03/lookup-performance-in-r.html?referer=');">latest article is about lookup performance in R</a>. He does a great job giving code samples and explaining what he is doing. Worth reading, for sure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/lookup-performance-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Real-World, Real-Time Analytics</title>
		<link>http://www.cerebralmastication.com/2010/02/real-world-real-time-analytics/</link>
		<comments>http://www.cerebralmastication.com/2010/02/real-world-real-time-analytics/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 16:54:32 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rockstars]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=645</guid>
		<description><![CDATA[Stop wasting time reading my drivel. You need to head over the the DataWrangling.com blog and read Peter Skomoroch&#8217;s interview with Bradford Cross of FlightCaster.
Peter wrote up this interview back in August 2009, so I&#8217;m a little late to this party. There&#8217;s some really great quotes in this interview. Here&#8217;s a few of my fav [...]]]></description>
			<content:encoded><![CDATA[<p>Stop wasting time reading my drivel. You need to head over the the DataWrangling.com blog and <a href="http://www.datawrangling.com/how-flightcaster-squeezes-predictions-from-flight-data" onclick="pageTracker._trackPageview('/outgoing/www.datawrangling.com/how-flightcaster-squeezes-predictions-from-flight-data?referer=');">read Peter Skomoroch&#8217;s interview with Bradford Cross </a>of <a href="http://www.flightcaster.com/" onclick="pageTracker._trackPageview('/outgoing/www.flightcaster.com/?referer=');">FlightCaster</a>.</p>
<p>Peter wrote up this interview back in August 2009, so I&#8217;m a little late to this party. There&#8217;s some really great quotes in this interview. Here&#8217;s a few of my fav quotes from Cross:</p>
<blockquote><p>At Google, the research scientists prototype in python and R, and then port to C++ for the real scalable map reduce runs.</p></blockquote>
<blockquote><p>Building layer upon layer of abstraction is a big key&#8230;    The technical term for this is “wrap the crap.”</p></blockquote>
<p>Here&#8217;s a problem I think anyone who works with data and models can relate to:</p>
<blockquote><p>I made a lot of mistakes early in my career in building trading models where I let me theories get too far ahead of what I could really test in practice. That is not a good place to be. Unfortunately, this is an easy mistake to make.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/02/real-world-real-time-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
