<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cerebral Mastication &#187; R</title>
	<atom:link href="http://www.cerebralmastication.com/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cerebralmastication.com</link>
	<description>Something to Chew On</description>
	<lastBuildDate>Fri, 16 Jul 2010 22:07:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Starting an EC2 Machine Then Setting Up a Socks Proxy&#8230; From R!</title>
		<link>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/</link>
		<comments>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 22:07:12 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=748</guid>
		<description><![CDATA[I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I alluded to using Amazon EC2 as a way to get around [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg"><img class="alignleft size-full wp-image-765" title="firewallkat" src="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg" alt="" width="361" height="312" /></a>I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I <a href="http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/">alluded to using Amazon EC2 as a way to get around your corporate IT</a> <span style="text-decoration: line-through;">mind control voyeurs</span> service providers. This tunneling method is one of the 5 or so ways I have used EC2 to set up a tunnel. I used to fire these tunnels up manually using the <a href="https://console.aws.amazon.com" onclick="pageTracker._trackPageview('/outgoing/console.aws.amazon.com?referer=');">Amazon AWS Management Console</a> then opening a shell prompt and entering:</p>
<blockquote>
<pre>ssh -i ~/MyPersonalKey.pem -D 9999 root@ec2-184-73-41-72.compute-1.amazonaws.com</pre>
</blockquote>
<p>the -i switch tells ssh to use my RSA identity file stored in ~/MyPersonalKey.pem</p>
<p>the machine name (ec2-184-73-41-72.compute-1.amazonaws.com) I get from the AWS Management Console</p>
<p>the -D is the magic. -D opens an dynamic port forwarding tunnel between my Linux box and the EC2 machine. This is, for all intent and purposes, an encrypted SOCKS4 proxy on port 9999 of localhost. Then I just have to change my proxy settings in Firefox to use use a SOCKS host.</p>
<p>Now that&#8217;s all pretty easy. And I like easy. But it&#8217;s not easy ENOUGH. You see, I&#8217;m lazy. I&#8217;m not just lazy in the &#8220;I&#8217;ll do it mañana&#8221; sort of way, but in the &#8220;I&#8217;m too damn lazy to click my mouse 5 times&#8221; way.</p>
<p>So I want this easier. Well, I can make the proxy settings in Firefox easier through the use of the <a href="https://addons.mozilla.org/en-US/firefox/addon/1557/" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/1557/?referer=');">Quick Proxy extension for Firefox</a>. That&#8217;s a good start. It turns on and off the proxy with a single mouse click. But I still have to go into the AWS management web site, fire up a machine then log in via SSH. Let&#8217;s make that part easier!</p>
<p>While it&#8217;s not simple to install and configure, the EC2 command line tools are going to be required in order to make a script that fires up EC2 and then connects to the instance with ssh. I struggled getting the tools to run until I found <a href="http://linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/" onclick="pageTracker._trackPageview('/outgoing/linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/?referer=');">this tutorial</a>.</p>
<p>Your file locations and names may be different than the tutorial. Change appropriately. I followed the tutorial instructions but I created a key named ec2ApiTools which will come in handy later.</p>
<p>After you get the EC2 tool up and running and you can do something like list the available AMIs without an error you can stop with the tutorial. I&#8217;ve been doing a lot of shell scripting lately so I said to myself, &#8220;Self, let&#8217;s script the ssh connection in R!&#8221; For the record, I always end my impredicative in an explanation point which I verbally pronounce as, &#8220;BANG!&#8221; As a result, when I talk to myself it sounds like two 10 year old boys playing cops and robbers. Anyhow, I did script it with R using Rscript. Because I&#8217;m a man who listens to myself.</p>
<p>And since you were kind enough to slog through my channeling the drunken ghost of James Joyce, here&#8217;s my script:</p>
<script src="http://gist.github.com/478930.js"></script>
<p>If you&#8217;re reading this in an RSS reader of for some other reason don&#8217;t see an R script above, <a href="http://gist.github.com/478930#file_start_ec2_instance_ssh.r" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/478930_file_start_ec2_instance_ssh.r?referer=');">here&#8217;s your link</a>.</p>
<p>The only two EC2 API commands I use in the script are  <em>ec2-run-instances</em> which starts the instance and <em>ec2-describe-instances</em> which gives me a list of running instances and their details.The rest of the script is simply parsing the output and figuring out which instances was started last.</p>
<p>I&#8217;ve now set up a launcher panel item that starts the script. Then when I see the xterm window come up I click the little red button in the lower right corner of my browser which switches on the Firefox proxy. Then I&#8217;m safe to surf <a href="http://www.sofmag.com/" onclick="pageTracker._trackPageview('/outgoing/www.sofmag.com/?referer=');">Soldier of Fortune Magazine</a> without the interference of my corp firewall.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bootstrapping the latest R into Amazon Elastic Map Reduce</title>
		<link>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/</link>
		<comments>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 15:38:42 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[EMR]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=736</guid>
		<description><![CDATA[I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the Stack Overflow [r] community. I have no [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg"><img class="alignleft size-full wp-image-737" style="margin: 6px; border: 2px solid black;" title="boot" src="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg" alt="" width="210" height="294" /></a>I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the <a href="http://stackoverflow.com/questions/tagged/r" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/tagged/r?referer=');">Stack Overflow [r] </a>community. I have no idea how crappy coders like me got anything at all done before the Interwebs.</p>
<p>One of the immediate hurdles faced when trying to use AMZN EMR in anger is that the default version of R on EMR is 2.7.1. Yes, that is indeed the version that Moses taught the Israelites to use while they wandered in the desert. I&#8217;m impressed by your religious knowledge. At any rate, all kinds of things go to hell when you try to run code and load packages in 2.7.1. When I first started fighting with EMR the only solution was to backport my code and alter any packages so they would run in 2.7.1. Yes, that is, as Moses would say, a Nudnik. Nudnik also happens to be the pet name my neighbors have given me. They love me. Where was I? Oh yeah, Methusla&#8217;s R version. Recently Amazon released a neat feature called &#8220;Bootstrapping&#8221; for EMR. Before you start thinking about sampling and resampling and all that  crap, let me clarify. This is NOT statistical bootstrapping. It&#8217;s called bootstrapping because it&#8217;s code that runs after each node boots up, but before the mapper procedure runs. So to get a more modern version of R loaded on to each node I set up a little script that updates the sources.list file and then installs the latest version of R. And since I&#8217;m a caring, sharing guy, here&#8217;s my script:</p>
<script src="http://gist.github.com/455962.js"></script>
<p>And if that doesn&#8217;t show up for some reason, you can find all<a href="http://gist.github.com/455962" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/455962?referer=');"> 5 lines of its bash glory here over at github</a>.</p>
<p>If you&#8217;re not conveniently located in Chicago, IL you may want to change your R mirror location. The bootstrap action can be set up from the EMR web GUI or if you&#8217;re firing the jobs off using the elastic-mapreduce command line tools you just add the following option: &#8220;&#8211;bootstrap-action s3://myBucket/bootstrap.sh&#8221; assuming myBucket is the bucket with your script in it and bootstrap.sh contains your bootstrap shell script. And then, as my buddies in Dublin say, &#8220;Bob&#8217;s your mother&#8217;s brother.&#8221;</p>
<p>And before you ask, yes, this slows crap down. I&#8217;ll probably hack together a script that will take the R binaries and other needed upgrades out of Amazon S3 and load them in a bootstrap action which will greatly speed things up. The above example has one clear advantage over loading binaries from S3: It works right now. And remember folks, code that works right now kicks code that &#8220;might work someday&#8221; right in the balls. And then mocks it while it cries.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Chicago R Meetup: Healthier than Drinking Alone</title>
		<link>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/</link>
		<comments>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/#comments</comments>
		<pubDate>Mon, 24 May 2010 20:20:34 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[meetup]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=728</guid>
		<description><![CDATA[I&#8217;m kinda blown away by the number of folks who have joined the Chicago R User Group (RUG) in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');"><img class="alignleft" title="meetup, y'all" src="http://cvillevegan.com/wp-content/uploads/2009/06/Meetup-Logo-1.jpg" alt="" width="251" height="185" /></a>I&#8217;m kinda blown away by the number of folks who have joined the <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">Chicago R User Group (RUG)</a> in the last few weeks. As of this morning we have 65 people signed up for the group and 25 who have said that they are planning on attending the meetup this Thursday (yes, only 3 days away!) I&#8217;m very pleased that this many people in Chicago find the R language interesting and/or valuable. Of course, there is the possibility that some of the 25 who are attending are simply hoping for some free beer. I was a member of a vegan society for 2 years because they had free beer. The week I accidentally showed up with a six pack of White Castle sliders really blew my cover. That&#8217;s how I discovered that you can scare off angry vegans by waving a steaming hot onion covered meat-like patty in their face. True story. And when I say &#8220;true story&#8221; I mean &#8220;total lie&#8221;.</p>
<p>By the way, I&#8217;m already recruiting presenters for next month&#8217;s RUG meetup. And I&#8217;m also looking for locations. So if you have an idea for either, let me know. I promise to not throw any mini burgers at you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/05/chicago-r-meetup-healthier-than-drinking-alone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virtual Conference: R the Language</title>
		<link>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/</link>
		<comments>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/#comments</comments>
		<pubDate>Tue, 04 May 2010 02:27:03 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[vconf]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=722</guid>
		<description><![CDATA[On Tuesday May 4th at 9:30 PM central, 10:30 eastern, I&#8217;ll be giving a live online presentation as part of the Vconf.org open conference series. I&#8217;ll be speaking about R and why I started using R a couple years ago. This is NOT going to be a technical presentation but rather an illustration of how [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin: 11px;" title="vconf logo" src="http://vconf.org/media/images/boxxee.jpg" alt="" width="100" height="100" />On Tuesday May 4th at 9:30 PM central, 10:30 eastern, I&#8217;ll be giving a <a href="http://vconf.org/presentation/r-the-language/" onclick="pageTracker._trackPageview('/outgoing/vconf.org/presentation/r-the-language/?referer=');">live online presentation</a> as part of the Vconf.org open conference series. I&#8217;ll be speaking about R and why I started using R a couple years ago. This is NOT going to be a technical presentation but rather an illustration of how an R convert was created and why R became part of my daily tool set.</p>
<p>If your not familiar with the vconf.org project, you should <a href="http://vconf.org/be-a-speaker/" onclick="pageTracker._trackPageview('/outgoing/vconf.org/be-a-speaker/?referer=');">read a little about it</a>. It&#8217;s just getting started but I love the idea that it&#8217;s not for profit and all presentations are Creative Commons license. You know that cool new technology you&#8217;ve been playing with? Yeah that one. You really should give a vconf about it. I know I&#8217;d like to hear about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/05/virtual-conference-r-the-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simulating Dart Throws in R</title>
		<link>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/</link>
		<comments>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 18:05:20 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[darts]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=435</guid>
		<description><![CDATA[Back in November 2009 Wired wrote an article about some grad students who decided to try to stochastically model throwing darts. Because I don&#8217;t actually read printed material I didn&#8217;t see the article until a couple of months ago. My immediate thought was, &#8220;hey, I drink beer. I throw darts. I build stochastic models. Why [...]]]></description>
			<content:encoded><![CDATA[<p>Back in November 2009 <a href="http://www.wired.com/magazine/2009/11/st_darts/" onclick="pageTracker._trackPageview('/outgoing/www.wired.com/magazine/2009/11/st_darts/?referer=');">Wired wrote an article </a>about some grad students who decided to try to stochastically model throwing darts. Because I don&#8217;t actually read printed material I didn&#8217;t see the article until a couple of months ago. My immediate thought was, &#8220;hey, I drink beer. I throw darts. I build stochastic models. Why haven&#8217;t I done this?&#8221; Well we all know why I haven&#8217;t done this. I have a job and a 2 year old daughter and I like my wife. Well a funny thing happened a few weeks ago. I sat down and was thinking about this problem and then 5 hours later I had a working dart simulator in my text editor. I don&#8217;t remember writing this. So <a href="http://en.wikipedia.org/wiki/Occam%27s_razor" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Occam_27s_razor?referer=');">Occam&#8217;s Razor </a>says that the most likely explanation is the simplest explanation. So clearly I was abducted by aliens and someone broke into my office and built a dart simulator.</p>
<p>I do reinsurance modeling to pay the bills and it immediacy hit me that this type of modeling is very similar to what I do for work. This similarity became the impetus for my presentation at <a href="http://www.rinfinance.com/agenda/" onclick="pageTracker._trackPageview('/outgoing/www.rinfinance.com/agenda/?referer=');">R in Finance 2010 </a>which starts today.</p>
<p>I dumped the dart board code into a github gist which can be found here:</p>
<script src="http://gist.github.com/278148.js"></script>
<p>If the embedded code is not showing up, you can get to it <a href="http://gist.github.com/278148" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/278148?referer=');">directly on Github</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/simulating-dart-throws-in-r-part-1-of-many/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Chicago R User Group&#8230; It&#8217;s for the sexy people!</title>
		<link>http://www.cerebralmastication.com/2010/04/chicago-r-user-group-its-for-the-sexy-people/</link>
		<comments>http://www.cerebralmastication.com/2010/04/chicago-r-user-group-its-for-the-sexy-people/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 16:42:07 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[RUG]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=672</guid>
		<description><![CDATA[I think we all know that Morris Day was talking about when he wrote the lyrics to &#8220;The Bird&#8221;:
Yes! Hold on now, this dance ain&#8217;t for everybody.
Just the sexy people.
White folks, you&#8217;re much too tight.
You gotta shake your head like the black folks.
You might get some tonight.
Look out!
That&#8217;s right, he was talking about the new [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_673" class="wp-caption alignleft" style="width: 179px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/04/Morris-Day-Article.jpg"><img class="size-full wp-image-673 " style="border: 2px solid black; margin: 3px;" title="Morris-Day-Article" src="http://www.cerebralmastication.com/wp-content/uploads/2010/04/Morris-Day-Article.jpg" alt="Give it up for Morris Day and The Mother Fucking Time!!!!" width="169" height="228" /></a><p class="wp-caption-text">Morris Day, y&#39;all! </p></div>
<p>I think we all know that Morris Day was talking about when he wrote the lyrics to &#8220;The Bird&#8221;:</p>
<blockquote><p>Yes! Hold on now, this dance ain&#8217;t for everybody.<br />
Just the sexy people.<br />
White folks, you&#8217;re much too tight.<br />
You gotta shake your head like the black folks.<br />
You might get some tonight.<br />
Look out!</p></blockquote>
<p>That&#8217;s right, he was talking about the new <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">R User Group in Chicago</a>! a.k.a Chicago RUG! We know that R is sexy because <a href="http://www.cerebralmastication.com/2009/02/hal-varian-google%E2%80%99s-chief-economist-thinks-i-am-sexy/">statistical analysis is sexy</a>. That is, if you&#8217;re doing it right! Even Mike Driscol at Dataspora knows that <a href="http://dataspora.com/blog/sexy-data-geeks/" onclick="pageTracker._trackPageview('/outgoing/dataspora.com/blog/sexy-data-geeks/?referer=');">Data Geeks have to get their sexy on</a>.  There is no doubt that Chicago is sexy. The second city is so damned sexy that Karen Abbott wrote <a href="http://www.amazon.com/gp/product/0812975995?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0812975995" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0812975995?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=0812975995&amp;referer=');"><em>Sin in the Second City </em></a>and managed to get it on the NYT best sellers list. She makes me reconsider my agrarian interpretation of Chicago&#8217;s &#8220;meat packing&#8221; heritage. <em><strong>*rim shot* </strong></em>Thank you, thank you.<em><strong> </strong></em>I&#8217;ll be here all week. Try the veal!<em><strong><br />
</strong></em></p>
<p>If you&#8217;re in Chicagoland and reading this blog then you have every reason to get over to the <a href="http://www.meetup.com/ChicagoRUG/" onclick="pageTracker._trackPageview('/outgoing/www.meetup.com/ChicagoRUG/?referer=');">Chicago R User Group web site </a>and sign up! I&#8217;m looking forward to meeting all the Chicago R users in the near future. In case you&#8217;re afraid you won&#8217;t recognize me I&#8217;ll be the one that looks just like Morris Day&#8230; only white&#8230; and not as well dressed&#8230; and kinda nerdy. But otherwise, just like Morris.</p>
<p>Now shut up and dance!</p>
<p><a href="http://listen.grooveshark.com/#/s/The+Bird/2G8yQK" onclick="pageTracker._trackPageview('/outgoing/listen.grooveshark.com/_/s/The+Bird/2G8yQK?referer=');">Morris Day and the Time on Grooveshark! </a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/chicago-r-user-group-its-for-the-sexy-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Math is Statistics</title>
		<link>http://www.cerebralmastication.com/2010/04/the-future-of-math-is-statistics/</link>
		<comments>http://www.cerebralmastication.com/2010/04/the-future-of-math-is-statistics/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 16:01:40 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[TED]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=668</guid>
		<description><![CDATA[The future of math is statistics&#8230; and the language of that future is R:

I&#8217;ve often thought there was way too little &#8220;statistical intuition&#8221; in the workplace. I think Author Benjamin would agree. 
]]></description>
			<content:encoded><![CDATA[<p>The future of math is statistics&#8230; and the language of that future is R:</p>
<p><object width="446" height="326"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/ArthurBenjamin_2009-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/ArthurBenjamin-2009.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=587&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=arthur_benjamin_s_formula_for_changing_math_education;year=2009;theme=how_we_learn;theme=numbers_at_play;theme=bold_predictions_stern_warnings;theme=ted_in_3_minutes;event=TED2009;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="446" height="326" allowFullScreen="true" flashvars="vu=http://video.ted.com/talks/dynamic/ArthurBenjamin_2009-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/ArthurBenjamin-2009.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=587&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=arthur_benjamin_s_formula_for_changing_math_education;year=2009;theme=how_we_learn;theme=numbers_at_play;theme=bold_predictions_stern_warnings;theme=ted_in_3_minutes;event=TED2009;"></embed></object></p>
<p>I&#8217;ve often thought there was way too little &#8220;statistical intuition&#8221; in the workplace. I think Author Benjamin would agree. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/the-future-of-math-is-statistics/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Lookup Performance in R</title>
		<link>http://www.cerebralmastication.com/2010/04/lookup-performance-in-r/</link>
		<comments>http://www.cerebralmastication.com/2010/04/lookup-performance-in-r/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 13:05:12 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=660</guid>
		<description><![CDATA[Rumor has it that Joe Adler, author of the O&#8217;Reilly Book R in a Nutshell, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His latest article is about lookup performance in R. He does a great job giving code [...]]]></description>
			<content:encoded><![CDATA[<p>Rumor has it that Joe Adler, author of the O&#8217;Reilly Book <a href="http://www.amazon.com/gp/product/059680170X?ie=UTF8&amp;tag=riskthou-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=059680170X" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/059680170X?ie=UTF8_amp_tag=riskthou-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=059680170X&amp;referer=');">R in a Nutshell</a>, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His <a href="http://broadcast.oreilly.com/2010/03/lookup-performance-in-r.html" onclick="pageTracker._trackPageview('/outgoing/broadcast.oreilly.com/2010/03/lookup-performance-in-r.html?referer=');">latest article is about lookup performance in R</a>. He does a great job giving code samples and explaining what he is doing. Worth reading, for sure.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/04/lookup-performance-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Real-World, Real-Time Analytics</title>
		<link>http://www.cerebralmastication.com/2010/02/real-world-real-time-analytics/</link>
		<comments>http://www.cerebralmastication.com/2010/02/real-world-real-time-analytics/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 16:54:32 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rockstars]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=645</guid>
		<description><![CDATA[Stop wasting time reading my drivel. You need to head over the the DataWrangling.com blog and read Peter Skomoroch&#8217;s interview with Bradford Cross of FlightCaster.
Peter wrote up this interview back in August 2009, so I&#8217;m a little late to this party. There&#8217;s some really great quotes in this interview. Here&#8217;s a few of my fav [...]]]></description>
			<content:encoded><![CDATA[<p>Stop wasting time reading my drivel. You need to head over the the DataWrangling.com blog and <a href="http://www.datawrangling.com/how-flightcaster-squeezes-predictions-from-flight-data" onclick="pageTracker._trackPageview('/outgoing/www.datawrangling.com/how-flightcaster-squeezes-predictions-from-flight-data?referer=');">read Peter Skomoroch&#8217;s interview with Bradford Cross </a>of <a href="http://www.flightcaster.com/" onclick="pageTracker._trackPageview('/outgoing/www.flightcaster.com/?referer=');">FlightCaster</a>.</p>
<p>Peter wrote up this interview back in August 2009, so I&#8217;m a little late to this party. There&#8217;s some really great quotes in this interview. Here&#8217;s a few of my fav quotes from Cross:</p>
<blockquote><p>At Google, the research scientists prototype in python and R, and then port to C++ for the real scalable map reduce runs.</p></blockquote>
<blockquote><p>Building layer upon layer of abstraction is a big key&#8230;    The technical term for this is “wrap the crap.”</p></blockquote>
<p>Here&#8217;s a problem I think anyone who works with data and models can relate to:</p>
<blockquote><p>I made a lot of mistakes early in my career in building trading models where I let me theories get too far ahead of what I could really test in practice. That is not a good place to be. Unfortunately, this is an easy mistake to make.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/02/real-world-real-time-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>You can Hadoop it! It&#8217;s elastic! Boogie woogie woog-ie!</title>
		<link>http://www.cerebralmastication.com/2010/02/you-can-hadoop-it-its-elastic-boogie-woogie-woog-ie/</link>
		<comments>http://www.cerebralmastication.com/2010/02/you-can-hadoop-it-its-elastic-boogie-woogie-woog-ie/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 18:31:23 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=592</guid>
		<description><![CDATA[I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng  (肏你娘) which your friend in grad school told you means &#8220;Live happy with many blessings&#8221;. Trust me, I&#8217;ve been hanging with Madam Wu and she told me [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_594" class="wp-caption alignleft" style="width: 271px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/02/bad_egg.png"><img class="size-full wp-image-594 " style="border: 1px solid black; margin: 3px;" title="I paid an old man in Chinatown $200 for this!" src="http://www.cerebralmastication.com/wp-content/uploads/2010/02/bad_egg.png" alt="" width="261" height="144" /></a><p class="wp-caption-text">This blog&#39;s name in Chinese! </p></div>
<p>I just came back from the future and let me be the first to tell you this: Learn some Chinese. And more than just cào nǐ niáng  (肏你娘) which your friend in grad school told you means &#8220;Live happy with many blessings&#8221;. Trust me, I&#8217;ve been hanging with Madam Wu and she told me it doesn&#8217;t mean that.</p>
<p>So how did I travel to the future to visit with Madam Wu, you ask? Well the short answer is Hadoop. Yeah, the cute little elephant. <a href="http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/">As I have told you before</a>, multicore makes your R code run fast by using worm holes to shoot your results back from the future. Well Hadoop actually takes you to the future on the back of an elephant and you can bring your own results back! I couldn&#8217;t make this up if I tried, so you know it&#8217;s true! And what&#8217;s fantastic about all of this is Hadoop works with R! And Amazon will let you rent a time traveling elephant through their <a href="http://aws.amazon.com/elasticmapreduce/" onclick="pageTracker._trackPageview('/outgoing/aws.amazon.com/elasticmapreduce/?referer=');">Elastic MapReduce service</a>! I think Amazon coined the term &#8220;Time Travel as a Service&#8221; or TTaaS  generally pronounced as &#8220;ta-tas&#8221; in <a href="http://www.savethetatas.com/" onclick="pageTracker._trackPageview('/outgoing/www.savethetatas.com/?referer=');">the industry</a>. If you are a CTO be sure and use this in your next &#8220;vision statement&#8221; pitch so everyone will know you&#8217;re hip to all this cloud stuff.</p>
<p>So you use R and you want to travel into the future on the back of an elephant to visit Madam Wu and get your model results back, don&#8217;t you? Well it&#8217;s a damn good thing you read this blog because I&#8217;m going to give you the keys to the Wu dynasty and a little 福寿 while we&#8217;re at it.</p>
<p>I&#8217;ve never had an original thought in my life so I started with <a href="http://developer.amazonwebservices.com/connect/thread.jspa?messageID=128995&amp;#128995" onclick="pageTracker._trackPageview('/outgoing/developer.amazonwebservices.com/connect/thread.jspa?messageID=128995_amp_128995&amp;referer=');">this discussion </a>over at the AMZN E M/R discussion forum. Peter Skomoroch from <a href="http://www.datawrangling.com/" onclick="pageTracker._trackPageview('/outgoing/www.datawrangling.com/?referer=');">Data Wrangling </a>gives a very good example with all the data and code provided so you can run it yourself.  Pete&#8217;s example really shakes the  yáng guǐzi, as we say in the future. In addition I read the documentation for David Rosenberg&#8217;s <a href="http://docs.google.com/viewer?url=http%3A%2F%2Fcran.r-project.org%2Fweb%2Fpackages%2FHadoopStreaming%2FHadoopStreaming.pdf" onclick="pageTracker._trackPageview('/outgoing/docs.google.com/viewer?url=http_3A_2F_2Fcran.r-project.org_2Fweb_2Fpackages_2FHadoopStreaming_2FHadoopStreaming.pdf&amp;referer=');">HadoopStreaming package</a> which was good for insight, but I didn&#8217;t use the package as it&#8217;s really focused on the &#8216;big data&#8217; problem.</p>
<div id="attachment_639" class="wp-caption alignleft" style="width: 218px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/02/hadoop-elephant.jpeg"><img class="size-full wp-image-639 " style="border: 1px solid black; margin: 3px;" title="hadoop elephant" src="http://www.cerebralmastication.com/wp-content/uploads/2010/02/hadoop-elephant.jpeg" alt="" width="208" height="156" /></a><p class="wp-caption-text">That elephant is so freaking cute! </p></div>
<p>Prior to my foray into time travel, I knew that Hadoop could be used to process big text files and do something like rip out all the links and count them. But I thought that Hadoop was all about processing big data. I never paid attention to the big Hadoop elephant in the room because I don&#8217;t have big data. I have big CPU hogging models (mostly slow because I don&#8217;t code worth a shit). What got me reconsidering my world view was <cite></cite><a onclick="pageTracker._trackPageview('/outgoing/www.johnmyleswhite.com/?referer=');pageTracker._trackPageview('/outgoing/www.johnmyleswhite.com?referer=http%3A%2F%2Fwww.cerebralmastication.com%2F');" rel="external nofollow" href="http://www.johnmyleswhite.com/">John Myles White</a>&#8217;s comment on my <a href="http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/">multicore post </a>earlier. John encouraged me to look into running my simulations on AMZN&#8217;s E M/R service using Hadoop streaming. So instead of giving Hadoop  a big fat text file to parse, I just gave it a text file with 10,000 rows each containing an integer from 1:10,000. Then I refactored my R code to read a line from stdin, trim it down to just the integer, and then go run the simulation with that number. When done I had it serialize the resulting model output and return that to stdout. Hadoop takes care of chopping up the input and pulling together the output.</p>
<p>I learned a few &#8220;gotchas&#8221; or, as we say in the future: 臭婊子(I think that should be plural). I&#8217;ll do a whole blog post on gotchas soon, but here&#8217;s the bullet points:</p>
<ul>
<li>AMZN is currently running the version of Debian Linux named Lenny which has version 2.7.1 of R installed. No matter what the documentation says, don&#8217;t let Lenny tend to the rabbits.</li>
<li>Test all code by firing up an interactive Pig instance and logging in as &#8216;hadoop&#8217;. Instead of running Pig, run R and test your code. And as it says in the FAQ: &#8220;The Pig don&#8217;t care either way. &#8221; Which, despite sounding like buggery, is the truth.</li>
<li>If your code runs inside of R on a Hadoop instance, drop back to the command line on the Hadoop instance and run &#8216;cat infile.txt | yourMapper.R | sort | yourReducer.R &gt; outfile.txt&#8217;. This pipes your input file into your mapper file which does it&#8217;s thing and then pipes the results to your reducer file which then &#8220;pumps up the jam&#8221; into an output file.  What you see in the outfile.txt is what Hadoop will produce. So it you don&#8217;t like what you see, you better do some more coding.</li>
<li>You CAN load packages into R in a Hadoop instance running in AMZN E M/R. There are a few caveats, of course:</li>
</ul>
<ol>
<li>Your package has to work in R 2.7.1. (until AMZN upgrades to the next stable version of Debian.</li>
<li>As far as I can tell, all the output has to come out of stdout. So if you want to end up with R objects which you use for other things, you should get comfortable with the serialize() command and reading text files back into R. Which, as you can see <a href="http://stackoverflow.com/questions/2258511/r-serialize-objects-to-text-file-and-back-again" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/2258511/r-serialize-objects-to-text-file-and-back-again?referer=');">from this question</a>, I am not yet comfortable with.</li>
<li>There will be multiple instances of R running on every machine. So if they are all trying to download a package to the same directory, you are going to get file lock errors. One solution is to have each R instance create a directory for packages that includes the PID of the R instances. That way there&#8217;s no possibility for a conflict! Here&#8217;s an example of how I load the Hmisc package:</li>
<p><script src="http://gist.github.com/304262.js?file=AMZNloadPackage.R"></script></ol>
<ul>
<li>You&#8217;ll probably want to provide some data to R. This is done by uploading your files to S3 and then passing the &#8220;-cacheFile&#8221; option to Hadoop. To get the plyr package to load in R 2.7.1 I had to edit the package. I then uploaded the altered package thusly:</li>
</ul>
<blockquote><p>-cacheFile s3n://rdata/plyr_0.1.9.tar.gz#plyr_0.1.9.tar.gz</p></blockquote>
<p>More to come later. I&#8217;ve gotta get back to the future.</p>
<div id="attachment_631" class="wp-caption alignleft" style="width: 314px"><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/02/christopher_lloyd.jpg"><img class="size-full wp-image-631" style="border: 1px solid black; margin: 3px;" title="christopher_lloyd" src="http://www.cerebralmastication.com/wp-content/uploads/2010/02/christopher_lloyd.jpg" alt="" width="304" height="224" /></a><p class="wp-caption-text">You hold the elephant and I&#39;ll plug this in. </p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/02/you-can-hadoop-it-its-elastic-boogie-woogie-woog-ie/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
