<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cerebral Mastication &#187; EMR</title>
	<atom:link href="http://www.cerebralmastication.com/tag/emr/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cerebralmastication.com</link>
	<description>Something to Chew On</description>
	<lastBuildDate>Wed, 07 Dec 2011 13:08:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Bootstrapping the latest R into Amazon Elastic Map Reduce</title>
		<link>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/</link>
		<comments>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 15:38:42 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[EMR]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=736</guid>
		<description><![CDATA[I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the Stack Overflow [r] community. I have no [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg"><img class="alignleft size-full wp-image-737" style="margin: 6px; border: 2px solid black;" title="boot" src="http://www.cerebralmastication.com/wp-content/uploads/2010/06/boot.jpg" alt="" width="210" height="294" /></a>I&#8217;ve been continuing to muck around with using R inside of Amazon Elastic Map reduce jobs. I&#8217;ve been working on abstracting the lapply() logic so that R will farm the pieces out to Amazon EMR. This is coming along really well, thanks in no small part to the <a href="http://stackoverflow.com/questions/tagged/r" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/tagged/r?referer=');">Stack Overflow [r] </a>community. I have no idea how crappy coders like me got anything at all done before the Interwebs.</p>
<p>One of the immediate hurdles faced when trying to use AMZN EMR in anger is that the default version of R on EMR is 2.7.1. Yes, that is indeed the version that Moses taught the Israelites to use while they wandered in the desert. I&#8217;m impressed by your religious knowledge. At any rate, all kinds of things go to hell when you try to run code and load packages in 2.7.1. When I first started fighting with EMR the only solution was to backport my code and alter any packages so they would run in 2.7.1. Yes, that is, as Moses would say, a Nudnik. Nudnik also happens to be the pet name my neighbors have given me. They love me. Where was I? Oh yeah, Methusla&#8217;s R version. Recently Amazon released a neat feature called &#8220;Bootstrapping&#8221; for EMR. Before you start thinking about sampling and resampling and all that  crap, let me clarify. This is NOT statistical bootstrapping. It&#8217;s called bootstrapping because it&#8217;s code that runs after each node boots up, but before the mapper procedure runs. So to get a more modern version of R loaded on to each node I set up a little script that updates the sources.list file and then installs the latest version of R. And since I&#8217;m a caring, sharing guy, here&#8217;s my script:</p>
<script src="http://gist.github.com/455962.js"></script>
<p>And if that doesn&#8217;t show up for some reason, you can find all<a href="http://gist.github.com/455962" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/455962?referer=');"> 5 lines of its bash glory here over at github</a>.</p>
<p>If you&#8217;re not conveniently located in Chicago, IL you may want to change your R mirror location. The bootstrap action can be set up from the EMR web GUI or if you&#8217;re firing the jobs off using the elastic-mapreduce command line tools you just add the following option: &#8220;&#8211;bootstrap-action s3://myBucket/bootstrap.sh&#8221; assuming myBucket is the bucket with your script in it and bootstrap.sh contains your bootstrap shell script. And then, as my buddies in Dublin say, &#8220;Bob&#8217;s your mother&#8217;s brother.&#8221;</p>
<p>And before you ask, yes, this slows crap down. I&#8217;ll probably hack together a script that will take the R binaries and other needed upgrades out of Amazon S3 and load them in a bootstrap action which will greatly speed things up. The above example has one clear advantage over loading binaries from S3: It works right now. And remember folks, code that works right now kicks code that &#8220;might work someday&#8221; right in the balls. And then mocks it while it cries.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/06/bootstrapping-the-latest-r-into-amazon-elastic-map-reduce/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

