<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cerebral Mastication &#187; ec2</title>
	<atom:link href="http://www.cerebralmastication.com/tag/ec2/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cerebralmastication.com</link>
	<description>Something to Chew On</description>
	<lastBuildDate>Fri, 16 Jul 2010 22:07:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Starting an EC2 Machine Then Setting Up a Socks Proxy&#8230; From R!</title>
		<link>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/</link>
		<comments>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 22:07:12 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=748</guid>
		<description><![CDATA[I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I alluded to using Amazon EC2 as a way to get around [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg"><img class="alignleft size-full wp-image-765" title="firewallkat" src="http://www.cerebralmastication.com/wp-content/uploads/2010/07/firewallkat.jpg" alt="" width="361" height="312" /></a>I do some work from home, some work from an office in Chicago and some work on the road. It&#8217;s not uncommon for me to want to tunnel all my web traffic through a VPN tunnel. In one of my previous blog posts I <a href="http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/">alluded to using Amazon EC2 as a way to get around your corporate IT</a> <span style="text-decoration: line-through;">mind control voyeurs</span> service providers. This tunneling method is one of the 5 or so ways I have used EC2 to set up a tunnel. I used to fire these tunnels up manually using the <a href="https://console.aws.amazon.com" onclick="pageTracker._trackPageview('/outgoing/console.aws.amazon.com?referer=');">Amazon AWS Management Console</a> then opening a shell prompt and entering:</p>
<blockquote>
<pre>ssh -i ~/MyPersonalKey.pem -D 9999 root@ec2-184-73-41-72.compute-1.amazonaws.com</pre>
</blockquote>
<p>the -i switch tells ssh to use my RSA identity file stored in ~/MyPersonalKey.pem</p>
<p>the machine name (ec2-184-73-41-72.compute-1.amazonaws.com) I get from the AWS Management Console</p>
<p>the -D is the magic. -D opens an dynamic port forwarding tunnel between my Linux box and the EC2 machine. This is, for all intent and purposes, an encrypted SOCKS4 proxy on port 9999 of localhost. Then I just have to change my proxy settings in Firefox to use use a SOCKS host.</p>
<p>Now that&#8217;s all pretty easy. And I like easy. But it&#8217;s not easy ENOUGH. You see, I&#8217;m lazy. I&#8217;m not just lazy in the &#8220;I&#8217;ll do it mañana&#8221; sort of way, but in the &#8220;I&#8217;m too damn lazy to click my mouse 5 times&#8221; way.</p>
<p>So I want this easier. Well, I can make the proxy settings in Firefox easier through the use of the <a href="https://addons.mozilla.org/en-US/firefox/addon/1557/" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/1557/?referer=');">Quick Proxy extension for Firefox</a>. That&#8217;s a good start. It turns on and off the proxy with a single mouse click. But I still have to go into the AWS management web site, fire up a machine then log in via SSH. Let&#8217;s make that part easier!</p>
<p>While it&#8217;s not simple to install and configure, the EC2 command line tools are going to be required in order to make a script that fires up EC2 and then connects to the instance with ssh. I struggled getting the tools to run until I found <a href="http://linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/" onclick="pageTracker._trackPageview('/outgoing/linuxsysadminblog.com/2009/06/howto-get-started-with-amazon-ec2-api-tools/?referer=');">this tutorial</a>.</p>
<p>Your file locations and names may be different than the tutorial. Change appropriately. I followed the tutorial instructions but I created a key named ec2ApiTools which will come in handy later.</p>
<p>After you get the EC2 tool up and running and you can do something like list the available AMIs without an error you can stop with the tutorial. I&#8217;ve been doing a lot of shell scripting lately so I said to myself, &#8220;Self, let&#8217;s script the ssh connection in R!&#8221; For the record, I always end my impredicative in an explanation point which I verbally pronounce as, &#8220;BANG!&#8221; As a result, when I talk to myself it sounds like two 10 year old boys playing cops and robbers. Anyhow, I did script it with R using Rscript. Because I&#8217;m a man who listens to myself.</p>
<p>And since you were kind enough to slog through my channeling the drunken ghost of James Joyce, here&#8217;s my script:</p>
<script src="http://gist.github.com/478930.js"></script>
<p>If you&#8217;re reading this in an RSS reader of for some other reason don&#8217;t see an R script above, <a href="http://gist.github.com/478930#file_start_ec2_instance_ssh.r" onclick="pageTracker._trackPageview('/outgoing/gist.github.com/478930_file_start_ec2_instance_ssh.r?referer=');">here&#8217;s your link</a>.</p>
<p>The only two EC2 API commands I use in the script are  <em>ec2-run-instances</em> which starts the instance and <em>ec2-describe-instances</em> which gives me a list of running instances and their details.The rest of the script is simply parsing the output and figuring out which instances was started last.</p>
<p>I&#8217;ve now set up a launcher panel item that starts the script. Then when I see the xterm window come up I click the little red button in the lower right corner of my browser which switches on the Firefox proxy. Then I&#8217;m safe to surf <a href="http://www.sofmag.com/" onclick="pageTracker._trackPageview('/outgoing/www.sofmag.com/?referer=');">Soldier of Fortune Magazine</a> without the interference of my corp firewall.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/07/starting-an-ec2-machine-then-setting-up-a-socks-proxy-from-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Using the R multicore package in Linux with wild and passionate abandon</title>
		<link>http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/</link>
		<comments>http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 19:57:20 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=562</guid>
		<description><![CDATA[One of my primary uses for R is to build stochastic simulations of insurance portfolios and reinsurance treaties. It&#8217;s not uncommon for each of my simulations to take 20 seconds or more to complete (if you&#8217;re doing the math, that&#8217;s 55 hours for 10K sims or, approximately 453 games of solitaire) . Initially I ran [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/02/amd_mc_processing.jpg"><img class="alignleft size-full wp-image-586" style="border: 0pt none; margin: 20px;" title="amd_mc_processing" src="http://www.cerebralmastication.com/wp-content/uploads/2010/02/amd_mc_processing.jpg" alt="" width="214" height="193" /></a>One of my primary uses for R is to build stochastic simulations of insurance portfolios and reinsurance treaties. It&#8217;s not uncommon for each of my simulations to take 20 seconds or more to complete (if you&#8217;re doing the math, that&#8217;s 55 hours for 10K sims or, approximately 453 games of solitaire) . Initially I ran my sims in R running on an <a href="http://www.virtualbox.org/" onclick="pageTracker._trackPageview('/outgoing/www.virtualbox.org/?referer=');">Oracle VirtualBox </a>(Oracle now owns Virtualbox! *gasp* ) running Ubuntu. Lately I&#8217;ve moved to running my sims on EC2 machines. I&#8217;m not yet doing RMPI clustering, although that is on my roadmap. Currently I just fire up a couple of 8 core instances and run 5K sims on each one then FTP the results back to my desktop. It&#8217;s not very sexy, but it gets the job done&#8230; I guess the same could be said of myself, except substitute &#8220;makes slurping sounds eating udon&#8221; in the place of &#8220;gets the job done.&#8221;</p>
<p>When running processor intensive crap (that&#8217;s a stochastic modeling term) the single threaded nature of R is painful. In Linux or Mac (i.e. NOT Windows) the <a href="http://www.rforge.net/doc/packages/multicore/multicore.html" onclick="pageTracker._trackPageview('/outgoing/www.rforge.net/doc/packages/multicore/multicore.html?referer=');">multicore package </a>is a real godsend. I did a quick code review and, from what I can tell, multicore exploits worm holes to travel back in time and reports your results in a fraction of the time you would expect it to take. Seriously. I expect that as the code matures my computer will fill up with simulation results from simulations which I have not even coded yet. It&#8217;s almost like magic, except without the rabbit and hat.</p>
<p>The crux of the package is a parallel-ized version of lapply() called mclapply(). I believe the mc stands for &#8216;magic carpet&#8217; and is an allusion to the worm hole technology. So how does one harness this package for <span style="text-decoration: line-through;">nefarious self interest </span>doing parallel operations in R? The ultra short answer is: write your R code so that the most processor intensive bit is done with an lapply() function. Then replace the lapply() with mclapply().  Of course you have to load the multicore package before you run it. But that&#8217;s basically it.</p>
<p>How I implement mcapply() is thusly: I build a table with all my random draws for my simulations. So if I have 20 variables and want to run 10,000 simulations then I&#8217;ll build a data frame with all 200,000 values (generally 10K rows and 21 columns for 20 variables + and index). The index keeps track of the draw number. Then I have code that performs the &#8216;valuation&#8217; based on a single observation of the 20 variables. I wrap the valuation step in a function and then call the valuation process 10,000 times with mclapply(). So it might look something like this:</p>
<blockquote><p>myOutput &lt;- mclapply( drawList, function(x) valuationReturns(drawNumber=x))</p></blockquote>
<p>The drawList object is simply a list of the possible indexes (i.e. 1:10000). When the code has iterated over each value from drawList the results will be in the myOutput object. Tada!</p>
<p>I recommend the <a href="http://htop.sourceforge.net/" onclick="pageTracker._trackPageview('/outgoing/htop.sourceforge.net/?referer=');">htop program </a>for tracking what&#8217;s going on with processor utilization in Linux (I presume Mac too if you ask Steve Jobs nicely). If everything is cranking well, and you have 8 cores, you might see an image that looks something like this:</p>
<p><a href="http://www.cerebralmastication.com/wp-content/uploads/2010/02/r-on-ec21.png"><img class="size-full wp-image-564 alignnone" title="r on ec2" src="http://www.cerebralmastication.com/wp-content/uploads/2010/02/r-on-ec21.png" alt="" width="535" height="400" /></a></p>
<p>I don&#8217;t understand time travel, but I&#8217;ve found that I have better luck if I set mc.preschedule=FALSE. Apparently prescheduled magic carpets are finicky. If I leave mc.preschedule to the default of TRUE then I find that often some of my cores go underutilized.</p>
<p>Let me know if you have other multicore tips and tricks.</p>
<p>If you want to give me shit for running my simulations as root, feel free. I&#8217;m impervious to your &#8220;best practices&#8221; mumbo jumbo. La la la la la la!! Not listening!</p>
<p>Special thanks to <a href="http://www.cis.udel.edu/~cavazos/index.php?page=multicore-programming" onclick="pageTracker._trackPageview('/outgoing/www.cis.udel.edu/_cavazos/index.php?page=multicore-programming&amp;referer=');">John Cavazos over at the University of Delaware</a> from whom I stole the MC for Dummies image. John, your a gentleman and a humble scholar. Damn few of us left.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Using Amazon EC2 to Thwart Crappy Internal IT Services</title>
		<link>http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/</link>
		<comments>http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 15:28:26 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=391</guid>
		<description><![CDATA[
The alternative title of this blog post is &#8220;How to get your sorry ass fired by violating your internal IT policies.&#8221; So keep that in mind as you read this.
I say lots of silly crap. Twitter allows me the pleasure of sharing this blather with the world. I was a little surprised that of all [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://twitter.com/CMastication/status/5294564298" onclick="pageTracker._trackPageview('/outgoing/twitter.com/CMastication/status/5294564298?referer=');"><img class="alignleft size-full wp-image-393" style="margin: 6px;" title="ec2 tweet" src="http://www.cerebralmastication.com/wp-content/uploads/2009/11/ec2-tweet.PNG" alt="ec2 tweet" width="417" height="233" /></a></p>
<p>The alternative title of this blog post is &#8220;How to get your sorry ass fired by violating your internal IT policies.&#8221; So keep that in mind as you read this.</p>
<p>I say lots of silly crap. Twitter allows me the pleasure of sharing this blather with the world. I was a little surprised that of all the things I have said over the last few months the above Tweet received the most discussion. Apparently this tweet captured the imagination and consternation of some fellow Tweeters. I had people follow up with me and basically ask, &#8220;what do you mean?&#8221; Twitter is good for a sound bite, but less so for an elaborate answer. Which brings us to this:</p>
<p>What are the top ways Amazon EC2 can allow a business user to escape the manipulative and counterproductive grip of corporate IT? Well I&#8217;m glad you asked!</p>
<p><strong>1) Over-restrictive web filtering policies</strong>:  When I worked as a risk manager for a Fortune 500 insurance firm I was shocked on the first day when I could not search Google Groups. At the time Google Groups was one of my favorite resources for figuring out everything from SQL syntax to Excel formulas. The firm, like most firms, outsourced the filtering of web content. Apparently they signed up for &#8220;Super Freaking Restrictive&#8221; filtering. I could not even search the web for &#8220;Ubuntu&#8221; as all sites with the word Ubuntu in the title or with the world &#8220;Ubuntu&#8221; passed as a form submission were blocked. Apparently Ubuntu is not just a Linux distro, but also a militant organization of African computer programmers, or something. So how did I get around this with EC2? I would fire up an EC2 Ubuntu instance running Squid proxy before I left home, then ssh into the cloud from work and use a little SSH port forwarding to route my web traffic through the ssh connection and out via Squid. I set up my EC2 instance to listen for ssh on port 443 and my firm&#8217;s firewall would let the connection pass as it assumed it was simply ssl traffic into Amazon. Brilliant!</p>
<p><strong>2) Under powered database servers: </strong>At another point I was responsible for data analytics on a portfolio of insurance policies. I had to join together data from multiple systems (underwriting, admin, claims, etc.). The firm was an Oracle shop and none of the Oracle machines had enough user space for me to make the big ass join that had to be made in order to cobble together my analytics. For a while I hobbled along using PROC SQL in SAS to bring all the data together inside of SAS running on a PC. Finally I just gave up and built my own data mart in the cloud. And I could totally cut my internal IT politics out of the system. Whew, once the politics and begging for resources was over I could kick ass at analytics without having to beg borrow and plead for permissions and space.</p>
<p><strong>3) Failure to backup desktop machines / inadequate shared drive space: </strong>Another experience I had was with a firm that decided it was a good policy to NOT back up desktop PCs at all. Each department was given shared drive space on a central server where &#8220;business critical&#8221; files were supposed to be kept (whatever the hell that means). Only the files on the central server were backed up. I was in the risk management department (ironically) and we had a whopping 100 MB allocated to us. Yes, this was 2004 and 100 MB was not enough to hold 2 years of risk reviews. Not to mention any ad hoc analysis and all the supporting documents. So everyone had their desktop drives, at least one USB drive, and no off site backup. It was during this period that I discovered <a href="http://www.jungledisk.com/" onclick="pageTracker._trackPageview('/outgoing/www.jungledisk.com/?referer=');">Jungle Disk </a>which allows client side encrypted data to be backed up to Amazon! Off site backup problem solved! And, once again, corp IT cut out of the system. (yes, this is a use of S3, not EC2) By the way, I paid for backups out of my own pocket because I felt it was very important. Well, I did have the firm buy me books which I happily kept when I left. We&#8217;ll call it even.</p>
<p>Let me reiterate that all three of the above uses <span style="text-decoration: line-through;">may have</span> <span style="color: #000000;">put me in direct violation of my corporate IT policies. And let me also state that ultimately I found a job at a firm where internal IT sees their job as helping the business units get crap done. If you are an IT professional and you find your self thinking, &#8220;damn, I have to make sure I restrict my users from all of these crafty uses of EC2&#8243; then, <strong><span style="color: #993300;">jackass,you are the problem with your firm&#8217;s IT department</span></strong>. If you see your job as stopping users then you are a useless burden on your firm and you should be not only fired, but spat upon. The way to prevent users from doing these, and other &#8220;shadow IT&#8221; behaviors is to <strong><span style="color: #993300;">provide the IT services that help your users be awesom<span style="color: #993300;">e</span></span><span style="color: #993300;">!</span></strong> If you do that then you don&#8217;t have to worry about what your users are up to. They&#8217;ll be too damn busy being awesome to have time to mess with Amazon EC2.</span></p>
<p>All the examples above took place at previous places of employment. I currently use Amazon EC2 in order to scale some of my analytics, but it is done with the knowledge and support of my internal IT team. They fully understand what I am doing and they want to help me be awesome at analysis. It&#8217;s amazing how much less time I am wasting these days now that I don&#8217;t have to be so creative about avoiding the manipulative and counterproductive intervention of my internal IT team.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2009/11/using-amazon-ec2-to-thwart-crappy-internal-it-services/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>
