<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cerebral Mastication &#187; plyr</title>
	<atom:link href="http://www.cerebralmastication.com/tag/plyr/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cerebralmastication.com</link>
	<description>Something to Chew On</description>
	<lastBuildDate>Fri, 16 Jul 2010 22:07:12 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Struggling with apply() in R</title>
		<link>http://www.cerebralmastication.com/2009/12/struggling-with-apply-in-r/</link>
		<comments>http://www.cerebralmastication.com/2009/12/struggling-with-apply-in-r/#comments</comments>
		<pubDate>Fri, 11 Dec 2009 19:30:55 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[apply]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=432</guid>
		<description><![CDATA[It&#8217;s common knowledge that I struggle wrapping my head around the apply functions in R. That is illustrated very clearly in the following discussion on Stack Overflow:

Dirk&#8217;s comment is actually spot on. I&#8217;ve asked the same damn question at least 4-5 times. Only I didn&#8217;t really understand it was the same question. That&#8217;s one of [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s common knowledge that I struggle wrapping my head around the apply functions in R. That is illustrated very clearly in the <a href="http://stackoverflow.com/questions/1355355/how-to-avoid-a-loop-in-r-selecting-items-from-a-list" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/1355355/how-to-avoid-a-loop-in-r-selecting-items-from-a-list?referer=');">following discussion </a>on Stack Overflow:</p>
<p><a href="http://stackoverflow.com/questions/1355355/how-to-avoid-a-loop-in-r-selecting-items-from-a-list" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/1355355/how-to-avoid-a-loop-in-r-selecting-items-from-a-list?referer=');"><img class="alignnone size-full wp-image-433" style="border: 2px solid black; margin: 2px;" title="apply_struggle" src="http://www.cerebralmastication.com/wp-content/uploads/2009/12/apply_struggle.PNG" alt="apply_struggle" width="536" height="217" /></a></p>
<p>Dirk&#8217;s comment is actually spot on. I&#8217;ve asked the same damn question at least 4-5 times. Only I didn&#8217;t really understand it was the same question. That&#8217;s one of the problems of not really being good at something; it&#8217;s hard to think abstractly about it. I&#8217;m not really good at R, so sometimes I don&#8217;t realize that multiple concepts are related. As I talk with other new users of R it&#8217;s clear that unless they come from a programming language with an apply-esque construct they likely are struggling with R. I think most of the confusion comes from a) not understanding what data format apply() is going to return and b) not understanding anonymous functions.</p>
<p>With this in mind I did a little screencast illustrating how this struggle plays out for a new users. I also show why I use the plyr package for much of the stuff other folks use apply() for.</p>
<p>Any feedback you have is appreciated. This is my first stab at a screencast, so I am still trying to figure out the best approach/method as well as how many drinks puts me on the <a href="http://xkcd.com/323/" onclick="pageTracker._trackPageview('/outgoing/xkcd.com/323/?referer=');">Ballmer Peak</a>.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/tdoIwXT_lP8" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/tdoIwXT_lP8"></embed></object></p>
<p><strong>EDIT</strong>: it&#8217;s been pointed out that I misuse some terminology a number of times. I should have named my year vector &#8220;yearVector.&#8221; By calling it &#8220;yearList&#8221; I then refer to the vector as a list. I was using &#8220;list&#8221; in the vernacular, but since list is a specific R data structure it is confusing that I named a vector a name with &#8220;list&#8221; in it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2009/12/struggling-with-apply-in-r/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>A Fast Intro to PLYR for R</title>
		<link>http://www.cerebralmastication.com/2009/08/a-fast-intro-to-plyr-for-r/</link>
		<comments>http://www.cerebralmastication.com/2009/08/a-fast-intro-to-plyr-for-r/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 20:00:52 +0000</pubDate>
		<dc:creator>JD Long</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.cerebralmastication.com/?p=339</guid>
		<description><![CDATA[I&#8217;m not dead yet! Although it has been rumored that I am. The new job is going great and I&#8217;m thrilled to be with a new firm doing interesting work alongside smart people. It makes me seem smarter by simple association.
There&#8217;s been a lot going on recently in the R user community. There was an [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-340" title="pliers" src="http://www.cerebralmastication.com/wp-content/uploads/2009/08/pliers.jpg" alt="pliers" width="235" height="87" />I&#8217;m not dead yet! Although it has been rumored that I am. The new job is going great and I&#8217;m thrilled to be with a new firm doing interesting work alongside smart people. It makes me seem smarter by simple association.</p>
<p>There&#8217;s been a lot going on recently in the R user community. There was an<a href="http://en.oreilly.com/oscon2009/public/schedule/detail/10432" onclick="pageTracker._trackPageview('/outgoing/en.oreilly.com/oscon2009/public/schedule/detail/10432?referer=');"> R flash mob of Stack Overflow</a> which resulted in a noticeable increase in the number of <a href="http://stackoverflow.com/questions/tagged/r" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/tagged/r?referer=');">R questions and answers</a> in SO. I&#8217;ve been blown away by the quality of the <a href="http://stackoverflow.com/questions/tagged?tagnames=r&amp;sort=stats&amp;pagesize=50" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/tagged?tagnames=r_amp_sort=stats_amp_pagesize=50&amp;referer=');">participants</a>. There has also been increased quality discussions on Twitter which are being <a href="http://twitter.com/#search?q=%23rstats" onclick="pageTracker._trackPageview('/outgoing/twitter.com/_search?q=_23rstats&amp;referer=');">tagged with #rstats</a>. These changes in the community have <a href="http://www.iq.harvard.edu/blog/sss/archives/2009/08/the_changing_na.shtml" onclick="pageTracker._trackPageview('/outgoing/www.iq.harvard.edu/blog/sss/archives/2009/08/the_changing_na.shtml?referer=');">not gone unnoticed</a>.</p>
<p>Recently I posted a question about how to do a &#8216;group by&#8217; in a regression with R. I had a way I had been doing this but I was suspicious there was a better way. <a href="http://stackoverflow.com/questions/1169539/linear-regression-and-group-by-in-r/1214432#1214432" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions/1169539/linear-regression-and-group-by-in-r/1214432_1214432?referer=');">One of the answers</a> proposed using the PLYR package. I think I had seen the plyr package a few times but never really understood it. Although I didn&#8217;t select this as my top answer, it prompted me to look into PLYR more. What I discovered was really interesting.</p>
<p>The <a href="http://had.co.nz/plyr/" onclick="pageTracker._trackPageview('/outgoing/had.co.nz/plyr/?referer=');">PLYR package </a>is a tool for doing split-apply-combine (SAC) procedures. I&#8217;m very fluent in SQL so the best analogy for me was the GROUP BY statement in SQL. PLYR adds very little new functionality to R. What it does do is take the process of SAC and make it cleaner, more tidy and easier. I think I&#8217;m not the only one who wants a clean and tidy SAC. Here&#8217;s a quick example of making some summary stats using PLYR:</p>
<pre># install.packages("plyr") #run this if you don't have the package already
 library(plyr)

#make some example data
dd&lt;-data.frame(matrix(rnorm(216),72,3),c(rep("A",24),rep("B",24),rep("C",24)),c(rep("J",36),rep("K",36)))
colnames(dd) &lt;- c("v1", "v2", "v3", "dim1", "dim2")

#ddply is the plyr function
ddply(dd, c("dim1","dim2"), function(df)mean(df$v1))</pre>
<p>result:</p>
<blockquote>
<pre>    dim1 dim2          V1
    1    A    J  0.02554362
    2    B    J -0.15839675
    3    B    K -0.06077399
    4    C    K -0.02326776</pre>
</blockquote>
<p>PLYR functions have a neat naming convention. The first two letters of the function tells the input and output data types, respectively. The one I use the most is ddply which takes a data frame in and spits out a data frame.  Let me see if I can explain what ddply is doing. The first argument, dd, is the input data frame. The next argument is the &#8220;group by&#8221; variables. Since I want to group by two variables I send them as a vector (that&#8217;s what the c() bit does). What threw me for a loop initially was the third argument, the function. What I found myself trying (unsuccessfully) was just using mean(v1) as the third argument. If I did that, R would spit at me and bring the marital status of my parents into question. I discovered that the problem was the ddply function was splitting the data by my &#8216;group by&#8217; variables and then it wanted to pass each of the resulting data frames to a function. So what does it mean to pass a data frame to mean(v1)? Yeah, it means Jack Crap, that&#8217;s what it means. So in one of the PLYR examples I saw they were using these inline functions. The idea behind function(df)mean(df$v1) is to create a function to which we can pass a data frame and get out a meaningful result. The subset (or split) of the data gets passed to the function and that subset is then known as df. mean(df$v1) calculates the mean of v1 and returns an answer. ddply holds on to the answers of each split and then reassembles them all in the end. Slick, ey?</p>
<p>As with most things in R the idea can be extended to a vector of functions in order to perform many operations on each split:</p>
<pre>ddply(dd, c("dim1","dim2"), function(df)c(mean(df$v1),mean(df$v2),mean(df$v3),sd(df$v1),sd(df$v2),sd(df$v3)))</pre>
<p>The result looks like this:</p>
<blockquote>
<pre>dim1 dim2          V1        V2         V3        V4        V5       V6
1    A    J  0.02554362 0.3400250  0.1206980 0.9326424 1.0044120 1.100762
2    B    J -0.15839675 0.3662559 -0.1784193 0.7447807 0.8752162 1.105258
3    B    K -0.06077399 0.5184403 -0.2076024 1.0385107 1.0609706 1.153153
4    C    K -0.02326776 0.2639328  0.1352895 0.7940938 0.9025207 1.072460</pre>
</blockquote>
<p>Pretty nifty.</p>
<p>The author of PLYR is Hadley Wickham who is also the man behind <a href="http://had.co.nz/ggplot2/" onclick="pageTracker._trackPageview('/outgoing/had.co.nz/ggplot2/?referer=');">GGPLOT2</a>. If you like PLYR or GGPLOT2 then you should immediately <a href="http://www.amazon.com/gp/product/0387981403?ie=UTF8&amp;tag=hadlwick-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0387981403" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0387981403?ie=UTF8_amp_tag=hadlwick-20_amp_linkCode=as2_amp_camp=1789_amp_creative=390957_amp_creativeASIN=0387981403&amp;referer=');">buy Hadley&#8217;s GGPLOT2 book on Amazon</a>. But be sure and use the link on this site or the link on <a href="http://had.co.nz/ggplot2/book/" onclick="pageTracker._trackPageview('/outgoing/had.co.nz/ggplot2/book/?referer=');">Hadley&#8217;s site </a>so he can get Amazon associate payment. The authors I have talked to told me they get more from the Associate program than they get from publishing royalties.</p>
<p>My father is a retired pilot turned crop farmer. He ALWAYS carries a pair of pliers in a nylon pouch on his belt. I can see that Hadley&#8217;s PLRY package is going to become my proverbial &#8216;belt pliers.&#8217;</p>
<p>Of course if I wrote an R package I&#8217;d have to name it <a href="http://www.paratech.us/html/FET/Crw/CrwSRB/ParatechNFSRB.htm" onclick="pageTracker._trackPageview('/outgoing/www.paratech.us/html/FET/Crw/CrwSRB/ParatechNFSRB.htm?referer=');">Super RamBar</a>, cause that&#8217;s just how I roll.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cerebralmastication.com/2009/08/a-fast-intro-to-plyr-for-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
