Struggling with apply() in R

It’s common knowledge that I struggle wrapping my head around the apply functions in R. That is illustrated very clearly in the following discussion on Stack Overflow:

apply_struggle

Dirk’s comment is actually spot on. I’ve asked the same damn question at least 4-5 times. Only I didn’t really understand it was the same question. That’s one of the problems of not really being good at something; it’s hard to think abstractly about it. I’m not really good at R, so sometimes I don’t realize that multiple concepts are related. As I talk with other new users of R it’s clear that unless they come from a programming language with an apply-esque construct they likely are struggling with R. I think most of the confusion comes from a) not understanding what data format apply() is going to return and b) not understanding anonymous functions.

With this in mind I did a little screencast illustrating how this struggle plays out for a new users. I also show why I use the plyr package for much of the stuff other folks use apply() for.

Any feedback you have is appreciated. This is my first stab at a screencast, so I am still trying to figure out the best approach/method as well as how many drinks puts me on the Ballmer Peak.

EDIT: it’s been pointed out that I misuse some terminology a number of times. I should have named my year vector “yearVector.” By calling it “yearList” I then refer to the vector as a list. I was using “list” in the vernacular, but since list is a specific R data structure it is confusing that I named a vector a name with “list” in it.

12 Comments

  1. Greg says:

    It is really easy with lapply:

    myData <- do.call("rbind", lapply(yearList, getValues))

    I think that starting with lapply makes it easier to understand how apply works.

  2. Greg says:

    The simplify is what got you. If you had added a simplify=FALSE argument to the sapply, it would have put each year worth of data as a list element within a returned list. The do.call(“rbind”, …) concatenates the lists together into one data frame.

  3. J says:

    Great point Greg! This is exactly why I blog.

    Also, as was pointed out to me on Twitter: I don’t need the c() when I do the yearList and sapply(yearList, getValues) is the same as sapply(yearList, function(x) getValues(x))

  4. Scot says:

    Nice, JD.
    What did you use for the screencast?

  5. Jason says:

    Great screencast. It’s good to know that I’m not the only one with apply-blindness. I don’t know how many times I’ve banged my head against the apply functions, only to give up and do it with a loop, “just this once.”

    My new goal is to grok plyr and then be able to say, “Oh, you’re still using apply? How quaint.”

  6. J says:

    Scot, I used Microsoft Expression Encoder 3. I found it easy to use and, best of all, free.

    http://www.microsoft.com/Expression/try-it/default.aspx?filter=encoder3

  7. J says:

    Jason, Glad that you got some value from the post! The hardest part of apply() is that all the different incarnations seem to have different syntax and different idiosyncrasies. What a pain. With plyr there’s a unified abstraction, and thus, less syntax to have to remember.

    So you watching the Army/Navy game today?

    -JD

  8. McKay says:

    Great screencast — thanks for taking the time to make it. I remember battling with the apply functions when I was first learning R. Although I’ve grown fairly comfortable with the apply functions, plyr looks slick enough that I may convert (if only to avoid being called quaint…)

  9. haha that was great

    split and lapply nearly brought me to tears

    ggplot2 and plyr is the best contributions to R, maybe ever

  10. doug says:

    I’m new to R (from python, etc.) and so i’ve been collecting links to good R blogs. I found this one through a link from another R blog. The subject of the particular Post i landed on involved you and Google’s chief economist…after a few paragraphs i thought that perhaps this was a celebrity gossip blog or something of that sort. So i glanced at your tag cloud: “Agriculture,” “Beer,” “Rockstars,” among others. Well, i’ve put it in my RSS feed anyway. (:

  11. JD Long says:

    yeah it’s a bit of a mixed bag here. Glad you found me and I hope that some of my blathering is of value!

  12. Ken says:

    Nice screencast. Thanks. I also find the apply() family opaque, and it looks like the plyr package could help.

Leave a Reply