## Third, and Hopefully Final, Post on Correlated Random Normal Generation (Cholesky Edition)

When I did a brief post three days ago I had no plans on writing two more posts on correlated random number generation. But I’ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, Gappy, calls me out and says, “the way mensches do multivariate (log)normal variates is via Cholesky. It’s simple, instructive, and fast.” And I think we’re all smart enough to read through Mr. Gappy’s comment and see that he’s saying I’m a complicated, opaque, and slow, גוי. My wife called and said his list would be more accurate if he added ‘emotionally detached.’ I have no idea what she means.

At any rate, in response to Gappy’s comment, here is the third verse (same as the first). The crux of the change is the following lines:

# shift the mean of ourData to zero ourData0 <- as.data.frame(sweep(ourData,2,colMeans(ourData),"-")) #Cholesky Decomposition of the covariance matrix C <- chol(nearPD(cov(ourData0))$mat) #create a matrix of random standard normals Z <- matrix(rnorm(n * ncol(ourData)), ncol(ourData)) #multiply the standard normals by the transpose of the Cholesky X <- t(C) %*% Z myDraws <- data.frame(as.matrix(t(X))) names(myDraws) <- names(ourData) # we still need to shift the means of the samples. # shift the mean of the draws over to match the starting data myDraws <- as.data.frame(sweep(myDraws,2,colMeans(ourData),"+"))

**Edit: **When I first publishes this example, I didn’t shift the means prior to taking the cov(). I’ve sense corrected that. Also thanks to @fdaapproved on Twitter who pointed out that I can replace the loop above with myDraws <- as.data.frame(sweep(t(X),2,colMeans(ourData),”+”))

This method, which uses Cholesky decomposition, is how I initially learned to create correlated random draws. I think this method is comparable to the mvrnorm() method. mvrnorm() is handy because it wraps everything above in one single line of code. But the above method is reliant only on the Matrix package and that’s only for the nearPD() function. If you are familiar with the guts of the mvrnorm() function and the chol() function, I’d love for you to comment on any technical differences. I looked briefly at the code for both and quickly realized my matrix math was rusty enough that it was going to take a while for me to sort through the code.

If you want the whole script you can find it embedded below and on Github.

I guess this shows that with R, there is always more than one way to skin a cat!

Also, I just wanted to throw this out there… a while back I wrote some javascript code that generates multivariate normals from scratch starting from the Math.random() method to generate uniforms (sometimes engineers just like to do things the hard way!). I think it shows all the guts of multivariate normal generation without matrix math.

http://stotastic.com/wordpress/2010/06/multivariate-normal-generato-in-javascript/

I’ve been using cholesky for some time – but was interested to see how the copula approach could better replicate the tails of distributions. Through cholesky I’ve been able to generate correlated distributions – but what I haven’t been able to do is preserve an original seeding distribution and create two further random normal distributions that are correlated to a specified degree with the seeding distribution and each other. This seems to be mathematically quite complex. Any ideas anyone?

[...] rmvnorm in the mvtnorm package, and using the Cholesky decomposition. [...]