Fitting Distribution X to Data From Distribution Y

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed the parameters of the gamma distribution to see how it impacted fit. An exercise like this is what I call building a “toy model” and I think this is invaluable as a method for building intuition and a visceral understanding of data.
Here’s some example code which we played with:

set.seed(3)
x <- rgamma(1e5, 2, .2)
plot(density(x))
 
# normalize the gamma so it's between 0 & 1
# .0001 added because having exactly 1 causes fail
xt <- x / ( max( x ) + .0001 )
 
# fit a beta distribution to xt
library( MASS )
fit.beta <- fitdistr( xt, "beta", start = list( shape1=2, shape2=5 ) )
 
x.beta <- rbeta(1e5,fit.beta$estimate[[1]],fit.beta$estimate[[2]])
 
## plot the pdfs on top of each other
plot(density(xt))
lines(density(x.beta), col="red" )
 
## plot the qqplots
qqplot(xt, x.beta)

Created by Pretty R at inside-R.org

It’s not illustrated above, but it’s probably useful to transform the simulated data (x.beta) back into pre normalized space by multiplying by max( x ) + .0001 . (I swore I’d never say this but I lied) I’ll leave that as an exercise for the reader.

Another very useful tool in building a mental road map of distributions is the graphical chart of distribution relationships that John Cook introduced me to.

6 Comments

  1. Muhammad Rahiz says:

    Thanks for this post.

    I’m trying to fit dataset y to the distribution of dataset x. I’ve followed your post but I can’t get it to work. In particular, how do I get the values of shape1 and shape2 from dataset x?

    x <- abs(rnorm(100))
    y <- abs(rnorm(100))

    plot(density(x),type=”l)
    lines(density(y),col=”red”)

  2. JD Long says:

    Hey Muhammad,

    I’m not completely sure what you are asking. What do you mean by “values of shape1″?

    I don’t know what shape1 and shape2 are. I don’t create objects by those names in my example. Can you help me understand? I’m happy to help if I can grasp what you are asking!

    -JD

  3. Muhammad Rahiz says:

    Hi JD,

    The shape1 and shape2 are in the following line (as part of the inputs of fitdistr);

    fit.beta <- fitdistr( xt, "beta", start = list( shape1=2, shape2=5 ) )

    I am not particularly sure of how values 2 and 5 for shape1 and shape2, respectively are derived.

    -Muhammad

  4. JD Long says:

    Ohhhh. Sorry. I got it now.

    if you look at the help for fitdistr you’ll see that the beta distribution requires a starting point for searching for the shape1 & shape2. Those aren’t derived, they are just sane guesses. HTH

  5. Rick Wicklin says:

    In your case, you can look at the parametric forms of the beta and gamma distributions to compare them. But since you mentioned John’s blog, your readers might also enjoy reading
    http://www.johndcook.com/blog/2010/08/11/what-distribution-does-my-data-have/
    in which John asks a different question: given observational data, why should any famous distribution fit the data?

  6. Danny says:

    You can also use the Kolmogorov Smirnov Test if you don’t feel like doing any graphing.

Leave a Reply