Post #1 - Romain Espinosa

Estimation of the mean of a log-normally distributed variable

Summary

I present below how to estimate the mean (and the standard deviation) of a log-normally distributed variable. We first start by simulating such a variable, we then define the log-likelihood function, and estimate the parameters. At the end, we compute the standard errors of our estimates.

Simulating data

Let us consider the case of a log-normally distributed variable for which we want to estimate the (unobserved) mean µ. We start by simulating data by taking the exponential of a normally distributed variable with mean 0.5 and standard deviation 1.2.

mu=0.5
sigma=1.2
y_lognorm=exp(rnorm(n=2000,mean=mu,sd=sigma))

Likelihood function

We now determine the log-likelihood function. This function takes as inputs the data \(y\) and the vector of parameters theta_funct.

The function proceeds as follows:
(1) We retrieve the number of observations in n_funct.
(2) We set the mean mu_funct equal to the first element of the vector theta_funct.
(3) We set the standard deviation sigma_funct equal to the exponential of the second element of the vector theta_funct. (Note: we take the exponential to ensure positive values.)
(4) We compute the log-likehood of each observation.
(5) We return the sum of the log-likelihood of the dataset.

llfunction=function(y_funct,theta_funct){
  n_funct=length(y_funct)
  mu_funct=theta_funct[1]
  sigma_funct=exp(theta_funct[2])
  ll_funct=log(1/(y_funct*sigma_funct*sqrt(2*pi))*exp(-(log(y_funct)-mu_funct)^2/(2*sigma_funct^2)))
  return(sum(ll_funct))
}

An alternative for to compute the contribution to the log-likelihood is to use the dedicated function in R. One would need to replace the definition of ll_funct in the previous code by:

  ll_funct=dlnorm(y_funct,meanlog=mu_funct,sdlog=sd_funct,log=TRUE)

Estimation

We now can estimate the mean and the standard deviation of our variable of interest y. To do so, we need to prepare a function for the maximization procedure that has only one argument, i.e., the parameters to, be estimated.

my_mle=function(theta_funct){
  return(llfunction(y_lognorm,theta_funct))
}

We can test the MLE function for the true values of the parameters to check that the function works:

my_mle(c(mu,sigma))

## [1] -5359.956

We can now launch the maximization process. We first need to define initial values (here we set both parameters equal to one). We also add fnscale=-1 to tell optim() to maximize my_mle() (the default is minimization).

theta_init=c(1,1) #Initial values: Mean 1 and st. dev. exp(1)
optimResults=optim(theta_init,my_mle,method="BFGS",control=list(fnscale=-1))

We first must check whether the algorithm achieved convergence. It is equal to 0 in case of successful convergence.

optimResults$convergence

## [1] 0

We can then retrieve the estimated parameters:

estimated_mu=optimResults$par[1]
estimated_mu

## [1] 0.4949707

estimated_sd=exp(optimResults$par[2])
estimated_sd

## [1] 1.206872

Our estimates are very close to the true values \(0.5\) and \(1.2\).

Standard errors

Last, we can estimate the standard errors for our estimates. To do so, we compute the variance-covariance matrix, i.e., the inverse of Fisher information matrix. We need the package numDeriv to compute the hessian matrix.

library("numDeriv")
H=hessian(my_mle,optimResults$par)
CovarianceMatrix=solve(-H)
CovarianceMatrix

##               [,1]          [,2]
## [1,]  7.282706e-04 -8.886646e-11
## [2,] -8.886646e-11  2.500002e-04

The standard error of our estimate for the mean is given by the square-root elements of the first element of the diagonal:

se_mu=sqrt(CovarianceMatrix[1,1])
se_mu

## [1] 0.02698649