applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA |
---|
as a Function of Sample Size |
For the sample mean the dispersion of the distribution is given by the rule
where n is the sample size and σn is the standard deviation of the sample mean for samples of size n. Thus as the sample size increases the distribution of sample means becomes less dispersed. The material below analyzes the dispersion of the sample maximum and minimum as a function of sample size. The anaysis for the two extremes is the same but for definiteness the maximum will be used.
Consider the distribution of sample maximums for samples of a random variable uniformly distributed between -0.5 and +0.5. For n=1 the sample minimum is just the sample value.
If p(x) is the probability density function for a random variable x, let P(x) be the cumulative probability function; i.e.,
The probability density that the maximum of a sample of size n is x is given by
This is the probability density function q(x) for the sample maximum.
The quantity P(x) represents the probability that the random variable has a value less than or equal to x. The (n-1)-th power of P(x) is the probability that all but one value of the sample has a value less than or equal to x. The factor of n represents the fact that the maximum could occur for anyone of the n sample values.
Let Q(x) represent the cumulative probability function for the sample maximum. Since the derivative of the cumulative probability function is just the probability density function the above relation is
If the probability density function is nonzero only over a finite range, say [xmin, xmax] then P(xmin)=0.0 and P(xmax=1.0. The cumulative probability function for the sample maximum will have the same range; i.e., Q(xmin)=0.0 and Q(xmax=1.0. The position of the median point for the sample distribution is closer and closer to xmax the larger the sample size and thus the probability density function is more and more concentrated close to xmax the larger the sample size. Let xmed(n) represent the median point for the probability distribution for the sample maximum for samples of size n. Then
For example, P(xmed(10)) = 0.933 and P(xmed(100)) = 0.9931. Thus the dispersion of q(x) must decrease with the sample size.
To see the relation between dispersion as measured by standard deviation and sample size consider a simple probability density function p(x).
then P(x) = (x−(−σ/2))/σ=(x+σ/2)/σ for −σ/2≤x≤+σ/2, Thus the probability density function for the sample maximum is then given by:
Let z=(x+σ/2). Then the probability density function for z is
The expected value of z, E{z}, is given by
Likewise the second moment, E{z²q(z)} is given by
The variance of z is given by
Finally the standard deviation of z for a sample of size n, σn reduces to
The standard deviation for the probability density function p(x) is not the parameter σ. Instead
Thus
So in constrast to the case of the sample mean in which the dispersion of the sample mean is inversely proportional to √n, the dispersion of the sample maximum for the simple case being considered is inversely proportional to (n+1)(1+2/n)½.
Since z=x+σ/2 and E{z}=(n/(n+1))σ,
Thus the expected value of the sample maximum is an asymptotically unbiased estimate of the population maximum.
For the sample minimum the same relations apply. The only modification is that the appropriate cumulative probability function is defined as
HOME PAGE OF Thayer Watkins |