Trends and the Stochastic Structure of Temperature and Other Variables Which are the Cumulative Sum of Random Disturbances

San José State University Department of Economics

applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

Trends and the Stochastic Structure
of Temperature and Other Variables
Which are the Cumulative Sum
of Random Disturbances

This material is concerned with the objective determination of trends for data which is generated by a process given by

Y[t] = Y[t-1] + U[t]

where U[t] is a random variable. Such data will appear to have trends even if the expected value of U[t] is zero for all t. This is illustrated below: (Click on REFRESH to get a new sample and new time series.)

Temperature statistics have such a stochastic structure because the rate of change of temperature T for some region is given by

C(dT/dt) = v(t)

where T and t are temperature and time, respectively, v(t) is the net energy inflow and C is the heat capacity coefficient of the region. Thus

T(t+h) = T(t) + (1/C)∫_t^t+hv(s)ds
or, equivalently
T(t+h) = T(t) + U_h(t)

This stochastic structure also applies to humidity and soil moisture. It also might apply to wind as a vector quantity, but that is a separate topic.
When there is no variation in the parameters of the probability distribution of U_h(t) the temperature reaches a level so that the average value of the temperature changes is zero.
For comparison with the trends shown above for a variable which is the cumulative sums of random disturbances the average global temperatures for the period 1855 to 2003 are shown below.

The changes in global average temperature from one year to another for 1856 to 2003 are shown below:

Statistical Trends

In general, when confronted with any time series data and asked about possible trends a statistical analyst would regress the data on time and ascertain whether the regression slope coefficient is significantly different from zero statistically. This is inappropriate statistically because if there was only one usually large random increment part way through the data interval it would make the data appear to have a permanent shift when all that was involved was one atypical value. Instead the proper procedure statistically would be to compute the first differences for the data and carryout the statistical analysis on those first differences. This analysis could include the regression on time and the testing for the statical significance of the regression intercept and slope coefficients but would also include simpler analyses.
The Statistical Inappropriateness
of Trend Analysis on the Raw Data

The raw data consists of n observations on a variable Y. The values of Y are labeled from 0 to (n-1). The number of random increments is (n-1) and their values are labeled from 1 to (n-1). The variable Y can be considered the cumulative sum of the random changes; i.e.,

Y[t] = Y[0] + Σ₁^t-1U[s]

For simplicity let y(t) and u(t) denote the deviations from the interval means.
The time variable t can also be expressed as deviation from its interval average, Δt, which is a function of time. If there are n sequential observation labeled 0 to n-1, then the average value is (n-1)/2 and Δt = t−(n-1)/2. Thus these Δt values range from −(n-1)/2 to +(n-1)/2. Note that the sum of there values, Σ₀^n-1Δt, is necessarily zero.
In the above notation the regression slope coefficient b is

b = Σ₀^n-1Δt*y(t)/Σ₀^n-1(Δt)²

For convenience let Σ₀^n-1(Δt)² be denoted as V.
Let the interval average value of Y be denoted as Y. Then

b = Σ₀^n-1Δt(Y(t)−Y)/V
= [Σ₀^n-1Δt*Y(t)/V − (Σ₀^n-1Δt)Y]/V
but since Σ₀^n-1Δt=0 this reduces to
b = Σ₀^n-1Δt*Y(t)/V

Note that

∂Y(t)/∂U(s)= 1 if s≤t
= 0 otherwise

This means that

∂b/∂U(s) = [Σ_s^n-1(∂Y(t)/∂U(s))Δt]/V
= [Σ_s^n-1Δt]/V

Since Σ₀^n-1Δt=0

Σ_s^n-1Δt = −Σ₀^s-1Δt

This latter expression evaluates to

s(n-s)/2

Thus the weight of U(s) in the computation of the regression trend is

∂b/∂U(s) = [s(n-s)/2]/V

This is a parabolic function with a maximum around (n-1)/2. The sum of the squared deviations for the time variable, V, evaluates to n(n²-1)/12.

The regression slope can be expressed as

b = Σ₁^n-1κ(s)U(s)
where
κ(s) = [s(n-s)/2]/V

The profile of these weights is a parabola starting at 0 at s=0 and rising to a maximum at t=n/2 and falling to 0 at s=n. This means that the temperature changes in the middle of the interval have an undue excessive influence on the regression coefficient for the trend in the temperature.

It is worthwhile to verify the above algebraic relationships for a couple of cases. Let n=5. Then the data are:

Time t Δt (Δt)² Y ΣΔt κ(t)

0 −2 4 Y(0)

1 −1 1 Y(0)+U(1) 2 0.2

2 0 0 Y(0)+U(1)+U(2) 3 0.3

3 1 1 Y(0)+U(1)+U(2)+Y(3) 3 0.3

4 2 4 Y(0)+U(1)+U(2)+Y(3)+Y(4) 2 0.2

Sums 0 10 10 1.0

For n=7 the data are:

Time t Δt (Δt)² Y ΣΔt κ(t)

0 −3 9 Y(0)

1 −2 4 Y(0)+U(1) 3 0.1071

2 −1 1 Y(0)+U(1)+U(2) 5 0.1786

3 0 0 Y(0)+U(1)+U(2)+U(3) 6 0.2143

4 1 1 Y(0)+U(1)+U(2)+U(3)
+U(4) 6 0.2143

5 2 4 Y(0)+U(1)+U(2)+U(3)
+U(4)+U(5) 5 0.178

6 3 9 Y(0)+U(1)+U(2)+U(3)
+U(4)+U(5)+U(6) 3 0.1071

Sums 0 28 28 1.0

It is statistically inappropriate and inefficient to give a much higher weight to the random terms in the middle of the interval compared to those at the ends of the interval. For n=5 the weight for the middle terms is 50 percent higher than the weight for the ends. For n=7 the middle terms have weights which are 100 percent larger than the weights for the end terms. For n=21 the weights for the middle terms are 450 percent larger than the weights for the end terms.
Nevertheless the fact that the weights κ(t) sum to unity indicates that the regression coefficient is an unbiased estimate of the trend in Y(t). Thus the problem is one of statistical efficiency rather than unbiasedness.
The graphs below illustrate the effect by showing three cases in which there is only one nonzero disturbance over the data interval but differing in when during the interval the disturbance occurs. The regression lines are shown in blue.

The graphs below illustrate the effect by showing three cases in which there is only one nonzero disturbance over the data interval but differing in when during the interval the disturbance occurs. The regression lines are shown in blue.
As seen in the graphs the regression estimate of the trend is much higher for the disturbance in the middle of the interval compared to disturbance at either end of the interval.
However there is another unbiased estimate of the trend which gives equal weight to all of the random changes. That is

b* = (Y(n-1)-Y(0))/(n-1)
= [(U(1)+U(2)+…+U(n-1))]/(n-1)

This estimate of trend is illustrated for the three cases previously considered.

As seen above the trend is the same for all three cases in contrast to the trend estimated using regression.
This estimate of the trend is unbiased only if the interval of analysis is not selected with reference to the trend. There are some infamous cases of some climatologist selecting an interval which had an extreme trend and trying to present that trend estimate as the long term trend in the variable. Such selection of the interval of analysis produces a figure that is statistically completely unrelated to the parameters of the distribution for the variable. Such estimates are completely meaningless.
In general the statistical analysis of time series of variables which are the cumulative sum of the random disturbance should be carried out on the values of U(t), the first differences Y(t)-Y(t-1). Whether there is a trend in the Y variable depends upon the expected value of the random variables, U(t); i.e.,

if E{U(t)} = 0 then there is no trend

Take the general case in which the trend estimate b# is a weighted average of the disturbances; i.e.,

b# = Σ₁^n-1w(s)U(s)

Assume that the expected values of the random disturbances and their variance and covariances are constant over time and given by

E{U(s)}=μ
Var(U(s))=σ²
Cov(U(s),U(t))=0 if s≠t.

Then

E{b#} = [Σ₁^n-1w(s)]μ
and
b#−E{b#} = Σ₁^n-1w(s)(U(s)-μ)

Therefore the expected value and variance of b# are given by

E{b#} = [Σ₁^n-1w(s)]μ
and
E{(b#−E{b#})²} = Σ₁^n-1Σ₁^n-1w(s)w(t)E{(U(s)-μ)(U(t)-μ)}
which reduces to
Var(b#) = [Σ₁^n-1w(s)²]σ²

Now the question can be asked as to which weights give the smallest variance. Without a constraint on the sum of the weights the answer would be zero weights. Assume that the weights must sum to unity. This requires that b# is an unbiased estimate of μ
The question is now, what values of the weights, subject to the condition that their sum is unity, will minimize Var(b#). The Lagrangian multiplier method can be used to answer this question. The first order condition for a constrained minimum is

∂Var(b#)/∂w(s) − λ = 0

where λ is the Lagrangian multiplier. This condition reduces to

2σ²w(s) = λ
or, equivalently
w(s) = λ/(2σ²)
which means that the weights are equal

The second order conditions for a minimum are satisfied as well.
Since the weights sum to unity this means that the weights must all be equal to 1/(n-1). (Note that n is the number of observations, n-1 is the number of random disturbances.)
Thus the most efficient estimate of the value of E{U(t)}, the trend rate, is the mean value of the random variable given previously as

b* = (Y(n-1)-Y(0))/(n-1)
= [(U(1)+U(2)+…+U(n-1))]/(n-1).

This estimate can be compared with the standard deviation of U computed from the values for the interval to establish whether or not the estimated trend is significantly different from zero.
If the U(t)'s are not correlated with each other then the variance of b* is equal to σ²/(n-1), where σ² is the variance of the U(t)'s. Presuming no correlation of the U(t)'s the variance of the regression estimate of the trend is 1.04 times higher than the variance of b* for n=5 and 1.071 times higher for n=7. The reciprocal of this ratio of variances may be called the statistical efficiency of the linear regression estimate. Thus the efficiency of the regression estimate is 96 percent for n=5 and 93 percent for n=7. The linear regression estimate of the trend becomes progressively less efficient as the sample size increases but asymptotically approaches a limiting value of about 83 percent. Here is the case for n=20.

(To be continued.)

HOME PAGE OF applet-magic
HOME PAGE OF Thayer Watkins

Time t	Δt	(Δt)²	Y	ΣΔt	κ(t)
0	−2	4	Y(0)
1	−1	1	Y(0)+U(1)	2	0.2
2	0	0	Y(0)+U(1)+U(2)	3	0.3
3	1	1	Y(0)+U(1)+U(2)+Y(3)	3	0.3
4	2	4	Y(0)+U(1)+U(2)+Y(3)+Y(4)	2	0.2

Sums	0	10		10	1.0

Y[t] = Y[t-1] + U[t]

C(dT/dt) = v(t)

T(t+h) = T(t) + (1/C)∫tt+hv(s)ds or, equivalently T(t+h) = T(t) + Uh(t)

Statistical Trends

The Statistical Inappropriateness of Trend Analysis on the Raw Data

Y[t] = Y[0] + Σ1t-1U[s]

b = Σ0n-1Δt*y(t)/Σ0n-1(Δt)²

b = Σ0n-1Δt(Y(t)−Y)/V = [Σ0n-1Δt*Y(t)/V − (Σ0n-1Δt)Y]/V but since Σ0n-1Δt=0 this reduces to b = Σ0n-1Δt*Y(t)/V

∂Y(t)/∂U(s)= 1 if s≤t = 0 otherwise

∂b/∂U(s) = [Σsn-1(∂Y(t)/∂U(s))Δt]/V = [Σsn-1Δt]/V

Σsn-1Δt = −Σ0s-1Δt

s(n-s)/2

∂b/∂U(s) = [s(n-s)/2]/V

b = Σ1n-1κ(s)U(s) where κ(s) = [s(n-s)/2]/V

b* = (Y(n-1)-Y(0))/(n-1) = [(U(1)+U(2)+…+U(n-1))]/(n-1)

if E{U(t)} = 0 then there is no trend

b# = Σ1n-1w(s)U(s)

E{U(s)}=μ Var(U(s))=σ² Cov(U(s),U(t))=0 if s≠t.

E{b#} = [Σ1n-1w(s)]μ and b#−E{b#} = Σ1n-1w(s)(U(s)-μ)

E{b#} = [Σ1n-1w(s)]μ and E{(b#−E{b#})²} = Σ1n-1Σ1n-1w(s)w(t)E{(U(s)-μ)(U(t)-μ)} which reduces to Var(b#) = [Σ1n-1w(s)²]σ²

∂Var(b#)/∂w(s) − λ = 0

2σ²w(s) = λ or, equivalently w(s) = λ/(2σ²) which means that the weights are equal

b* = (Y(n-1)-Y(0))/(n-1) = [(U(1)+U(2)+…+U(n-1))]/(n-1).

T(t+h) = T(t) + (1/C)∫_t^t+hv(s)ds
or, equivalently
T(t+h) = T(t) + U_h(t)

The Statistical Inappropriateness
of Trend Analysis on the Raw Data

Y[t] = Y[0] + Σ₁^t-1U[s]

b = Σ₀^n-1Δt*y(t)/Σ₀^n-1(Δt)²

b = Σ₀^n-1Δt(Y(t)−Y)/V
= [Σ₀^n-1ΔtY(t)/V − (Σ₀^n-1Δt)Y]/V
but since Σ₀^n-1Δt=0 this reduces to
b = Σ₀^n-1ΔtY(t)/V

∂Y(t)/∂U(s)= 1 if s≤t
= 0 otherwise

∂b/∂U(s) = [Σ_s^n-1(∂Y(t)/∂U(s))Δt]/V
= [Σ_s^n-1Δt]/V

Σ_s^n-1Δt = −Σ₀^s-1Δt

b = Σ₁^n-1κ(s)U(s)
where
κ(s) = [s(n-s)/2]/V

b* = (Y(n-1)-Y(0))/(n-1)
= [(U(1)+U(2)+…+U(n-1))]/(n-1)

b# = Σ₁^n-1w(s)U(s)

E{U(s)}=μ
Var(U(s))=σ²
Cov(U(s),U(t))=0 if s≠t.

E{b#} = [Σ₁^n-1w(s)]μ
and
b#−E{b#} = Σ₁^n-1w(s)(U(s)-μ)

E{b#} = [Σ₁^n-1w(s)]μ
and
E{(b#−E{b#})²} = Σ₁^n-1Σ₁^n-1w(s)w(t)E{(U(s)-μ)(U(t)-μ)}
which reduces to
Var(b#) = [Σ₁^n-1w(s)²]σ²

2σ²w(s) = λ
or, equivalently
w(s) = λ/(2σ²)
which means that the weights are equal

b* = (Y(n-1)-Y(0))/(n-1)
= [(U(1)+U(2)+…+U(n-1))]/(n-1).