U.S. Energy Information Administration logo
Skip to sub-navigation
‹ Consumption & Efficiency

Commercial Buildings Energy Consumption Survey (CBECS)

Estimation of Standard Errors

Sampling error is the difference between the survey estimate and the true population value due to the use of a random sample to estimate the population. This difference arises because a random subset, rather than the whole population, is observed. The typical magnitude of the sampling error is measured by the standard error of the estimate. The standard error is the root-mean-square difference between the estimate based on a particular sample and the value that would be obtained by averaging estimates over all possible samples.

If the estimates are unbiased, meaning there is no systematic error, this average over all possible samples is the true population value. In this case, the standard error is simply the root-mean-square difference between the survey estimate and the true population value. If systematic error is present, however, this bias is not included in the error measured by the standard error. Thus, the standard error tends to understate the total estimation error if there are non-negligible biases.

In principle, random errors can be attributed to the estimate by sources other than the sampling process. Such additional sources of random error include random errors by respondents and data entry staff and random unit nonresponse. To recognize these additional sources of variation, the definition of the sampling process can be expanded to include not just the selection of buildings but all steps required to obtain a set of responses. Under this expanded definition, all random errors can be regarded as sampling errors. The procedures designed to estimate the sampling error for CBECS incorporate all random components of the estimation process.

Jackknife Replication

Throughout this report, standard errors are given as percents of their estimated values, that is, as relative standard errors (RSEs). Computations of standard errors are more conveniently described, however, in terms of the estimation variance, which is the square of the standard error.

For some types of surveys, a convenient algebraic formula for computing variances can be obtained. The CBECS used a list-supplemented, multistage area sample design of such complexity that it is virtually impossible to construct an exact algebraic expression for estimating variances. In particular, convenient formulas based on an assumption of simple random sampling, typical of most standard statistical packages, are entirely inappropriate for the CBECS estimates. Such formulas tend to give severely understated standard errors, making the estimates appear much more accurate than is the case.

The method used to estimate sampling variances for this survey was a jackknife replication method. The idea behind replication methods is to form several pseudoreplicates of the sample by selecting subsets of the full sample. The subsets are selected in such a way that the observed variance of estimates based on the different pseudoreplicates estimates the sampling variance in the overall estimate.

The sampling strata are first divided into k groups, and within each group, divided into 2 (or occasionally 3) members. The kth jackknife pseudoreplicate sample set is obtained by deleting all observations from one of the members in the kth group and multiplying the weights on all cases in the other group members by 2 if there are 2 members in the group and by 1.5 if there are 3 members in the group. Observations in all other groups are unaffected. The kth pseudoestimate is then obtained from this pseudoreplicate sample by following all the steps used to construct the full-sample estimate.

The variances are estimated from the pseudoestimates in the following way. Let X' be a survey estimate (based on the full sample) of characteristic X for a certain category of buildings. For example, X may be the total square footage of buildings using natural gas in the Midwest. Let Xk' be the pseudoestimate of X based on the kth pseudoreplicate sample. The estimated variance of the full-sample estimate X' is then given by:

equation

The standard error of X' is given by:

equation

The relative standard error (percent) of X' is obtained from this standard error as:

equation

Specific questions on these topics may be directed to:

Jay Olsen
jay.olsen@eia.doe.gov