# Estimation of Standard Errors

The standard error, or sampling error, of a survey estimate is a measure of the variation among the estimates from all possible samples, so it measures how precisely an estimate from a particular sample approximates the average from all possible samples. This difference arises because we observe a random subset, rather than the whole population. We measure the typical magnitude of the sampling error by the standard error of the estimate.

If the estimates are unbiased, which means they have no systematic error, this average over all possible samples is the true population value. If systematic error is present, however, this bias is not included in the error measured by the standard error. So, the standard error tends to understate the total estimation error if non-negligible biases exist.

In principle, sources other than the sampling process can cause random errors. These sources could include random errors by respondents and data entry staff or buildings for which we did not get a response. To include these sources, we can expand the definition of the sampling process to include not just the building selection but all steps required to obtain a set of responses. Under this expanded definition, we can regard all random errors as sampling errors. The procedures for estimating the sampling error for CBECS incorporate all random components of the estimation process.

## Jackknife replication

Throughout the CBECS data tables, we represent standard errors as percentages of their estimated values (that is, as relative standard errors [RSEs]). Computations of standard errors are more conveniently described, however, in terms of the estimation variance, which is the square of the standard error.

For some types of surveys, a simple algebraic formula can be used to compute variances. For CBECS, however, we used a list-supplemented, multistage area sample design so complex that it’s almost impossible to construct an exact algebraic expression for estimating variances. In particular, formulas based on an assumption of simple random sampling, typical of most standard statistical packages, don’t work for CBECS estimates. These formulas understate standard errors, making the estimates appear much more accurate than they are.

We used the jackknife replication method to estimate sampling variances in CBECS. Replication methods are used to form several pseudoreplicates of the sample by selecting subsets of the full sample. The subsets are selected in such a way that the observed variance of estimates, based on the different pseudoreplicates, is an approximation of the sampling variance in the overall estimate.

The sampling strata are first divided into *k* groups, and within each group, divided into two (or occasionally three) members. We obtain the k^{th} jackknife pseudoreplicate sample set by deleting all observations from one of the members in the k^{th} group and multiplying the weights on all cases in the other group members by: two if there are two members in the group and by one-and-a-half if there are three members in the group. Observations in all other groups are unaffected. The k^{th} pseudoestimate is then obtained from this pseudoreplicate sample by following all the steps used to construct the full-sample estimate.

The variances are estimated from the pseudoestimates in the following way. Let *X*' be a survey estimate (based on the full sample) of characteristic *X* for a certain category of buildings. For example, *X* may be the total square footage of buildings using natural gas in the Midwest. Let R be the number of pseudoreplicates (151 for the 2018 CBECS). Let *X _{k}'* be the pseudoestimate of

*X*based on the k

^{th}pseudoreplicate sample. The estimated variance of the full-sample estimate

*X*' is then given by:

The standard error of *X*' is given by:

The relative standard error of *X*' (standard error as a percentage of *X*') is:

Specific questions on these topics may be directed to:

Jay Olsen

jay.olsen@eia.gov