Technical information

Home> Commercial Buildings Home> Technical Information > Estimation of Standard Errors

Estimation of Standard Errors
Sampling error is the difference between the survey estimate and the true population value due to the use of a random sample to estimate the population. This difference arises because a random subset, rather than the whole population, is observed. The typical magnitude of the sampling error is measured by the standard error of the estimate. The standard error is the root-mean-square difference between the estimate based on a particular sample and the value that would be obtained by averaging estimates over all possible samples.

If the estimates are unbiased, meaning there is no systematic error, this average over all possible samples is the true population value. In this case, the standard error is simply the root-mean-square difference between the survey estimate and the true population value. If systematic error is present, however, this bias is not included in the error measured by the standard error. Thus, the standard error tends to understate the total estimation error if there are non-negligible biases.

In principle, random errors can be attributed to the estimate by sources other than the sampling process. Such additional sources of random error include random errors by respondents and data entry staff and random unit nonresponse. To recognize these additional sources of variation, the definition of the sampling process can be expanded to include not just the selection of buildings but all steps required to obtain a set of responses. Under this expanded definition, all random errors can be regarded as sampling errors. The procedures designed to estimate the sampling error for CBECS incorporate all random components of the estimation process.

Jackknife Replication

Throughout this report, standard errors are given as percents of their estimated values, that is, as relative standard errors (RSEs). Computations of standard errors are more conveniently described, however, in terms of the estimation variance, which is the square of the standard error.

For some types of surveys, a convenient algebraic formula for computing variances can be obtained. The CBECS used a list-supplemented, multistage area sample design (See “How the Survey Was Conducted”) of such complexity that it is virtually impossible to construct an exact algebraic expression for estimating variances. In particular, convenient formulas based on an assumption of simple random sampling, typical of most standard statistical packages, are entirely inappropriate for the CBECS estimates. Such formulas tend to give severely understated standard errors, making the estimates appear much more accurate than is the case.

The method used to estimate sampling variances for this survey was a jackknife replication method. The idea behind replication methods is to form several pseudoreplicates of the sample by selecting subsets of the full sample. The subsets are selected in such a way that the observed variance of estimates based on the different pseudoreplicates estimates the sampling variance in the overall estimate.

The k^th jackknife pseudoreplicate sample set is obtained by deleting all observations from one of the members in the kth group and multiplying the weights on all cases in the other group members by 2 if there are 2 members in the group and by 1.5 if there are 3 members in the group. Observations in all other groups are unaffected. The k^th pseudoestimate is then obtained from this pseudoreplicate sample by following all the steps used to construct the full-sample estimate.

The variances are estimated from the pseudoestimates in the following way. Let X' be a survey estimate (based on the full sample) of characteristic X for a certain category of buildings. For example, X may be the total square footage of buildings using natural gas in the Midwest. Let X_k' be the pseudoestimate of X based on the k^th pseudoreplicate sample. The estimated variance of the full-sample estimate X' is then given by:

The standard error of X' is given by:

The relative standard error (percent) of X' is obtained from this standard error as:

Generalized Variances

For every estimate in this report, the RSE was computed by the methods described above. This was the RSE used for any statistical tests or confidence intervals given in the text or to determine if the estimate had too much variation to publish (an RSE greater than 50 percent).

Space limitations prevent publishing the complete set of RSEs with this document. Instead, a generalized variance technique is provided by which the reader can compute an approximate RSE for each of the estimates in the main summary tables. For an estimate in the i^th row and j^th column of a particular table, the approximate RSE is given by the simple formula:

     RSE_i,j=R_iC_j

where R_i is the RSE row factor given in the last column of row i, and C_j is the RSE column factor given at the top of column j.

Derivation of Row and Column Factors

The row and column factors are determined from a two-factor analysis of the table of RSEs on the basis of the model:

     log(RSE_i,j)=m+a_i+b_j

Least-squares estimates for this model are given by:

      equation

Where:

is the mean of log(RSEij) over all rows i and columns j,

is the mean over all columns j for a particular row i, and

is the mean for all rows i for a particular column j. The row and column RSE factors are then computed as

equation

The RSE row factor, R_i, is thus the geometric mean of the RSEs in row i, and the RSE column factor, C_j, is an adjustment factor with geometric mean equal to 1.0.

For a few table cells, there were no sample cases, hence, no estimate and no RSE. As a result, some of the arrays of direct estimates RSE_i,j had a few missing values. In such cases, the formulas given above for row and column factors still apply, but only after appropriate estimates have been substituted for the missing values. In cases where a statistic was not publishable because of a large RSE or small cell sample size, the value of RSE_i,j was set to missing and an appropriate estimate substituted so that the computed row and column factors are based only on statistics where the RSE is small enough to allow publication. Additionally, RSE column factors are not included for the median statistics found in Detailed Tables BC-2 and CE-19, or for all data in Detailed Tables EU-1 through EU-7.

Top

Return to “Technical Information on CBECS”

Specific questions on these topics may be directed to:

Jay Olsen
jay.olsen@eia.doe.gov

-or-

Joelle Michaels
joelle.michaels@eia.doe.gov
CBECS Manager

File last modified November 16, 1999

URL: http://www.eia.gov/emeu/cbecs/tech_std_errors.html

If you have any technical problems with this site, please contact the EIA Webmaster at wmaster@eia.doe.gov Phone: (202) 586-8959.