U.S. Energy Information Administration logo
Skip to sub-navigation
‹ Consumption & Efficiency

Commercial Buildings Energy Consumption Survey (CBECS)

Back to Methodology

How We Reviewed Data to Ensure Quality of the 2018 CBECS

Release date: April 12, 2023

We carefully review all CBECS data to ensure quality. Data review and processing occurs in both the CBECS Buildings Survey and the Energy Supplier Survey (ESS); it is iterative and occurs throughout the survey process, from initial data validation to analysis of estimates for consistency and comparability before publication.

Buildings Survey data editing

The 2018 CBECS Buildings Survey computerized survey instrument included features to help reduce data errors. The instrument prevented skip pattern errors and the entry of ineligible responses. It provided acceptable ranges of entry for numeric items.

With the introduction of self-administered web response in 2018, we removed most real-time consistency checks between items so we would not frustrate respondents as they completed the survey. However, post-collection processing included reviews and edits for consistency.

Post-collection processing started with preliminary edits. The goal of preliminary edits was to identify cases where the wrong building was interviewed, where the building should have been classified as out of scope, or where we needed to recontact the respondent to ask for more information. These edits included:

  • Flagging and investigating cases where the square footage category reported by the respondent differed by more than one size class from the size category used for sampling.
  • Flagging and investigating cases where the sampled address did not match the address reported in the interview to confirm that the correct building was interviewed.
  • Reviewing open-ended other specify responses to the building activity to make sure the building was eligible for CBECS.
  • Checking for interview comments that would suggest that the case was ineligible for CBECS.
  • Checking for critical data items that were left unanswered to determine whether the respondent should be recontacted.

We used resources such as mapping tools, targeted internet searches, and sample frame data to resolve data inconsistencies revealed during editing. In a limited number of situations, we recontacted respondents to gather further information.

The next phase of editing involved case-by-case reviews. We ran more than 100 edit checks across all the cases. For example, we looked at:

  • Comparing the square footage of the building to the number of employees, and other measures of size reported, to determine if the ratios fell within expected parameters based on building activity. A high or low ratio might indicate a mistake with either of the values reported.
  • Consistency across questions, such as a situation where the building reports cooking but no food preparation or serving areas.
  • Coding open-ended responses, especially those that would affect skip patterns throughout the questionnaire, such as building activity.

Our editing team reviewed all the flagged values for each case that failed at least one of the edit checks, using rules that were developed to resolve them or adjudicating as a team if the situation did not fall into the set rules. Edits were often resolved by coding the variable as missing and flagging it for item imputation or with batch code to change cases programmatically.

A similar editing process occurred for strip shopping centers (for the establishment and mall manager interviews), followed by the estimation process that determined characteristics of the strip shopping building as a whole.

Buildings Survey item nonresponse and imputation

Prior to publication of the 2018 CBECS, we made adjustments for item nonresponse. Item nonresponse is a specific piece of information that is missing in an otherwise completed interview. In the case of building interviews, the usual causes for item nonresponse were that the building respondent lacked the knowledge to answer a questionnaire item or the value failed edits during review and was set to missing.

Questions with item nonresponse were treated by a technique known as hot-deck imputation, as was used in previous CBECS surveys. In hot-decking, when a certain response to a question is missing for a given building, another similar building (a donor building) is randomly chosen to furnish its reported value for that missing item. That value is then assigned to the building (a receiver building) with the item nonresponse. This procedure is used to reduce the bias caused by nonresponse for particular survey items.

A donor building has to be similar to the nonresponding building in characteristics correlated with the missing item. The characteristics used to define a similar building depend on the nature of the item to be imputed. The most frequently used characteristics are principal building activity, square footage category, year constructed category, and census region. Numeric items that have a categorical follow-up question use the categorical follow-up to find appropriate donor cases. Other characteristics, such as type of heating fuel and type of heating and cooling equipment, are used for specific items. To hot-deck values for a particular item, all buildings are first grouped according to the values of the matching characteristics specified for that item. Within each group defined by the matching variables, donor buildings are then assigned randomly to receiver buildings.

With hot-deck imputation, the building that donated a particular item to a receiver also donates certain related items if any of these items are missing. Thus, a vector of values, rather than a single value, is copied from the donor to the receiver. This procedure helps to keep the hot-decked values internally consistent, avoiding the generation of implausible combinations of building characteristics.

Among the variables that are on the 2018 CBECS public use microdata file, most were eligible for imputation; however, item imputation rates (that is, the number of cases that were imputed for a specific question divided by the total number of cases receiving the question) were generally low. Across all variables on the public use file that were eligible for imputation, the average item imputation rate was 8.3%, and the median imputation rate was 5.2%.

ESS data editing

The number of data items collected in the ESS is fewer than the Buildings Survey. For each building (or each account within a building), the energy supplier was asked for 16 months of billing data, including the beginning and ending date of each billing period, the amount of energy consumed, the unit of measurement, the dollar cost, and whether the energy was sold only, delivered only, or both.

We processed the data using a program containing a series of edit checks, including checks for insufficient data, missing units, low costs, and high or low prices. We addressed errors by updating the data, recontacting the supplier to verify the data, or overriding the edit failure if the data were deemed correct. We also reviewed all comments provided by suppliers and made any necessary changes to the data.

We checked and resolved problems with the data using programmatic batch edits for issues such as insufficient data, billing records without dates, and matching start and end dates. The remaining edits were resolved through analyst review; those edits included checking for records with inconsistent billing dates, identical data submitted for cases with the same supplier, and missing or other units of measurement.

Buildings Survey and ESS consumption data reconciliation, imputation, and nonresponse

We collected energy consumption information in both the Buildings Survey and the ESS, the main difference being that the Buildings Survey requested one annual data figure from building respondents while the ESS collected monthly data from energy suppliers. The reason we collected data in both the Buildings Survey and the ESS was that the best source for the data can vary by building. For example, a Buildings Survey respondent may have the information to provide energy data for a single building on a campus where a supplier might only have data for the whole campus or might not be able to distinguish the building for which we were requesting data.

Data from both sources had to be annualized (for calendar year 2018) and disaggregated before the final stages of review. We disaggregated data for cases where the Buildings Survey respondent reported that consumption from other buildings was included in the figures. In such cases, we prorated the consumption estimate using the square footage provided for the other buildings.

The consumption data could be collected in both the Buildings Survey and the ESS, in one but not the other, or not provided at all. The following table shows the initial breakdown of consumption data by the survey from which it came. Notably, 20% of both electricity and natural gas cases (among those using the source) had data from both the Buildings Survey and the ESS.

Table 1. Initial consumption data sources for each energy source (before editing and reconciliation)
Consumption data sources Electricity Natural gas Fuel oil District heat
  Number of buildings Percentage of buildings, among those using electricity Number of buildings Percentage of buildings, among those using natural gas Number of buildings Percentage of buildings, among those using fuel oil Number of buildings Percentage of buildings, among those using district¬† heat
Energy source not used 79   1,883   4,519   5,668  
Both Buildings Survey and Energy Supplier Survey 1,220 20% 889 20% 56 3% 49 9%
Only Buildings Survey 419 7% 193 4% 438 26% 128 23%
Only Energy Supplier Survey 3,619 59% 2,558 59% 47 3% 49 9%
No consumption data provided 891 14% 705 16% 1,168 68% 334 60%

Data source: U.S. Energy Information Administration, Commercial Buildings Energy Consumption Survey.

Next, we reviewed the data to determine which data source should be used as the final value for the case. We made our decision by comparing the intensity (consumption per square foot) against the 2012 CBECS intensity for the same building activity and by comparing the reported usage to expected usage based on engineering-based models and the characteristics of the building. For cases with data from both the ESS and the Buildings Survey, if both passed the intensity and expected usage checks, we used whichever value (from the ESS or Buildings Survey) was closer to the expected usage. Sometimes we needed to manually review the building characteristics and monthly consumption to determine which source to use or if we should impute the consumption, which was done using the engineering-based models and a regression model that included consumption from reported cases.

For a case with data from only one source (just the ESS or just the Buildings Survey), if the provided value passed both the intensity and expected usage checks, it was used. If it failed both, it was imputed. If it passed one of the checks but failed the other, we manually reviewed the case to determine whether or not to use the value.

The following table shows the number and percentage of cases in the final data set by energy source and by the data source for each case, after editing and reconciliation.

Table 2. Final consumption data sources for each energy source (after editing and reconciliation)
Final consumption data source Electricity Natural gas Fuel oil District heat
  Number of buildings Percentage of buildings, among those using electricity Number of buildings Percentage of buildings, among those using natural gas Number of buildings Percentage of buildings, among those using fuel oil Number of buildings Percentage of buildings, among those using district heat
Energy source not used 79   1,883   4,519   5,668  
Buildings Survey 964 16% 575 13% 462 27% 145 26%
Energy Supplier Survey  3,692 60% 2,696 62% 73 4% 63 11%
Consumption data imputed 1,467 24% 1,074 25% 1,174 69% 352 63%

Data source: U.S. Energy Information Administration, Commercial Buildings Energy Consumption Survey.

Final data file

For all variables that were eligible for imputation, a corresponding Z variable on the data file indicates whether the variable was reported, imputed, or inapplicable. In addition to the data collected from the Buildings Survey and the ESS, the final CBECS data set includes known geographic information (census region and division) and weather data acquired from the National Oceanic and Atmospheric Administration (NOAA).