U.S. Energy Information Administration logo
Skip to sub-navigation
‹ Consumption & Efficiency

Commercial Buildings Energy Consumption Survey (CBECS)

2003 CBECS Survey Data 2018 | 2012 | 2003 | 1999 | 1995 | 1992 |

Jump Menu:

Public Use Microdata DOWNLOAD PDF

The 2003 CBECS Public Use Files are comma separated value (.csv) files that each contain 5,215 records. They represent commercial buildings from the 50 States and the District of Columbia. Each record corresponds to a single responding, in-scope sampled building. These files contain data for all buildings, including malls. There are 395 mall building records. For the mall buildings, a limited amount of information was collected, so there are many blank fields in the data for these cases. For all buildings, these files contain information such as the building size, year constructed, types of energy used, energy consumption and expenditures. For non-mall buildings, additional data items such as types of energy-using equipment and conservation features are also included in the files.

The smallest level of geographic detail available is the Census division, of which there are nine in the U.S. No state level data are available.

2003 CBECS Building Characteristics
File Numbers Layout File Data File Revised Date
File 1: General Building Information and Energy End Uses TXT CSV 12/06
File 2: Building Activities, Special Measures of Size, and Multibuilding Facilities TXT CSV 12/06
File 3: Heating and Cooling Equipment and Conservation Features TXT CSV 12/06
File 4: Water Heating, Refrigeration, Office Equipment and Special Space Uses TXT CSV 12/06
File 5: End Uses of Major Energy Sources, Electricity Generation, and Purchasing of Electricity and Natural Gas TXT CSV 12/06
File 6: Minor Energy Sources and End Uses for Minor Energy Sources TXT CSV 12/06
File 7: Lighting Percents, Equipment, and Conservation Features TXT CSV 12/06
Imputation Flags (or Z variables)
File 8: Imputation Flags for File 1 TXT CSV 12/06
File 9: Imputation Flags for File 2 TXT CSV 12/06
File 10: Imputation Flags for File 3 TXT CSV 12/06
File 11: Imputation Flags for File 4 TXT CSV 12/06
File 12: Imputation Flags for File 5 TXT CSV 12/06
File 13: Imputation Flags for File 6 TXT CSV 12/06
File 14: Imputation Flags for File 7 TXT CSV 12/06
Consumption & Expenditures
File 15: Consumption and Expenditures for Sum of Major Fuels and Electricity (Includes Imputation Flags) TXT CSV 12/06
File 16: Consumption and Expenditures for Natural Gas, Fuel Oil, and District Heat (Includes Imputation Flags) TXT CSV 12/06
File 17: Consumption of Major Fuels by End Use TXT CSV 11/08
File 18: Consumption of Electricity by End Use TXT CSV 11/08
File 19: Consumption of Natural Gas by End Use TXT CSV 11/08
File 20: Consumption of Fuel Oil and District Heat by End Use TXT CSV 11/08
All Format Codes (text file, 12 KB) TXT   12/06
All Layout Files and Format Codes (pdf file, 92 KB) PDF   11/08

File Organization

Because of the size of the CBECS questionnaire, the variables were separated into groups by subject matter. These 20 smaller files make it easier to manipulate the data.

Several variables are frequently used in the analysis of commercial energy data. These core variables are included in each group of variables:

  • PUBID8: building identifier, which is the link between files;
  • ADJWT8: adjusted sampling weight;
  • STRATUM8 and PAIR8: variance stratum and pair member which can be used for calculating variances;
  • REGION8 and CENDIV8: Census region and division;
  • SQFT8 and SQFTC8: square footage, both exact and category;
  • PBA8: principal building activity;
  • YRCONC8: year constructed category; and
  • ELUSED8, NGUSED8, FKUSED8, PRUSED8, STUSED8, HWUSED8: a set of variables indicating whether electricity, natural gas, fuel oil, propane, district steam or district hot water were used in the building.

For each group of variables, there are two items: a layout file and a data file. The layout file is a text file which gives, for each variable on a file: the variable name, a description, the position on the file, and the corresponding format. The data file is a comma separated value file.

To determine what the different values for each variable represent, use the text file provided of all the format codes (alternatively, there is also a PDF document containing all the layout files and format codes). The formats are arranged in alphabetical order and are written so that they may be easily turned into a SAS format library.

Each of these 20 files can be used by itself or be merged with other files for more complex analyses. By merging files together, a new file can be created that contains, for each respondent, variables from two or more files. The variable PUBID8 should be used to link the files.

To find the national estimate for... Do this... And you should get...
Total number of buildings Sum ADJWT8 4,858,749.82 (or 4,859 thousand)
Total number of office buildings Sum ADJWT8 for cases where PBA8="02" 823,805.47 (or 824 thousand)
Total floorspace Create a new variable (weighted square footage) by multiplying ADJWT8 by SQFT8 for each case, then sum this new variable 71,657,900,522 (or 71,658 million square feet)
Total floorspace in buildings with air conditioning Sum the new weighted square footage variable (see above) for cases where COOL8="1" 63,559,999,624 (or 63,560 million square feet)

The CBECS sample was designed so that survey responses could be used to estimate characteristics of the entire commercial buildings stock nationwide. All published CBECS tables report national estimates.

In order to arrive at national estimates from the CBECS sample, base sampling weights were calculated for each building (these are the reciprocal of the probability of that building being selected into the sample). Therefore, a building with a base weight of 1,000 represents itself and 999 similar, but unsampled buildings in the total building stock. The base weight is further adjusted to account for nonresponse bias. The variable ADJWT8 in the data file is the final weight. In order to obtain a national estimate, each sample building's value must be multiplied by the building's weight.

Imputation Flags or Z variables

Files 8 through 20 contain variables that begin with the letter Z. These "Z variables" are also referred to as "imputation flags." Imputation is a statistical procedure used to fill in values for missing items. Missing values for many, but not all, of the variables were imputed in 2003. The imputation flag indicates whether the corresponding non-Z variable was reported, imputed, or inapplicable. There are no corresponding "Z variables" for variables from the CBECS questionnaire which were not imputed, variables where there was no missing data, and variables which were derived based on other variables.

Confidentiality of Survey Respondents

The names or addresses of individual respondents or any other individually identifiable data that could be specifically linked to an individual sample building or building respondent are seen only by employees of EIA, its survey contractor, and EIA-approved agents. The data are kept secure and confidential at all times. The 2003 CBECS is the first survey cycle for which EIA took possession of specific building identifiers. This change, which gives EIA greater capability to handle and manage its data, was the result of a new Federal law, the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA). The legislation gives EIA both the authority and the responsibility to protect from disclosure identifiable data which respondents have been promised would be kept confidential and used exclusively for statistical purposes. The CBECS meets these criteria, and the 2003 cycle was collected under CIPSEA protection.

In addition, these specific variables that could possibly identify a particular responding building have been masked to protect the confidentiality of respondents:

Square Footage: For buildings over one million square feet, the numeric square footage was replaced with the weighted average square footage of all responding buildings over one million square feet. Separate weighted means were calculated for each of the four Census regions. For buildings one million square feet or less, the numeric square footage was rounded to within 5 percent of the upper limit of the buildings' square footage categories. If the rounded value fell below the lower limit of the category, the value was coded at the lower limit -- for example, buildings in the range of 5,001 to 12,000 square feet were rounded to the nearest 500 square feet (except that buildings rounding to 5,000 were coded as 5,001.)

Climate Zone: This variable has been withheld for buildings larger than a million square feet.

Number of Workers: For buildings where the numeric number of workers was between 2,500 and 4,999, the reported number was rounded to the nearest 250. For buildings where the numeric number of workers was 5,000 or more, the reported numeric number of workers was replaced with the weighted average number of workers of all responding buildings with 5,000 or more workers. Separate weighted means were calculated for each of the four Census regions.

Type of Government Ownership: For inpatient health care buildings, the type of government ownership (Federal, State or local) variable has been withheld.

Number of Floors: The upper range of the number of floors was replaced with two categories: 15 to 25 floors (coded as 994 on the file) and over 25 floors (coded as 995 on the file).

Special Measures of Occupancy: Seven special measures of occupancy are included in the 2003 CBECS (seating capacity for religious buildings, public assembly buildings, education buildings, and food service buildings; licensed bed capacity for inpatient health care and skilled nursing buildings; and number of guest rooms for lodging buildings). These numbers were rounded to the following: Fewer than 25 units (no rounding performed); 25-49 units (rounded to nearest 5); 50-99 units (rounded to nearest 12); 120-249 units (rounded to nearest 25); 250-499 (rounded to nearest 50); 500-999 units (rounded to nearest 120); 1,000-2,499 units (rounded to nearest 250); 2,500-4,999 (rounded to nearest 500); 5,000 or more units (rounded to nearest 1,000). In addition, for inpatient health care buildings, buildings with over 150 beds have been collapsed into one category (coded as 9991).

Heating Degree Days and Cooling Degree Days: These values have been masked so that it is not possible to identify the exact weather station for an individual building.

End Use Estimates

The last 4 public use files (Files 17-20) have information regarding the amount of each major fuel used for specific end uses. In a few cases, the estimates for a specific fuel and end use combination do not agree with the results of the building characteristics survey.

There are 3 cases where the building characteristics would seem to indicate natural gas for secondary heating, but the end use estimation procedure indicated that other fuels accounted for all of the heating in the building. In these cases, the variables indicating that natural gas was used for secondary heating were left as "Yes," but the estimate of the quantity was set to 0. Similarly, there is one case where electricity secondary heating is "Yes," but the amount is 0, and one case where electric cooling is "Yes," and the amount is 0.

Finally, there is one case where all of the heating, cooling, water heating, and cooking was accounted for by other fuels other than district heat, so the district heat consumption estimated is assumed to be for other unspecified uses.

Questions about CBECS may be directed to:

Joelle Michaels
Survey Manager