U.S. Energy Information Administration - EIA - Independent Statistics and Analysis
Commercial Buildings Energy Consumption Survey (CBECS)
How Will Buildings Be Selected for the 2012 CBECS?
Background and Overview
Did You Know?
In the CBECS, commercial refers to any structure that is neither residential, manufacturing/ industrial, nor agricultural. Building refers to a structure that is totally enclosed by walls that extend from the foundation to the roof.
Data collection for the 2012 Commercial Buildings Energy Consumption Survey (CBECS) will begin in April 2013, collecting data for reference year 2012. The goal of the CBECS is to provide basic statistical information about energy consumption and expenditures in U.S. commercial buildings and information about energy-related characteristics of these buildings.
The 2003 CBECS estimated that there were 4.9 million commercial buildings in the US. Because it would be completely impractical and prohibitively expensive to interview all 4.9 million buildings, the Energy Information Administration (EIA) uses a statistical sample that is designed to represent the entire population. For the 2012 CBECS, the sample size is targeted to be 8,400 completed building interviews (about a 50 percent increase over the number of interviews in previous rounds of CBECS). Trained field staff will conduct the interviews to collect data at each of these 8,400 buildings.
In order to select a statistically valid sample that will produce accurate statistics about the commercial buildings population, each building must have one and only one chance (probability) of selection, and the probability must be known. To do this, there has to be a frame or list of commercial buildings. Currently there is no existing comprehensive frame of US commercial buildings, so EIA must create one. The majority of this frame is called the area frame portion; it is comprised of all commercial buildings in statistically selected geographic areas. Trained field staff create the list by walking or driving through selected areas and recording information about every commercial building. The other part of the frame (as much as 20 percent) is called the list frames portion and is made up of lists of buildings from five different sources. The multi-frame approach ensures that all types and sizes of commercial buildings have a chance of selection. More detailed descriptions of the frame types are provided in the sections below. The 2003 CBECS design is also described because it is the foundation of the 2012 CBECS design.
The area frame construction is the most difficult, time-consuming, and expensive piece of the CBECS frame, but it is a well-established method that ensures the frame is representative of all commercial buildings in the US. As mentioned above, trained field workers visit commercial areas in person and record all commercial buildings. It is not feasible within EIA's budget for field staff to walk every street in the US to record all commercial buildings for the area frame, so the country is broken up into multiple levels of smaller, more manageable pieces that are statistically selected. This is called multi-stage area probability sampling.
When the area frame was created in 2003, the US was first divided into 687 geographic areas called primary sampling units (PSUs); PSUs are counties or groups of counties. A measure of commercial business activity within each PSU was estimated using the Census Bureau's County Business Patterns. This measure of business activity is highly correlated with the number of commercial buildings, so it is used as a measure of size to statistically select PSUs with a method called probability proportionate to size (PPS) sampling. PSUs in major cities with substantial commercial activity are selected with certainty. In the 2003 CBECS, 108 PSUs out of a total of 687 were selected. These 108 PSUs were divided further into 7,031 secondary sampling units (SSUs or segments); SSUs are Census tracts or groups of Census tracts. Ultimately, 511 segments were sampled. The field staff (called listers) then constructed the area frame in these selected segments by walking the blocks (or driving the roads in more suburban or rural areas) and carefully writing down name, address, and some other descriptive information about all commercial buildings within the designated boundaries. There were over 140,000 buildings listed on the 2003 CBECS area frame.
The area frame is limited to all those commercial buildings located in the selected areas but the multi-stage area probability sampling method ensures that it is representative of the commercial building population: selected PSUs represent the population at the PSU level, and selected SSUs represent the given PSU at the SSU level.
It is desirable to sample large buildings at a higher rate than small buildings because of the relatively large and variable amount of energy that is consumed in these buildings—square footage is highly correlated with energy consumption. However, the area sampling procedure cannot provide an optimally efficient mix of large and small buildings to guarantee that a sufficient number of very large buildings would be available for sampling.
To compensate for this inefficiency of the area sample, special lists of large buildings are used to augment the area frame. EIA purchases or obtains five administrative database lists from organizations such as government agencies and trade associations and processes them for the purposes of sampling. There is a possibility that a building could exist in more than one frame, so de-duplicating the frames is essential to ensure that a building has only one chance of selection to avoid biased estimates. After data cleaning and de-duplication, the compilation of these lists provides sufficient coverage of large, energy-intensive buildings. The special lists currently being used are for hospitals, Federal buildings, airports, college campus buildings, and other large (>200,000 square feet) buildings such as offices, shopping malls, and hotels.
Updating the Frames for 2012
The area frame and administrative lists used for the 2003 CBECS are a good foundation for the 2012 CBECS frame, but new buildings were constructed and old buildings demolished since 2003. Because it is so expensive and time-consuming to develop, EIA does not create a new area frame for each CBECS (in fact, prior to 2003, the last time a frame was developed from scratch was 1986). However, it is essential to include new buildings in the sample because newly constructed buildings use energy differently than old buildings due to new technologies and construction standards. Updating the list frames to include new construction is relatively easy; EIA simply obtains the most recent version of each list and replaces the older version. Updating the area frame is trickier—EIA will select a sample of the SSUs from the 2003 CBECS and field listers will again visit these areas in-person to update the frame by adding newly constructed buildings that do not appear on the 2003 frame. These new buildings will be sampled at a rate such that the entire stock of buildings constructed since 2003 will be adequately represented. As mentioned above, the sample size for 2012 CBECS will be 50 percent larger than the 2003 CBECS. In addition to updating the 2003 area frame, EIA will select 43 new PSUs which will be divided into almost 200 new SSUs, and listers will build the area frame components for these areas from scratch. The field work to update the area frame will begin in the Fall of 2012 after training sessions for field staff.
Drawing a Sample from the Frames
After the administrative lists and area frames are updated to reflect the commercial building population in 2012, the sample will be selected.
In selecting the buildings, the goal is to optimize survey cost and sample accuracy. Using the information available on the sampling frame, the buildings will be sorted into subgroups with similar qualities by building size and type. The number of sampled buildings within each subgroup will be calculated so that the variance of estimated energy consumption is minimized, subject to total sample size and budget constraints: subgroups that have a highly variable total energy consumption, specifically large buildings, will be sampled at a higher rate than subgroups where the total energy consumption is less variable. Each sampled building will be assigned a weight, which is equal to 1 / (probability of selection). This weight is the number of buildings in the population that the building represents (including itself and other similar non-sampled buildings). The estimated total number of commercial buildings will be calculated by adding up the weights of all the buildings in the sample.
The increased sample size for the 2012 CBECS will provide estimates with smaller relative standard errors (RSEs) and finer levels of detail compared to past rounds of CBECS.
The initial sample for the 2012 CBECS is expected to include just slightly more than 12,000 buildings in order to yield the targeted 8,400 completed buildings. The higher number of sampled buildings is needed to account for buildings that will turn out to be ineligible and those which will not respond to the survey.