# Methodology for EIA Weekly Retail Gasoline Price Estimates

- Updated sampling frame
- New sampling and estimation methodologies
- Updated city definitions

As a result of these methodological changes, the published estimates for May 14, 2018, will not be directly comparable to those published in the May 7, 2018, release that were based on the previous sample used by EIA (prior to these changes). Estimates produced from the two alternative samples are shown in a table published on eia.gov for two weeks (April 30, 2018, and May 7, 2018) by selected geography: United States, Petroleum Administration for Defense Districts (PADD), and selected states and cities.

The documentation on the previous sample used by EIA prior to the methodological changes is archived on eia.gov. The sections below discuss EIA’s methodology for the new sample.

## Weekly gasoline price and annual sales volume data collection

Every Monday, EIA collects information on retail prices for regular, midgrade, and premium grades of gasoline from a sample of retail gasoline outlets across the United States using Form EIA-878, *Motor Gasoline Price Survey Schedule A*. The weekly survey is designed to collect data on the cash price offered at the pump (including taxes) to consumers for each grade of gasoline. The data collected represent the price as of 8:00 a.m. local time on Monday, for the self-serve price except in areas having only full-serve, and the cash price except for outlets accepting only credit cards.

The prices are collected via telephone, email, text, fax, or the internet from a sample of outlets. All collected prices are subjected to automated error checks during data collection and data processing. Data flagged for potential errors are verified with the respondents. Imputation (a statistical replacement process) is used to estimate prices for outlets that cannot be reached during the data collection or validation.

The price data from the sample are used to calculate volume-weighted average gasoline price estimates at the national, regional, and selected city and state levels for all gasoline grades and formulations. The volumes are based on the most recently available annual sales volume data obtained from the sampled outlets and top suppliers of retail gasoline using Form EIA-878, *Motor Gasoline Price Survey Schedule B*. The volume data are used only for statistical purposes and are not published.

The average gasoline price estimates are published at approximately 5:00 p.m. ET Monday, except on government holidays, when the data are released on Tuesday (but still represent Monday's prices). For more information, see Form EIA-878, instructions, and frequently asked questions.

## Gasoline sampling methodology

The target population is all active retail gasoline outlets in the United States for a given week. The population includes two types of outlets—big-box[1] and non-big-box outlets. Big-box outlets typically sell large volumes of gasoline at discounted prices.

The new sample for the *Motor Gasoline Price Survey* was drawn from a frame of approximately 130,000 retail gasoline outlets in the United States that were active in 2016. The gasoline outlet frame was constructed by combining information from a private commercial source with information contained on existing EIA petroleum product frames and surveys, federal and state administrative records, and other publicly available sources.

Outlet names, physical addresses, and ZIP codes were obtained from the private commercial data source. The individual outlets in the frame were assigned to counties after converting the physical addresses to geographic coordinates. The outlets were then assigned either as reformulated or conventional gasoline areas based on the published geographic areas as defined by the U.S. Environmental Protection Agency program and some state-defined reformulated gasoline program areas. The outlets were then further assigned to city areas based on the geographic areas as defined by EIA.

The new gasoline outlet sample is a stratified systematic sample with a total size of 1,000 retail outlets. Retail gasoline outlets are assigned to primary sampling strata based on physical address. These primary sampling strata are nonoverlapping, and one or more primary sampling strata may be combined to correspond to a publication cell.

The primary sampling strata are further substratified by retail gasoline outlet type (big-box or non-big-box). The total sample size is allocated to the sampling substrata in proportion to the number of outlets in the cell after weighting the big-box substrata in recognition of larger annual sales volume per outlet compared with non-big-box substrata.

Sampling within each sampling substratum is performed by ordering the outlets by county and ZIP code and selecting an independent systematic random sample without replacement. This procedure results in adequate sample representation by ZIP code within a given substratum.

Each year, the sample will be augmented to account for new outlets that are established. Also, each year, some geographic regions may experience relatively higher annual rates of outlets going out of business. Those geographic regions with relatively higher rates of sample attrition will be oversampled to account for this impact.

## Imputation and estimation

EIA calculates the survey response rate based on the annual volumes represented by the reporting outlets in the sample. The volumes represented by the reporting outlets in the weekly survey (in terms of total weighted annual sales volume) account for at least 80% for regular grade at the U.S. level.

Item and unit nonresponse to weekly gasoline prices and annual sales volumes are handled at the outlet level by imputation using two sources: (a) survey data reported from other outlets in the sample; and (b) weekly price data obtained from a private commercial source.

The estimation for weekly prices uses two sources of data from the *Motor Gasoline Price Survey*: annual sales volumes for each outlet in the sample and weekly price data for those outlets. Prior to implementing the new weekly sample, EIA collected annual sales volumes and ethanol content for regular, midgrade, and premium gasoline for the retail gasoline outlets in the sample from owners of the outlets and top suppliers of retail gasoline.

The sampling weight for a given sampled outlet is the reciprocal of the outlet’s probability of selection in the sample. Using the annual sales volume data to estimate average prices, the volume weight for a given sampled outlet was constructed by multiplying its sampling weight by its annual sales volume. These volume weights are applied each week to the reported or imputed outlet gasoline prices to obtain weighted average price estimates for the formulations, grades, and geographic areas that EIA publishes.

For quality assurance purposes, average price estimates are withheld from publication if at least half of the weighted annual sales volume is based on outlets for which the weekly gasoline prices are imputed.

## Sampling error, measures of sampling variability, and confidence intervals

Sampling error is a statistical term for the error caused by observing a sample instead of the entire sampling frame. Statistics based on a sample, such as averages, generally differ from statistics for the entire frame because the sample includes only a subset of the frame.

Statisticians use measures of sampling variability, such as the standard error and the coefficient of variation, to measure the sampling error. These measures of sampling variability are typically estimated from the sample that was selected. The standard error, which is measured in the same units (current dollars per gallon for weekly gasoline prices) as the estimate, is a measure of the sampling variability of the estimate based on all possible samples that could have been selected using the chosen sample design. The coefficient of variation, which may also be referred to as the relative standard error, is the standard error expressed as a fraction of the estimate.

Each average price estimate published by EIA has a corresponding estimated standard error published in the Detailed Price and Standard Error Report. For quality assurance purposes, average price estimates are flagged if the corresponding estimated coefficient of variation is more than 5%.

Data users can use the estimated standard error to compute a confidence interval centered about the corresponding published average price estimate with a desired level of confidence. EIA selected only one of many possible samples for the Motor Gasoline Price Survey. If a confidence interval were constructed for each of these possible samples, the percentage of confidence intervals containing the census value (if we had surveyed the entire sampling frame) would be expected to equal the level of confidence. For example, if one could construct a 95% confidence interval for each possible sample that could be selected, then one would expect that 95% of these confidence intervals would contain the value obtained from taking a census of the sampling frame.

To determine the width of the confidence interval for a given published average price estimate, users can compute the margin of error (MOE) using the estimated standard error. The MOE is defined as the estimated standard error of the estimate multiplied by the standard normal percentile for the level of confidence, rounded up to the nearest unit used in publishing the corresponding estimate. The lower bound of the confidence interval is the estimate minus the MOE, and the upper bound of the confidence interval is the estimate plus the MOE. For the standard normal percentile, 1.645 is used for a 90% confidence interval, and 1.96 is used for a 95% confidence interval.

For example, suppose an average price estimate of $1.670 has an estimated standard error of $0.0230482625709464 in the *Detailed Price and Standard Error Report.* The 95% margin of error would be 1.96 * $0.0230482625709464, which rounds up to $0.046. The 95% confidence interval would then be $1.670 +/- $0.046, or $1.624 to $1.716.

## Nonsampling errors

Potential errors unrelated to sampling, called nonsampling errors, include various response and operational errors, such as those related to data collection, respondent reporting, transcription, and nonresponse. All these types of errors could also occur even if every known outlet had been surveyed under the same conditions as the sample survey. Although nonsampling error is not measured directly, EIA employs quality control procedures throughout the survey process.

[1] Big-box outlets include warehouse clubs and large discount retail establishments that sell general merchandise such as groceries, apparel, and gasoline. Non-big-box outlets are all other retail establishments.