Methodology for EIA Weekly On-Highway Diesel Fuel Price Estimates

On June 13, 2022, we implemented a new sample based on methodological changes made to improve the accuracy of our estimates of weekly on-highway diesel fuel prices. These methodological changes include:

Creating an updated sampling frame
Using new sampling and estimation methodologies
Publishing a history of standard errors for the weekly price estimates produced from the new sample

We will monitor the results from this sample and the performance of the methods.

As a result of these methodological changes, the published estimates for June 13, 2022, will not be directly comparable to those published in the June 6, 2022, release, which was based on the previous sample we used before our methodological updates. We present separate estimates produced from the two alternative samples in a table for two weeks (week of May 30, 2022, and week of June 6, 2022) by selected geographies — the contiguous United States and regional breakouts of the five Petroleum Administration for Defense Districts (PADDs). For more information on the methodological changes, see frequently asked questions.

You can find documentation on the previous sample we used before the methodological changes on our website. The sections below discuss our methodology for the new sample.

Weekly on-highway diesel fuel price and annual sales volume data collection

Every Monday, we collect information on retail prices for on-highway diesel fuel from a sample of retail diesel fuel outlets across the contiguous United States using Form EIA-888, On-Highway Diesel Fuel Price Survey Schedule A. The sample includes a combination of truck stops and service stations that sell retail on-highway diesel fuel. We designed the weekly survey to collect data on the cash self-serve price offered at the pump (including taxes) to consumers for on-highway diesel fuel as of 8:00 a.m. local time on Monday. The data represent the price of ultra-low sulfur diesel (ULSD), which contains less than 15 parts-per-million sulfur.

For the sample of outlets that we selected for the On-Highway Diesel Fuel Price Survey, we collect weekly prices via telephone, email, text, fax, web survey, or manual retrieval from company websites. All collected prices are subjected to automated error checks during data collection and data processing. We verify data flagged for potential errors with the respondents. We use imputation (a statistical replacement process) to predict prices for outlets that cannot be reached during data collection or validation.

We use the price data from the sample to calculate volume-weighted average diesel fuel price estimates at the national, regional, and California-state levels. The volumes are based on the most recently available annual sales volume data collected from Form EIA-888, On-Highway Diesel Fuel Price Survey Schedule B, only once when outlets are selected for a new sample. We use the volume data only for statistical purposes, and we do not publish these data. All collected volumes are subjected to error checks during data validation. We verify data flagged for potential errors with the respondents. Volume data reported for only part of the year are inflated to approximate sales volumes for a full calendar year. We use imputation to predict volumes for outlets that could not be reached during the data collection period.

We publish the average diesel fuel price estimates around 4:00 p.m. ET Monday, except on government holidays, when the data are released on Tuesday (but still represent Monday's prices). For more information, see Form EIA-888, instructions, and frequently asked questions.

Diesel fuel sampling methodology

The target population is all active retail on-highway diesel fuel outlets in the contiguous United States for a given week. Due to statistical and operational considerations, we exclude outlets in Alaska and Hawaii from the target population. The population includes two types of outlets—truck stops and service stations. For the sole purpose of sampling efficiency, we define a truck stop as an on-highway diesel fuel retail outlet that has diesel fuel bays designed to accommodate and serve large trucks and may also offer amenities such as restaurants, showers, truck maintenance and repair, and laundry services. Truck stops typically sell larger volumes of diesel fuel and have designated diesel fueling bays that are separate from gasoline or diesel fuel pumps intended for automobiles and light trucks. In contrast, service stations sell diesel fuel in the same area as gasoline.

The new sample for the On-Highway Diesel Fuel Price Survey was drawn from a frame of approximately 73,000 service stations and 9,500 truck stops in the contiguous United States that were active in 2021. We constructed the diesel fuel outlet frame by combining information from a private commercial source with information contained on our existing petroleum product frames and surveys and other publicly available sources. We obtained outlet names, physical addresses, and ZIP codes from the private commercial data source, and outlets in the frame were assigned to counties after converting the physical addresses to geographic coordinates.

The new diesel fuel outlet sample is a stratified systematic sample with a total size of 590 retail outlets. Retail diesel fuel outlets are assigned to eight primary sampling strata based on physical address. These primary sampling strata are nonoverlapping, and they correspond to the published regional breakouts of PADDs, the most detailed geographic levels used to define publication cells.

Each primary sampling stratum is further substratified into a certainty substratum of truck stops and up to four noncertainty substrata: large truck stops, medium truck stops, other truck stops, and service stations. Truck stops that are selected in certainty substrata are included in the sample with a probability of 1. By substratifying the primary strata, 38 sampling strata were formed.

We developed a model using historical sales data collected with Form EIA-821, Annual Fuel Oil and Kerosene Sales Report, to estimate annual diesel fuel sales volumes at the outlet level. This model helped us stratify the truck stops into up to four substrata for a given primary sampling stratum based on auxiliary data at the outlet level on the frame that included truck diesel fuel lane counts, traffic volumes on nearby roadways, truck parking availability at the outlet, and sales of diesel exhaust fluid (DEF) from a pump.

We allocated sample sizes to the noncertainty substrata based on the substratas’ relative estimated diesel fuel sales volumes, subject to constraints on minimum sample sizes and maximum sampling weights. Sampling within each noncertainty sampling substratum is performed by ordering the outlets by county and ZIP code and selecting an independent systematic random sample without replacement using a fractional interval. This procedure results in adequate sample representation by ZIP code within a given substratum.

Based on annual assessments of the diesel fuel frame, the sample may be augmented to account for new outlets that we identified since the construction of the initial frame. In addition, some geographic regions may experience relatively higher annual rates of outlets going out of business. Those geographic regions with relatively higher rates of sample attrition may be oversampled for newly identified outlets, compared with other regions, to help offset these smaller sample sizes.

Imputation and estimation

Before implementing the new weekly sample, we collected recent annual sales volumes only once from owners of the sampled outlets using On-Highway Diesel Fuel Price Survey Schedule B. For a given week, we calculate the survey response rate based on the annual volumes represented by the reporting outlets in the sample. In terms of total weighted annual sales volume, the volumes represented by the reporting outlets in the weekly survey account for at least 80% of diesel fuel sold at the contiguous U.S. level.

We handle item and unit nonresponse to weekly diesel fuel prices and annual sales volumes at the outlet level by imputation. Depending on available information, the imputation procedure is based on a model that incorporates some combination of previous survey data reported by the outlet, survey data reported by similar outlets in the sample, and data obtained from a private commercial source.

The estimation for weekly prices uses weekly price data for outlets in the sample that are collected from Form EIA-888, On-Highway Diesel Fuel Price Survey Schedule A, and recent annual sales volumes for each of these outlets that are collected from Form EIA-888, On-Highway Diesel Fuel Price Survey Schedule B. The sampling weight for a given sampled outlet is the reciprocal of the outlet’s probability of selection in the sample. To estimate average prices, we first calculate the weighted volume for a given sampled outlet by multiplying its sampling weight by its annual sales volume. We then apply these volume weights each week to the reported or imputed outlet diesel fuel prices to obtain weighted average price estimates for the geographic areas that we publish.

For the few outlets in the sample that report selling on-highway diesel fuel to automobiles and trucks at different prices, we collect only the weekly truck prices from these outlets and weight these prices by the recent annual truck sales volumes collected from Form EIA-888, On-Highway Diesel Fuel Price Survey Schedule B. We do not collect the weekly automobile diesel prices for these outlets because we do not believe that these automobile sales would have a meaningful impact on the published price estimates to justify the additional response burden.

For quality assurance purposes, we withhold average price estimates from publication if at least half of the weighted annual sales volume is based on outlets for which the weekly diesel fuel prices are imputed.

Sampling error, measures of sampling variability, and confidence intervals

Sampling error is a statistical term for the error caused by observing a sample instead of the entire sampling frame. Statistics based on a sample, such as averages, generally differ from statistics for the entire frame because the sample includes only a subset of the frame.

Statisticians use measures of sampling variability, such as the standard error and the coefficient of variation, to measure the sampling error. These measures of sampling variability are typically estimated from the sample that was selected. The standard error, which is measured in the same units (current dollars per gallon for weekly diesel fuel prices) as the estimate, is a measure of the sampling variability of the estimate based on all possible samples that could have been selected using the chosen sample design. The coefficient of variation, which may also be referred to as the relative standard error, is the standard error expressed as a fraction of the estimate.

Each average price estimate we publish has a corresponding estimated standard error published in the Detailed Price and Standard Error Report. For quality assurance purposes, we flag average price estimates if the corresponding estimated coefficient of variation is more than 5%.

Data users can use the estimated standard error to compute a confidence interval centered on the corresponding published average price estimate with a desired level of confidence. We selected only one of many possible samples for the On-Highway Diesel Fuel Price Survey. If we constructed a confidence interval for each of these possible samples, we would expect the percentage of confidence intervals containing the census value (if we had surveyed the entire sampling frame) to equal the level of confidence.

For example, if one could construct a 95% confidence interval for each possible sample that could be selected, then one would expect that 95% of these confidence intervals would contain the value obtained from taking a census of the sampling frame.

To determine the width of the confidence interval for a given published average price estimate, users can compute the margin of error (MOE) using the estimated standard error. The MOE is defined as the estimated standard error of the estimate multiplied by the standard normal percentile for the level of confidence, rounded up to the nearest precision unit used in publishing the corresponding estimate. The lower bound of the confidence interval is the estimate minus the MOE, and the upper bound of the confidence interval is the estimate plus the MOE. For the standard normal percentile, we use 1.645 for a 90% confidence interval and 1.96 for a 95% confidence interval.

For example, suppose an average price estimate of $3.670 has an estimated standard error of $0.0230482625709464 in the Detailed Price and Standard Error Report. The 95% margin of error would be 1.96 x $0.0230482625709464, which rounds up to $0.046. The 95% confidence interval would then be $3.670 +/- $0.046, or $3.624 to $3.716.

Nonsampling errors

Potential errors unrelated to sampling, called nonsampling errors, include various response and operational errors, such as those related to data collection, respondent reporting, transcription, and nonresponse. All these types of errors could also occur even if every known outlet had been surveyed under the same conditions as the sample survey. Although nonsampling error is not measured directly, we employ quality control procedures throughout the survey process.

Top

Petroleum & Other Liquids

Major Topics

Find by

Gasoline and Diesel Fuel Update