Summary of the Spring 2006 ASA Meetings

Summaries of the

American Statistical Association (ASA)

Committee on Energy Statistics Advice and

Energy Information Administration (EIA)

Responses at the spring 2006 Meeting

1. How Can Modeling Suggest Data Needs? Open discussion between the Committee and EIA. This session was prompted by Committee remarks in the fall 2005 meeting. Nancy Kirkendall, Chair, and Margot Anderson, Director, EMEU.

See transcript for discussion on EIA’s Home Page:

http://www.eia.gov/calendar/asa_overview.htm

2. Measuring Perceptions of Applying Alternative Disclosure Limitation Methods, Jake Bournazian, SMG Suppression is the most common method that federal agencies use to protect the confidentiality of reported data when releasing an information product. During the past 15 years, alternative disclosure limitation methodologies have been developed for protecting tabular and microdata. These methodologies offer new options in releasing data products for statistical agencies to protect the confidentiality of the reported data. Although these alternative methods offer an improvement to the information loss caused by suppression, these methods impact the utility of the information product to the data user. Research is needed to measure the perceptions of the data user community and the survey respondents of applying alternative disclosure limitation methods to confidential EIA data.

ASA Committee Advice

The ASA Committee recommended collecting some preliminary information as a starting point. They indicated that it would be useful to know the perceptions of users and survey respondents on the current methodology of applying cell suppression to tabular data. The Committee suggested that finding answers to preliminary questions would be useful in designing a study. For example, do respondents understand the current practice of using cell suppression? Also, do respondents care about which methodology is applied if the risk of re-identification is the same? Answers to these questions could be collected inexpensively through telephone calls, focus groups, or EIA’s web site. These qualitative approaches could also be used to explore the concerns and knowledge of users and respondents in this area. If further investigation is warranted, this information would be useful in designing a study to collect data to measure perceptions.

EIA Intended Response to Committee Advice

EIA will consider various qualitative approaches to collect feedback from data users and survey respondents’ regarding the current methodology of applying cell suppression to protect the confidentiality of reported survey values. The first step will be to select a target group to explore their concerns and knowledge concerning cell suppression methodology. The second step will be to draft the questions to ask the target group. Based on the information collected, EIA will then re-assess the need to collect statistical data for measuring data users’ and survey respondents’ perceptions on these issues.

3. A New Oil Production Representation for the SAGE Model: Methodology and Producer Behavior Assumptions Justine Barden, John Staub, Glen Sweetnam, OIAF, EIA

EIA uses the System for the Analysis of Global Energy Markets (SAGE) model for its annual International Energy Outlook. The model has 16 regions, and each has well defined demand, supply, technology, and import/export representations. The SAGE oil production representation has received more scrutiny recently as world oil prices are approaching $70 per barrel. Model results show that the current approach of using supply steps to represent oil production can over state the supply response of oil in a world where demand grows at an unexpected higher rate.

EIA plans to test a new oil production representation for the non-OPEC regions and examine the performance of the model in response to various demand and price scenarios. For each time period, the new model will incorporate key factors such as exploratory drilling, reserve additions, development drilling, production per well, oil prices, optimal drilling activities, and resource constraints, in its dynamic production analysis.

This paper describes a linear/non-linear modeling methodology that EIA is considering and raises the issue of price expectation in the determination of optimal drilling and production activities. The handling of price expectation may have profound effect on the projection of oil production and will be tested extensively when the model is fully operational.

ASA Committee Advice

The committee recommended that any assumptions in the oil representation be made explicit, with information on how sensitive the model results are to these assumptions. The committee counseled careful use of the USGS resource estimates, which is the basis of the resource data used for the model. The committee pointed out that we should be aware of the assumptions inherent in the USGS data, and that as the USGS estimates have grown over time we should specify how we account for this in using current estimates to forecast supply out 25 or more years. Finally, as this model is still under development, the committee requested that we provided more details in the future that were not yet available.

EIA Intended Reaction to Committee Advice

EIA has received additional information from the contractor working on the model and is currently evaluating it before possibly soliciting further advice from the committee.

4. Improving the SAGE Petroleum Refinery Model, John Staub, OIAF, Phillip Tseng, SMG, EIA

The EIA completed initial development of the System for the Analysis of Global Energy markets (SAGE) in early 2003. The model is built on a liner programming platform and is solved for the least cost of meeting a predetermined set of energy service demand, period by period over a five year time interval from 2005 through 2030. SAGE includes sixteen world demand and supply regions. For each region, there are four end-use demand sectors (commercial, industrial, residential, transportation), petroleum refining, power generation, and supply of both fossil fuels and renewable energy. The structure of the SAGE is generic. A modeler can relatively easily increase the number of demands for energy services and introduce new technologies into the system.

Recent developments in the world market for crude oil and petroleum products prompted the need to enhance the refinery representation of the SAGE model. The U.S. refinery acquisition cost (RAC) of crude oil rose from less than $26 per barrel in January 2000 to more than $55 in October 2005. Price differentials between light sweet and heavy sour also widened in the same period; in January 2000, the U.S. FOB cost of crude gravity 20 per cent or less was $20.78 per barrel and the same barrel for crude gravity 40.1 to 45% was $26.9 per barrel. In October 2005, the cost was $44.21 and $59.24 for the heavy and light crude oils; differences in prices increased from about $6 per barrel in 2000 to more than $15 per barrel in 2005. The increased price differentials reflect several important market interactions: demand share for light products increased more than heavier products, supply of heavier crude oils was relatively more abundant than light crude oils, and refineries lack down stream capacities to process heavy crude oils.

The world demand for petroleum products is projected to increase by almost 50 percent between 2005 and 2030. Most of the increase will be in gasoline, diesel, and jet fuel. The improved refinery representation will help EIA capture several important features of future petroleum market. They include investment requirements, product pricing, product trade flow, price differential between light and heavy crude oils, and more reasonable forecast of long term supply of petroleum products.

The key to developing a manageable refinery model is adding only the essential elements of refinery operations and minimize unnecessary details. The refinery representation needs to mimic only the essential physical structure of refineries. For example, an abstract representation may remove sulfur before rather than after distillation.

John Staub discussed (1) historical trends in worldwide and U.S. refinery capacity, crude oil inputs and refined petroleum product outputs, (2) the existing petroleum refinery methodology in the SAGE model, and (3) a proposed enhanced modeling approach for refineries.

ASA Committee Advice

The ASA Committee acknowledged that modeling worldwide petroleum refining is a difficult task because it is attempting to match a diverse slate of refinery inputs and outputs in multiple locations. The challenge is to introduce sufficient complexity to characterize the decision process in how the market operates, but not too much such that the model becomes unmanageable. Moreover, political decisions often override market economics in the refinery sector. The ASA Committee recommended that whatever level of refinery model detail was finally selected, EIA run the model multiple times with likely different parameters to see how sensitive the outcome is to the input parameters. This will provide insight into the sensitivity of the results to stochasticity in the model and its assumptions.

An outstanding issue for EIA is how much should EIA get into the business of talking about risk associated with doing business at different parts of the world, which is going to be one of the driving forces behind investments in these big capital-intensive projects.

EIA Intended Reaction to Committee Advice

EIA intends to follow the Committee’s advice to run the model multiple times with different parameters to estimate the model’s sensitivity to various parameters. In terms of the level of model complexity, EIA will test more detailed refinery representations as time permits.

5. 2006 Manufacturing Energy Consumption Survey (MECS): Looking at Past Performance Statistics to Motivate New Methods of Collection, Robert Adler and Tom Lorenz, EMEU, EIA

The paper and presentation looked at three aspects of the 2002 Manufacturing Energy Consumption Survey (MECS) to see what changes would be in order for the 2006 MECS. First, a sample of data status flags were examined to see what they showed about troublesome aspects of the questionnaire content. Second, the status flags data were compared for respondents who completed an Excel version of the questionnaires to the data for those respondents who completed a written version. Together, those two sections point to the desirability of an electronic version of the 2006 MECS. Finally, the presenters gave an examination of differential response and coverage-rates were made along with a description of non-response follow-up procedures.

ASA Committee Comments and General Discussion

Dr. Feder led the discussion by asking why we do the survey every four years and not do it continuously through the cycle by going to a portion of the sample every year.

There was then a lot of back-and-forth and group discussion about possible benefits of using the electronic MECS: cleaner data, less call backs. There was general agreement that this would be a helpful thing to have and that there is potential for cost and time savings.

The discussion then turned to response rates and the possibility of using incentives for respondents. One committee member talked about incentives she used for a facility study. The point-of-contact got a nice portfolio while related unidentified staff received a food incentive. She said that both could be obtained relatively cheaply.

Discussion turned briefly to the respondents’ capability to save incomplete survey work to Census Taker and return to it later. They will be able to do that.

The response rate discussion turned to the idea of coverage rates vs. traditional response rates. There was general agreement that the coverage rate is the more important measure for national level estimates. However, there was some concern that at the smaller area estimates or for sub-samples chosen by researchers using micro-data that the coverage rates may not be as relevant here. There was also some discussion about the type of non-respondents as well as the number and that we “stopped” our follow-up when we reached our coverage rate. (We do have some characteristics by size and NAICS and we also have written follow-ups before we go to the telephone.) It was suggested that we may experiment with the letters and the mandatory response to see if that can elicit a better response.

EIA’s Response to Committee Advice and Comments:

Regarding doing the survey on a continuous basis, the main hindrance is cost. The cost would not really be able to be split into fourths as certain operations require a base level of expenditure regardless of the size of the sample. Second, one would never have an actual yearly estimate in the same way; there would instead be a series of rolling averages. Then, splitting the sample in such a skewed population would be difficult. There are certain large refineries, chemical companies, and primary metal companies that would almost always have to be part of any sample to ensure accuracy of any estimate.

While those things are difficult to overcome, they may not be impossible. EIA would have to desire the changes enough to shoulder a rather large initial cost. We have also been exploring more compromise solutions like a large MECS every four or five years and a “mini-MECS” that would occur once or twice in the interim. The plan in this case is to field a small set of questions with a smaller sample size independently drawn. All of this becomes more possible if we can reduce the cost and time for one survey. We hope switching to an electronic form as the primary data collection mode will accomplish that.

Regarding response rates and non-response follow-up, EIA intends to keep the same general call-back procedure and to emphasize the coverage rate over the simple response rate. However, for the 2006 MECS, the sample will be drawn earlier so that the actual people responding at the establishment can be identified. They will be told about the merits of using the Census Taker version of the MECS and how to submit their data, no matter what mode they choose. In this way, we hope to identify many of the potential problems before the actual fielding of the survey. As for incentives, we will investigate cheap alternatives to mail to them at the time of the survey mail-out.

EIA will also reexamine the non-response follow-up letters that precede the telephone follow-up. If alternative versions seem desirable, we may try an experiment to see which version of the follow-up letter achieves a better conversion rate.

(Check for duplication, above vs. below.)

EIA’s 2002 Manufacturing Energy Consumption Survey (MECS) data file has a wealth of “metadata” available in the StEPS database in which it is housed. For each data item collected or derived, a flag indicates whether the data is reported, corrected by analyst intervention, or the result of using an alternate data source. The number of analyst corrections for collected data items gives an indication of data quality.

This paper will present statistics for some key items that may indicate a problem in the wording of the question, conceptual understanding, or other problems in respondent reporting. This metadata examination will yield potential changes to the 2006 MECS.

The 2002 MECS also had an electronic option for reporting for certain classes of respondents. For those classes of respondents, we will compare the analyst intervention flags and other performance statistics between the electronic and non-electronic reporting groups. The results will be used to justify the use and clearance of an Internet Data Collection.

Performance statistics, especially non response statistics for industry and size classes, will be used to demonstrate the desirability of:

· Form specialization based on type of industry;

· Shifting the sample away from smaller respondents and allowing their weights to rise.

If available in time for the presentation, we will provide an update of frame and sample changes anticipated for 2006.

Expect Friday, July 7 per note from Bob A.

6. EIA 914: Data Expansion Challenges to Include Crude Oil Production, John Wood, OOG. (John Wood is at 214-720-6160)

Abstract, summary of ASA advice and summary of EIA response are outstanding. Materials related to this talk may be found at:

http://www.eia.gov/calendar/asa_overview.htm

7. Making Adjustments to Survey Data When the Collected Data Do Not Meet Expectations. Stan Kaplan, CNEAF, EIA. Paper was on the EIA-920 data and information challenges. Statisticians will be interested because the form was changed before on the basis of cognitive testing, but still has some challenges. It may be that the Committee modelers and energy members will have useful ideas about the concept we are trying to collect and model.

Reminder sent 9/24/06 for summary of suggestions and EIA response.

8. Preliminary Research Results on Respondent Cut-off Dates for EIA Electricity Data Collections Howard Bradsher-Fredrick and Alethea Jennings, SMG, EIA In order to achieve high response rates on establishment surveys, EIA expends significant resources in administering non-response follow-up to those surveys. For example, our analysis of the submission dates related to the EIA-860, Annual Electric Generator Report, shows that over 95% of the volume has been reported within two months of the deadline for submission while EIA continues to conduct non-response follow-up for over four months following the final deadline. Considering tightening budgets, the issue can be raised as to whether EIA will be able to continue to expend significant resources to achieve near 100% coverage by volume.

In order to make rational decisions on this issue, it is advisable to study past data collections to first assess when data had been submitted and to then determine the character of the respondents and lost respondents associated with an array of alternative cut-off dates. This paper summarizes these preliminary analyses on 2004 submissions of EIA-861, Annual Electric Power Industry Report, and EIA-860, Annual Electric Generator Report, data. In addition to overall summaries some analyses were also conducted on various strata important to data users.

In addition, we would like to discuss with the Committee some of the challenges, such as that both the EIA-861 and EIA-860 surveys are used as frames for sample surveys. We would also like to discuss plans for future analyses, such as the use of imputation to obtain data for the missing respondents. We would like to get the Committee’s comments on the work we have done so far and on our plans. Because this is work in progress, the paper may differ slightly from the abstract.

ASA Committee Advice

The ASA-Energy Committee discussed some ideas concerning how to handle the problem of late respondents and non-respondents on the EIA-860 and EIA-861 surveys. These ideas were generated in the context of a discussion more specifically about the use of alternative respondent cut-off dates. The ideas of the Committee included the following:

· Is there really a need to release data at an earlier date? The presentation focused on the cost to EIA of continuing to collect data for an extended period of time and consequently releasing data at a later date than would be required if an earlier cut-off date were employed. The Committee posed the question of the cost to EIA data users, “Is there a significant research cost to EIA’s data users to wait as long as they presently do to release the data for the EIA-860 and EIA-861?” If not, maybe EIA should not change its procedures.

· Telephone interviews could be conducted in order to help identify the primary causes of late responses to these surveys and remedies could perhaps be identified.

· The non-respondents (after a specified cut-off date) could be sampled rather than trying to elicit responses from all non-respondents. This could enable EIA to concentrate their resources on a few non-respondents with greater vigor. However, it was pointed out by EIA that it is very important to receive data from as many respondents as possible since the EIA-861 data, in particular, is very important for other surveys. For example, the results of the EIA-861 are used for imputation purposes involving the EIA-826 (a monthly survey of a sample of the EIA-861 frame). Thus, sampling non-respondents would cause poorer results on other surveys.

· EIA could conduct a cost/benefit analysis of what is gained and lost by not collecting the last 1, 5 or 10% of volume.

· EIA could think more about their approach on repeated contacts to bring in more completed surveys earlier in the process.

· EIA could investigate mechanisms for reducing the cost of non-response follow-up. For example, repeat offenders could be identified as part of an approach to solving the problem of untimely submissions. As part of an effort to obtain their completed surveys earlier in the year, resources could be dedicated to assisting respondents in identifying and resolving, if possible, the issues that contribute to the lack of timeliness in their data submissions.

EIA Response to Committee Advice

EIA was already aware of the ideas presented by the Committee on this issue. It was, however, interesting to discuss these issues again. The presenters will review, make any necessary changes and submit their paper to CNEAF management. This document could be useful in the event that the issue of early cut-off of non-respondents or other related issues (e.g., sampling a portion of the small respondents on a rotating basis) are again raised in the future.

9. Functional Requirements for EIA’s Internet Data Collection System, Stanley R. Freedman, SMG, EIA An EIA team has been working to develop functional requirements for an EIA-wide Internet Data Collection (IDC) system. These requirements will serve as a basis for developing an IDC that will meet the needs of EIA’s respondents, and survey managers. The work of the team is nearing completion as reflected in the accompanying PowerPoint presentation given to the Goal 4 subcommittee for EIA’s Strategic Plan. The team would like input from the ASA Committee on the requirements we have developed to this point.

Summary of Committee Comments & Advice

The committee felt that EIA was taking the correct approach to developing requirements for an Internet Data Collection System. They believed that the functional areas were adequately addressed. Most of their comments and suggestions related to interaction between respondents and the IDC. Many of their suggestions had already been addressed in the functional requirements document.

The Committee had the following specific concerns and suggestions.

A. One committee member had some concerns about protecting the confidentiality of retrieving historical information from the data base. Specifically, if a company was bought, sold, or partially acquired by another company would there be any way to safeguard unauthorized access to historical information

Response – Access to a company’s data through the IDC is controlled by a user ID and password. EIA has no control over who in the company is granted access to that password, or whether multiple people in a company have the ability to access the IDC. If a respondent has concerns about access to historical data by its own staff, we will look at ways of disabling that functionality for that particular respondent. This may not be technically possible. EIA will also need to look at the impact it has on the on-line editing functionality and any other impacts on other functional requirements.

B. The committee suggested that some form of a respondent feedback questionnaire be available to get feedback from respondents.

Response – EIA thinks this is a good idea and will complement the required usability testing. We will explore the most appropriate way of administering that questionnaire.

C. The committee asked whether respondents would be notified if their data were changed after submission to EIA.

Response – This is not a functionality of the IDC. The changes to which the committee is referring take place during the survey follow-up and data cleaning periods and are a function of specific survey operations. Some surveys do routinely notify their respondents if they change a submission. It does impact the IDC however, in those instances where a respondent sees or has access to their historical data. If a respondent sees a data value in the IDC that it did not submit, it will cause concern and confusion. EIA must examine this issue and provide a mechanism for notifying respondents of changes to their submissions, although that notification may be outside the IDC.

D. The committee suggested that there be functionality in the IDC for companies to be able to transmit their data directly to EIA in a prescribed format, compiled from their data bases, rather than having to key data into browser screens.

Response – This functionality will be part of the IDC. Once the data are received by EIA, they will be loaded into the IDC so that the on-line edits can be run. Respondents who submit this way will then have to be called to resolve those flags.

E. The committee thought it would be a good idea to have “tailor” technical assistance for the IDC. By this, the committee meant that there needed to be assistance to handle IT related questions, general survey issues, and questions on specific items on the survey.

Response – There are several types of assistance that will be part of the IDC. The first is a help line dedicated to providing assistance with the single sign-on system, and other strictly IT issues which span all the surveys in the IDC. The second type of help line will be survey specific. Each survey or group of surveys will have a number that respondents can use to ask questions about specific data elements. These are subject matter related questions and not appropriate for IT to answer. In addition to these human interfaces there will be extensive built-in help in the IDC itself. This help will be survey specific, but will link a user to assistance for specific data items, definitions of terms or guidance in completing the survey. This help will be survey specific.

F. The committee thought it is important for respondents to receive feedback on their status of data submission.

Response – Feedback functionality will exist in the IDC. When a respondent submits their data to EIA they will receive immediate notification that their data have been submitted. Secondly, they will receive a confirmation email that their data have been received at EIA. Surveys in the IDC will, at the option of the survey manager, have progress bars to show how much of the survey has been completed.

G. One of the most significant comments we got from the committee was to explore what value added features we can add to the IDC for respondents. Without providing some benefit to respondents to switch from their current reporting mode it may be difficult to bring participation levels up to desired levels

Response – The IDC team has discussed this issue several times over the development of the functional requirements. We believe it is important to address this issue to increase participation. Clearly one incentive is that the respondent may receive fewer calls from EIA about data anomalies. Other suggests for incentives could include special reports and tabulations of a companies data vis a vis other aggregated survey respondents. We will include the development of these incentives in the implementation plan and explore them with respondents during the testing phase.

10. An Empirical Evaluation of the Relationship Between Crude Oil and Natural Gas Prices, Jose A. Villar, OOG, EIA

This paper sought to develop an understanding of the salient characteristics of the economic and statistical relationship between oil and gas prices. This analysis identifies the economic factors suggesting how crude oil and natural gas prices are related, and assesses the statistical significance of the relationship between the two over time. A vector error correction model is estimated to distinguish between long-run and short run effects of changes in natural gas prices on oil prices, and vice-versa. A significant stable relationship between the two price series is identified. Oil prices are found to influence the long run development of natural gas prices, but are not influenced by them.

The purpose of the analysis was to examine the time series econometric relationship between the Henry Hub natural gas price and the West Texas Intermediate (WTI) crude oil price. Typically, this relationship has been approached using simple correlations and deterministic trends. When data have unit roots as in this case, such analysis is faulty and subject to spurious results. The analysis found a co-integrating relationship relating Henry Hub prices to the WTI and a trend capturing the relative demand and supply effects over the 1989 through 2005 period. The dynamics of the relationship suggest a 1 month temporary shock to the WTI of 20 percent has a 5 percent contemporaneous impact on natural gas prices, but is dissipated to zero in two months. A permanent shock of 20% in the WTI leads to a 15% increase in the Henry Hub price one year out all else equal.

The analysis was presented to the Committee to get input on the econometric modeling, and to explore the implications of the Vector Error Correction Model (VECM) for EIA forecasting efforts.

Committee Advice:

In general, the Committee encouraged EIA to continue and expand efforts in similar state of the art time series analysis techniques. With respect to the analysis of the relationship between oil and gas prices, the Committee expressed considerable interest in the time trend term that was used in the model, and suggested that further work should be done to investigate what the real-world phenomena are underlying the time trend term in the model.

EIA Intended Reaction to Committee Advice:

The current research analysis, as presented to the ASA, will be completed and published. EIA will utilize the report to examine opportunities to use what has been learned in its forecasting efforts. A future study of oil and gas prices will be considered to analyze the nature of the time trend, and attempt to determine what is driving this significant link between crude oil and natural gas prices.