AMERICAN STATISTICAL ASSOCIATION (ASA) ASA ENERGY STATISTICS COMMITTEE- ENERGY INFORMATION ADMINISTRATION (EIA) MEETING Alexandria, Virginia Thursday, April 22, 2004 ASA COMMITTEE ON ENERGY STATISTICS: JAY F. BREIDT, Chair Colorado State University NICOLAS HENGARTNER, Vice Chair Los Alamos National Laboratory MARK BERNSTEIN RAND Corporation MARK BURTON Marshall University MOSHE FEDER Research Triangle Institute BARBARA FORSYTH Westat NEHA KHANNA Binghampton University NAGARAJ K. NEERCHAL University of Maryland SUSAN M. SEREIKA University of Pittsburgh RANDY R. SITTER Simon Fraser University ALSO PRESENT: COLLEEN BLESSING EIA ERIN BOEDECKER EIA HOWARD BRADSER-FREDERICK EIA ALSO PRESENT (CONT'D): TOM BROENE EIA PHILLIP BUDZIK EIA GUY CARUSO EIA STACEY COLE Bureau of the Census DAVE COSTELLO EIA STAN FREEDMAN EIA FRED FREEM EIA CAROL FRENCH EIA JANET GORDON EIA DOUG HALE EIA TAMMY HEPPNER EIA RICK HOGUE Bureau of the Census PAUL HOLTBERG EIA SUSAN HOLTE EIA ALSO PRESENT (CONT'D): CRAWFORD HONEYCUTT EIA ALTHEA JENNINGS EIA DIANE KEARNEY EIA NANCY KIRKENDALL EIA TANCRED LIDDERDALE EIA EDWIN LU EIA RUEY-PYNG LU EIA KEN MARTIN PricewaterhouseCoopers PRESTON McDOWNEY EIA HERB MILLER EIA RENEE MILLER EIA KARA NORMAN EIA JOE SEDRANSK Case Western Reserve University/EIA TOM SPERL EIA ALSO PRESENT (CONT'D): YVONNE TAYLOR EIA PHILLIP TSENG EIA KEN VAGTS EIA SHAUNA WAUGH EIA BILL WIENIG EIA LORIE WIJNTJIS PricewaterhouseCoopers NATHAN WILSON EIA * * * * * C O N T E N T S AGENDA SESSION PAGE Greetings and Remarks 15 Updates Since Fall 2003 26 Meeting Introduction to Short-Term 39 Forecasting Background on Short-Term 46 Forecasting Issues in Short-Term Energy 60 Modeling Summary of Recommendations 115 Natural Gas Prices and 124 Industrial Sector Responses Summary of Recommendations 181 Electricity Transmission 190 Electricity 2005 206 Transmission Data for 226 Public Policy Estimating Weekly Other 246 Oils Stock Summary of Recommendations 285 Natural Gas Production 292 Monthly Survey * * * * * P R O C E E D I N G S (8:32 a.m.) MR. BREIDT: Welcome. This meeting is being held under the provision of the Federal Advisory Committee Act. This is an American Statistical Association (ASA) committee, not an EIA committee, which periodically provides advice to EIA. The meeting is open to the public and public comments are welcome. Time will be set aside for comments at the end of each morning and the afternoon session. Written comments are welcome and may be sent to either ASA or EIA. All attendees, including guests and EIA employees, should sign the register in the hall and include their e-mail addresses. Rest rooms are at the end of the hall toward the back of this room. A fountain is in the same corridor on the way. Telephones in this room share a single number. That number is (202) 586-3071. Kathleen Wert and Tara Stull with the ASA Meetings Department are here and available for committee members for questions on expense reimbursements. In commenting each participant is asked to speak toward a microphone. The transcriber will appreciate that and we have microphones both along the desktop and we have the standing microphones here. You have to speak clearly and into a microphone. These microphones are reasonable sensitive, so you may not need to lean into them. Speakers can use this podium microphone. There's also a lapel microphone. A pointer is available around here somewhere. I'd like to start out now by having us introduce ourselves and I'll start out. My name is Jay Breidt. I'm a professor in the Department of Statistics at Colorado State University and chair of this committee. DR. NEERCHAL: I'm Nagaraj Neerchal, Professor of Statistics at UMBC, University of Maryland, Baltimore County. MS. FORSYTH: I'm Barbara Forsyth. I'm a survey methodologist at Westat. DR. FEDER: I'm Moshe Feder, a statistician from the Research Triangle Institute. DR. HENGARTNER: My name is Nick Hengartner. I'm with LANL and I'm the Vice Chair of the committee. MS. KHANNA: I'm Neha Khanna. I'm an assistant professor of Economics and Environmental Studies at Binghamton University. DR. BURTON: I'm Mark Burton. I'm an associate professor of Economics from Marshall University. MR. BERNSTEIN: I'm Mark Bernstein. I'm from RAND Corporation. MR. CARUSO: Guy Caruso, Energy Information Administration. DR. KIRKENDALL: Nancy Kirkendall, Energy Information Administration. MR. BREIDT: If we could go back into the audience, and please use the microphones. MR. TSENG: My name is Phillip Tseng. I'm with the EIA. MR. HALE: I'm Doug Hale. I'm with EIA. MS. BLESSING: Colleen Blessing, EIA. MS. HOLTE: Susan Holte, EIA. MS. WIJNTJIS: Lorie Wijntjis, Price Waterhouse Coopers. MR. MARTIN: Ken Martin, Price Waterhouse coopers. MR. VAGTS: Ken Vagts, EIA. MS. FRENCH: Carol French, EIA. MS. MILLER: Renee Miller, EIA. MS. HEPPNER: Tammy Heppner, EIA. MR. WILSON: Nathan Wilson, EIA. MS. KEARNEY: Diane Kearney, EIA. MR. BROEM: Tom Broem, EIA. MS. GORDON: Janet Gordon, EIA. MR. FREEDMAN: Stan Freedman, EIA. MR. SEDRANSK: Joe Sedransk, SWRU and EIA. MR. HONEYCUTT: Crawford Honeycutt, EIA. MS. TAYLOR: Yvonne Taylor, EIA. MS. BOEDECKER: Erin Boedecker, EIA. MS. JENNINGS: Alethea Jennings, EIA. MR. PYNG LU: Ruey-Pyng Lu, EIA. MR. LU: Edwain Lu, EIA. MR. MILLER: Herb Miller, EIA. MR. COSTELLO: Dave Costello, EIA. MR. WIENIG: Bill Wienig, EIA. MR. BREIDT: Thank you. For those of you who haven't been here before that mobile microphone is a huge technological advance over the old system so thanks to EIA for that. For your information Nancy J. Kirkendall is the designated federal officer for the advisory committee. In this capacity Dr. Kirkendall may chair but must attend each meeting and she is authorized to adjourn the meeting if she determines this to be in the public interest. She must approve all meetings of the advisory committee and every agenda. Also, she may designate a substitute in her absence. We have an interesting agenda today and tomorrow. We have some plenary sessions and we have a number of breakout sessions, eight of them. This is probably a record. We'll be fairly busy. We'll try to keep on time and, of course, we're already not on time but we'll try to catch up. So the first section of the meeting will begin with a briefing by Guy Caruso, EIA's administrator. Then Nancy Kirkendall, the Director of the Statistics and Methods Group, will comment on EIA work since the Fall 2003 meeting. After Nancy's comments we'll begin the first morning's breakout sessions. Those will be on EIA's work in short-term forecasting as well as measuring the quality of EIA analysis, part of EIA's strategic plan goals for 2004 to 2008. Lunch for the committee and invited guests will be on the first floor at 12:30. It will be in Room 1E226 in the same corridor as we are in now. That's the usual place. This evening we have a reservation for dinner at the Little Viet Garden in Clarendon in Arlington, Virginia. So we need a show of hands but there's a note here that it's a short Metro line on the Orange Line and we'll probably need to stick together. The reservations are at 6:15 if anyone's interested. So could I see a show of hands for committee members for dinner tonight? Bill, did you get that? Tomorrow morning, breakfast for the committee and the EIA participants and management will be here again beginning at about 8:00 so we can meet in the hotel lobby around 7:45 if you care to do that. We will resume tomorrow at 8:30 here in this room. A couple of other announcements. We have two new members on the committee. One is here already. This is Barb Forsyth. And one had a late class last night in Pittsburgh. That's Susan Sereika, who will be joining us later. Bill Moss and Jae Edmonds will not be attending today. Finally, the usual meeting mechanic is that if the panelists have a comment you turn your name tag vertically to be recognized. Now its my pleasure to recognize Guy Caruso, administrator of EIA. MR. CARUSO: Thanks, Jay, and welcome, everyone, especially Barbara to your first meeting. We appreciate your joining the committee. As usual it's been a busy six months since we last met in October. Since then we've done the usual core products that you are all familiar with. The annual energy outlook came out right after our meeting last time. Since then we have had, of course, the international energy outlook just last week. But I think the thing that's been shall I say driving a lot of the activities of the EIA since October have been energy markets particularly for natural gas and gasoline. Some of the topics on todays agenda, in fact several of the topics, will address what we are doing in those areas that we think this committee can be helpful in. I'll talk a little bit more in detail on gasoline in the short term, which, of course, is instrumental in our short-term outlook, and just two weeks ago we did our summer driving season outlook conference using the short-term energy outlook model which Dave Costello is going to talk about a little later. Natural gas prices continue to be very firm with the prospects of them staying there which has important implications for our work on both the supply and the demand side. Both of those areas will be discussed today. Ill talk a little bit about our budget which we submitted in February. We have had all of our hearings and we certainly hope that the Congress will vote on that before they adjourn. I'll mention that as we talk about the budget in particular, where we think that's headed. And the trend that I mentioned probably in all of the meetings that we have been in together that I have had the privilege of addressing the committee on is that the continued trend of being asked to do more policy analysis, what-if analysis, using both the NEMS and the short-term model, and that's continued and accelerated probably in the last six to eight months, mainly because of the energy bill that's still being debated in Congress. The high gasoline prices, of course, are an analytical issue but in particular at this time a political issue. With the Presidential election coming up its taken on even a greater significance and so EIA, the numbers we produce each week, whether they be prices or inventory numbers, and our analysis each month, I think more attention is being paid to that than ever. In particular the media interest in what EIA does has also intensified and so all the more reason why, I think, we want to refine the work we do in the short-term energy outlook and in the collection of the gasoline data which, as some of you know, we have increased the detail in which we collect the weekly gasoline data to include more blending components, which have become very important given the introduction of ethanol in substituting for MTBE. We have redesigned our sample in collecting that gasoline data and the new forms were just reported on April 14. So that was the first weekend. We certainly had the usual running in problems and I'm not sure we'll have a chance to talk about that in more detail later. This next slide just shows that kind of run-up in not only gasoline but crude oil prices that has occurred over the last several years and began to intensify the interest in what we do. The next one shows something that there has been a lot of interest in, not only among our customers but among the regulatory agencies such as FERC and CFTC and that is how EIA data moves markets and there have even been several investigations called for by Congress as to whether there's any possibility of manipulation of the market that certainly, I think, once again shows how important the data we collect and the way we produce and analyze and disseminate it are important not only to the customers but to efficient markets. Natural gas prices have been rising steadily and now are moving hand and glove with crude oil prices. We have been trying to improve and I know Beth Campbell in one of the first meetings I was at in this committee, talked about the way we collect production data. We are now almost ready to release a Federal Register notice on the possibility of a new production survey for natural gas and we are going to discuss that, I know, in this agenda as well so we really seek your advice in that area. We hope to initiate a new natural gas production survey in the issues regarding frames and other statistical measures with respect to that that would be valuable to get your input on that. The other analytical issue which has emerged from the natural gas price run- up is what it has been and what will be the impact on in particular industrial demand for natural gas. This is another area where we believe more analytical work needs to be done improving our models, whether they be the short-term or the long-term, and certainly that's an area where again we think the committee could be of assistance to us. As I mentioned, we submitted the budget and have had budget hearings for FY 05. It's a request for $85 million, which is almost $4 million more than FY 04. It's a bit misleading in that last year in the current FY 04 budget although we have an $81 million budget this year we are actually operating on $84 million of expenditures because we are taking $3 million from de-obligated and uncosted funds of a previous year. So although this looks great, a 5- percent increase, its really a lot less than that and Howard and I are working very hard to try to get more resources, which we all know EIA spends very judiciously. So we hope that that would happen. The latest work from people who follow the Hill very closely is that there's a high likelihood that perhaps Defense and maybe one other major appropriation bill will actually pass this session and that all of the other budgetary items will go into an omnibus bill which is likely to go into a continuing resolution because of the elections coming up and the difficulty of trying to pass all of those budgets before Congress adjourns. So well keep you posted on that. Probably by the next meeting we'll know a lot more. With the increase in the money, as I mentioned, the production survey is one of the major new things we'll be doing in FY 05 if all goes well and we can get the approval from OMB and are ready to move on that as early as 1 January 05 if possible. That's our goal to at least start. We clearly need through Congressional mandate to update the voluntary reporting on greenhouse gas survey and database, the so-called 1605(b), which I know Jae Edmonds has been so close to. We are working on a number of things within the NEMS modeling, the long-term modeling system, one of which is the transportation sector. One that's not listed here but I know we'll be working on is this issue of natural gas and its use in the industrial and manufacturing sectors. What does $5 gas mean for that kind of demand as well as on the oil side the imposition of new fuel standards? Low sulfur gasoline and deisel fuel are important, and the regionalization of our short-term energy model, which will be an important topic on your agenda today. This next slides shows just a sample of the reports we have been asked to do by Congress. They're almost all related to the Energy Bill or the Omnibus or the comprehensive energy bill that was introduced two years ago and reintroduced last year. There have been a number of negotiations going on. Its uncertain whether there will be an energy bill but nonetheless EIA has been asked to analyze not only the whole comprehensive bill by Senators Sununu and Senator Dorgan in two separate versions of the bill but we have been asked to do specific analysis with respect to things like the Alaska Natural Wildlife Refuge, issues with respect to LNG, and global oil markets. So EIA I think in the last six months has been quoted and relied on extensively in Congress and within the administration and some of the work that this committee has assisted with has been extremely valuable in our analytical responses to Congress and the administration as well as the regular products that we produce. So finally this just sums up some of the things that we will be asking your advice and counsel on with respect to these new products that we are working on, improvement of existing systems in terms of surveys and frames and also the cognitive methods in establishment surveys. I'm sure Nancy will go into more detail on some of these things. And the other topic that no matter where I go, and I'm around speaking to a number of different meetings that the EIA website continues to be extremely heavily used and very popular and nowhere around the world or certainly within this country can you go without someone coming up and saying oh, I love your website. Occasionally they will have a little suggestion but basically it's our main vehicle for getting the information and analysis we do across to the public as well as to our other customers. So, once again, what you are doing in helping us to achieve our goals of better quality and more timely and accurate data and analysis is invaluable and I thank you all once again for your time. I appreciate your all being here and look forward to working with you over the next couple of days. Thank you, Jay. MR. BREIDT: Next Nancy Kirkendall will update the committee on progress since the Fall 2003 Meeting. DR. KIRKENDALL: Well, I'd like to welcome the committee before we get the slides started. The purpose of my talk is what weve done regarding some of the advice you gave us last time and how it ties in to some of the things were going to ask you about this time so it's a transition kind of talk. While Preston is looking for it one of the major things that EIA did, I think it was March we agreed to the strategic plan. We talked about the strategic plan with you last time, particularly the action items under goal one. Now the strategic plan has been finalized and agreed to by all EIA senior staff so we are now serious about doing all the things we said we were going to do. We will talk about the status of the strategic plan because there are lots of action items in the strategic plan that have a lot to do with statistics, how do we measure things and how do we report them, so that will be a topic of a number of break-out sessions today as it was at the last meeting. Guy mentioned that we have a number of initiatives. We've talked to you before about natural gas and we have a few of them on the agenda again this time. I'll give you an update on the Confidential Information Protection and Statistical Efficiency Act, CIPSEA, we call it, how EIA is implementing that, so this is just a brief progress report on that activity. We have some sessions on electricity, particularly on electricity transmission, and I'm not really going to say a whole lot about the STEO. That's aa new initiative that well be talking to you about. So I'm really just going to use that as an introduction to Howard Gruenspecht, who will tell you more about that. So these are the three goals from our strategic plan. Not all of you, Barbara is new, but many of you have seen them before. The first one has most statistical content because we are interested in measuring relevance, reliability, consistency, timeliness, accuracy, all the wonderful things you'd like to have be attributes of your data, so that's the area that we are going to concentrate on but you can see what the others are, too. Guy touched on this. We would like our resource base to be sufficient. The third one is we'd like to operate in an effective and an efficient way providing good leadership. So under relevance and reliability a number of these are measures that we have been collecting for a long time so these will just continue. The number of information products requested, these are briefings and things that are requested by high level officials, either senior people in the department or Congress. It includes the service reports that Guy talked about. Tom Broem manages that and is very careful to make it a nice, consistent series. We also have one that talks about the percent of frames that are deemed sufficient. We talked to you about this last time. We've done more work. There's a break-out session, I think, later today on our work on frames. We have decided that we would like to have percent of all EI frames that we think are sufficient so we are taking a broad look at all EIA frames. Weve got about 70 surveys and we are looking at a variety of methods for deciding whether they are sufficient. They are going to tell you what we have done and what we have looked at and ask your advice on how do we come up with measures of whether a frame is sufficient. There are a lot of different ways of going about it and we would like the least cost ways of getting the most appropriate measure. Outside expert evaluations, we talked to you about this last time and you totally rejected what our proposal was. So we went back to square one and redesigned our survey frame. This time we have interviewed people who have been attending our NEMS conference. These are pre-registrants. We have data to show you and we'll ask your advice on interpretation of results and how to move forward in the future. One of the challenges is trying to find a list of experts who can give you good evaluations of your analytical products. On quality the measure is the number of surveys meeting quality targets and we've talked to you about this a number of times. Tom Broem is going to talk about this. I think that's on Friday. Customer satisfied and very satisfied, this year we are going to be using the American Customer Satisfaction Index again. We did that three years ago. Basically our approach is we do something every year but there's no reason to do the American Customer Satisfaction every year. We have to pay to do that, for one thing. Colleen Blessing is our customer satisfaction guru and they frequently do web-based surveys in the intermediate times. Timeliness, senior staff has pretty much agreed to release targets for all of our products. I'm not sure we have actually implemented it yet but there's been an agreement that this is what we will do. I'm not sure that we've actually let people know what our targets are. Then the customer and stakeholder involvement is just a list of events. It includes things like this meeting, talking to the American Statistical Associations Energy Committee, our NEMS conference, some of our other outreach conferences, and other things. We thought that a list with discussion among senior staff was probably more useful than just counting things. So the challenges you are going to be hearing about I've actually already touched on most of these. We have a breakout session to talk about frames. We have another breakout session to talk about our interviews with outside experts and users of our annual energy outlook and our international energy outlook. We also have another session talking about surveys meeting performance targets. So just an update on some of the other things we talked about last time, you have heard a lot about how we estimate natural gas production in Texas. This one is a nice one because there's actually data available from the state. One of the things we talked about was an evaluation of methods and you gave us some advice. I think that what we'll do is to update that and bring it back to you because I think we'd like to use the approach more consistently in the work that we do in SMG to evaluate various methods of doing things. So it's nice to have a statement of how we think we should go about an evaluation. In Texas we talked last time and observed that there was a bias using the method that Randy Sitter's student, Crystal, came up with and that's because of changes in reporting in the State of Texas. So we haven't done any work on that yet although the Dallas field office has done some work and they would like to talk to you about that in the fall. They talked about your recommendations in the Gulf. Of course, Dallas is still using the methods they were talking about. They are trying to do more company-level time series analysis. I'm not sure exactly where they are on that. That's a difficult problem because the data aren't really rich. There's only so much you can do with flaky data. Guy talked about the new natural gas production survey. Inder Kundra is going to talk about that in a session tomorrow. We are going to be doing a sample from the frame for one of our other surveys, the EIA 23, which is an annual survey that goes to producers of oil and natural gas. The intention of the survey is a little different but it ought to be a pretty good frame so he's going to talk about sample selection and some of the difficulties and changes over time in the frame and whatever else comes up. We talked to you about we had a survey that collects commercial and residential prices of natural gas but only in five states. We talked about that with you last time. We're going to implement it in seven additional states at some time. We hope to start in January 2005. It depends on resources and other things but that's where the plans are. It will be seven additional states and the District of Columbia. Industrial prices, haven't done a whole lot of work on that yet. We're still considering going to the Census Bureau. Now we are talking to them about the possibility of an annual survey to get information about natural gas prices in the industrial sector. It would be volumes and prices in the industrial sector. That sounds like it might be promising. So for CIPSEA, the Confidential Information Protection and Statistical Efficiency Act, this act gives EIA for the first time the ability to actually protect data reported to us by filing respondents. In the past we were not able to protect it completely because we had to share it on request with other government agencies if they asked for it for official use, not statistical use but official use. So now we have 11 information collections that are under CIPSEA. We have modified all of our wording so that it clearly defines what kind of information and confidentiality protection we can give. We are developing a training program for staff because CIPSEA also carries with it penalties for release. People can go to jail for releasing information that's collected under CIPSEA so people need to take it seriously and we need to train everybody. And Jay Castlebury is participating in an inter-agency committee led by OMB that is going to put out guidance for all agencies on this. The good thing about that is that way we know it's going to be coming out and our views are reflected in the guidelines, too. Under electricity actually the committee was interesting. You suggested that the data needs depend on industry structure and, of course, its still changing. So you are going to have an update today. Since the last meeting we have done focus groups. We are going to give you a report on the focus groups. Since the last meeting they have redesigned our electricity forums. They have gone out with their Federal Register, so plans on implementing electricity 2005 are well under way. Bob Schnapp will talk to you about that. Doug talked to you about his transmission paper. That has been finalized. He's not going to talk to you about it but we do have time for you to ask him questions about it and he is here in the audience so I encourage you to talk to him if you have any comments about his paper. We talked to you about data edits for the 9:20 and we have not done very much with that at this point. We ran out of time so that's an item that you might actually hear about in another session. So the last item, the short-term energy outlook, this is a new effort that we are starting up. I think that the committee will find it interesting. This is the first time you will have heard about our STEO. Maybe it was discussed years and years ago but not in recent history anyway. So with that I'll introduce Howard Gruenspecht since he is actually going to introduce the STEO modeling effort. MR. GRUENSPECHT: Good morning, welcome. Thanks for coming to help us out. Obviously, as you can see from Nancy's presentation, it's quite helpful, the information exchange we have with you. It does affect what we do. I wanted to talk to you very briefly about the work on doing some regionalization on the short-term forecasting model. The short-term model has been something of great interest to me, really, from the moment I arrived at EIA about a year ago. From the start it was apparent to me that together with our weekly reports on gasoline prices the short-term outlook gets by far the most attention in the mainstream press, not the expert trade press that deals with energy matters but the mainstream press relative to other EIA products. It's pretty understandable. People want to read about the outlook for gasoline prices this coming summer or what their heating costs are going to be next winter and that's what the short-term forecasting model and the short-term energy outlook put out each month look at. So it does get a lot of attention and its important to do the best possible job in these areas of great importance to EIA. The STIFS model and the STEO publication derived from it each month really benefits, I think, from being run by a very strong team within EIA. You will be hearing from Dave Costello, who, I think, leads that work under the direction of Mark Roedecker. They really are very dedicated, very strong intellectually, very open to suggestions and new thinking, and, again, given the importance that the public places and the prominence that the STEO results receive, its really important. Some changes have been made already. I don't know if you look at the STEO but we have gone to a shorter STEO text. We have gone to a more careful approach explaining confidence intervals. I think there is some use of more sophisticated approaches in some aspects of estimating the model and the confidence intervals which are important use of some stochastic estimation. But the focus of what you are going to hear about today is on regionalization. I call it limited regionalization because really we are not going to do a full regionalization of the model. I think that the project to put some more regional content into the model is really a critical element to making the short-term energy outlook more useful and more relevant. The project aims to provide greater depth, consistency, relevance and credibility to the monthly short-term forecast by allowing for unique regional factors that affect energy demand supplies and prices in important ways that tend to get obscured if you are trying to estimate a national model. A good example would be, lets say, gasoline in California. That's a regionally very isolated market for gasoline. You can have very different things going on in that market and going on in the rest of the country. Heating oil effectively used significantly in New England and the northeast, not used much any place else. People don't care about it much any place else. You really want to know what's going on in heating oil markets in those areas, not on a national average basis. So there's clearly a value in developing the information. So the depth we'll get we'll have a better deconstruction of broad US energy developments. The relevance will be tailored better to regional audiences and their areas of interest. The consistency will be better because we'll eliminate some of the biases that creep in when you use highly aggregated data to develop these national average forecasts when you really have different things going on in different areas. And the credibility is that important regional energy market developments are not obscured or ignored. So there's a lot to be gained from this. There's also a need to balance what we are doing here, I think, because, remember, this model unlike some of the other things we do here on an annual basis is the crank we've got to turn every month. Already it can get very demanding to implement this model and estimate it on a monthly cycle. So we want to have this additional limited amount of regional detail but we don't want to develop a tool that in the end if you had a monthly cycle and it took you more than a month to run your model and be comfortable with it you'd have a big problem. So we have to worry about yes, we want regionalization to some degree but we also have to made sure that we're coming up with a workable tool. So you will hear about this from David and also from Phil Tseng. It's a pretty high priority for us. It's something that the Secretary, I think, asked us to do last June along with the natural gas production survey that Guy already talked about and some other things. We think there's a lot of potential there. It's still at a stage where input would be helpful. It's not cast in stone. We're working on it. The goal, I think, is to have the model ready to go at the beginning of the next calendar year. And that's really where we are. You will hear more about it from Dave and Phil, so I'll just get out of the way and let you hear from them. Thank you. MR. BERNSTEIN: Questions? MR. BREIDT: Thank you. Before David starts I wonder for instance we could have the people who just arrived introduce themselves. MR. COLE: My name is Stacey Cole. I'm from the Bureau of the Census. I work on the NEC survey. MR. HOGUE: Rick Hogue, also from the Census Bureau, NEC survey. MS. WAUGH: Shauna Waugh, Statistics and Methods Group, EIA. MR. BRADSHER-FREDERICK: Im Howard Bradsher-Frederick. I'm with the statistics and methods group of EIA. MR. HOLTBERG: Im Paul Holtberg. I'm the Director of the Demand and Integration Division in EIA. MR. SPERL: Tom Sperl, International Policy. MR. FREEM: Im Fred Freem, Coal, EIA. MR. BREIDT: Thank you. Now I'd like to turn it over the David Costello to talk about background on EIA short term forecasting. MR. COSTELLO: That's me, Dave Costello. Basically my life's work here has to do with turning this outlook out every month and all the things that go along with that. Today I just want to basically give you a little background on what it is in the first place, what we do. I'm going to give a little example of some analysis that we do with the model and I'm going to talk about essentially the plan for regionalizing it a little bit and some reasons why we're going to do it, what we hope to get out of it, and when we hope to do it. The first slide here just gives an indication of what the coverage is. It's a national model although we do look at world oil markets because obviously we have to have some background for determining our oil prices. It's a monthly model. The frequency is monthly. We get a lot of our own data from EIA but also from the other agencies and other sources outside of EIA. Generally speaking, the forecast goes from 12 to 24 months ahead. Every January we move the forecast up one more year just as a convention and, as Howard said, we update this every month and we post the information on the web, including a simulating version of the model which is in Windows that you can download. But a lot of offices work on this model. It's not just this forecasting project. It's not just my team but also teams in the CNEAF and OIAF, just to throw out a couple of names that you may not know what the heck that means, but it's a multiple office job and there's a lot of people involved. Just a little more on the general structure of the thing, there are about 900 endogenous variables in there currently although there are a little less than 200 actual stochastic estimating equations so, of course, we have a lot of identities, some of which are very interesting, some of which aren't. The model is currently estimated and simulated in EVIEWS 4.1, which is just a PC-based econometric package, but we find it very handy, not to do a plug but its an improvement on what we had before. So far as who looks at this, not to go into too much detail but so far in April I think we have been averaging about 1300 hits a day on the web at the main STEO page and we have a list serve of about 4,000 customers that get the notification of the report every month. Just to give you an idea of how we put this together, we read a lot of the maintained monthly data from EIA along with some weather data. We get weather forecasts from NOAA, by the way, and, of course, the history is also from NOAA. EIA internally runs a macro model, which is a quarterly modeled US economy from Global Insight, and they provide us with monthly inputs for our model. We pull that all together and create a historical database which is always available. That database goes into EVIEWS and the model is estimated and saved. When we are ready to do a forecast we just pull in the exogenous forecasts, weather. Some of the energy variables are exogenous to this model such as nuclear power, for example. We get that from CNEAF and hydroelectric also ÄÄÄÄ the model and put it out to files and make reports. Of course, this happens numerous times in the course of a month because you get data updates and also you have to look at the results and made sure they make some sense and hopefully they do. Usually at the last minute we determine that they finally do make sense. Anyway, I thought I would just do a little example of some analysis that is easy to do and I thought it was a little interesting. It starts with natural gas but I'm really going to talk about electricity because when Phil Tseng gets up he's going to get into a little bit more detail about what we are going to do in that sector for the regionalization. This is just a schematic. I'm not going to give too much detail on how we put together natural gas demands and the supply. Our main focus here is to get an equilibrium spot price for natural gas. Its a fairly well integrated market in the US and we are looking at producing area spot price average as the prime thing. This helps determine end use prices for the rest of the model but basically having the model search for the price that equilibrates demand and supplies, how this particular model works. Electricity demands, we put that together again. We estimate the model by sector nationally. This is the electricity demand side. And we come up with a total demand for electricity which, if you go to the next slide, basically helps us determine how much generation we are going to need. We have a little bit of imports but basically it determines the amount of electric load that we have. As I said before, we are taking nuclear and hydroelectric power as exogenous from other EIA models. Our main problem is, of course, there are some renewables and we make some estimates of those but our main problem is basically to figure out what the possible fuel components are of the electric power generation. We basically do that by a set of equations that use some cross-price elasticities and some other factors. But anyway, not to get to much into the schematic, if we go to the next page, here's the scenario. The reason I did this one is that it's timely. Today EIA projects that natural gas supply will grow domestically about a percent a year through the next two years. We have a very high drilling rate and this is the result that we get. However, there are a lot of analysts that feel that that's too optimistic. So we did a scenario where in fact the production decline is greater than what we are getting out of this model itself. In that case we wind up with about two BCF a day less by the end of the horizon than we have in our current base case. The right panel is the impact that that has on the price. Instead of 5.50 gas we are talking about 6.50 gas on average over the next year or two. So what does that so? Well, it feeds, of course, importantly, into the electricity model. Here's the impact on natural gas demand by sector. Its fairly significant. Eventually we lose about 1.6 BCF a day in total demand by sector in terms of the total. We also have a little bit less electricity demand. Electricity costs do go up and it does affect output a little bit but not very much. Its relatively small. The mix of generation does change. The higher natural gas prices naturally reduce natural gas demand. It pulls down a little but mainly that's because of the output effect on electricity demand. The really only other options are for basically oil-based generation to increase and that's the result. Now, it seems to make sense but if you were to simulate this model over a period of time how would it do? So I figured we'd better do that. This is based on a simulation from January '99 through the latest historical period. This is the scatter on the predicted versus actual coal-based generation number from STIFS. The mean absolute percent error is a little under 2 percent, which sounds pretty good but actually it's a lot of electricity because coal is very big. By the way, this simulation assumes I know what the fuel costs are which is okay for this piece of analysis but obviously in the forecast we don't really know. The mean absolute percent error for natural gas is almost 8 percent and for oil its about 14 percent. Well, all it's not too bad and bias that seems to show up from this sort of exercise is not tremendous, pretty much negligible. But the thing of it is that there are lots of things that would have had to go into determining this. A lot of assumptions were basically made about how we figured out how much oil-based generation was going to replace natural gas. On a national level it works out okay except that it leaves open a lot of questions. Where did it exactly happen? Well, we know it's not going to happen everywhere because there's not a lot of oil generation everywhere. And does this approach tend to overstate or understate? It doesn't seem to do too much of either one way or the other but on the other hand we probably could do better on the percent error for any given month. In order to do that we really do have to get down to a little bit more detail, a lot more detail, really, on the regional composition of generation and in order to do that we have to know more about how demand patterns vary by region. So that's what we are going to do. Well, I think Howard talked about these factors. It may make our national level forecast for some variables to add this regional detail, which, by the way, I think Phil will talk about this a little more. Basically we are talking about looking at the demand for electricity and for natural gas by census reading or census division. We are looking at determining electricity supply by NERC region or pretty much the same supply regions that NEMS currently has. Its a little bit of a variant on some of the standard NERC regions. So it may actually make the national level forecast more accurate but I don't think we can tell that for sure until we get into it but that certainly is the hope. Why now? Actually, regionalizing the short-term model has been an idea that's been around a long time. Its been in our stretch plan for the long term, if you will, for some years. But now we have additional resources that are being allocated to it and a lot of interest in EIA management to do it so we are fortunate to have that. As I said, a lot of the demand detail will be by census region or census division. We are going to look at market area price determinations for natural gas. We're not going to do necessarily a whole lot of demand and supply by region for natural gas. Our main focus is going to be on Henry Hub prices, looking at the integrated national demand-supply balance for gas. Then we are going to spend a lot of time focusing on how a benchmark price like Henry Hub could be translated into regional spot prices based on the behavior of the basis differentials and factors that affect those. So we're picking out some strategic spot prices for natural gas to do and those will generate our end use prices by region where we need them and hopefully provide key information about the particulars of those markets, whether it be factors that relate to pipeline capacity limits or some other particular factor. Were going to do a pretty detailed look at household activity and particularly heating costs by census division. We collected a lot of that data from our own RECS survey and from other sources of data and we're going to highlight those, too. I think Howard mentioned this but we are hoping to have the testing finished by late fall of this year. We have tasks out there that are going to begin to be used to get the estimation done. We have done a lot of work in getting the database together. We still have a few little things to do there. We hope to be doing the short-term forecast on a limited regional basis, this basis that I just mentioned, in early 2005. I don't know if that means January or February or March. Hopefully one of those three will do but well continue to release it on a monthly basis. That's the goal. Any expanded regional coverage will be in our PC downloadable model, too, and also we'll feature it on the web in as much as detail as we can reasonable do. I think that's it. MR. BREIDT: We have a quick turn- around between our breakout sessions and rather than cut into this next breakout session we can just go directly into it, take some time out at the break as necessary and get back in here at 10:40 for the ASA summaries of those discussions. So as far as breakouts does everybody know which breakouts they are in? So Mark Bernstein, Neha, Nagaraj, and Nick will stay here and the rest of us will be going down the hall. (Recess) MR. TSENG: My name is Phillip Tseng. This morning I will talk about some of the statistical and modeling issues facing the regional short-term energy outlook in model building. So I'll briefly discuss some of the reasons I think Howard and Dave Costello mentioned about the regional model. I'll just list a few reasons why we need a regional model. I'll briefly describe the regional representation of the model and some of the statistical issues in modeling the market and modeling the electricity demand and supply. Why do we need a regional model? I think Dave mentioned earlier national data cannot capture adequately into action soft market demand and supply because we aggregate too many things and sometimes the price variations in a demand response to those differences in prices also are not captured in the national model. So we figured we probably can get more insight when we do the regional model. Many relevant questions cannot be addressed by a national model. For example, when we talk about winter heating fuel demand and supply in the northeast, like Howard mentioned, and differences in regional natural gas prices sometimes we have bottlenecks and we like to capture that as well. Summer gasoline demand and supply in California, so if we understand the regional market we may be able to understand what happens in terms of products, in terms of crude oil supply, and action between different regions. So a regional model provides that kind of information. Also electricity generation and transmission issues, this is a tough one. We were doing the regional basically trying to understand some of the basic elements and hopefully as our understanding of the transmission issues evolve using this regional model we may be able to improve even further in the future. The regional representation of the model, some of this representation we simply use available data and we may aggregate it. So for electricity demand we have four census regions and electricity supply we have seven NERC-based regions, seven or eight. I think we are still debating. For natural gas demand we have four census regions. For natural gas supply we may use six producing regions, looking into the difficulties or implementation issues, and we have three storage regions, inventory regions. For petroleum products we have data for five PADD regions and we may focus on petroleum product demand and supply on maybe east coast and west coast but in general we have information for all five PADD regions and we can probably answer a lot of questions there. This is a kind of simple way to look at it, the four census regions. We have ten regions here but really we are looking at northeast, midwest, west, and south. These are the four regions we try to model. For the electricity regions I mentioned about seven regions and it could be eight. It depends on how we gather the information and how we analyze the results. So the eight regions could be New York, Florida, Texas, California, New England regions, Midwest, east central, mid-Atlantic and southwest as one so we have eight regions. Actually Dave's people started collecting the information. We have developed a database so we can start doing some testing. DR. NEERCHAL: What I notice is that some of the region boundaries are going through the states like Virginia and Iowa. MR. TSENG: That's correct because we follow the NERC definition of the regions. What we may have to do is for some regions and for some technologies we may collect state data and then decide how to fix them. Hydro, actually when we look at states looks pretty good but when we look at the NERC regions we may have to split some of the capacity in generating. So we were looking to the data management issue and making some decisions once we start doing the econometric analysis. So that's the general idea of the census demand and power generation regions. The next topic I'll talk about is statistical issues in modeling the market. Actually my discussion here is very short because when I was trying to understand the modeling structure a lot of issues were not really clear to me. That's why I think the ASA committee can help a lot in terms of clarifying some of the statistical issues, some econometric issues, and some of the modeling issues as well. In the integrated energy model I've got two key elements here. Fewer market representations, that's one level, and sectoral components of demand. That's another level. These elements need to be integrated in a way that optimizes the model in terms of the performance of the model in predicting energy consumption and prices and providing valuable insights in identifying the determinants of energy demand and supply. From the modeling point of view I think Dave's solution algorithm basically indicates you have macro or exogenous variable drive demand and then demand and supply integrate. Then we solved the model. But when we estimate the model there's a statistical issue and econometric issue. When we look at national data, like market data, market demand and supply, and we look at the components, regional demand and supply, somehow there's consistency on how do we estimate the system in a way we can capture some of the constraint within the system. For example, recently we looked at natural gas price and we heard the term "industrial ÄÄÄÄ and natural gas demand destruction." Part of the reason is supply is not there and somehow the market system has to allocate available resources to different end users. So there's a market competing mechanism and we hope we can understand the mechanism. So when we estimate the accretion systems we can capture some of the interactions so when we do the forecasts we can actually reflect some of the reality in the framework. So those are the issues we are facing that we are trying to understand the market as well as the meaning of statistics. One example is when we look at a market we simply look at market demand, the fuel market. We look at natural gas, we look at electricity, or we look at petroleum products. So we have demand and we have supply and then we also know the capacity. The capacity information could be in the electricity sector. It could also be in the supply sector like natural gas supply. It has certain limits in terms of what's the boundary in the near term to push the production. That limits determining the allocation of resources within the system in the near term. So the adopted approach to model of fuel market must satisfy factors that determine the market demand and supply of a few so when we look at a fuel market we must capture that aspect. Also if we want to understand prices specification of the demand and supply equations must allow models to identify the equations econometrically. This is an identification issue. Sometimes we say we are trying to estimate equations using some kind of reduced form. Then the question is how do we interpret the meaning of the reduced form equation when we want to do forecasting or do we need to really understand the implied parameters when we use the reduced form and so we say we identify the equations and we meet certain conditions so the reduced form equation will provide the kind of predictability to serve our purposes. But that's the kind of thing I think will be helpful to know. On the demand side we look at the sector demand representation. There are many ways to estimate the demand. For example, if we look at residential demand it could have three different types of demand for residential fuel. For example, we can look at natural gas, electricity, and petroleum. Do they compete? I think some of the comments I heard is in the residential sector in the near term maybe efficiency improvement could be captured in the model specification; however, inter-fuel competition like fuel switching may not. But in the industrial sector or in the electricity sector we could see a lot of fuel switching. So that's another area we're interested to understand and hopefully, we can do some econometric testing and understand for each sector what happens there. Of course, for the residential sector we can also use micro data, can offline do some analysis and then to understand what happens in terms of near- term market demand and fuel switching. So basically I covered some of the issues, why we do the short-term regional modeling and some of the regional representations, some of the statistical issues facing us looking at market demand and supply, and now I'll look at electricity market demand and supply. We want to understand the regional market. Part of the reason is the regional energy demand in the electricity sector can have an impact on natural gas which can actually affect the natural gas storage, which in turn affects winter heating fuel availability for households using natural gas. So that's why we are focusing a little bit more on the electricity and then we are trying to change the representation a little bit. The demand for electricity actually varies by the hour. Peak demand imposes more burden on generation transmission. I think there's some data for some of the California independent system operators. They will release their system load, projected system load, and they would actually plot the load curve and we can see between 12:00 o'clock midnight and 5:00 o'clock in the morning demand for electricity is pretty flat. Then it ramps up and usually it peaks around 3:00 or 5:00 o'clock in the afternoon. So the burden on the system is different when we look at different times of the day. So a regional approach may allow modelers to assess potential bottlenecks in generation and transmission. Given the model structure and the regional representation this is the first attempt. We think once we have a better handle of the regional representation and the flow information we can probably improve the application of this methodology even better in the future. Generation capacity in each region and state can help the estimation of electricity costs. That's another element, I think. If we understand different types of technologies we can actually estimate the margin of cost of producing electricities. Load curves and state or regional electricity generating capacity will be used to determine dispatching. Fewer choices in marginal costs of producing electricity, that's the design of this new modeling structure we are trying to implement. I'll use California as an example to illustrate the importance of load curves on dispatching decisions. But since it's ÄÄÄÄ some of the numbers I will just show a few examples here. This is California average hourly sales of electricity by month in 2002. So what I did here is I simply used EIA published data, monthly electricity sales, and that converted to a daily number. For this number I think the important thing is we see the range of hourly sales of electricity derived from monthly data in California just centered around 20.75 gigawatt hours. The level is not very high. The highest hourly demand was in July, about 31 gigawatt hours, and the lowest one was like in November, around 24 gigawatt hours. I'll show you what I've got. This is basically downloaded data from California ISOs and this is in March. The previous chart, you can see, March and November are pretty low but when you look at demand for electricity by the hour even in March it actually exceeded 30 gigawatt hours. So that shows the importance of the hourly demand, that it actually can have an impact on dispatching. Another example here, this is also from California, actual system load in July. Now, the numbers actually exceeded 35 gigawatt hours. So there's a distribution and for some months in July generation actually exceeded like 40 gigawatt hours. I think during the California crisis demand for electricity was higher than that. Also on planned outage, besides the planned outage, on planned outage it also reduced the supply capacity. As a result California experienced some difficulties. So looking at those load curves and looking at California generating capacity by primary energy source in 2002 you can see California has some nuclears, quite a lot of hydroelectric, and a lot of natural gas. They are renewables, dual- fired units, some petroleum and a little bit coal. Its interesting. Coal doesn't play a significant role in California. Yes? MR. BERNSTEIN: I assume this does not include imports into California? MR. TSENG: This is generation capacity in California. For this one when California reports the planned capacity that includes generation capacity plus imports. MR. BERNSTEIN: Right, it particularly includes generation that's owned by California utilities in other states. It's dedicated transmission. So what your histogram is missing is capacity that is actually owned by California utilities that have actual dedicated transmission lines to them. MR. TSENG: That's correct. The information I collected, the total capacity, the number here, if you add up the numbers its about 56 gigawatt capacity. So that's the total capacity in California. But in any given month the available capacity, for example, when I visit the website I look at the numbers in July, especially this one. I actually could see the available generating capacity on March 8, 2004, was 42 gigawatts and that includes imports. So some planned outage in lieu of out of service and as a result I think in terms of the planning part of the reason we see the difference in load curve and difference in seasonal available capacity that all can play a role in terms of meeting the electricity demand. MR. BERNSTEIN: My point is just don't forget when you are doing the regional stuff because you've got generation in other places that is actually owned by so it is imports but it's actually capacity owned by utilities in the state and that is different than importing electricity on a spot basis or a contract basis. MR. TSENG: That's why we have regions and we will try to capture those but still in terms of our definition we will try to say if those power plants are not in California California will be getting it from another region or state. Thanks for pointing that out. It may not be an issue because our region may be more aggregated but for California I think this is a good point and we will definitely pay a little bit more attention to that area. DR. HENGARTNER: What this suggests to the least is that you need to have very careful modeling of the transmission. I mean, it could be ÄÄÄÄ as long as the transmission from Oregon to California is well modeled we can account for it. That's one of the difficulties here is the interplay between regions. MR. TSENG: The transmission part, the power flow information is not available anywhere. So that's why we want to use the regional model as a way to understand the potential power flows. Again, when we look at the regional aggregation we integrated some of the issues or assumed away some of the transmission issues. So when you say we want to see what happens, where California will get electricity from different regions, say from Washington State or Oregon through the hydropower plants we may be able to look into that. But there are some other issues which can be interesting such as when California will get the imported electricity. The time of the day affects the type of fuel used to generate electricity. We think in the bigger picture that's one area well be looking at as well. MR. BERNSTEIN: Let's let him finish and then he can answer. MR. TSENG: I'm almost done here and then I can answer questions. So I think when we do regional modeling a lot of times the first question is do you have the data. For the electricity part I think fortunately we do and some of the consultants do, at least like the load curve stuff. So data are available to create load curves. I think that's one thing we know, like the ISOs, the independent system operators and some of the RTOs, like PJMs. They have information and some electricity modeling consultants actually purchase that information so we do have almost like by state. So we plenty utilize that information. So the EIA ÄÄÄÄ historic monthly electricity sales set up by region and will be used to create load curves. This is a process we use as typical macro drivers. Exogenous variables drive the demand and then we have the monthly demand and then we convert the monthly demand to a load curve. Then we try to model the dispatching. That's the general approach. There are available to create electricity supply curves. I think EIA actually has very good information and also we can actually get the O&M cost and we can get fuel cost so we can actually create margin of costs of electricity supply from different technologies. Then the dispatching algorithm can be applied to that. So the state regional demand and generation will help us check the flow information. That's the part I think is an attempt. We try to understand the flow information but actually Dr. Hale sitting in the audience is an expert. He just completed a report and he gave the committee a briefing last time. There are a lot of issues in terms of checking the flows, in terms of collecting and processing the information. So what we are doing here is a first step. Were not saying this approach is perfect but at least this approach will allow us to get a handle of looking at the potential flows, looking at the relationship between load curve and generating capacity and possible electricity flow. So a few questions for the committee. The regional short-term forecasting is designed for the short- term forecast so should we use a system of equations approach in our estimation of model parameters? I mentioned earlier because we are looking at demand equations in each region and so exact demand issues in terms of how do we estimate demand. Also there's an overarching market so there's a statistical consistency issue when we estimate the equation system how do we estimate the system and how much do we lose if we simply use an easy approach to estimate the equation just on the demand side? Also I think the supply side we are looking into it and it may take some time and maybe we can raise some of the issues next time. So that's the issue. The second one is how do we handle the linkage between regional demand and supply. Dave's flow chart identifies an algorithm but what we want to be able to solve or estimate equation system is to ensure that the linkage when we estimate a system the top numbers, the market numbers, price, market clearing, demand and supply, are consistent with the regional demand and supply. Even though we are not solving the regional market still we need to understand the consistency issue. In the electricity sector aggregation effectively flatten the regional load curves so the more regions we have the flatter the load curve which be. So we would like to get some guidance in terms of some criteria for the selection of the number of regions even though we look at supply regions with seven or eight and when I started talking to people some people say when you aggregate regions don't include midwest with other regions because mid west has a lot of excess capacity. So if you aggregate midwest with Atlantic States then you don't have any problems in terms of supply and potential flow. So those are the issues we are looking into. Hopefully next time we will have some empirical data to report. Thanks, that concludes my presentation. MR. BERNSTEIN: Thank you. Phillip, did you have a question? MS. KHANNA: About transmission flows I understood that information is not available but I wonder if it should be available with the ISO at least. There must be some place within the grid where someone is tracking flows of electricity from source to end use. Maybe it's not easy to correct but it should be there in the system somewhere. I think that if transmission is the issue then maybe that's what we need to do to try to figure out an approach. MR. TSENG: The transmission flow, we actually had four focus group meetings and Mr. Bradsher-Frederick will give a talk about some of the findings. Actually I was involved in that project. What we found out from the stakeholders is it involves different levels of data collection and analysis. From the stakeholders people say they want to collect information for different power lines. For example, EIA collects 230 KB lines and some 115 KB lines but some stakeholders, especially at the state level, people are looking at 69 KB lines. In some states it's treated as transmission lines but in New York it's distribution lines. So you are talking about a host of information which would be very difficult to process. I think another theme I'm pointing to, Dr. Hale, he has a model. He actually worked with a consortium of professors to develop a power world model basically looking at the flow information. The question is how do we integrate empirical data with the modeling framework. That's another layer. Its a lot more complicated. For the regional model I think one issue is the turn-around time is one month so we need to basically manage the information system and estimate the ÄÄÄÄ system in such a way it will not overburden the team who will produce the report every month. DR. NEERCHAL: Looking at the two days of data you showed, the natural question that came to my mind was is the availability the same throughout. I see that the level is different from March to July, the level of load, but is the availability the same? MR. TSENG: The availability is not the same. DR. NEERCHAL: Not the same without the changes or the -- MR. TSENG: Right, I think the interesting pattern we actually have probably five or six years historical data basically looking at hourly information and that I think the burden on the load becomes more severe if you have a heat wave like say five or six days in a row of very hot weather because the first day or second day doesn't impose a load but when the buildings are heated up the demand for cooling will increase. That can really drive up the demand for electricity in the afternoon. That imposes a burden on the power generators and the transmission. But what we are trying to do is we want to first capture some of that element and see how we can improve our understanding of the system. DR. NEERCHAL: Now, for example, the March data and July data, if you look at the largest load minus the smallest load for the 24-hour period of time it is about 4-5 kilowatts, about. MR. TSENG: I used that as an example to show the variation but I actually downloaded another day, like July and other times in California, and actually the actual loads in the afternoon was close to 40 gigawatt hours. MR. BERNSTEIN: We had 42 gigawatt hours two weeks ago with some stage one emergency stuff so it can vary a lot. MR. TSENG: Yes, it can vary a lot. DR. NEERCHAL: The other question or comment I have is that the last bullet in your presentation about the number of regions, one thing I was thinking, if there is ÄÄÄÄ until you have very good quantitative data about transmission at least qualitatively when you are doing the regions you should take the transmission flow into account. Are any of the region definitions based on the transmission flow? MR. TSENG: No, actually there's no flow information. Some people requested EIA use the FERC-1 information to create flow information; however, when we talked to some of the users of the data some people say how relevant is that because some analysts look at the congestion issues at control areas, which is very different from the market area or the state level or the region. So the congestion could be very local and the flow information we see is probably just an indicator. Really, there's no way the kind of modeling we are doing at the regional level would pinpoint the congestion point. But the interesting part is a lot of people working on the transmission trying to understand the congestion and some of the implications of reliability and market power don't have any idea about what kind of data would be needed to do the analysis. So what we are doing here is at least one step forward and trying to identify the demand and supply and some potential flow information. MR. BERNSTEIN: I really need to take a step back here and ask a very fundamental question. That's what are you trying to get out of the short-term energy forecasts? Actually in the presentation so far are we trying to get to be able to look out the next month or the next few months prices or are we trying to get reliability? What is the goal of the short-term energy forecasts and therefore the reason for the regional breakdown? Because if you are trying to do anything about whether there will be supplied and the ability to be supplied that's one thing. If you are strictly just saying here are going to be the trends and prices that's another thing altogether. Do you understand the question? MR. TSENG: Yes, I understand the question. Part of the reason we want to understand the field use pattern in the power generation sector is it's a very important element. I was trying to show you the generating capacity, the base load and the peak load. If we don't model the dispatching pattern then we don't know exactly the field usage. MR. BERNSTEIN: Going upstream from that, what are you trying to ÄÄÄÄ what do you really want to get out and what is the information that you really need to have out of this for the short-term energy outlook? That's the fundamental question. If we are worried about whether there's going to be sufficient electricity supplied that's one thing. You're not going to do that. That's one thing. Then we need to worry about transmission. But if we're not worried about that then ÄÄÄÄ what are you really trying to get at? What's your objective? MR. TSENG: The objective here is really to do a good accounting of demand and supply, especially in the natural gas market. That's why I said well, is there a consistency between the market and demand. So I use electricity as one example. You say well, we don't capture transmission issues but we do capture dispatching decisions. If somehow after we analyze the historical data hydropower is used as both base load and load shaving that could have a lot of impact on demand for electricity. California has a lot of electricity, has combined cycle. So if we can even shave the demand for natural gas slightly looking at the dispatching decision you can change the demand and you can change the underground storage of this in the summer. That could have an impact in the winter in terms of heating fuel demand because in the northeast natural gas and the number two heating fuel will be used to provide heating needs. So what we want is we want to be able to check the flow of demand for fuels and supply of fuels and understand what happens in terms of demand and supply in storage and then look at those relationships in terms of prices and markets. So does that answer your question? MR. BERNSTEIN: Most of the way. MR. CARUSO: One of the things that I think both Howard and I were struck with when we first got here after is the number of questions we were getting both from the executive branch and the congressional branch whenever there was a discontinuity or a potential discontinuity in supply or demand and price spikes. And the fact is we couldn't answer a lot of them because there was no regional detail in the short-term model and, to put it as far upstream as you can go, basically the Secretary wants to know what's going to happen in California, I mean, first starting with the crisis in electricity but in natural gas and now more recently gasoline prices. We have been asked to do analysis on both the west coast and the east coast on gasoline so those are -- MR. BERNSTEIN: So the initial focus appears to be natural gas and gasoline? MR. TSENG: And also heating fuel. However, heating fuel competes with natural gas as well even in the northeast. This is an empirical question and we will do some empirical analysis there. Part of the reason I talk about electricity and you say well, it doesn't answer the question about transmission bottlenecks, it's correct; however, the algorithm, what gave us a better understanding of how natural gas power plants would be used to serve the load, and that information can help us understand how natural gas will be used not only in California but also in other regions because in recent years I think natural gas demand for power generation actually increased by almost 2 quads, the gas combined cycle. A lot of the new gas combined cycles just came on line in the past several years and that increased demand for natural gas and so how natural gas power plants are used especially for intermediate load and peak load can have an impact on natural gas storage because now the market is very tight and sometimes people say well, the forecasting error is maybe one percent or two percent. When we look at a 100 quad, the common energy system, one percent is one quad. If it's in the coal supply or coal demand it may not matter but if it is natural gas then you're talking from $4 or $5 per MCF to $12 per MCF or even higher. MR. BERNSTEIN: So is the vision here to be able to say that this winter if you really have this up and going because of expectations of electricity use and natural gas use for electricity and state of storage that there could be a shortage of natural gas for heating in the wintertime? MR. TSENG: Yes, that could be one outcome because we do a much more rigid accounting of the demand for natural gas. MR. BERNSTEIN: That definitely has to come from a region. That has to be a regional difference? MR. TSENG: Right, yes. MR. BERNSTEIN: Sorry to take up your time with that but it really makes a difference on what it is you are trying to do. You have asked three very specific questions that maybe we should try to address. The first one is should we use a system of equations approach. What's the alternative? Is there an alternative? DR. HENGARTNER: I'm not an economist and that's where my failings are. But I understand this is a very detailed economic trick of modeling that you have done. A statistician would simply look at the data and say I have a vector of prices and a vector of supplies and the question is can we model on the black box without going ÄÄÄÄ supplies and demand curve? Simply can I get some model that explains how these prices move maybe in conjunction with the supplies and maybe heat degree days or other exogenous variables? That would be an acceptable approach but it doesn't have the same flavor, the understanding that you are gaining, from the econometric model; however, for forecasting if I need to predict what is going to be the prices a few months down the road that might be a very acceptable solution. So Mark's question, what do you want to do, what is the customer asking you, is fundamental. I mean, that is indeed the first question one has to ask. MR. TSENG: Right. Some of the questions I heard or you read in the press could be are petroleum refiners gouging the customers, why the price is $2.50 per gallon in some areas. So just pure statistic analysis will not give the why and that's the part we are looking, market demand and supply. Sometimes we add additional information in terms of distribution system bottlenecks. For example, a few years ago I think that there was really no natural gas shortage per se but the midwest and northeast had a cold spell. What happened was natural gas producers in the south said well, I'm going to send as much as natural gas as possible. They passed certain processing steps. Normally natural gas coming out of a well will go through a natural gas plant and we call it coming out of a well is wet gas and you strip the liquids and you send the dry gas through the pipeline. But when the demand is very high, what happens is some of the gas producers simply send the natural gas through the pipeline and when you reach north it will liquefy. So it actually jammed the pipeline and so it made the situation even worse. So that's the kind of thing I think especially for our administrator when he has to testify. People will not say what happened. Something is wrong and people are not sending natural gas to the north. That's why prices going through the roof. So the statistic is one thing and also some of the meaning associated with the numbers is important. DR. HENGARTNER: The other question the statistician will ask is how many parameters are you estimating. You say you have 180 equations. One parameter, maybe two, per equation, are we are talking about 600 parameters easily? MR. TSENG: But some of the general form or generic form equations are more or less the same. Demand equations could be very similar. Only the value of the parameter will be different. DR. HENGARTNER: Yes, so you need to estimate those values from the data. MR. TSENG: Yes, we do, right. DR. HENGARTNER: So if you have 600 parameters you need to estimate how many independent "or" observations do you have? Is it even feasible to do it? DR. NEERCHAL: I think econometrics ÄÄÄÄ identifiability ÄÄÄÄ DR. HENGARTNER: Well, identifiability is one issue but even if it is identifiable do we have enough data? MS. KHANNA: Is it a robust estimate? That's what you're looking at. MR. TSENG: Dave, correct me. Dave knows a lot more about data. I think from '97 we have monthly data, '97 through 2002 or 2003, right? MR. COSTELLO: Just to clarify this, currently we have data going back as far as 1975 by month. But on a consistent basis for the variables that we cover, which are mostly national level flows, stocks, prices, I would say that most of the data begins monthly in about 1987. Now, some of this electricity data that we are talking about, right now we just finished building the first part of the database we are going to use here and it doesn't get into the load curve stuff because that's another matter. But just in terms of the flows, the generation numbers by region and by fuel source, we update it from January of 1989 through today, through the latest month, which is December of 2003. We also have capacity by month, by fuel, by region although, looking at that data, it needs to be cleaned up a little bit. Of course, we maintain everything on a per day basis. We don't put any flows on anything other than a per day basis because it's silly not to. But, anyway, there's a lot of data there and probably more than you usually see on EIA's website because it's hard to get all that historical data and we are putting more into it. But to me one of the questions that this brings up, and I think it's pertinent to what I think Phil was getting at, is let me give you an example. The question of natural gas, in order for us to do a natural gas modeling effort here we've got to have natural gas from the different sectors. That means residential, commercial, and industrial. And the other big sector is electric power. So in order to know what the total amount of natural gas is we've got to obviously do that. So we do it. Now, there are two levels of this question of identification, I think, that are relevant here. One is the question of well, how are we going to get to the equilibrium price, so-called. How do we get a flavor for what that is? The approach that I am trying to sell here is that we ought to look at, first of all, the natural gas market in the United states is pretty well integrated. You have in general the pressure on prices at the wellhead, that is, in the producing region is affected by just about everything else that happens, demand in the northeast, supplies from Alberta, Chicago market demand, all that stuff. It's not perfectly integrated because there are places where there are transmission bottlenecks and so forth. But what we are trying to sell or what I'm trying to sell is we take all these demands. We calculate them by region. We add them all up. Now, the issue I think Phil was focusing on was more the question of how we get the fuel demands appropriately, and there are some transmission issues, too, but the fuel demands appropriately out of all these regions and that includes the electric power sector. We add them all up. That gives us a total requirement for natural gas. We also are tracking inventories. We have to worry about inventory behavior a little bit. But even if we just assume, as seems to be the case, whatever inventories end up at the end of the winter there is a very strong propensity for them to get back close to normal. We notice that, for example, the variance at the beginning of the heating season is much lower than at the end because of the fact that in other words it's not one of these things where its a really cold winter and you wind up with low stocks, oh, we are going to be stuck with low stocks for the year. That's not the way it works. Gas utilities have a lot of requirements and they are basically going to put the ÄÄÄÄ so even if we just assumed that well, they are going to get back to normal pretty soon that adds to the amount of flow for natural gas we have to worry about. Anyway, there's a total requirement for new supply that comes out of this. On the other hand, we have to say well, how does new production come about. Well, we look at drilling and so on and so forth and what's available from Canada. At that aggregate level we could just simply say we need to have equilibrium. What will happen is that the price will be the price but equilibrates this aggregate demand for supply. We are looking at the Henry Hub prices as a benchmark price. What's an alternative to that? As an alternative we are just adding up all these things and forcing it to do a demand supply equilibration and give a price, which is essentially what we are doing now. One alternative is to say well, okay, we can characterize the market model as a supply, basically estimate a supply function, and also a demand function in the aggregate and worry about recovering the structural parameters for that system. That gives an answer. Depending on how complicated it is it gives an answer for spot prices. It gives an answer for the actual level of demand and it gives an answer for inventories, probably; however, that's an answer that may not be the same as what we just finished doing at the regional level to add up. It's nice because we can say we have these well-identified specified structural parameters for demand and supply for natural gas in the United states. We could take that and allocate it back in some form to the regions or we could just do the other thing. That's one question, I think, that we are talking about here. The other question has to do with if you are back down at the regional level again and if you are worrying about, lets say, and we do not do this right now and we may not do it but it's another one of Phil's questions, I think, if we look at, say, residential energy demand and leave out gasoline but we are really just talking about electricity and heating fuels we could specify it as a system in which, first of all, we could do an overall sectoral energy demand function as part of a system that says well, the energy demands are one set of inputs in industrial or one set of goods consumers get as opposed to others and there would be some substitutional elasticities in there. Once you determine that then you have to worry about substitutions between fuels which we can figure in the residential sector in the short run there's not a lot of that but there may be some. So there's the question of whether to estimate those functions in that way. The alternative is simply to have an equation that we have for natural gas demand in the residential sector and we know a lot of things that affect this. One of the things is the housing characteristics in the region, which we are going to track, in other words how many of these households have natural gas ÄÄÄÄ so on and so forth. We know what the weather is. That's obviously a big factor and these prices will be important, too. So we put those in there. We use ordinarily two-stage least squares, what method. Sometimes it doesn't look like it makes a whole lot of difference. But for me, and this is just because of my focus, for me whatever we do we want to do the regions because we want to know more about particularly how those things change. How do you explain how you got this total gas on hand? I mean, you have an aggregate equation and that's fine and maybe it seems to fit reasonably well but you couldn't tell anybody where it came from. MR. BERNSTEIN: I don't think there's any issue with going ÄÄÄÄ the question is how complicated do you need to get. How detailed do you need to get? I think it still comes back to what your goals are and what you are really trying to get out of the model and then asking yourself how do you do it in a way that's going to be understandable but at the same time doable in the time frame that you need to deal with. One of the questions is are you getting too complicated. The other question is well, don't you have to. MS. KHANNA: I'm actually having a hard time trying to answer the first question, whether to use the system of equations ÄÄÄÄ versus two-stage least squares because a priori without knowing the data or your structural model I can't answer that. It depends on those ÄÄÄÄ do you expect there to be an ÄÄÄÄ or simultaneity problem and the second is a statistical question. If so, is it really a statistical issue? So I don't think you can answer those in a vacuum. I mean, people like you who actually deal with the data in the model probably if you give us a little more insight into why you are even asking this question from an econometrics point of view maybe we can give you a more informed answer. MR. BERNSTEIN: I think there's not enough time at the moment to actually provide that. MR. TSENG: I think in terms of asking more specific questions we will have more because right now we are still collecting information. Dave has someone collecting the information. When we actually start estimating the equations well probably know more about the statistical issues and also econometric issues because we could start doing a different kind of statistical test. MR. BERNSTEIN: I think perhaps we may need to wait to try to address some of those questions for the fall after you have actually gotten some ÄÄÄÄ DR. HENGARTNER: I have a few more questions about the data. It's beautiful that you can do the electricity on an hourly basis but a lot of the other variables you measure are not on an hourly basis. They are probably not even on a daily basis. I mean you say some of them you have on a daily basis but very few -- MR. COSTELLO: Oh, no, none of the flows that we have are actually day by day observations. When I say daily I mean divide by the number of days in the month. DR. HENGARTNER: That complicates your equation greatly. The only data that's available is what you have in a month and you divide by the number of days in a month. This very detailed modeling might be extremely hard to do because you have this additional source of error. It's errors in variables, using proxies instead of the actually numbers. MR. COSTELLO: Well, I think what were really talking about here, I mean, I'm not really sure ultimately how the form ought to come out. I think Phil has a little bit better idea than I do but it's going to derive at some point. What I think they are talking about is if we have enough information over time of the typical load curve in a given month for all of the regions then we can in a sense build an understanding of a typical day in a month and how it changes under different conditions. One of my goals is to make sure that when we do a forecast we don't have to intervene every time to retool for a different configuration. Do you see what I'm saying? DR. HENGARTNER: I see what you are saying. I think what the customer will want is exactly an atypical day. It's if a heat wave hits and the prices go through the roofs I think that will need to be understood. MR. COSTELLO: Well, actually what I meant by that, I said that wrong. If we have enough actual observations of days within months across time then the hope is that we can get an idea of how that shape does change under different conditions, particularly things like weather. That's the big one but there are other factors, too. I mean, that's the idea. The only purpose for doing it other than getting to the possibility of adding to the transmission problem question to me is the hope that being able to look at it this way will provide better month to month estimates of changes in fuel shares per generation. That's the key thing. MR. BERNSTEIN: So that's an important, very good focused goal that I think we need to help you. If that's your goal you need to define your approach to meet your goal and I'm not sure that that's being done. One of the other questions you asked is criteria for selecting numbers of regions. One of the things maybe you noticed when you looked at the flow charts in California is that the peak is not the middle of the afternoon. California's summer and wintertime electricity peak now is between 6:00 and 7:00 o'clock in the evening. That is a relatively recent shift. You can watch if you go back in time and you have track load curves in the last 10 years. You can watch that peak move from 3:00 o'clock 10 years ago slowly, slowly across ÄÄÄÄ I saw a nice simulation of that, actually, at some point. Clearly, looking at load curves to make that determination you have to look at existing load curves in different regions and different states and not combine regions where the load curves are complementary. What you don't want to do is combine a summer peaking and a winter peaking or stick California in with a state that has traditional 3:00 o'clock peak. What you would end up doing is then you lose the variation. So while there are no criteria to say specifically you can think about looking at existing load curves trying to rate the regions that have similar patterns. I'm not sure if any other state has a similar pattern to California ÄÄÄÄ but it hasn't changed decision making but it ought to but it certainly ought to change the forecast. MR. TSENG: Yes, actually part of the reason we are looking at California and New York, Florida, and Texas separately is because those are very distinct states or regions. So definitely we'll look at the data very carefully and decide how we want to aggregate. The session is over. Thank you. (Recess) MR. BERNSTEIN: Actually we had a very interesting session. I want to thank Phil and David for filling in things. I think we came to a couple of conclusions. One is that they are really early in the process and we need to revisit some of the stuff in the fall as they actually get some of the analysis done. One of the particular issues that I had was defining the objectives for the short-term energy outlook. I think that's going to help really determine the modeling approach and help it evolve. You really have to be clear what you want to get out and while sometimes you want to look out in the future and say three years from now we want to make sure we're getting such and such I think you've got enough near term issues that you need to deal with to address very specific near term problems. There were some issues. I mean, you asked questions about whether the approach is appropriate and I think we all concluded that you needed to do some analysis and show us how it works so that we could give you a credible response in the fall because it depends on a lot of different factors from exactly the type of data you have to be able to give you any real good information or evaluation. There were some thoughts about simplifying it and thinking about whether you could really use statistical analysis to simplify some of the approach. I think again that comes back to what you're trying to get at, what you're really trying to answer. You can set yourself up for a real intense, complicated output that you are not going to be able to achieve if you are not careful so I think you need to come back a little bit. Anything else? I think it's early in the process for us to be able to tell you a lot at this point. Nicholas? DR. HENGARTNER: It's a good summary. The proof is in the pudding. MR. BERNSTEIN: It's real important. The short-term energy outlook is going to be really important and being able to break it down regionally is going to be really important and really necessary. I think we all agree on that. You need to get something out in January so how we are going to be able to help you in the fall it's hard to tell because by that time you are going to be far enough along that it's going to be hard to change direction but we'll try to help you tweak it at that point. I think that's about all we can say. MR. BREIDT: Thanks, Mark. Now we have our other Mark B., Mark Burton, to summarize the other discussion session. DR. BURTON: As an academic I'm constantly used to doing a lot of work that absolutely nobody pays any attention to so it frightened me to death today to be in a session where what we were looking at was a direct EIA response to recommendations or complaints that we made last fall. If I had known you guys were really going to listen I would have stopped talking a long time ago. Essentially the goal has been to evaluate two product areas, the annual energy outlook and the international energy outlook. EIA came to us in the fall with a notion of how they might accomplish that and we took exception to that. We saw some strengths in it but we also saw some weaknesses that we thought required a different approach. What they did in response to our comments was initiated an entirely separate effort. Instead of looking at a select group of surveyed users that had registered for the NEMS conference they did an e-mail survey. They did three waves. One of the issues that we talked about was the response rate. It turned out to be somewhere in the neighborhood of 35 percent, which I think there was a general consensus that it was admirable for this type of survey instrument. What we did in the session downstairs is just walk through the results, talked about the survey method, what alternatives there might be, talked about the results, where they were useful, where they were not. For example, there was some concern that the results while they provided a very nice quantitative measure of the qualities of these two products didn't really provide actionable information that could immediately lead to alternatives for improvement. We talked about whether there might be some problem in the nature of the respondees, whether the respondents might represent some sort of self-selection, and whether they were getting the information from the users that they wanted the information from. Again, I think if there was a consensus it was basically they were getting input from the people that they needed to be receiving input from. There was a discussion about whether this represented a good group to go to on a regular basis. I think there's some fear that if they continue to send surveys to conference registrants that people wont register for the conference. So there was a desire, perhaps, to get at a larger set of people so there wouldn't be quite as much of a burden in order to do this on an ongoing basis. The bottom line, and gee, I'm going to do everything I can to get you back on schedule, buddy. I'm doing this for you, all right? The bottom line is that this is not perfect. There's information that they would like to get that they didn't get from this survey. There are still some questions about who they are hitting with it as opposed to who they want to hit. But I think the bottom line is that everybody feels pretty good about this. I mean, it's very rare that you do something perfectly the first time. As a first effort I think we are all of one mind that this represents very substantive progress. So that's it. MR. BREIDT: Thanks, Mark. We'll move back into breakout sessions now. The committee members will be in the sessions they were in last time with one exception, which will be that Mark will stay here and Nick will go downstairs, Mark Burton. MS. SEREIKA: I am Susan Sereika. I'm from the University of Pittsburgh. MR. WIENIG: With the ASA Committee? MS. SEREIKA: Yes. MR. BUDZIK: I'm Phillip Budzik. I'm with EIA. MS. NORMAN: I'm Kara Norman with SMG of the EIA. MR. McDOWNEY: I'm Preston McDowney, Statistics and Methods Group, EIA. MR. WIENIG: Dave Costello, our speaker for this session is on the phone. He'll be just a minute. This is what you get when you do short-term forecasting. MR. COSTELLO: What's the timeline? MR. WIENIG: In the order of 15 to 20 minutes. The next session is on issues and short-term energy modeling, adding regional components to the EIA short-term energy model. You have heard from Dave Costello already earlier. This is part of an ongoing interest in EIAs short-term modeling. We expect to be joined by Fred Joutz from the George Washington University. He has had a delay but we hope that he will be here shortly. In the meantime we weren't really asking Dave to do what, three sessions in a row. Thanks for being here. MR. COSTELLO: Sure. This is totally different from the regional stuff in case there's any confusion about that. What this has to do with is a question that came up repeatedly last summer having to do with natural gas, a question and a topic that was in the news quite a bit, which was since 2000, actually, we have had a couple of pretty serious shocks in natural gas markets. Actually we had some earlier ones in the mid-'90s which at the time probably seemed pretty significant but by comparison to what's happened since 2000 don't look like much. The question had to do with what is this doing to industrial activity in the United States. We know that a lot of industries use natural gas extensively either for process heat or for feedstock and the typical one that people worry about, although in terms of total amount of natural gas it actually uses is another question, but the fertilizer industry was hit very hard. Other industries for which natural gas constitutes an important part of the variable cost also are of some concern. Anyway, Alan Greenspan was asked a number of times since he happened to be talking a lot about natural gas last summer, well, what's this doing to the industrial output and his answer was generally we haven't seen anything that we can point to yet other than some of these anecdotal stories about particular industries. Also there has been some effort to take the natural gas price shocks that we've seen and interpret them in the same way that oil price shocks have been interpreted in terms of their impact on the economy overall. I think Fred has done some work along those lines and a couple of other people, one an economist from the St. Louis Federal Reserve, I think. Anyway, a number of people have done something but it's been ad hoc and back of the envelope thing. The motivation for this came from two things, one, wanting to be able to maybe say something about this question, and another thing, being concerned that perhaps we are in a regime now where what's going on with natural gas in terms of its volatility and the levels to which it can run when we get into peak demand periods may be causing effects on industrial output that's not usually captured by our macro models. What we usually do in the course of monthly forecast is that we take last month's forecast and update it as much as we can, look at the prices that we are generating, and supply those to the handlers of the macro model here in EIA and ask them to give us a new forecast. We don't always update it every month but generally fairly frequently it's updated. So you get some different answers but, first of all, that's labor intensive. We have to iterate manually to do that. Secondly, I was wondering if maybe the result that it gave us back, if we were to tell it, for example, that we had a very high natural gas price or that we had a particular scenario even if we had the time to do that particular scenario if they gave us back an answer with regard to industrial output and the US economic growth in general whether it would be sensitive enough to maybe incorporate what's going on in the gas intensive industries in particular. Anyway so the purpose of this was to try to see if we could put together a little model that tried to take into account the interactions between natural gas and industrial production and other factors and see if we could find a response function that was revealing in terms of how sensitive it was and whether we could use it to either directly do some simulations or to go forward and make more models. So I've basically gone over the objective. The objective was to see if we could get some kind of a useful model out of this exercise. Mostly what we wound up doing was putting together a VAR model, a couple of different ones, and then testing them for various specifications to see what made the most sense and then look at the impulse response functions that those generated to see whether it was particularly revealing about this question that I've posed. Another thing touched on in the paper, there's a paper that details a little bit more about this. I don't know. I'm assuming you got a copy of it. But also we started doing some analysis on co-integration between some of these variables and trying to look at long run relationships as another approach. In general we're going to start with a general level of consideration of what these models will look like and then move down to more specific representations that seem to make sense. Ultimately we want to try to put these two strategies side by side and see what works the best and if we get anything that we feel we can really use. I'll just go through some of the data and then I think after I do that Fred will talk about some of the results. This is the industrial natural gas price. Its EIA's industrial natural gas price. We use this although there are some conceptual problems with this data although probably not serious enough to be of concern in this particular work. The red line is the real value in 1982 dollars. So clearly there were some legals back in '96 and '97 that were of note but they pale in comparison by the later shocks. This is the net industrial gas price, which is the difference between the previous 12-month maximum and the current price. Fred may want to talk little bit more about it. It's been used before to incorporate memories of shocks into the model. This is a measure of industrial output. It's a composite index from the six biggest natural gas consumers in the manufacturing sector. The index is averaged using value added weights by those industries. All this is national level. Again, the top line is the industrial production index not seasonally adjusted. The line below is the pattern of natural gas demand by month. This is natural gas demand including combined heat and power. So you can see obviously the natural gas price shocks, the market shocks in general. It wasn't just natural gas prices but there has been a big decline in the average level and really in the variance of demand for natural gas in the industrial sector since 2000. And then I'll let Fred pick up. MR. JOUTZ: First, I'm sorry I'm late. What I will continue with is a discussion of the first of the models that we have worked with fairly intensely. There's actually a whole suite of these models but this summarizes what we have done. I'll present some of the results from this in terms of some diagnostics about the model and then go through some of the testing that we've done in terms of looking at simulation exercises and so forth. So we start from a framework where we utilize the current variables that are in the STIFS model that influence the industrial sector of natural gas consumption and so forth. We distill this down into a four-equation vector on a regressive model where on the left-hand side we have the price of natural gas. This is in natural logarithms. We have this consumption measure that Dave was just showing you. Then we have this big six. That's that six major industrial production sectors, which really drive industrial energy consumption and I think about 30 percent or 33 percent consumption of natural gas. This one that says PGIN, that's our net price series. This is a series that we adapted from the oil price literature and looking at oil price effects where one often thinks that perhaps the level and the change may be important but what also is critical is perhaps the memory of past shocks. So we utilize that concept in constructing this variable that we call the net price measure. We also went back and tried lags going back three years, that is, 36 months. It ended up that 12 lags doesn't seem to have much of a difference from this. So we have our four variables on the left-hand side and then here we have lag values of those going back at least 13 months. We've tried all variations to reduce this. Then we have some deterministic variables in here for monthly dummy variables. Then we have some outstanding events where there were weather-related effects or there were problems in getting natural gas supplies to the market. Then we have this other set of explanatory variables that we are treating as exogenous. They are actually, I think, for the most part really pre-determined and these include the spot price of natural gas. A very important variable that influences the price in the short run looks at inventories relative to consumption. So we have this gas VAR variable and we have some relative prices of natural gas to residual fuel oil. So we can possibly get in ideas of switching in the model. Some of the testing that we did in terms of the specifications of our alternative models, we looked at integration and co-integration. I won't talk about co-integration during this part of the session but if there is interest later on we can probably fire up the other computer and demonstrate some of this. We are concerned about lag length. We don't want to have it too short so that we miss some of the dynamics of the model but we go in the opposite direction a little bit with some of our tests. These are some of the standard statistics that we use, the Aka Ikie Information Criteria, Bayesian-Schwartz, Hannan-Quinn, Autocorrelation Coefficients. We look at normality. We look at Granger causality or Granger precedence is probably the better way to think about this. We are going to show you impulse responses. We have done some work in variance decomposition and then the simulation, which is the thing that is, perhaps, I think, really interesting about what we've managed to get done so far, and where we want to go with co-integration is thinking about this in terms of error correction modeling and other forecasts. We started off with that vector auto-regressive model and that's what we are going to think of as being our GUM. That's our general unrestricted model. So we try to put everything in at the beginning and then we have our statistical evaluation that allows us to try to distill that down to the most parsimonious model that's possible rather than start with a specific model and then veer off in different directions trying to find an appropriate model. We start at the outset with a very large one and then bring it down. One is just to have no seasonal dummy variables, no event dummies, no exogenous variables. So its just a strict VAR model to Model 5, which includes the seasonal variables, the event dummies, and the exogenous variables, and model 4 has no event dummies and so forth. Here are some of the results from this and here I'm presenting F-tests of these sets of restrictions. The results using the other criteria essentially give us the result. So what we have here is we have Model 5. This is the most general model against Model 4, Model 5 against 3, 5 against 2, and then 4 against 3, 4 against 2, and so on and so forth. What we find is that Model 4, which doesn't include those particular event dummies, does not reduce the explanatory power of the general model if we impose those zero restrictions; however, any of the other restrictions against Model 5 lead to rejection. If we go down to Model 4 we see that any of the other restrictions are a restriction on Model 4. You see down here at 3 there doesn't appear to be much difference in explanatory power between models 3 and 2; however, the net result is that our procedure suggests that Model 4 is the appropriate model. Here are some of these Granger precedents tests. I don't want to harp on these too much but they give us a clue as to some of the results that we may see later on. So the way that we did this is we started off with our Model 4 and for each equation we asked the question whether we can remove lag values of the other variables in the system and see if this leads to a loss in explanatory power. If it does then it suggests that that particular variable in its lag value helps to explain the current value of our dependent variable. So in this case we have our industrial natural gas price and we are presenting the P-values associated with our tests. This is our net price series and the P-value here is greater than 10 percent. So it suggests that this shock series isn't going to help us to explain much about the level itself over the full sample. However, we do see that consumption and industrial production do help to explain future movements in industrial gas prices. The net price variable, I don't know why this result came up. This is one of those bizarre things that happen. It appears for industrial consumption that none of the variables helps to explain future values of it; however, if we do a joint test of all three of the variables the level of the price, the net price measure, and industrial production, we do find that they as a group help to explain. Further discussion on that. Here for industrial production what I want to emphasize here, and I don't want to try to sell it too much, is that it does appear as though the net price, this past shock, memories of past natural gas price shocks, does seem to have a marginal impact on industrial production. Now, here I'm focusing on only four of the impulse responses from our model. What in effect we are doing with these impulse responses is we are asking the question if there is a shock this month to a particular variable, let's say the price level, how does that effect trace itself out on the future values of our variables of interest? What we are doing here is we are taking the cumulative effect of the shocks. We're not looking at the marginal effects at Month 7 or anything like that. We are looking at the cumulative effects. What I've done here, and I can show you more if there's time, is to focus on the industrial consumption and the industrial production measures because that's the thing that Dave in STIFS is trying to get at with this particular experimental study. What we find here is that the memories of big natural gas price shocks don't seem to influence consumption. What I'm showing here, the blue line is that impulse response and the red line represents a confidence interval around that; however, what we do see is that the impact of a positive shock to the price does have a negative effect on future consumption. So that's consistent with at least some economic theory. Now, down here I'm looking at the impacts of price level shocks and this past memory shock on industrial production. The evidence is not overwhelming but it certainly suggests that the memory of past big natural gas prices has an impact on industrial production. Similarly, for the price level we are seeing this negative impact. So we do seem to find some relationship between industrial natural gas prices in various forms, consumption of natural gas by the industrial sector, and industrial sector activity as a whole. Now, here we just did some fairly simple forecast simulations. We've done others but I just want to try to give you the essence of it here. We're trying to see how well does our model fit or track the turbulent period that Dave showed you starting really in 2000 when the price of industrial natural gas went up to I think it was $8.70 per TCF. I think it was in January or so. We are going to start our simulation before then. We are going to fit the model from 1989 up through December of '97 and then were going to have the model try to fit the rest of the period. The model is going to be dynamic for the endogenous variables. We are just going to use the actually values for the predetermined and exogenous variables. So this is going to buy us our result somewhat. Well, it certainly will buy us our results but I think it still gives us the essence of the story that we are trying to understand and in particular I want to mention about these variables we have this storage level to consumption, an important measure in terms of short-run dynamics. We have also lag spot prices in here. Here is where we fit this model. I don't know how clear this to you. The green line shows us the actual movement of industrial natural gas prices and our other variables and the blue line shows us our forecasted values. When we first did this I don't know who fell out of their chair first, Dave or myself. We saw it seemed to track the natural gas prices up here in the northwest quadrant. We were amazed. Now, we think that's primarily being driven by a lot of the predetermined variables keeping us so close. Over here we have natural gas consumption. One of the nice things, at least, that I would like to point out here is that we had natural gas consumption up in this range and then it seems to be tracking the decline. So that was a good feeling. For industrial production we see a slowdown but not as great as was actually observed. It maxes out at about 5-6 percent above what was actually observed in some months. In others it's only about one or two percent. I really wouldn't want to talk much about this graph. Here's where we use the fitted values through the full sample. So here we are giving ourselves the best possible world to deal with. Now I'm going to show you what happens when we only fit the model through December 1997. The results look worse but they continue to give you, I think, this consistent story about natural gas consumption and industrial activity in the wake of price shocks and so forth. How clear is this? Do you have pictures where you can see some of this? Up here I have the industrial production index. The blue line shows us what our forecast is. The green line shows us what the actual level is. The dashed red line represents the two-standard error confidence interval around that. What we see is that the model seems to be predicting that industrial production is going to continue to rise. Now, this is fit through December of '97. The decline in industrial activity we know was not all due to increases in natural gas prices. There were a few other events that seemed to occur that affected industrial production. However, it seems like it's okay and we stay within the bound up until the big industrial natural gas price shock and then we begin to lose it. Over here this is the simulation for the industrial natural gas price and here is that spike in December 2001. The model does well over those first two and a half years and after that it tries to bring the price back down but it doesn't go quite down as far as it did and then we had the second set of price shocks in 2003. So we are going to have a rough time there. Heres the other interesting part. Here's the industrial consumption. In this one what happens is the blue line is showing us what the simulated value is and the green line shows what the actual value is. We stay within that confidence interval certainly through 2000 and into 2001 but after about 2001 and 2002 and 2003 we're outside. We are below, under-predicting actually consumption. I think I can relate that to one thing in particular and that is this part here where the natural gas price is being over-predicted. If you recall from the impulse responses we saw that it's the level of the prices that seemed to have a big impact on industrial natural gas consumption. So in summary what we have done is we have just presented here some tests of an experimental model on this interaction between natural gas prices, natural gas consumption in the industrial sector, and economic activity. We have done this in a VAR framework. We have tried to use this net price or memory effect that seems to give us some oomph to the model much like it does in some of the oil price models. The impulse response function suggests natural gas price shocks have negative impacts on the industrial sector. We have done the simulation exercises suggesting the VAR model can reasonably approximate the data-generating process. I'm not going to define "reasonable" because I don't know what that is at this point. We have done a lot more work and, again, if there's time we can show it to interested panel members. We have a set of questions for the committee and certainly chime in with your own. First off, do you think the VAR approach makes sense? Are there other variables that we should have included that you think we forgot? What do you think the strengths and the weaknesses of the approach are? How can we modify it? To the extent that you think this approach has its merit are there other sectors where you would like to see this done? Do you have other ideas for this net price measure? There are variations on this that people have tried and we've done this one. What do you think about using these constructed variables for the Big-6 in the industrial sector? When we constructed this Big-6 measure we took the industrial price measures from the Federal Reserve Board and they publish tables of value added in 2000. So we used a constant weight in constructing our industrial price Big-6 measure based on those value added weights. Another way it might be done is using the industrial surveys from 1998 and using some kind of a consumption measure weighted approach. I think I'll stop here. If there are any questions I'd appreciate hearing from you. MS. SEREIKA: I'm going to take a cue from Mark Bernstein from the last session to be able to do this more effectively. You have a lot of questions. To be able to give you a good answer there's one basic piece of information as panelists which is you picked a particular approach, the VAR approach. You talked about at least a couple of others. You mentioned a few others in the beginning. Our basic question is why did you as the author of the paper think that VAR was the best approach? That will be our platform to answer the question. The same thing goes to a lot of ÄÄÄÄ for example, the net price variables. What are the other kinds of net price variables being considered in the ÄÄÄÄ and why did you choose this particular one. I think if we could get some answers to that we could probably give you a better reaction. MR. JOUTZ: Let me start with the VAR idea here. I think there are perhaps two main reasons why this seemed attractive to us. There could be others but, first off, what we are trying to do is look at the possible simultaneous relationship between the variables of interest. By simultaneous I don't just mean contemporaneous. I'm also talking about the interaction and the dynamics of the variables. So in that respect I think that begins to get at the difficulty that Davis faced with the STIFS and then having to go to essentially another room or another model where they run a macro model and then they walk back and give him a number and say okay, can you try this and how many times can you do this sort of iteration whereas here we are trying to actually do it all at the same time. So that gets us into the dynamics of it in a crude form. Another way to look at this that has appeal for other frameworks is that here we are really going at I don't want to necessarily call it truly atheoretical but there's not a whole lot of theory or structure in the model. It turns out that we actually can reduce this VAR down some more. We can boil it down some more and look at the idea of co-integration, which suggests that there is a long-run relationship between the variables, and in fact we do find evidence of this and we can look at the interaction between the short run and the long run simultaneously. This would be a way further up the pike about bridging the gap between the STIFS and the NIMS models. Let's see. Is there another reason why we wanted to use the VAR? I think it's primarily to give Dave a tool so that he can work not only with the STIFS but also with real economic activity in the model. The second question dealt with the choice of our net price measure. This is the first time that I had seen someone use a net price measure for natural gas. There may be others but I'm not aware of them. In the oil price literature where this has been tried initially was well, you can always pick one person and someone will say well, there was also somebody in 1970 that did something like this. It's primarily driven by recent work by Hamilton in which he used these net oil price measures and he tried various lags going back how far you would want to go. Other people have just said okay, what's the difference between the maximum we have ever experienced and the price today whereas in this case we are saying well, we forget things after a certain point in time. You can argue and debate about that or maybe you want to discount those longer run values here. We just say at Point X depreciation zero. MS. KHANNA: I'm curious. Have you tried the other measures? MR. JOUTZ: Yes, we actually went back 36 months and it doesn't seem to have much effect. I mean, I was surprised because when I looked at the 12-month ones I think we plotted that out in the slides here. It looks really like its almost a seasonal variable right here. It's not quite seasonal. The green line is the one that we actually use. That's our maximum values. Excuse me. The net is the one that we use. The green line is our maximum value. So in a period like this we are right at whatever the previous maximum value was and then we are going to newer heights. This looks fairly seasonal when I first saw it and so I said well, lets go back 36 months. It smooths this out some but not by much. Another measure that's been used for net oil prices is often done not just looking at the first moment of the distribution but the second moment. So one would think about this in terms of a measure of variance or conditional variance that fluctuates over time. So in periods where there's a tremendous amount of uncertainty you could have prices be fairly stable but then you get a small shock but it's small enough to get people worried about what's going to happen and people have tried that measure. Alternatively you could have periods of very high prices and you could have the same size shock but people say who cares. So that's one other measure that's been used. People have also tried some asymmetry sorts of tests looking at deviations from a maximum, deviations from the floor, and so forth. I didn't do the floor in this case. DR. NEERCHAL: One specific question, is net price variable? It seems to me if you take the max of one year or two year and cut in price and if you say every once in a while there's going to be a maximum it's going to have the effect of an intervention type of thing. MR. JOUTZ: You could think of it like that. DR. NEERCHAL: When you are modeling you already had an intervention theme so I'm wondering if that might explain why the intervention is not significant but ÄÄÄÄ and I'm wondering if it could be interpreted that way. The other thing I wondered is I'm curious if you go back to I don't have the slide in my hand-out here, the slide about comparing actual versus predicted. MR. JOUTZ: Which of the simulations? DR. NEERCHAL: The next one. I'm looking at this with my bare eyes and I'm saying that you are not doing that well. But on the other hand you concluded and your conclusions seemed to indicate you felt you were doing pretty well and I was just wondering am I missing something there. MR. JOUTZ: No, I'm not trying to oversell it but I don't want to undersell it at the same time. The reason that I think it's important to show here is that really up until, say, 1999 the gas price market, and Dave can talk certainly more on this than I can, had been fairly stable through the '90s. There were a few bumps along the way but it had been fairly stable. Then the structure of the market seemed to change. Then we had these huge shocks. We had prices in the $2 range per TCF and then suddenly they are up at 4 and 5 and then 8 and then back down to 3 and 4 and 5 and back up to 6 and 7. So any linear model is going to have an impossible time trying to capture that. What made me feel reasonably decent about the outcome from this set of graphs was that the directions of change in the variables were consistent with what we might call economic theory, that if we fit the model through a certain period where we have a set of economic fundamentals that are working in this market and then we subject this market to some really huge shocks. I mean, I should have brought in a plot of the real price of natural gas. Those that we saw in 2000 and 2003 are unprecedented. So in that respect I think that the directions of change that we saw here particularly in the consumption measure and to a certain extent here in this price measure don't necessarily start in 2001 right when we think they should. Before then it does reasonably well. Then again, its the direction of change that seems to work. DR. NEERCHAL: So your model reflects the theory well but the actual prices are different? MR. JOUTZ: Right, because, remember, there was no experience in this post-2000 period. This is the problem that the STIFS model and Dave and his associates have been facing, how do we deal with this kind of an issue. They are asking questions. Are there these relationships between the real sector and the energy sector and how can we begin to try to understand these? DR. HENGARTNER: So the first question then is is an answer like this acceptable? I think maybe you should try to do something that will get the green line back into your ÄÄÄÄ maybe that's -- MR. JOUTZ: This is through December 1997. Really, for the next two to almost three years the model is able to capture the movement in the series. But once you get these ÄÄÄÄ the price measure is going up in the order of four. MR. BERNSTEIN: But the real question is if one believes the future is going to look more like it has over the past couple of years then what the model shows from '97 to 2000 is irrelevant. I mean, if the future is going to look more like the recent past then you need to do something to capture that. That's the real issue. If it returns back to the way it was before and this is just a short-term blip, fine. Everything will work out nicely. DR. BURTON: I agree with you completely. I think in part it depends on whether users learn how to anticipate and deal with this much more volatile set of market conditions. MS. KHANNA: Well, actually as an economist I take a slightly different ÄÄÄÄ I mean, look at the data. I buy your point, actually, very well because if you look at the data through 97 you've never seen anything like what you see right here. MR. BERNSTEIN: I don't disagree with that at all. MS. KHANNA: So I don't think it's possible for any model to actually get a very good forecast for ÄÄÄÄ 2000 onward. I think the real test of the model is going to be from now through the next couple of years. If we see that kind of volatility ÄÄÄÄ next couple of years will this model then be able to capture it because then we'll have the memory of the volatility at all and then we'll be able to capture because in memory there is no real volatility at all. MR. BERNSTEIN: One of the things you really need to do is put in the data for 2003 and then go out and pretend you know the future and ÄÄÄÄ MS. KHANNA: Yes, that, I think, would be a really good ÄÄÄÄ MR. BERNSTEIN: I mean, from this point we are not here to tell you whether this thing is going to perform well. That wasn't what I was actually going to ask ÄÄÄÄ one of the things missing in the paper you gave us are the references. I would appreciate ÄÄÄÄ I actually read the paper but I was really looking for the references. MR. JOUTZ: That's my omission. MR. BERNSTEIN: It had seemed to me that you had learned from some of the oil price literature. MR. JOUTZ: Yes. MR. BERNSTEIN: If you think about it one of the things that might impact decision making and ÄÄÄÄ futures prices ÄÄÄÄ I know that's hard to try to capture but I wonder if you thought about it in the literature ÄÄÄÄ MR. JOUTZ: I'm going to push that one back on Dave. I think I've asked him about futures prices in the past. My recollection is they are not necessarily the greatest things in the world, either, although as an economist I would think that they should work reasonable well. MR. BERNSTEIN: Well, I think the Big-6 energy intensity industries -- MR. JOUTZ: They are hedging, you bet. MR. BERNSTEIN: They don't just rely on spot prices for natural gas. MR. JOUTZ: That's right. MR. BERNSTEIN: Particularly the changes over the past few years may be due to their expectations of where the price is going ÄÄÄÄ also the relationship between where they think the price will be and then as you get there the difference between ÄÄÄÄ futures price is actually ÄÄÄÄ MR. JOUTZ: Another important point here that Dave has mentioned in talks with him is that you are also getting substitution away from natural gas during these really high-priced periods to the extent that where the firms or the factories can switch fuels they are going to do so. I think both of you had a comment about where do we go from here with the model or models that were experimenting with. One of the things that I would say is that in testing the model itself through the full sample period and through recursive estimation procedures where I could stop in 1994 and fit the model through the full sample, add a month, re-estimate the model, and then refit it through all the way through. If I do that kind of an exercise what is very clear and which has surprised me is that for industrial consumption and industrial output this particular model is stable. If you want to think about this in terms of recursive Chow tests where you are doing a whole sequence of these tests off into the future the models are stable. The problem is the natural gas price. I mean, if you are going along and you have again this notion of $2 to $3 per TCF price and then all of a sudden it goes up to 8 no model is going to capture that. The question is how can you modify the model now that it has been observed as you have suggested. That's the next stage of trying to work with these models. Again, this is limiting in some respects but it's suggesting that Dave at least has an experimental tool that shows him this interaction between natural gas prices and real activity. It seems like it's very little but I was surprised we got as far as we did with it. MR. BERNSTEIN: No, and I think the approach is actually quite interesting. You have some interesting stuff. The question is just how do you use it now and for the future is the question. The other question you asked is whether it's appropriate to look at the Big-6. MR. JOUTZ: Yes. MR. BERNSTEIN: I haven't looked at the data recently. What's the gap between the top six and everybody else? MR. JOUTZ: They have a little over 80 percent of total natural gas consumption in the industrial sector. DR. BURTON: I'm a little confused over what the Big-6 index is intended to reflect. MR. JOUTZ: Let me see if I can find it. I think I put on here somewhere who they are. DR. BURTON: I know which industries but, I mean, what is this index intended to capture? MR. JOUTZ: The idea here is that we have six industrial sectors which are the most natural gas-intensive consumers. So they are going to be focused most on movements in price whether they are using this as a feedstock or whether they are using it to provide heat, what have you, or some kind of power. They are going to respond the most. If we were to put in other industries that lets say their natural gas consumption was zero and we had their activity fluctuating it could be fluctuating and it could be correlated with natural gas prices but it would just be a spurious relationship. DR. BURTON: Right, so this index is the principal output measure? Is that it? MR. JOUTZ: Its the Federal Reserve Board's industrial production index. DR. BURTON: But you're using it as an aggregated measure of output? MR. JOUTZ: That's right. DR. BURTON: And when you aggregate it you have to measure output as measured in dollars, right? MR. JOUTZ: Well, I was the one who selected doing this in terms of value added. DR. BURTON: What I'm curious about and I know there are texts that address it and I read it and it was still unclear. MR. JOUTZ: Its not complete. DR. BURTON: That makes me feel better. MR. JOUTZ: There were lots of spreadsheets involved in that part of it. DR. BURTON: When you construct an output measure that's based on a monetary aggregation and you observe a change in movement in that output measure, say a increase, how can you know that that increase is an actual increase in physical outputs or a decrease in physical outputs versus simply a reflection of the change in input price that's working through the final product price? Do you see? MR. JOUTZ: Let me try to think about this. DR. BURTON: That's a hard question even to ask. MR. JOUTZ: No, no, I mean, aggregation issues are always really thorny. I want to try to be careful with what I say. The dollar measure that we used is only for 2000 and it's a value added measure. This is what the Federal Reserve Board used and we hear that constant weight over the full sample. DR. BURTON: That makes it so that you are not going to have input-related price fluctuations in this measure? MR. JOUTZ: No, right, but there's another really good point here that I don't think I made clear in the write-up is that some of the industrial production measures that the Federal Reserve is able to produce actually are constructed from energy consumption itself whereas it turned out fortunately for these six industries that we have put out here the output actually is a physical quantity. I think in one case it might be 8 percent of it is the number of hours employed. DR. BURTON: And you're multiplying by the value added in order to perform the aggregation? MR. JOUTZ: That's correct. DR. BURTON: All right, I'm good to go. DR. HENGARTNER: Since we cannot really see the future I was wondering whether we can introduce an artificial shock early in your series and apply your model to see how that tracked because I think your first goal I thought was to really figure out how long does the effect of the shock can last, for example. It seems to me that if you really want to answer it one way is to apply your model with an artificial shock right in the beginning of the series or time reverse it because VAR is essentially a stationary animal so it works because you have a shock right at the end of the period and see it coming ÄÄÄÄ for example, and I was wondering if that could be of any use. MR. JOUTZ: If we had seen that I wouldn't be here today. That's really clear. I'm not quite sure how to implement just introducing a shock in the middle of the sample. I don't know quite how to do that from a statistical framework. I do think we get partially at the question you have by looking at the impulse responses from the implied VAR model because that's saying if there is a shock to the price level at time T you can then generate what the impact of that shock would be on future values of the level at T plus H or industrial production at T plus H and so on and so forth. What we did with the impulse response was we accumulated them because we wanted to see how the path of the series worked, not just those individual components. Sometimes these impulse responses are just completely opaque and here they seem to be somewhat consistent with our results. I only showed you the 4 by 4 out of the 12 possible diagrams. There were some other consistent impulse responses there, in particular like if you see shocks to industrial production do you see future increases in industrial consumption. So it grows on a steady path in that respect. I think there was some limited effect of a shock to industrial production having a short run impact on natural gas price levels. I wouldn't want to think that that would have a permanent effect but it wouldn't surprise me necessarily that if there was a fairly large increase in industrial production it could have an impact for a month or two or even three on the price level. This is just one sector of natural gas consumption. MS. KHANNA: The way I feel, I just think that Mark had the same comment which is about testing your model and you said about the model after 2003 you said introduce an artificial shock and then see how the model performs. Intuitively that's a very appealing suggestion to me. Would it be possible for you to generate artificially ÄÄÄÄ variation and then introduce that kind of giant shock? It lets me move the window back ÄÄÄÄ just to see the predictive power of your model because if you are going to use this model to see what's going to happen in the future and if you do expect these higher volatilities ÄÄÄÄ having a model that ÄÄÄÄ well when the market is stable is not going to be very useful. You really want to actually subject your model to all kinds of maybe even bizarre looking pricing just to see how it really behaves. If you find you get reasonable results even when you have artificial price shocks or things that you haven't ÄÄÄÄ that would be your starting point and that would be the real test of your model. Otherwise, looking at a stable market, if that's not what you expect you could have a lot of models if you do that ÄÄÄÄ I'm saying you see the supply ÄÄÄÄ if you take that supply and just move it back to '86 artificially in your date and then let the price ÄÄÄÄ and maybe have another spike and if the model is actually able ÄÄÄÄ you don't know what the industrial output is going to be because ÄÄÄÄ simulate something like that to see if your model would be able to capture that. It would be an artificial exercise but I think ÄÄÄÄ MR. JOUTZ: So I would just put a shock in there and then just see what happens to the model. My first thought would be I would get the exact same diagrams. The extra problem would be that my confidence intervals would blow up because the information that I had been able to get in estimating up to that point where the shock occurred would not be as much as I've had previously. So I have a feeling that the model is going to do pretty much the same thing given that the model for consumption and industrial production is fairly stable. Let me try offering another kind of suggestion here. If we believe that let's say '99 and 2000 and beyond is this new regime for natural gas there are perhaps two things that strike us. The first is the level shift component. That's one thing that I think we can try to do with these models fairly easily is incorporate a level shift impact for natural gas prices for that period and see how well it's able to do the prediction. The second component comes to this relative volatility measure and here the VAR as it's currently specified isn't going to capture that. What would begin to capture it is perhaps looking at one of these models where we think about conditional volatility. You can do it from a stochastic framework and an estimation through a GARCH procedure or you could do this through some sort of ad hoc approach where I'm going to just say some standard deviation of the past 12 months or 24 months or last 3 months, some weighted average of that. Then you go to the question of how well are we able to predict in this much more volatile period. I would feel a little bit more comfortable perhaps trying something like that rather than go back in '94 or '96 and put in the shock. DR. NEERCHAL: I just wanted to comment on that. Earlier on you were discussing why VAR. I think the way you can write it down like an equation and explain it is one of the attractive features of VAR is that you can explain to people, okay, what relates to what from ÄÄÄÄ and so on. I think if you get more complicated with GARCH and stuff like that you are going to lose the transparency of some of these things. I somehow find interesting the shock and studying it. That way you can actually say something about how long the shock last. I think that's the key quest. I think especially if it's happening only once in a while should we just throw away all the data before the shock. That's for your future forecasting purposes or how much does it affect. Those are the real questions, I think. I think that would be interesting to see ÄÄÄÄ MR. JOUTZ: Could I ask you for references in terms of introducing these artificial shocks earlier in the sample? DR. NEERCHAL: Sure. MR. JOUTZ: To see how people have actually tried to implement that? DR. NEERCHAL: Like a simulation expansion, sure, I can -- MR. JOUTZ: I'd like to know what my benchmark would be in terms of evaluating the performance of the model and so forth in that kind of an environment. One of the things that are difficult, I think, in an exercise like this is that in some ways we would like to be able to explain all of these events with our model. We're lucky because we have the full information. When it comes to Dave the last week of every month he's got to say okay, I know what's going to happen in the last three months of 2005 and that's going to have this impact on prices and consumption. So I came in with limited expectations in trying to think about how can we have this model that gives us a little bit more in terms of the projection system but also gives us the link to the macro economy that he felt was missing in his current set of tools. He gets these numbers and then he says does this make sense and he discusses with his staff does this make sense and so forth. So then they are going to bring in their add factors where their expertise contributes to the model. So we would like the empirical part to actually capture as much as of this and then give us the direction and then he's going to fine-tune it from there. Dave, do you want to add to this? MR. COSTELLO: Just that I think for a couple of reasons it's worth going a little further with it because the benefit that it gives us is a method for, say, if we want to do a lot of different scenarios or some stochastic simulations that essentially explore possibilities that are sometimes extreme ranges but not have to do that and be pretty much unrealistic about some of the underlying variables like industrial output. This was a test because gas was so interesting to do it with respect to that particular angle but it has a broader importance for us. So that time if we do stochastic runs as a regular sort of thing, which is what we want to do, we can have incorporated in it this feedback component from the macro that helps make it complete. As it is, if we did a stochastic run essentially it would fix the macro inputs which doesn't make sense. They have to be made stochastic somehow and this seemed to be the most sensible try. MR. BERNSTEIN: I just wanted to add that you can't deal with in this kind of stuff and fundamentally you are going to have to try to answer as we report in time how this change in natural gas prices is going to change decision making among the gas users. You can't answer that in these types of models but it may be something that you try to get an understanding of how companies are going to change the way they are going to make decisions, particularly the Big-6 guys. That may then help figure out how you have to incorporate that. MR. JOUTZ: Yes, and the model does give us those directions. We see the response to those price increases. We see natural gas consumption by that sector fall. The current STIFS doesn't actually enable you to see that. So in that sense I think that's one reason why Dave wanted to try to see what the model would buy him. MS. KHANNA: Anything else? MS. NORMAN: In regard to the possible simulation and the continuous updating for estimating and comparing against the actual data that you have how far out are you trying to estimate or did I just miss it somewhere in the context? MR. JOUTZ: The horizon that Dave was concerned with was typically 24 months, a maximum, I think, of 24 months. MS. NORMAN: I was just thinking about what you were talking about earlier where if you were to update it and do a month by month snapshot and then do a comparison to see how well it can project out, at least just for a year, nothing extremely further than that. MR. JOUTZ: In the slides that we presented here let,s try to show you some possible extremes and see how it goes from there. MS. NORMAN: Just curious. MR. JOUTZ: That's certainly what we want to do. MR. BERNSTEIN: Thanks. MR. JOUTZ: Thank you. MR. BREIDT: So let's finish off the last bit of the morning here. We'll have two ASA discussants summarize the discussions in the two breakout session. Before we do that I want to introduce Susan Sereika, who is the new committee member who arrived mid-morning. I forgot to do that earlier. I apologize. First we'll have Neha report on natural gas prices and industrial sector responses. MS. KHANNA: Thank you. We had a very, very lively debate in our breakout session, I should say, one of the best I think I've seen in the couple of years that I have been around. The main question that the committee addressed was the particular model chosen for this paper the right one. The discussion took the form of, well, the model performs really well in the first few years of the historical series because it was a particularly stable time and there were no major structural changes in the series and so the model does a remarkably good job of predicting that. The problem arose when the nature of the series itself seems to have changed. We have gotten not a new level of prices but increased volatility in those prices. What the committee members picked up on was the actual series started to fall outside of even the confidence bands and there was some concern about that. So the earlier discussion centered around what do you do about that and is the model sufficient for this. I think the primary suggestion that came out was to try and either fit the model maybe not from 1997 and then predict but maybe from 2003 so you start seeing some of that volatility right in your sample period when you are doing out of your sub-sample forecasts or is it to try to subject the series to some kind of artificial shock. There was some debate about how that could actually be done. But I think the primary thing that the committee was saying was if you are going into a new natural gas regime then we need to test the model under the new regime conditions to see if it really has the kind of predictive power that we want. The good news was that the model did give the right direction of change and that's typically your first measure of are we on the right track. It looks like the model is capturing that but I think that the committee would like to see some more tests of the model to the kind of regime that we are moving into. I think that was the biggest recommendation that came out. There were other smaller recommendations like some question about exchange of references. Both sides wanted some more references. Possibly using historical futures prices in the market to see if that had any kind of explanatory power. There was some question about the right net price variable and the authors gave us a good response to why they picked the particular one that they chose but it may be nice to see how other ones perform if there's really any difference or not. I think that was the primary recommendation that came out. If anyone else who was there from the committee wants to add to this I would like your help. Then I think that's all that we have to say. Thank you. MR. BREIDT: Next Moshe will summarize the results of the other breakout. MR. FEDER: I'm supposed to talk here about problems of coverage of the frame so I must say that my summary is going to be a coverage problem, too. I'll cover what I've been able to take from that. The question was regarding the defects in frames and the coverage that are used by the EIA and how can you measure it. There are two main ways to do it. One way is to compare it to census frames, which are usually much more comprehensive, and the other is to look at some tabulated data such as the one I'm holding here when you compare consumption with disposition and you see a discrepancy which is accounted for by a balancing item. There was some discussion about why there could be a discrepancy and that could be for a variety of reasons, different definitions and so on. Frame defects is only one of them. With regards to matching to the census it was mentioned that the EIA has a threshold of one megawatts conception below which a plant is not included in the frame. There are linkage issues. There are certain limitations because of disclosure issues that don't allow matching. There are different definitions of establishments between the two organizations mainly because of the level of aggregation in state level data. There was another important point raised by one of the committee members. There could be some establishments missing for both, in which case you will never know. Then we talked about the issue of how to compute the rate of the coverage. There it really depends on what is your measure of coverage. Is it by volume or is it by the number of missing establishments because there could be some missing establishments that are also very small and the contribution to the national aggregate is not that high. But on the other hand there could be some interest because of policy issues in small establishments. So we really need to define the coverage issue. Some remedies to that were suggested. A committee member suggested an adaptive sampling approach which was proposed in two ways. One way to do this was by asking known establishments, plants and so on, who your competitors are. Tell us who else is out there in the market and try to see if we actually have the information. Another idea proposed by this committee member was to ask them about the suppliers and the customers and see if those are missing. So this what we referred to as both adaptive sampling and network sampling is to identify missing units not only by census by also by those that we know about. Then there was another question raised. What discrepancy we see in the tabulated data such as here, what is large and what is not large? How do you measure it? In addition a question that was raised from the audience is why do you really care about bias? Maybe bias is not so bad if it's constant over time because we are more interested in trend and seeing changes in the trend. That led to a suggestion that one should look not only at the size of the balancing item but trends in that because if it fluctuates sometime its positive, sometimes its negative. Maybe it's just because the attribution to a given year is not the same across the different reporting levels and definitions. Finally there was a suggestion made by another committee member to create the EIA frame as a sample from the bigger population because we know some people are not there and treat it as a nonresponse problem and calculate propensity scores for units being in the frame based on some census data by post-certification of the sample and using some known techniques for the response. MR. BREIDT: I now need to invite public questions and comments. Hearing none, lets then head out to lunch. (Whereupon, at 12:26 p.m., a luncheon recess was taken.) A F T E R N O O N S E S S I O N (1:21 p.m.) MR. BREIDT: I think we are ready to get started. Our first presentation will be on the Electricity Transmission Data Needs, focus group results. MR. BRADSHER-FREDERICK: Thank you. I do want to thank my so-author, Phillip Tseng for helping out on this project. Both he and I are with the Statistics and Methods of EIA. First the outline of the presentation. I'm going to talk about the introduction and the need for this work, the participants and the methodology. We'll talk about some of the issues that were identified, the data needs, and the potential data sources that were mentioned by the focus group participants, and finally the focus group recommendations and observations and then a couple of questions for the committee. The impetus for needing the input from stakeholders, the electric power division of EIA was interested in collecting transmission data so this provided a good opportunity for getting input from stakeholders. We all know that the electricity industry is in a period of transition. The information collected through the focus groups is being used as part of the input for both short-term and long-term planning for future EIA data collections and the presentation following mine will be by Bob Schnapp and he'll talk more about the input. This work was also recommended by Doug Hale and his research work. Incidentally, Doug Hale will be available at the end of this session after Bob Schnapp's talk to answer some questions about his work. We conducted four focus groups from November through January of 2003 through 2004. We began with the focus group on EIA staff. We like to do this as both a pretest of the protocol but also we always get good data from these first sessions so we are able to use that data in addition to the other data that we get. We had participation from six of the offices that are involved with electricity transmission one way or another. The second focus group was with the DOE staff. We had participation from three different offices, the Office of Electricity Transmission and Distribution, the Office of Energy Efficiency and Renewable Energy, and the Office of Policy and International Affairs. We had some high level people for these groups. In fact we had three office directors. The third group was other federal organizations. We had representation here from the Congressional Budget Office, the Federal Energy Regulatory Commission, and the Department of Justice, each with about three participants in each of those groups. In the fourth focus group we had nonfederal organizations. In this we had a pretty wide range of people. The organizations were Resources for the Future, the Edison Electric Institute, the National Rural Electric Cooperative Association, the National Association of State Energy Officials, the American Public Power Association, the Electric Public Research Institute, the Electric Power Supply Association, and we also had participation from the Department of Agriculture, the Rural Utilities Service. They were a little bit of an anomaly here but sometimes scheduling can be a problem. You will notice that the first three sessions all had data users and for the most part so did the fourth session except we did have participation from the Edison Electric Institute. They are advocates for the data providers so we had a little bit different representation there. Also we conducted some telephone interviews, which I'll go into, but first I'll mention we did have 9 to 12 participants in each of the groups. This is a nice number of participants for focus groups, not too few, not too many people. Also the moderator has extensive experience with moderating focus groups on energy issues. The participants in the telephone interviews, we interviewed the congressional staff. We had three staffers. Two were from the House and one was from the Senate. They were involved with some of the energy committees. We also had interviews with an energy consultant and a public utility commission in the midwest had three persons who provided written answers to us. One of the first questions we asked of the participants was to ask them about emerging issues in the electricity transmission area. This is both a good icebreaker and it gets them thinking about the future. Some of these issues that were mentioned, in fact I think all these were mentioned more than once, determining the location of bottlenecks, the availability of the grid to generators, disturbances on the grid. These were all going to be things that EIA should be concerned about in the future, reliability measures, the evolution of regional transmission organizations and the probable need to develop models with dynamic data requirements. Also you are going to see a lot of changes in ownership and EIA should be aware of that. Some more issues involved, the need for market transparency, the data collection problems due to joint ownership, owned versus controlled facilities, and how to go about surveying those groups, the need for dynamic ratings as opposed to static ratings, and the need for line maintenance data. In terms of the data needs which were mentioned we grouped these into four classifications. I'm not going to read through all of them. There's a very long list but it's on page 2 of the handout that I distributed. I'll go into a few of them that were some of the most frequently mentioned and involved the most discussion in some of the groups. The first one, the economics of the grid, market power was one of the things frequently mentioned. Particularly the Department of Justice people were interested in this. In terms of policy development, reliability was mentioned over and over again. The people were concerned about reliability, how to measure it, how do you go about trying to measure those variables, and so forth, also access to transmission, issues surrounding that. In terms of physical planning needs the costs of investment in new lines were important. The transmission flows within the states and the implications of transmission on capacity expansion decisions and costs were considered important, also outages and transmission line outages including frequency and duration of the outages and a number of occurrences of a transmission line denial was also of interest. In terms of the potential data sources those listed up here were mentioned, the ISOs, the RTOs, the North American Electric Reliability Council, public utility commissions, FERC Form-1, which is the annual report of major electric utilities, licensees, and others also is a futuristic possible data source in the future, the possible use of telemeters in order to have real time data collected, also generator interconnection reports and the construction of transmission facilities reports. The focus group recommendations and observations for EIA, these are things which some people thought EIA should be involved with either doing or thinking about. EIA needs to define its role in goals and providing transmission data. EIA needs to more fully cooperate with FERC. EIA should explain markets with narrative reports. We are sometimes accused of using schematics too often without providing narratives. And for geographical reporting purposes it might be useful to align the transmission data with RTOs, ISOs, or control areas. In terms of the recommendations or observations for others participants thought it might be nice if the flow and capacity data using EIA, FERC, ISO, RTO, and RUS sources could be processed in some way, an integrated approach so that the data might be more easily used by some of our customers. They thought that code names of reporting entities should be standardized in some ways for comparison purposes. They also felt there was a need for NERC and EIA data to be combined again in some ways so it would be more useful for our customers. But it was also mentioned that we should also be concerned about maintaining time series with similar indicators throughout. We shouldn't break our time series if we could avoid it. Questions for the committee, I think there's a general question and this is one that was discussed in pretty great length in some of the groups. That was what should EIA's role be with respect to electricity transmission data collections and also any observations or recommendations regarding the focus group work. Phillip Tseng and I could answer any questions that you have, I hope. DR. HENGARTNER: So this focus group happened to be after last summer's incident where the whole grid went down in New York. I was trying to see through the kind of data that people were recommending collected somehow reliability, robustness, and the aging of the whole infrastructure, which is indeed what the grid is, or part of the grid, if that came up as one of the concerns. MR. BRADSHER-FREDERICK: It could very well be that the case that the August 14 incident was on the minds of a lot of people in several of the groups. Certainly, for the congressional staff it seemed to be right at the top of their interests list, reliability and issues surrounding reliability. DR. HENGARTNER: So do you think your conclusions are biased because of this? MR. BRADSHER-FREDERICK: Well, I'm not sure exactly what conclusions we have. I mean, this is more of a matter of collecting data about what our stakeholders have to say rather than really arriving at conclusions about what should be done. DR. HENGARTNER: Well, but the stakeholder will say we want this kind of data to be able to predict future failures. So, I mean, the kind of data that they are suggesting to collect might be in that direction. MR. BRADSHER-FREDERICK: Sure, we should be aware of that, be made aware of this. Are there other questions? MR. BREIDT: What are these futuristic telemeters? MR. BRADSHER-FREDERICK: Actually I'm not that familiar with them. Does Phillip want to say something about them? MR. TSENG: I think in the beginning Howard mentioned the purpose of doing a focus group actually reflects market structure changes. Because of the market deregulation or restructuring actually stakeholders need to understand what's happening. Policy makers want to understand reliability issues because when we talk to government officials, that's the key. Whether last year's incident played a role, actually we are not sure because actually within DOE at least we heard about reliability issues quite frequently. In terms of market restructuring actually from a regulated market to a deregulated or restructured market it creates a lot of uncertainty in terms of investment and rate of return. That's another area I think makes this work more interesting is that people actually demand this information. So this is a background. It actually plays a role in terms of what drives the activity in collecting input from the focus group and what EIA can do. I think another issue is resources. There is information available but EIA has to evaluate what's available in terms of our capability of handling things and actually at the stakeholders meeting we heard people commenting on EIA has to define its role in providing the data because this massive amount of information and how to process it, how to collect it, and what kind of burden it could be imposed on respondents, those things have to be taken into consideration. I think that's a general flavor. I don't know if it answers the question. MR. BREIDT: I can ask you offline. MR. BERNSTEIN: This is just a broad policy issue, more for EIA as a whole and Doug when Doug comes up and gets questions. The problem is going to be EIA getting money to be able to collect the data and do the analysis on the transmission system, which is, I believe. Desperately needed in the US. There's nobody else out there with the capability to do that. What you are going to need to do is to make a really strong case as to why this should be EIA's responsibility. Focus group information ought to be able to help you, not necessarily in the form that you have it here but when you are making your case to Congress and others these are your stakeholders. These are the people who are going to use them. I don't think this information is telling you anything new or it's probably not telling Doug anything he didn't already write in his report. But what it does is give the ability to add to that report justification for going after the stuff because the stakeholders believe it's an issue; the stakeholders believe EIA has a role in it and that's what you need to use this for. It didn't tell you anything you probably didn't know already. The stakeholders probably didn't tell you anything you didn't know but it gives you the justification for what you need to do. MR. BRADSHER-FREDERICK: Your point is well taken. MS. KHANNA: I'm going to say something that's going to let you know how little I know about the electricity sector but it seems to me the one thing the EIA could do as far as electricity transmission data collection goes is help to sort out just definitional issues that seem to exist in the system. For example, I think Phillip was saying you mentioned in one of our breakout sessions that a 69 KV line in some situations may be a transmission line and in other situations it may be a distribution line. I see someone like the EIA really to be in a position to come up with a sensible way of deciding what should it really be seen as. That from an end user's point of view would be very useful and I'm sure from a policy point of view as well when you're putting things into NEMS or into any other kind of policy framework. It's bound to be useful. MR. BREIDT: Any other comments? Okay, thanks, Howard. MR. BRADSHER-FREDERICK: Thank you. MR. BREIDT: Our next speaker will be Bob Schnapp from the office of CNEAF. MR. SCHNAPP: Good afternoon. I appreciate your time so that we could share with you what we have been doing with revising our electric power data collection forms. This is more of an informational briefing. This is a briefing that I've given now probably about 40 or 50 times to our stakeholders, people that use our data or supply data to us, so I've condensed that a bit. And then the last part is a little bit more information about just transmission data, what's available and items that we are proposing to add to our forms now. So let me just generally go over what our project is and how we are handling it and then I'll get into a little bit more specifics. The first phase is data requirements where we have to figure out what data we are going to collect. This project is a follow-on to a project that we finished a few years ago called Electricity 2002 where at that point we literally threw away all of our forms, all of or systems, and all of our publications and started from scratch. This project is changing things around the margin. We have already designed our new forms. We have our Internet data collections in place. We have our new dissemination products out there. So now we are figuring out what things we need to change. So with this we have come up with a list of key issues that we needed to address this time through and we'll have a list to go over with you. We've done a lot of stakeholder coordination and one of the slides will show you who we've met with so far. Then we've evaluated our confidentiality policy to see what elements we should either add or subtract from our data confidentiality policy. And then there's a new requirement on confidentiality and I'll address that as well. Then with all that we put together our proposal and issued it in a Federal Register notice at the beginning of this month. So any comments from the public are due by June 1st. If any of you are interested you can go out to our website. Its on top of our electricity website. You can just click on it. It will show you the Federal Register notice and it will also show you what the proposed forms will look like. In addition to that another feature it will show you is a redlined version. In other words it will have strike-throughs of all the words and elements that we are throwing away and it will have red highlighting for all the new elements. So you won't have to sit there and find the old forms and try to figure out what's going on. We tried to make it as convenient you as possible. The second phase is the forms design. That was part of the data requirements analysis and how we would be presenting it to the respondents. That included any consolidations or moving of elements from one form to another, revisions that we would have there. Then our proposal is to go to the Office of Management and Budget by about September of this year so we could have it cleared by November. The third phase is once we have decided on everything we are going to be changing in the forms we have to figure out how to change it in the systems that collect and process the data, including our Internet data collection system. All that would have to be in place by January of next year so that we could let all of our respondents know what it is that they need to provide to us. Then in the fourth phase of dissemination we currently have two reports. I'm not calling them publications any more because we don't publish them any more. We only release them on the Internet. That's the Electric Power Annual and the Electric Power Monthly. We have detailed historical spreadsheets behind them that go back on the monthlies month by month, state by state, fuel by fuel going back to 1990. And on the annual side it also goes back to '90 by sector and by state. In this phase we also want to figure out how to redesign the Internet products since we are not going to be publishing hard copy any more. We want to figure out how we should make it more usable and useful to people when they come to find our data. Then we have consistently put out all of the public releasable data in large database sets that are out there now. So all that would have to be available by September 2005 when we would be going out with our annual report. I'm not going to go over any of these. This is a list of some of the folks that we have already met with, some of our stakeholders, and just to go over a couple of the items that we have talked about, our Internet data collection system, over a third of all of our forms were submitted over the Internet last year. That was the first full year of its operation. We sent out passwords to almost everybody and we got a fairly good response. This year we are trying to double that, to get at least two-thirds of all the respondents to file by the end of the year. As of right now we have about 5,000 out of about 38,000 forms that have to be filed, have already come in over the Internet. We have sent out all of our requests. Where we have an e-mail we have sent out all of our requests by e-mail to the respondents asking them to file. We have also set up a help center so that if there are any questions, and there have been many, on how to get into the system and other issues. We have established one phone number to call, one e-mail to send their e-mail questions to. So we have done a lot of work on this side so that we could eventually free up the staff. When the data comes in over the Internet it already has built-in edits so that we don't have to key the data in. We don't have to review the data. We don't have to make phone calls to make sure that the data is correct. All that will be done inside the Internet. So the key issues that we have been looking at in Electricity 2005 are the role of distributed and dispersed generation and I'll just quickly tell you the difference between those terms. Distributed technologies, in EIA speak anyway, is all the technologies connected to the grid that are located near the load. By located near the load we mean transmission-wise so that they have 69 KV voltage or less. Dispersed generation are those facilities that are not connected to the grid. They could be connected to an office building or they could be out in the middle of Montana powering somebody's log cabin. We don't have any plans to go out to all the dispersed generation folks and ask them for the numbers but we have added a couple of questions to our 861 form, which is on sales and revenue and goes to all the utilities, so they could tell us what is hooked up to them and they should know everything that's hooked up to the grid in some shape, manner, or form. Then fuel switching, we have been getting more and more questions about that, particularly when the natural gas price goes high or the price of oil goes high. How can it be switching back and forth and what's the capability? So we have added actually quite a few questions on our 860 form which goes to all the power generators. On that form we ask for, really, mostly static information, what's your capacity, your prime mover, what types of fuel do you use, so we have added that number of questions there. On the 417 we have been trying to figure out what else to add on system emergencies. We haven't quite gotten as much as input as we wanted to. Homeland Security is very difficult to contact and meet with. The Office of Energy Assurance just lost their office director so it has been difficult to get anything there. So we are having very few changes on that form. That form actually won't be in this clearance cycle. That goes separately so we really only have another year on that one. Transmission information, which I'll talk about at the end here, we have added a number of questions to EI-411 which goes to the NERC, North American Electric Reliability Council, added to the 412, which only goes to the municipals and federals, and 860 also. Power imports and exports are currently collected by the Office of Fossil Energy. They have indicated very strongly that they would like us to take over that responsibility. It's being moved into that new Office of Electricity Transmission and Distribution and there are lots of questions about their makeup yet. So they haven't made a decision yet on whether they can afford to have us do it and redesign the form. So that's again not going to be in the package. We have been trying to get that going but we are not there yet. Then the plant costs for unregulated entities on the 412, this data was collected for the first time beginning in 2003. Those facilities are having a hard time reporting. It's about 30 elements we ask them on financial information. It has been difficult for them. So we've culled out about ten questions to ask them and we got some feedback from some of the industry groups that were on the right target. So from this Federal Register notice we'll see if everybody else agrees with that. The last item is energy balances for unregulated entities were for these past three years on the 861 form. That gave us some problems. We are planning on moving that over to our monthly form that we go to, the utilities, IPPs and combined heat and power plants. On confidential data this is a list of the data that we currently deem to be confidential, fuel costs and fuel stocks that we collect on a monthly basis, latitude and longitude. This gets some laughs from some people. You go out to Mapquest and you put the address in you will get the map coordinates. You will even get an aerial view. Terrorists will have lots of fun there. But our philosophy was that we just didn't want to make it easy for everybody. You could download one database and everyone would have all the locations, all the latitude and longitude, so we do share this information with other government agencies but our plan is not to put it out on the Internet. Maximum tested heat rate under full load conditions, people can calculate the operating heat rate but this is just a maximum one and this would affect people's ability to compete. Plant cost data, unregulated entities, which I was just talking about as an issue, all of that is confidential because it's financial information for facilities that are in competition. Then monthly retail sales, revenue, and number of customers, only for energy service providers on a monthly basis. But these same folks provide us information on an annual basis and that information is not confidential because we figure the data is old enough and it won't affect their ability to compete. Then there's a couple of items on the emergency form that are confidential. One is who the contact is and this way those people are in contact during an emergency and the other is there's a narrative that they have to write to explain after the incident what happened and if they felt that this information would be going out, would be published in the Washington Post tomorrow, they wouldn't give the department a full analysis, a full true analysis, of what happened, so we've asked for that to be confidential. So this is a list of all the elements that we currently hold confidential and these are the elements that we are changing in some shape, manner, or form. We'd like to add two elements as confidential. One is power flow cases, and these are transmission information, as well as transmission line maps that are handed to us. This information is already given to FERC. We really just get a copy. They hold it confidential. We are trying to get our policy in line with theirs. Then there's a number of items that I just listed on the previous slide that we hold confidential that we are proposing to release after six months of holding it so that people could see stocks and fuel costs because by that time we figure the data is going to be old enough and won't affect their ability to compete. The last item there is a new act which I believe this group has been briefed on before called the Confidential Information Protection and Statistical Efficiency Act. What that does is it allows EIA to not have to provide information because of the Freedom of Information Act. It absolutely covers us completely and we don't have to entertain those requests at all. But what it does require us to do is that if another government agency promises to use the information only for statistical purposes then we are allowed to lend that to them. There is a couple of other items that go along with that but of all the items that we are collecting and then the smaller set of those elements that we think should be held confidential only the last one here, the plant cost information for unregulated entities, we thought should be covered by that. All the other information then we could share with other government agencies so that they could see the individual data. So, moving on the transmission here real quick, I wanted to let you know what data the federal government already does collect because it does collect a lot of information. On the FERC Form-1, which is filled out by investor-owned utilities, they provide information on existing transmission line characteristics, existing substation characteristics, and transmission lines added during the past year. We mirror that on the 411 with the municipals and federals. FERC further has a 715 which collects the power flow cases and existing transmission line maps. Then we collect from NERC again, as I explained before, transmission line maps for proposed lines and in power flow cases for new lines. That's not collected by FERC. Then on the emergency form there's a couple of check boxes there that they would have to indicate whether transmission lines are involved at all. So given the work that Howard and Phillip had done and the report that Doug has put together in discussions with other groups around town we came up with this list. I would say that this is probably a short list of items that EIA should collect but the analysis that a number people here mentioned should be done and the amount of money needed to do those things is going to take a longer time to get to. In fact one of the items, I think Recommendation 11 in the Joint US-Canadian Task Force report on the blackout, suggests that EIA get together with a whole bunch of government and industry groups and figure out what information should be collected on reliability and then, given resources, go and collect that data. So we are in the midst right now of preparing a budget submission for that. We'll see how important it is to the Congress as to whether they would fund that. That really is the bottom line. So our short list then of the new elements that we are proposing in our new data collection forms is on the 412 just for the municipals and federals we would ask for existing transmission line upgrades and costs, which are not collected now, and then the same thing for system upgrades and costs. Again, this is only for municipals and federals. We can't force FERC to collect the same information but we hoping that they will follow suit with us. On the 411 that the North American Electric Liability Council submits to us we are proposing to ask for scheduled and unscheduled transmission line outages and then we have cut-outs for AC and DC lines. We know that some of this information is already collected in a number of the regional councils so we are working with NERC headquarters to try to move the rest of them in that direction. NERC headquarters has been very supportive of us in this. Then on the 860 we would ask all the generators for any new generator interconnection data so it would be their location, ownership, and then cost and transmission rights of being hooked up to the grid. This seemed to be a fairly contentious issue from the new generator point of view that they couldn't connect up and so there was a lot of questions. Well, who is doing it? What's the cost of those things and what are they allowed to do and not allowed to do? I think that was the last one. So that briefly is the Electricity 2005 Project and then a subset of that what it is that we are proposing to add in the transmission area so we can dovetail that with what Howard and Phillip were just talking about. So I would be glad to answer any questions that anybody might have now. MR. BREIDT: And questions? DR. HENGARTNER: On the side of reliability data of transmission lines you just showed that you are going to ask municipal and government lines to give you the data. And you cannot force individual companies to provide you the same information, correct? MR. SCHNAPP: Well, the federal government can only ask the question once. Because FERC collects transmission information we can't ask it again. So they are collecting a whole set of information on transmission on the FERC Form-1 and we are saying we are collecting the same thing on the 412. Now we are proposing to collect a few more data elements on the 412 and we are hoping to entice them to collect the same information on the Form-1. DR. HENGARTNER: That makes more sense. Thank you. MR. BREIDT: Other questions? Okay, thanks, Bob. Next we have some time on the schedule for Doug Hale on transmission data for public policy. Doug, are you actually speaking or are you just answering questions? MR. HALE: I was told to be here to answer questions and I'm happy to do that. But the more I listen I'd like to talk for a minute or two first. MR. BREIDT: You go ahead. MR. HALE: First I want to thank the people who over the last year have suffered along on this project with me. Many of you are here. I have had terrifically good comments in the last few months from the people working for Bob Schnapp and for Scott Sitzer. These people are under enormous stress in their day to day work and they have taken a lot of time to help me get this thing more right, certainly more right than it was. I've also had some real help from the ASA committee. At the last session we talked and it was clear that the draft report we had was not salvageable and I presented a different way of trying to sort through this morass of transmission. You all encouraged me to try it that way and either it worked or the reviewers were so beaten down they have decided they are not going to make any more comments that I've got to fix. So I certainly do appreciate very much the work of the people here. Of course, Herb and Renee and others in SMG took a lot of care going over our characterization of the existing data collections and for that stuff that was pure fantasy they certainly fixed that. It turns out it's a very broad analysis with a lot of factual and theoretical pieces. I keep getting reminded of what one of my Arab friends in graduate school told me. He said, Doug, only Allah is perfect so I think you should read the report in that spirit. We have tried very hard, but I'm sure there are things wrong. Transmission is important now because of industry restructuring. I want to spend one second on that. Until fairly recently the industry consisted of pretty much stand-alone utilities with their own service areas, with their own lines, with their own generators. In the northeast in particular there were interconnections between the utility service area but they were viewed as more of a way to economize on providing electricity because of differences in demand patterns or something that could be used just in emergencies. So these weren't very important commercially. It wasn't important from a regulatory point of view because it was only 6 to 15 or 20 percent of total cost. Almost all of the costs were in generation and in distribution so no one really paid that much attention to transmission per se. In the new world the transmission system is supposed to be like a common market, a highway where you are moving electricity around. That's fine except for a couple of things. One, people are no longer able almost by themselves to control their own reliability systems. So how reliable I am depends upon what Nancy does with her generators. We saw a dramatic example of that in August. It also means that I've got to know what Nancy is doing and she's got to know what I'm going to do and when I'm trying to do things that are going to impact her. Because I make money from that I'd be backed off when that's appropriate. That's a new thrill for the industry as well. FERC in its regulation realizes they have to encourage investment and modernization of the grid. The only way to do that is to have a financially viable transmission business. The FERC Form-1, which is a basic data collection form on the finances of these things, doesn't allow you to figure out what the economics of transmission are. Again, that made sense because transmission, whatever it was, wasn't much. Now it's very important. Then finally, as we've said a couple of times, markets are now very important and growth is very important. So what I tried to do in this report is say okay, we have been collecting data now on this industry for at least 50 years. We have updated our reports periodically but this has been a real sea change, if you will. What is it about our forms, and by "our" I mean the government's, FERC, EIA, whoever else? Are they still getting at the questions that policy wants answered? Is the system reliable? What are the economics of this? Are they supporting growth? So what you see in this report is a methodical, incredibly boring attempt to work through how the changes in the industry have affected the data that's available or needed to answer those questions and then to take a look at whether that information is being picked up on our forms currently and if not to make a suggestion of one potential way of collecting the data. That's what I'm trying to do. Everyone hasn't read this, I hope. I thought you'd like to have a little background. MR. BREIDT: Questions? Mark? MR. BERNSTEIN: I did read it, much to my chagrin. It almost put me to sleep on the airplane but that's okay. This is actually not a question directed just to you since Howard and Bob are related to this as well. By the way, the report is good. Its a little long and droll but what the hey. It's not exactly an interesting issue, except to those of us who worry about electricity reliability. How do all these three things come together? I mean, this is an issue of I read your report and I heard your presentation and I looked at your slides and I say well, we are all talking about the same basic issue. What I don't see is integrating lessons learned from the different pieces. How is what you are doing in Electricity 2005 being informed by Doug's report and by the stakeholders and vice versa? So I want to be able to see at some point how this is all being integrated and coming together into one package. MR. HALE: The way I'm looking at it is that what I'm trying to do is get a view of the federal, not just EIA, interest in collection and as part of that get some idea of where EIA fits in in specific type areas. It's driven entirely by the federal government's needs for data in areas of traditional responsibility. That is one of the first hurdles. A new data collection has to pass with OMB in order to be approved. I think other things when you get past that initial hurdle that they are going to consider very much are things like what do the other users think about the data. Is this responding to the public's other needs other than just purely the federal interest? Bob has to worry about can you really collect this stuff on the ground. What does it really cost? But Bob also has to worry a lot about what is FERC doing. So one of the things I was trying to do was lay out a picture of what the data situation is in such a way that FERC and EIA, Justice to a lesser extent, Agriculture to a far lesser extent, could look at it and say yeah, we all see the same picture. Now let's use that as a picture for guiding our developments of transmission data collection. That's how I see it. I know that Bob has enough problems. He needs another report like a hole in the head but he got it anyway. MR. SCHNAPP: Well, if I could just add to that, the project was originally set up so we could have focus groups, get information from them which would feed into Doug, who would write his report, which would then feed into us and we would use that information to figure out what should be collected by the federal government and then what EIA should do and try to get the other agencies that would have to be collecting the data also involved in it. I can't say that the whole project from beginning to end has worked out the way everybody wanted it to but it's very difficult to get the attention of other agencies if they are not under the gun for it. But with the recommendation coming out of the blackout task force report it may be an impetus for us to make sure that they will sit down with us. FERC is a regulatory agency. They have a different reason for collecting the data. They do different things with it than we do. So a lot of data collected by FERC is not made available in an easy to use manner. So there are lots of things that the federal government needs to do. Our objective is to work forward with them, NERC and the ISOs, the RTOs, everybody, so that we can get what it is that's needed. It's going to take quite a while. Focus group meetings, they wanted real time data. EIA is never going to collect real time data. I mean, that's a fact. That's just too expensive for us to process in any way. But something on a more regular basis is possible but everything has to be laid out, who is going to collect it and then what are they going to do with it and how are they going to make it available to everybody. So there really was some logic to how we were moving forward. Time doesn't always agree with you but we were trying to move forward that way. MR. HALE: It would have been a lot clearer if the whole issue was a lot easier. MR. BERNSTEIN: It's a really complicated issue. MR. HALE: Then I would have gotten things done earlier. MR. BERNSTEIN: It would have been nice to see in your presentation referrals both to these other projects and whether they actually had any impact on the decisions being made to move forward. MR. SCHNAPP: Actually the one on new generator interconnects comes directly from Doug's report. MR. BERNSTEIN: And the other thing that might be useful is some schematic somehow. I haven't figured out how to suggest you do that but something that visually shows the different data sources and needs and related to the types of report type outcomes you can get, reliability or congestion or something like that, something visual. If you are going to go up to the Hill and try to get money for this you want to relate it directly back to the August 14 blackout report and you want to be able to visually show well, if you are going to ask us whether the grid is reliable enough to support our economic growth in the future this is what we need. And be very explicit. This is the type of report to get out and be very up front on it. DR. BURTON: I can't decide whether I've got a question. I keep thinking of things and then you keep answering them. MR. SCHNAPP: Sorry. I won't let that happen again. DR. BURTON: No, I'm going to ask questions that are probably impossible to answer now and it's your fault. Does anybody have a sense of how long it would take to modify the collection processes to provide the data that's necessary to conduct the analysis that Mark or Doug talked about? Does anybody have a sense of how much money it would cost? If I said I'm king today, you can ask me and I'll give you the money, how much would we be talking about? MR. SCHNAPP: Nancy, how much do we need for EIA, $70 million? DR. KIRKENDALL: Eighty-five million would be nice. One of the things that everybody seemed to want was real-time data, especially Doug. Doug loves data and he wants it real-time. As Bob said, some of the data are available on some ISO websites. So there's some data like that out there. We are probably not the right people to collect it but if FERC or some other regulatory body would work with everybody and get them and get them to make data available then at least that would solve one problem. The data would be available. But we are probably not the right ones to collect it. Our traditional way of collecting data, and maybe we need to evolve in the future, too, is on a form that you can send to somebody and they can fill out. You can do that by Internet but it's still a form that somebody fills out and sends back. And we prefer not to have them enormous. DR. BURTON: What I'm trying to get a sense of is whether this is something that's tractable or whether we need to at least set it aside as an impossibility. MR. SCHNAPP: Well, I think additional data is going to have to be collected and a number of agencies are now being put on notice because of Recommendation 11 that they need to sit down with us. Once we do that and take all the input that we've gotten, and they may have other input as well, scope out what should be collected and by who, once you have that done you have to design the form over several months. You have to put out a Federal Register notice for two months, address those comments, go back to OMB. You are talking about a year of that particular process and probably a year or maybe two years to really evaluate what has to be done but maybe a year. So I think we are talking about two years or so to get off the ground. Certainly I would think, and I will go on the high side, a couple million dollars. I don't know if that's a year or the total but I think at least a million dollars a year. I've got to tell you that. DR. KIRKENDALL: Another recommendation that has been made is to have somebody clean up the FERC Form-1 data, which is another data set which is not usable. MR. BERNSTEIN: I seem to be stuck on a theme today which happens to me sometimes. Again I'm not sure I know what you are trying to get out at the end of this. Maybe what needs to happen is asking and designing what the report is going to look like. What report do you actually want to produce and what is the electric outlook going to look like if you had the information you wanted and then you work backwards and figure out how you are going to get that information. It's possible you have done that but I haven't seen that. It's hard for me to make judgments and, again, it's hard for you to make the argument to collect the data if who you are making argument to doesn't understand what benefit they are getting out the other end. So if the benefit we can get out the other end is we can raise red flags on where we are going to have reliability problems out in the next couple of years, well, that's a really good outcome and everybody is going to want that and you are going to get a lot of support for that and then you are going to tell them well, this is what it is going to cost you to actually get that and if you can't pay for it you ain't going to get it. MR. SCHNAPP: That's why our budget request is being put together now. That will do that. MR. BERNSTEIN: But it's hard for me to comment because I haven't seen that process. DR. KIRKENDALL: Well, I think a lot of that is we really don't know what the report that we want looks like. That's one of the reasons we had the focus groups. We wanted to know what the other people need. Where are the data question people? We have some analysis capability, too, but if we knew, what we'd like somebody to tell us, is what report should we be producing. Now, Doug's data, he wants data input to models. It's not necessarily a report. MR. BERNSTEIN: But the models are going to be used for something. I mean, in the end it's the electric outlook. And you are going to add some stuff to it now or maybe you are going to create a new report. But I think you've got to work backwards ÄÄÄÄ this is what we think we need to be doing and then figure out the data you need to be doing that and then if it's not going to be funded then you are not going to do the report. You are not going to have that information. More and more you are being relied on to do and provide that analysis. That was obvious ÄÄÄÄ and clearly that's being the case. So perhaps you need to make the argument stronger if this is what you want from us this is what it takes to get it. Maybe you do it in other forums but -- DR. KIRKENDALL: I suspect you're pointing out something that we are not doing very well. MR. BERNSTEIN: And it's really not an issue for the ASA group per se. DR. KIRKENDALL: What we did, Bob looked at Doug's report and he heard what the focus group came up with. What he did this time was to make the changes to his current forms that would capture some of the data missing. So that's what you can do now in a very short time period. MR. BERNSTEIN: But if you were able to do that, what at you proposed here, what do we get at the end? That's what I haven't seen? What do you get in the electric outlook that you don't have now if you were able to do all this? That's an important element. MR. SITTER: I think that there are two aspects to that. One is to continue to give you what you are getting now with the changing market and one is what do we need in addition to be able to handle issues like blackouts. Separating those two will help a lot as well. I don't think it's just a matter of we need to do more so we can get more. It may be a matter of with the deregulated market we can't do what we used to do or what we are giving you is not really even measuring what you would like it to measure or may not as this continues to grow. There might be two aspects. MR. BREIDT: I think we'd probably better head into the breakout sessions. We do have a break following the breakout sessions. If we could get back in here about 3:30 I'll say 3:30 but I know it will go a little beyond that so let's shoot for 3:30. As far as the committee memberships at the breakout session, they will be the same as last time. So if you were up here stay up here. (Recess) DR. LU: Good afternoon. Thanks for coming to my session. This is ongoing project now and what I'm trying to talk about is how to estimate a weekly petroleum inventory using both weekly and monthly data. I gave you the head note. Table 3, you can see that. This one is the stock of crude oil and the petroleum products. On this slide you can see the first component is crude oil. Then the rest are the petroleum products. So these are the major categories, total motor gasolines, distillery fuel oil, residual fuel oil, jet fuel, unfinished oil or PEM (?), and other oil. So that's the last component we are working out. You can see on the first block the other oil including the propane on the very top, first one, in 2003. In 2003, the very top one, you won't see the propane there but the second block you will see the propane because there is a new system coming in. So you can see the third block is the weekly, which is the other oil not available until April, so I'm going to talk about this. Our goal is to estimate other oil minus propane. So in 2003 propane was included in the other oil mainly for the weekly other oil stock for weekly publication. This is a weekly publication, Table 3. We want to find what is the best way to estimate that and if there's any other alternative methods we want to know how to compare it with the estimate to the monthly data. So what we have, we have two kinds of data or two sets, monthly stocks of oil component, monthly collections covers all of those seven. Crude oil, motor gasoline, and other oil are about 15 different components in the highlighted area you can see. That's including the aviation gasoline, kerosene, and all those minor components we put together as other oil. So monthly it's the census. So we will collect all the information of every minor component in those seven major things. The weekly we only collect the seven categories. So you can see there's a difference of the monthly data and the weekly data. Then we compare these two data theories from January 2002 to June 2003. You can see here there is a discrepancy. This is about 20 percent. In this one it's about 13 percent difference. The unit is medium barrels. That's quite a number. That's about 27 medium barrels difference. That's a lot. So we want to avoid that estimating discrepancy happening again. April 2004, early this month, we started collecting the weekly propane data. The propane data was collected just only in the winter months before but now we start collecting every month, every week, so that's a new thing. That's your advice coming at the right moment. My work is employ a new methodology to estimate the weekly stock of other oils. I compare the correlations of other oil minus propane with the other major components, see how they correlated. Hopefully if they are the same patterns of the time series, seasonal patterns, then you can use that information. Also we investigate a cross correlation associated with the transfer function model. We are looking to the ÄÄÄÄ modeling and the transfer function modeling. The correlation you can see here, this dependent variable, motor gasoline minus propane. The rest is the other major component. DR. KIRKENDALL: Does it start with other oils minus propane? DR. LU: Yes. DR. KIRKENDALL: You said motor gasoline. DR. LU: And I'm using the five years data, January 1998 to December 2002. Let me give you the picture of this five years. This is the ten years. Everything here you can see all this component and this is the one, other oil minus propane, the yellow one. It's our interest. It's the other oil minus propane. You can see the outline is quite similar pattern with the yellow one but it is not clear on the picture here. So we look at the transfer function model, look at the simple other correlation, and the partial ÄÄÄÄ correlation of the other oil minus the propane with the other major categories of petroleum products. Then we need to apply the transformation degree level one and then the seasonality of 12 ÄÄÄÄ this time the series will be a stationary time series. After that then we check the close correlation with the input variable, treat the others as an input variable, the other six major categories. The output variable will be our interest to work out. I'm going to show you two examples of this time series. The stock of the distillate fuel and other oil minus propane, this time series of those two time products, and the close correlation significance is a lack of ÄÄÄÄ so which means once we have any significance at any leg the transport function model may be applied here. If we look at the motor gasoline, start with the other oil minus propane, then you can see the picture here and the cross correlation. None of the legs is significant which means if I'm trying to apply the transfer function model motor gasoline would not be a candidate as an input variable. But distillate could be a candidate as an input variable. So my question to committee members is are there other alternative methods so we can estimate a weekly other oil minus the propane. Remember, now I don't have to say minus propane because propane will be automatically separated out from other oil as you can see here. The last one, April 9, the propane is out. So this is my talk. DR. NEERCHAL: I think we have plenty of time for questions. Just to start out and to make certain things clear, if you go to your slide 5 you are talking about the comparison of weekly estimates and monthly data? DR. LU: Yes. DR. NEERCHAL: Can you give us a little bit more detail? This is not a result of the transfer function model, is it? How do you get these estimates? DR. LU: This one, the other oil will include the propane because it's 2002 and 2003. Month 3, the blue one is a census so we collect everything. But the red one is an estimate. It's weekly and then converted to the monthly. DR. NEERCHAL: So add them up? DR. LU: No, it's not added up. Its just an estimate to there. DR. KIRKENDALL: Can you tell us how it was estimated in the past? DR. LU: The weekly converted to the monthly is derived by computing the average daily rate of the stock change for the minor product of each month. So you have to figure out each month the minor product then the stock change. So you take an average of that based on the last six years, then using this daily rate and the minor stock from the previous months and adding it up. DR. NEERCHAL: On a moving average type of thing, I think. DR. LU: It's already an increment in the system so they just put the data in and then they will ÄÄÄÄ DR. NEERCHAL: And you are trying to make this one better, right? DR. LU: Yes, it's the current system. DR. KIRKENDALL: The discrepancy is what made people decide we needed to do something. DR. LU: This is the current system we have. DR. NEERCHAL: Again, the clarification thing, we have plenty of time. I looked at your cross-correlation of distillate with other minus propane and that's about .348 at zero. Shouldn't it be pretty close to your correlation in your slide 8 where you are doing essentially the same correlation? DR. LU: Do you mean here? DR. NEERCHAL: Yes, here you have this ÄÄÄÄ .348. Shouldn't they be the same or shouldn't they be pretty close? Essentially they are ÄÄÄÄ different data sets? DR. LU: No, this one is -- DR. NEERCHAL: This will help me understand what I'm dealing with as far as the data. DR. LU: This one I did not transform. So once you difference once and then difference seasonality so the correlation will change because I just treat as regression data ÄÄÄÄ check. That's the first thing you do. You do well, look, if I had two pair of similar sets of the data then I would look at what's the correlation. Then I would realize it's a time series and you have to filter out the seasonality. Then also you have to transform to a stationary time series in order to do any filtering or connections. MR. BERNSTEIN: So what you are proposing to do is estimate the others through the relationship to distillate and motor gasoline because you saw a correlation? DR. LU: Yes, I'm proposing to do it. This is my interest. With any object these are the major ones, motor gasoline, jet fuel, distillate, residual, unfinished crude oil, or even propane. MR. BERNSTEIN: I'm trying to figure out why stocks of these other fuels would be related to motor gasoline or distillate. You can show me all the correlations you want but I'm trying to figure out from the standpoint of how the industry works why there would be a relationship. I certainly wouldn't go and start estimating something unless I can explain -- DR. KIRKENDALL: Well, right now this is not an industry analysis. It's a data analysis. It is something that has a predictive capability based on that. MR. LIDDERDALE: I do the short-term energy outlook forecast, which includes other oils. I'd like to make a suggestion on your analysis and raise one or two questions that may address your question. First off, the other oils contain several products that don't have the same seasonality. For example, included are asphalt and road oil, which is a high summer demand product. Then you have butanes, ethane and butane, which come from natural gas liquids which have seasonality very much like propane but also distillate. So the rationale between relating other oils to distillate is the seasonality and the inventories that distillate heating oil inventories are lowest during the winter or drop during the winter and the same thing happens with the LPGs that remain in the other oil category. They are also winter demand products. So the correlation there could be related to you're going to have a large draw on distillate when it's very cold and you will also have a large draw on the LPGs. But that's not at all consistent with the other components of the other oils category. What we do in the short-term energy outlook is we forecast those components separately, where we look at LPGs produce a separate forecast for LPGs based on monthly stock changes and also a separate forecast for the other products that may have their own seasonality. So hopefully that clears up your question of the correlation. My suggestion was that you, number one, of course, when looking at other oils even econometrically separate out the components to sub-aggregates, probably improve your look at the seasonality. Number two, you do the analysis without transforming and what I've found is you do a better job looking at first differences, in other words stock changes rather than the level of stocks. Since I made that change a year or so ago it's ended up with better forecasts. The third thing I suggest is when you are doing essentially a forecast that you can also use the forecast of the short-term energy outlook which comes out around the eighth of the month. I'm not sure why we are doing two separate forecasts. That's my only comment. Thanks. MR. BERNSTEIN: You are not talking about doing a forecast. Now you've got me confused. You are not talking about doing a forecast. You are just talking about putting out the data, estimating the data for -- DR. KIRKENDALL: Well, you can look at it as a forecast because we don't have data for other oil. DR. LU: We don't have the weekly, see. DR. KIRKENDALL: We are going to predict what's going to be reported for other oils that week. MR. BERNSTEIN: Right, but you're just estimating how much. You are not doing a -- DR. KIRKENDALL: You can write it as a forecast, though. MR. BERNSTEIN: So where do you get your data for the short-term energy outlook? MR. LIDDERDALE: For the short-term energy outlook we are using the exact same data but I do not use the estimates of other oils in the weekly petroleum status report. I use my own estimates. MR. BERNSTEIN: Why couldn't you use his own estimates? MR. LIDDERDALE: It's possible. There are some problems. Number one, it doesn't meet -- MR. BERNSTEIN: And shouldn't you be using the same thing anyway? MR. LIDDERDALE: Number one is our estimates are certainly not better than the estimates published on the web server. They are often different, not necessarily better. I have done analysis over the last couple of years. Number two, there's a fundamental difference in how the numbers are used. For example, when a petroleum supply monthly comes out both of us end up having to adjust our numbers. I can adjust my numbers easily because I simply change all of my history. The problem with the week petroleum status report is that the history between the petroleum supply monthly and when they are publishing doesn't change so it has significant ramifications on their estimates of other oil demand. I think if you start looking at it that's where you have a bigger problem. You have some problems in your stocks but I've got a feeling that your error's in the other oil demand because other oil demand is imputed from the stock change, that you actually have larger errors in your other oil demand. So the short-term energy outlook forecast can be used but we only come out once a month and they are doing their estimates weekly and there are inconsistencies in the structure of how each estimate affects everything else in the body of work. So it could be used as an input but I'm not sure it can be used as a replacement and I'm not sure that it can be made consistent. MR. BERNSTEIN: It certainly seems to me to be a problem. Maybe I'm missing something but it seems to me to be a problem to have two separate parts of the organization using the same -- DR. LU: Well, we both use the same data. MR. BERNSTEIN: But you have two separate estimates, energy outlook included. DR. KIRKENDALL: You see, he doesn't really care about weekly data. He's doing monthly and he's doing forecasts up to two years out. His main purpose is forecasting. MR. BERNSTEIN: Right, but if his monthly data is different than your weekly data -- DR. KIRKENDALL: Well, our monthly data are different from our weekly data, too. We have a survey that gives us weekly estimates but then we get monthly data two months later that replaces them. So the weekly gives you preliminary estimates for what's going on and then it's replaced by better estimates later after the monthly surveys are in. MR. BERNSTEIN: But within that same time frame you have produced another short-term energy outlook, right? So between the time you have been estimating weekly data you have been doing a new short-term energy outlook potentially using different estimates of other oils. MR. LIDDERDALE: Well, the benefit of this work is hopefully I will be able to change that. That's going to be one of the objectives particularly I'm here for, that well be able to reconcile the two. DR. NEERCHAL: The purpose of the weekly estimate is just for about a month and a half. DR. KIRKENDALL: Yes, I mean, we do compare weekly to monthly and we probably make sure they represent one another pretty well. DR. NEERCHAL: As soon as the monthly figures come the weekly numbers are not important any more? DR. KIRKENDALL: Right. DR. LU: Because the weekly is timely so we want to release as early as possible so industry can use it. Is that right, Mike? MR. CONNER: That's correct. DR. LU: Well, he is the manager of the weekly so he will have more idea. You can see here the weekly is published April 9 but a monthly only January. So the monthly will have a couple months lag behind. So what we are trying to do is using the monthly and trying to estimate a weekly based on what we have. DR. NEERCHAL: It seems to me one other accommodation is for the two groups to talk and make sure you are using the same database and perhaps use the last monthly estimate from there and worry about it because it is really very short-term, like a month and a half. So think of it that way rather than a whole new model. DR. KIRKENDALL: It's a different purpose and we certainly need to look at it but I'm not sure his estimate is going to be a replacement for his product. MR. LIDDERDALE: It wouldn't be. MR. BERNSTEIN: But you certainly have to make sure there's a consistency between what you are estimating on a weekly basis and what they are using in the short term. It doesn't have to be the same number but it better not be very different. DR. KIRKENDALL: Do you use the weekly numbers in the STEO or do you just use the monthly data for the STEO? MR. LIDDERDALE: For all the other products or other oils? DR. KIRKENDALL: No, for all the other products. MR. LIDDERDALE: Yes, because they are survey numbers. MS. KHANNA: Well, I have a very basic question here which is how important in the scheme of things are these data on other oils? I mean, if it's a small fraction of everything we have ÄÄÄÄ cost-benefit analysis. We could have a weekly survey going out and spend a lot of money and get really accurate data and not have to estimate anything. But if on the other hand in terms of let's say the STEO or any other model that you run, these other oils, it's really almost as good as the ÄÄÄÄ than just running a simple correlation of something that's probably just fine. MR. BERNSTEIN: I'm not complaining about the correlation. I want to know is there a reason for the correlation before you actually use it. I mean, there has to be a logical reason for it. That was the starting point of my question, which you answered partially, but I don't have a problem with doing the estimate. I was just trying to get a feel about why. MS. KHANNA: I completely agree with you but also, just thinking about it, I mean we went on this big circle of discussion but your point is absolutely correct. Before you run any correlation you want to know why should the two things be correlated in the first place because otherwise you can get something very spurious. My other question is yes, okay, we could spend a lot of human hours deciding all of this but if it's really just one little bit of a whole report that goes out and it's not a big part of that is it worth spending that much time or have you got something like a correlation? We don't even know why it's correlated, let's say, worst case scenario, but it seems to predict well on the very short-term basis. It is not a big fish in the pond. Maybe you don't want to know more. MR. BERNSTEIN: That's related to the other question I was asking. When you go back to the reason for doing this you showed us some amount worth of data that had only a couple of points that were out of whack. How many times has that occurred? I mean, up to that point it didn't look so bad. DR. LU: Yes, exactly. DR. KIRKENDALL: Mike is at the table now. He is the survey manager for the weekly and has worked a lot on those problems. MR. CONNER: When we started hearing this talk I said to Tammy up there quietly well, we define a big discrepancy as one that leads to John Cook being called into the administrator's office. DR. KIRKENDALL: We have an administrator who knows a lot about petroleum. MR. CONNER: This was enough to cause that to happen. MR. TSENG: I have one comment in response to Mark's comment about methodology to correlate things. This is about industry. If you look at a component of other oils we have some of the elements like hydrocarbons, oxygenates, aviation gasoline, blending components, naphtha. So some of the components are actually intermediate products. They are blending components. Some components like lube oils, waxes, coke, asphalt, those are end products. So if you don't fix the basic demand supply refinery operation to capture some of the intermediate product will be used to produce end use products you are not capturing why they correlate or they don't correlate. Seasonality may capture some but you really have to understand which component is related to end use product, which components actually are end use products. So if you see end use product, say demand for lubes and waxes increase, then even though the refinery output of this product is more or less stable to stock components but their elements would change. So you are adding up several elements which may have opposing forces and which can change your estimate. So using motor gas or distillate is not the best way to capture that correlation. MR. LIDDERDALE: I mean, all of what you said is correct that there are lots of different things involved there that respond to different forces, but ultimately a lot of this depends on how much crude oil is being run through refineries. To some extent, at least, the same forces the same basic you've got this much oil and you get this much product out of it, so if you run less oil you get less product and that affects all the products to varying degrees, granted, but it does affect all of them. So there are certain underlying reasons why all of those products move in the same direction and your point regarding the intermediate products, those also are coming from the same pool of crude oil. MR. TSENG: Yes, if you look at refinery operation you have this category called LRGs, liquefied refinery gases. That could be from refinery, that could be from imports, or that could be from natural gas liquid plant. So the components may or may not be directly related to refinery operations. Crude oil is another stream so the process is different. I think the LRG category is more or less for blending components and maybe for petrochemical feed stocks. So I think we getting in the weeds here but the thing is because of the characteristics of other oils it's not so simple just run using motor gas or distillate as driving forces to estimate. DR. KIRKENDALL: Remember, what was done before was basically the assumption that you took the stocks of other oils on the market or seasonal patterns and turned them over six years. That didn't do badly for a long time. That's a pretty simple model. There is no underlying industry knowledge in there. It's just the seasonal pattern. So there was a change last year when natural gas prices went up and there was some change in the news coverage but he has the point that do we care about something small. In fact the weekly has improved things a lot already because now we have propane data separately. So that other oil stock where you saw the big discrepancy included propane and now it's not in there so we have a smaller group of things. Our error hopefully will certainly be no larger. But that's a smaller set of products that we are trying to follow. MR. TSENG: And also petrochemical feedstock is in the other. So when natural gas price is high petrochemical feedstock is more valuable. That's what is driving the changes. I think when you look at the market, and I am just trying to follow Mark's approaches, understand the market and understand why it happens and then find the best way to model it. MS. KHANNA: But then if you've got such a heterogeneous group of products in there ÄÄÄÄ would mean that either you model them each separately or at least lump them together with products that have similar markets, and that's exactly why I asked my question because this could become a monstrous project. DR. KIRKENDALL: I think I'd look at what Pyng's done to aggregate products. There are several things aggregated. That may be a good starting point, anyway. MR. BERNSTEIN: If there are components in here that have a rational reason to be related to motor gasoline and components that have a rational reason to be related to distillate then you split the two, estimate them, and put them back together. When the students come to me and they do these correlations I send them away and go tell me why first and then you can do the correlations. They get pissed off when they give me correlations without relations. DR. KIRKENDALL: As a statistician we frequently do these sorts of things without knowing anything about the industry. We do a pretty good job sometimes ÄÄÄÄ MR. BERNSTEIN: I know you know these things. DR. KIRKENDALL: I think the relationship should show up in the correlations. I don't want to know that there are relationships before I look at correlations. MR. BERNSTEIN: But I can show you lots of correlations between things that are just by accident. DR. KIRKENDALL: I think over a six-year period of monthly data I would be surprised if there were spurious correlations. They are a highly seasonal series. I think the high correlation you see is the seasonality lines up pretty well because the seasonality is a very strong characteristic of ÄÄÄÄ MR. CONNER: To some extent a lot of these intermediate products are byproducts of running crude oil for the purpose of making gasoline. So as you run more crude to make more gasoline you automatically get more of these other things as well which is one reason why we think they are probably connected. MR. BERNSTEIN: Do they correlate with refinery output? DR. KIRKENDALL: This is inventory -- MR. BERNSTEIN: I realize but the point is you get more inventory on this stuff when you get more production because there's ÄÄÄÄ demand for ÄÄÄÄ on this stuff I don't think anybody strategically plans for many of these things to keep them in stocks. They are just there as byproducts and they sell them when they can sell them and they don't sell them when they can't and so they have stocks. Maybe I'm wrong on that. MR. CONNER: I think that's essentially what we are proposing here, that the stocks of the gasoline and the distillate and the other major products are sending you information because we have survey data on those. MR. BERNSTEIN: Well, then that's a good explanation and that's fine. That's what I needed to hear first. DR. KIRKENDALL: I was going to say what was done before, the one that was in the discrepancy, was only based on monthly data, the seasonal patterns for monthly data, took the last monthly observation in March and forward. So our thought here was maybe we could learn something better by looking at some of the other series that we have weekly data for. So you are looking for relationships with the monthly data but then we might be able to translate to a model that would help us do a better job weekly. MR. CONNER: And that in fact I think is part of the reason you see some improvement later. I mean, part of the discrepancy came from a problem in Venezuela that led to less crude oil going to refineries. But we didn't pick up on it quickly enough and eventually we end up in the administrator's office. I think there was another comment here where somebody said we've got a problem with the demand that, of course, is related to the stock changes. In fact that's the whole reason that we have this other oils component in here, so that we can do the total product inventory and use the difference there to calculate total demand and then the other oils fall out as the difference between the major product demand and the total demand. MS. KHANNA: I'm just looking at your slide there. The pale blue, that's propane and the blue in the middle right next to the yellow line, that's your others minus propane. If you look at those two series they track the yellow almost perfectly. So if you are going to be collecting weekly propane data your problem might actually dissolve in terms of getting the two things to be -- DR. KIRKENDALL: Actually, your correlation with propane was .6 so you haven't run that cross correlation yet. DR. LU: No. I just run two to show you one category of the petroleum product may be cross-correlated with the other one, with the one we are interested in. There are six of them and I only ran two. One showed where there has a ÄÄÄÄ and the other one has nothing. Besides this any other suggestions ÄÄÄÄ I look differently as a transfer function model or -- DR. NEERCHAL: Yes, I think I was going to say I think the discussion so far we have been worried about the model formulation. I think that does need some more thinking and getting with the monthly people and try to learn from some of the stuff they are doing. I think, given that you are going to model, there's another decision, I think. It seems like a good thing ÄÄÄÄ how much money or how much effort ÄÄÄÄ DR. KIRKENDALL: That should be a fairly simply model. It was pretty simple before. We are not looking for something complicated. We think we can do something so we can do a better job than we had before. We know it will be better because we have propane. DR. NEERCHAL: So can I ask the committee to think about the methodology question? Transfer function is a linear regression so this seems like ÄÄÄÄ the other series have information and a regression is a way to capture it definitely. It is basically an interpolation using some extra series. That's what you are really doing. So it seems like a good candidate, methodology. DR. KIRKENDALL: We might look at trying to group some of the minor products to see if they have some that group better in terms of seasonality for predictive purposes ÄÄÄÄ provide some guidance ÄÄÄÄ MR. LIDDERDALE: Well, separating them out and then going to estimating based on first inferences of stock change instead of stock level. DR. KIRKENDALL: I think what Ruey-Pyng came up with in terms of looking for the relationship would be to use the ÄÄÄÄ difference and the seasonal differences. MR. LIDDERDALE: Well, that's what I'm going to do when I get back to the office, try to figure those out. DR. NEERCHAL: And I think the other question ÄÄÄÄ how do you compare estimates of the others and the other question about should you use absolute deviation or square ÄÄÄÄ DR. LU: What happened here is we have the monthly but the monthly is not up to April. So if we are trying to say something about April 16 we need to forecast three months to here and then we will have the monthly data. DR. NEERCHAL: One of the things that I think you should do is after a month and a half you get another estimate. I think that is one check already that you are fairly close to whatever methodology you come up with whenever there is a comparable estimate from another methodology something that's independent so that they look pretty close. I think that's a check you should run in checking the estimate. But while you are exploring different models, for example, I'm not sure it will make a whole lot of difference whether you use mean absolute error or square root. i don't think it's going to be a particular issue, assuming ÄÄÄÄ transfer function model ÄÄÄÄ any other comments? DR. LU: Well, we will work with the STEO and compare the methods and then see. I believe Mike has proposed other improvements. DR. KIRKENDALL: Actually, Mike and Mike, there is something that they are using already. There's a number in here so they have a methodology that they are using. That will certainly be one that we end up comparing to in looking at alternatives. DR. NEERCHAL: Thank you very much. (Recess) MR. BREIDT: We are now ready to summarize the breakout sessions and we will begin with Najaraj Neerchal. DR. NEERCHAL: We had a very good breakout session. We had lots of lively discussion, not only among the committee members but also with the audience involved and it was a very good thing because two people in the audience turned out to be very important people for the project. One of them was Tancred Lidderdale from Short-Term Energy Outlook and the other person is Mike Conner from Oil and Gas. The problem, very simply stated, here is the breakdown of all the petroleum products. You have crude, motor gasoline, distillate fuel oil, jet fuel, unfinished oils, propane, and other oils. Essentially you have weekly estimates available for everything here except for other oils. For other oils and everybody else you have monthly data. The challenge is to come up with weekly estimates for other oil to match with the others so that we can get a weekly estimate of the whole thing. So it's a simply stated problem but, like all the simply stated problems, it's more complicated than it looks. The current methodology uses historical daily averages and that's some kind of a ratio, adds things up to a month and comes up with a weekly estimate based on historical daily averages. The proposal is to use something like the transfer function model or any other suggestion that the committee can come up with and improve the current estimate. The first question the committee asked is how important it is to really improve this estimate. And it turns out that the administrator knows a lot about petroleum products. So a small difference here will be noticed. So it's a good time to put something forward and say hey, statistics makes a difference. So I think it is an important problem that we solve that one. So the discussion actually had a very good of exchanging information among Mike and Tancred and Nancy. That was a very good outcome. I think it justifies everything else. I think the statistics part in this project is somewhat secondary. Hopefully something simple will come out based on the discussion among the three groups. But the most important thing is hopefully this meeting has given impetus for them to sit down together and come up with a combined approach. I will stop right there. MR. BREIDT: I'll be summarizing the other session, which was on testing EIA surveys. This was also a very interesting breakout session directed by Stan Freedman and Bob Rutchik. Shauna and Karen were also contributing. The committee was asked to consider two basic questions, one of which was whether EIA was currently using appropriate methods for testing their surveys and the second was were there other methods they should be considering. The kinds of methods currently in use include a pre-survey design visit, and this was something that had originally been suggested by Seymour Sudman, where site visit teams actually go out to the people who will be filling out these surveys down the road asking if they actually understand the questions and then finding out something about the structure of their data systems so that they know how easy it is to get those data out of the record systems at the top of the form. The second kind of method was cognitive interviews and these were cold interviews in which you just show up with the form and have somebody work their way through it while you are standing there, which wasn't thought to be useful by EIA. Then there were respondent debriefings on completed survey forms. So they send them out, give them some time to look at them, fill them out, and then show up with a site visit to work their way through the forms and decide how well the respondents are understanding what's going on. Then finally there are the usability testing methods and those are really separate from the respondent debriefings and the other methods. The usability is like on a web-based survey how well do you navigate through the fields and things like that. So the general impression from the committee was that these were appropriate methods for testing surveys. One of the questions that came up was the number of iterations that were used in doing this testing. Ideally you would have something along the lines of a pre-test and then a design and then a test and then a revision and then another test because the first revision might introduce new problems. So they don't always have time to do that but if they did that would be ideal. So the other major question was were there other methods. One of the things that came up in discussion was that when they go out with the site visit team at the point of the respondent debriefings they often have in the room a number of people including someone at an administrator level and someone at a clerk or flunky level and this can cause a lot of problems. It can cause different kinds of problems, at least two different kinds of undesirable effects. One is the impress your boss effect where the data clerk is on their best behavior to understand everything perfectly and that may be unrealistic and artificial in what actually goes on in filling out the survey so you may not get good information in that direction. Another is that the flunky may be intimidated by having the boss there so the natural suggestion was to consider separate debriefings for the two kinds of people to split up this hierarchy so you get a little bit of different information from the two sources. Ideally you would have some sort of overlapping questions to address some of the same issues but addressed to the two different kinds of people. So those were some of the main points that came up. Were there any other things that the committee would want to add to that? So I think that we are two minutes ahead of schedule. I'm so happy. We will just turn this over to Inderjit Kundra, who will be talking about the natural gas production monthly survey. MR. KUNDRA: I'm Inderjit Kundra from SMG. Good afternoon. EIA is going to conduct this natural gas production monthly survey. The reason why we are conducting it is because the data which they are receiving is not timely. I think why we need the survey is clear from the slides that the data which they are being received this data ÄÄÄÄ not timely. The other thing is that they probably doubt the accuracy of the data also. So there is a need for that because they want to disseminate it in a timely way as reliable data so we need to conduct this survey. The survey, the results are needed for not only the US but for these big regions also, Texas, Louisiana, New Mexico, Oklahoma, Wyoming, and others. So what we did, how we were going to accomplish it, I used the form ÄÄÄÄ which is given in Cochran, Second Edition. That's called the presumed optimum allocation formula. In order to select that sample we needed a frame. The frame is the 2002 EIA-23, which has been prepared by the Dallas Field Office. The total stream sizes, the 20,906 operators, would look at this. Out of this, 5,527 are zeros, 8,094 are blanks, and the remaining are only 7,285. For the purpose of this exercise we used 7,285 because we just ÄÄÄÄ from the frame. When we used that formula we came up with the sample size. We used the expected ÄÄÄÄ level as 1 percent and 5 percent. If you look at the first line it's giving you the sample size needed for the US, for the nation as a whole. The first column is giving you the number of companies which are selected with certainty. Under the footnote we say that certain companies were selected as follows. Any operator who are having a method of size greater than or equal to the total size divided by 2M, that M is the sampling unit to be selected, the sample size. So when we went to the dealer the next lines are given for the six strata which we have here, Texas being number one. If you look at that, the sample size which is needed, if we go to the regions we need 358 companies in order to provide an estimate at the level of 5 percent at the regional level. But we don't so many companies because 41 companies which have been repeated, which have got multiple strata, multiple regions, those 41 companies have repeated 102 times so it comes to about 256. But then the problem of this was that they want to determine the Federal Gulf region at a higher precision ÄÄÄÄ 1 percent. So if we add that sample size we add only 7 more companies and the sample size comes to about 263. But because ÄÄÄÄ it is expected ÄÄÄÄ pretty sure that if we use the 5 percent of generation for these regions the coefficient of variation for the national level will probably be less than 1 percent. We are testing. In order to make sure which method will work we are testing two sample selection schemes. One is the single space certified random sample and the second is the probability proportion to size. So we are in the process. We have selected a sample at least for the probability proportion to size and we are looking at it and we are trying to compare it, go back to the years like 2001 and 2000 and see how much nonresponse is still there when we are looking at this and how much we are expecting. We don't know about that. Then the question which is how we are treating the zeros and blanks, we selected a sample of 100 each from those companies and this has been turned over to the Dallas field office to look at that, how accurate this classification of these zeros and blanks is. For the purpose of this survey zeros are defined. Either they are not producing gas or they are rounded because of rounding because the gas production is very small so when they are rounded up on to millions or something like that it comes to zero so that's what the zero is. Blanks are said to be due to nonresponse. A lot of business are ÄÄÄÄ unknown factors. But that's ÄÄÄÄ blanks. They're not very well defined. We are trying to find out why there are so many zeros and blanks in the frame because it's hurting us. If they are misclassified then we have a problem. That really is a question for the committee. Then the next one is, I think, of all the states Texas monthly estimates are developed using the Texas Railroad Commission and ÄÄÄÄ model. I think the function of the ÄÄÄÄ model is to provide you the data for the ÄÄÄÄ estimates for the nonrespondents. As we say, as these estimates are revised when the ÄÄÄÄ is received. We are going to test these, at least these certain companies, for some times. If the survey is probably going to give us the results which are more reliable than what we are getting from there then we might even discard this. If the results which we are getting now, they are any good, they are very level, then probably we will not go to the sample. But we will probably see this. We will compare these results for one or two years and see what happens and when we compare enough of that we decide what we want to do. The next one is the question for the committee. The committee's comments are invited on all aspects of this ÄÄÄÄ methodology which we don't have many anyway. If some of the sample operators showing zeros and blanks in the frame have a large production how will the committee treat those zeros and blanks? That's the estimating. That's where the problem comes. If some becomes very high estimate we will have a problem to treat it. That's the main question. Thank you. MR. BREIDT: Questions? MS. KHANNA: I have a very basic question. MR. KUNDRA: Go ahead. MS. KHANNA: The data collected through surveys and ÄÄÄÄ MR. KUNDRA: Not right now. The data which is received right now is probably through the state agencies. MS. KHANNA: So how do you know what the source of the blank is? I'm curious because if it's nonresponse -- MR. KUNDRA: I think EIA-23 is the frame. They have got some companies there which they go to in the field. DR. KIRKENDALL: Actually, its the frame for the EIA-23. MR. KUNDRA: Frame for EIA-23. DR. KIRKENDALL: Because 23 itself is the frame? MR. KUNDRA: No, no, the frame is rated EIA-23 ÄÄÄÄ getting the data, that's the question. DR. KIRKENDALL: Yes. They've also been looking at state agency websites to update their frames so they have look at individual company level data from as many sources as they can. Its a hodge-podge of information. There's data from the 23. There's data from wherever else they could find it. They tried to get company level data from all the different state agencies to fill in. MR. SITTER: Were you able to get them company level data from Texas? DR. KIRKENDALL: I don't know the details of which state they got them from. I think they think they are pretty complete in Texas. I don't think they get that on a regular monthly basis. DR. HENGARTNER: Knowing what happens in Texas, knowing that in Texas sometimes companies don't report or delay would that appear as a blank in your data frame or zero? MR. KUNDRA: That may be one of the reasons. DR. KIRKENDALL: You're talking about monthly data. The 23 is an annual. MR. KUNDRA: That is an annual survey. DR. KIRKENDALL: That's the frame for an annual survey. MR. KUNDRA: We don't have monthly ÄÄÄÄ no. DR. HENGARTNER: You don't have the same problems in Texas with reporting delays? DR. KIRKENDALL: Not yet. MR. SITTER: You're going to run into this problem, it appears. It seems like something that might be inherent for the kind of data you are asking for. DR. KIRKENDALL: Its true. We may have a problem. In fact the Federal Register notice for this survey has just gone out. It's today or tomorrow or Monday. So I'm sure that industry will probably come back and tell us they can't give us the data as soon as we want it. Stan went out on field tests to talk to respondents. So if you have questions about that he can talk about it, too. I think they said that while they might be able to do it in 45 days they weren't sure they could give it to us in 30 days so the timing will be interesting. Remember, our secondary data, the stuff that comes from Texas, there are two reporting problems. The company has to get it to the state and the state has to get it to us. MR. SITTER: No, it may not be a serious defect. If they are already telling you they can't make the deadlines and you say okay, well, this is one you want me to report do they have to make that deadline? If they come in a month late what happens to them? DR. KIRKENDALL: Actually nonresponse will be a problem, especially if it's some of the big ones. If it's a big certainty group, those are big companies, well need to have most of them in or we won't be able to rely on the data. MR. SITTER: But you may want to at least consider the Texas experience; that is, it may be better to get them to give you preliminary results depending on how they collect it. After all, some of these companies, some of their delays may be real; that is, they've got to get information from a whole bunch of different wells or a whole bunch of different spaces. DR. KIRKENDALL: Stan, why don't you tell us what they told you when you talked to several of the operators? MR. FREEDMAN: In terms of timeliness of response about half the companies thought that they could give us data within 30 days after the end of the report, with this caveat. The caveat had to do with the accuracy of the information. Randy, you are shaking your head yes so it's a common problem. There are lots of revisions that go on during the month. There are physical transactions that take place as the gas leaves the lease, which is the number that we want, and then there are reconciliations and financial transactions. They felt pretty comfortable about the physical transactions aggregated up to the state level. There was a lot of switching of volumes between wells and things like that. The companies tended to tell us between and 45 days they could report the data with about 90 percent accuracy. Now, this was a ballpark guess on their part because they had not done an analysis of data going back and looking at the historical information. There was a number of companies that said 60 days would be much more reasonable in order for them to reconcile their books, make sure the data were accurate between 2 and 3 percent. The other problem with reporting in 30 days wasn't so much pulling the information off their computer systems because they get that data in pretty quickly because it's important for financial reasons. But the process of going through the company and getting that data approved and signed off by the president of the company or some senior VP, especially the first few times the survey went out, is what would lengthen the process. One of the companies suggested in this was in the Federal Register notice that for the first few months of the survey we give them 45 days to respond and then we shorten it to 30 to give them time to get used to reporting to us. Now, maybe that's more information than you wanted? MR. SITTER: No, its fine because it does answer the question. When you say 90 percent confidence you are talking about variability, not bias? You see, one of the things that was very clear in the Texas data is that one month after it was too low consistently and three months after it was okay. So that means that the adjustments that occurred in revision over time were almost uniformly increases. MR. FREEDMAN: I don't think they had a clue as to what that bias would entail. MR. SITTER: No, in Texas they don't know. At the company level it may not exist. That's really the question because at the state level it could be the reporting of companies that's hidden underneath there. At the company level you are saying basically that the reporting of wells comes in instantaneously. MR. FREEDMAN: Certainly within 30 days. MR. SITTER: So the adjustments would be probably in both directions if it's based on transfers of stocks ÄÄÄÄ MR. FREEDMAN: Without analyzing the data, they tended to believe that those variations would wash out in most of the cases when they rolled the numbers up to the state level. They didn't think there would be a lot of moving of volumes from one State to another. DR. HENGARTNER: Another question, Phillip, these companies operate in more than one state or region. Do you account for that? MR. KUNDRA: Yes, that's why I said the sample size will be not to 357. It will be 256 because we found that 42 of the ÄÄÄÄ companies which were doing business into all the six, some of them were doing six, some of them were doing five, so they were repeated about 102 times. DR. KIRKENDALL: So a company will report in all of the regions it operates in and if it's selected with a probability in one state they will still get to report for all their other states? MR. KUNDRA: Yes. The only difference will be they will be considered a certainty in the other states. MR. SITTER: But then you get them to separate out their data? MR. KUNDRA: Yes, we do that. MR. BREIDT: So you don't know much at all about the blanks and the zeros at this point? MR. KUNDRA: I don't. MR. BREIDT: Is there any hope of getting something? DR. KIRKENDALL: We just got something yesterday from Dallas. What they did was to match our sample of 100 companies with their list. We haven't obviously had time to look at it but we do have at least the information that they have in their data set. Shauna has done a little bit of preliminary looking at it. MS. WAUGH: I looked at the Kansas Geological Society information. So 21 operators were on the sample of which I was able to find 14 of them on the website. Of those 14 this was a list of zeros; 12 of them were legitimate zeros and had been zero throughout a number of years because the data set that they have goes back I think it's to the early '80s, maybe late '70s. The natural gas is zero. They report oil production. So they are oil wells and they don't have any natural gas production so they are legitimate zeros for natural gas. The frame that they are using is for both oil wells and natural gas operators. There were two cases on the Kansas Geological Society website where they had natural gas production in one case for several years. So there was just something missing in EIA's frame for that particular unit. In another case the company started in 2001 and had no natural gas production. Then it went up and it had like 1,000 MCF. Then it went up in 2003. The estimated amount is 15,000. So I think it's a new well or a new company operating. One of the issues is how many of these companies are merging. Again, a lot of these, the 14,000 that were missing from the sample analysis are potentially very small operators. MR. BREIDT: Yes. The worry, of course, is that some of them are very small. Those are no problem but occasionally you get one that's huge and if you are sampling these blanks and zeros with low probability they get a big weight which means that you have an estimate with very high variability. So anything you can find out about the blanks and zeros, maybe stratifying on the basis of are they an oil producer, when were they started? DR. KIRKENDALL: What we were going to do with the zeros and blanks, these were not going to be part of the monthly survey. We thought we'd take a look and see what we can learn about them. It depends on what we find out but if we find out that there's non-zero production do you have any ideas what do we do? Do we add a constant and make an estimate for the missing piece? DR. BURTON: How many of these firms are there, particularly with the blanks and the zeros? MR. KUNDRA: More than half, a lot. MR. SITTER: You have data from the EIA-23. Is there any way to use that data to weed out those that are ÄÄÄÄ problems? You said some of these are oil. DR. KIRKENDALL: He said this was a sample from the same frame. MR. SITTER: Oh, I see. You don't have the whole frame? DR. KIRKENDALL: No, this is the frame for the 23. MR. SITTER: But you have data from that. DR. KIRKENDALL: Yes. MR. SITTER: Can some of that data identify by auxiliary information some of these zeros that are real zeros? They don't produce natural gas. DR. KIRKENDALL: Well, we've got information that will tell us that for at least some of it but we haven't had a chance to look at it. I scanned down the list and one of the things in the data set they send us is the volumes that they have in their file for the last ten years. In some cases you can see that it wasn't very big and it trickled down to zero, so you can be reasonable sure that that's a zero and not worry about it. DR. HENGARTNER: These are wells, correct? DR. KIRKENDALL: These are operators. MR. KUNDRA: Operators. There might be more than one then. DR. HENGARTNER: They might have more than one well and so therefore they might trickle down in ÄÄÄÄ and jump up again. DR. BURTON: One thing, if the number wasn't so large it would make this a little more tractable. One of the things that I would typically do in a setting like this is go back and look for employment data through state sources or you can purchase that although I'm not sure how. It may be available through census data. You can purchase firm-specific data. States can help you do searches on firms. If they don't have any employees then chances are production is zero. If they have 200 employees they are doing something. But that solution would be tractable if you have 1500 but 15,000 I'm not sure that it's doable. MR. SITTER: Well, I think that the suggestion holds no matter what. If we noted what you can and what you are left with you've got to put it in a separate stratum. You can't put it in a PPS sample. You could get burned so badly. Then one of the things you may want to consider is what questions you might want to ask those people. Maybe you can learn something through your actual survey when you implement it if you are going to sample some of these guys in a separate stratum. DR. KIRKENDALL: The trouble with sampling these guys is they haven't been responding to this and you want a timely fast turnaround, mostly survey. MR. SITTER: That's for the blanks. The zeros have been responding real well. DR. KIRKENDALL: At least they have been responding. MR. SITTER: Maybe when you start a monthly survey the first thing they come back and say is well, we don't do natural gas. DR. BURTON: When was the frame created? DR. KIRKENDALL: In the spring. They have been running the 23 for years so this is a frame they update. DR. BURTON: It was updated just recently? DR. KIRKENDALL: It was updated just recently. In fact I think they did a fair amount of work trying to add information from state agencies. DR. NEERCHAL: What kind of information does EIA-23 collect? DR. KIRKENDALL: It's reserves of production of oil and gas and I don't remember what other. It's not a huge survey so it's mostly reserves and production of both natural gas and crude oil. MR. KUNDRA: I think that if we believe that these are the actual blanks and zeros we have excluded from the frame I have no problem with that. My main worry is that when we have selected a sample and have selected some companies with very small probability and if they turn out to be producing a large quantity of oil and gas then we are going to be in trouble. That's what my main worry right now. Because if we already excluded because we were taking that for granted we have already selected a sample, we are testing the ÄÄÄÄ sampling this is whether the zeros and blanks are actually the zeros and blanks. That probably will give some verification and we can find out. But when we go to those 7,285 and we select a sample from there and some of the companies which are very small they are selected ÄÄÄÄ proportion to size and a very small probability and they turn out to be ÄÄÄÄ that is the problem. That solution has been there. We have to look at that. DR. KIRKENDALL: One of the things we are going to do is we have a history of data from the 23. We have it for 2001 and 2002. Preston and Inder are collaborating on this. We are going to select samples from the frame and predict back to the 23 in 2001 and 2000 to see how good a job we'd do. So we might get an idea how many of these big changes happen with the small companies. Already, I think, Preston has found that you can't find everybody. It's a fairly dynamic industry. So it's going to be a little challenge. DR. BURTON: When they update the frame how do they eliminate firms that have exited the market? How do they know who stopped producing and have gone on to finding new careers? DR. KIRKENDALL: I'm not sure. Bob, do you know anything about how they get rid of companies from the frame? MR. BRADSHER-FREDERICK: I don't. DR. KIRKENDALL: No. Bob King. MR. KING: I can see that as an awfully easy way to retain firms, operators that don't belong. Its just hard to pick them up when they back up the truck and leave. DR. KIRKENDALL: I'm sure some of the blanks or zeros are companies like that where they've just changed or gone out of business. You don't want to leave them off of the frame just in case you don't have good information. MS. KHANNA: Do we know anything about the blanks, I mean, in terms of where they are located? I'm fishing here. If we know anything we could focus on a regional thing. MR. KUNDRA: You don't have the addresses? DR. KIRKENDALL: To be honest, I don't remember what's in the data set we got from Dallas. I don't remember there's an address, but there could be. History of production? DR. KUNDRA: That would tell us something. MS. KHANNA: And that would probably be the name of the parent company anyway. DR. KIRKENDALL: Or the reporting unit. MR. KING: On the 23 there is what is known as deaths, in other words where they have a confirmed fact that that company is not operating any more, be it combined with something else, in which case they will move it over, be it some family firm where the person running it he himself has died. You go back to them and you say okay, who's going to handle the property? That's the sort of thing they try to chase down to see who's handling the wells. If they send everything in and close everything down there's a death. DR. KIRKENDALL: So presumably Dallas removed those from the list before they sent it to us. I'm sure they would have tried to do that where it was confirmed that it was gone. MR. BREIDT: What is the sampling faction for the 23? DR. KIRKENDALL: That's a good question. They have a huge certainty group. There are 1500 companies or something like that selected with certainty so they get very high coverage and then I think it's a probability proportional to size design after that. At least that's what John Woods said when I talked to him yesterday. When we looked at weights it wasn't clear. They weren't all equal, though. MR. BREIDT: But it is a probability sample from the entire frame? DR. KIRKENDALL: Yes. MR. BREIDT: So can you do this in two phases where you sub-sample the 23 or is that not practical? MR. KUNDRA: You can sub-sample it if you know how they selected the original one. MR. BREIDT: Don't you know that? MR. KUNDRA: No, I don't. It's not clear from what they've done. Nancy might know. I don't. DR. KIRKENDALL: John said it was a probability proportional to size. MR. KUNDRA: But when you look at the weights -- MR. KING: When the 23 started out it was about one company for every eight in the random but they have changed it since then. MR. KUNDRA: Then that's not a probability. This is one in eight ÄÄÄÄ MR. KING: When they started out the 23 they sampled one company out of every eight that were in the random sample. Several years ago they changed it and went to this probability. I wasn't involved in that so I don't know that that well. DR. KIRKENDALL: We haven't been looking at it as a sub-sample. It may be an operation that would be a good thing to do. For one thing they have the frame. They draw a sample for the 23 every year. You would think that having this be a sub-sample of the 23 would help those surveys because this will be a smaller sample. So that probably will be a good way to do it. DR. HENGARTNER: Is there a reason why you don't want to tag on directly to the 23 those questions? DR. KIRKENDALL: Its only an annual survey. I mean, they report that data on the 23. They aren't exactly the same questions but -- DR. HENGARTNER: Essentially if you have already a sample of companies in the 23 why not use that sample and go with it or is that too onerous? DR. KIRKENDALL: It's too onerous. That's a sample of several thousand. It's too much for a monthly survey. MS. WAUGH: It's also for both oil and gas and we're only seeking gas. But a sub-sample could work. MR. BREIDT: Is that 23 a panel that's maintained year after year? I guess its been adjusted a little bit. DR. KIRKENDALL: Its dominated by the certainty companies and so they report every year. The big ones always report. I don't know how often they change the identity of the sample companies. I thought I remembered something about it changing every couple of years. Bob, do you know how often they change the probability part of the sample? MR. KING: Based on the deaths and stuff that occur they draw a new sample each year, as far as I know, of those folks. MR. BREIDT: Is there any other discussion? So now is the time for invitation for public comment? Are there any comments? All right, hearing none, I'll go ahead and adjourn the meeting then. Thank you. (Whereupon, at 4:20 p.m., the PROCEEDINGS were continued.) * * * * *