AMERICAN STATISTICAL ASSOCIATION (ASA) ASA ENERGY STATISTICS COMMITTEE- ENERGY INFORMATION ADMINISTRATION (EIA) MEETING Alexandria, Virginia Friday, April 23, 2004 ASA COMMITTEE ON ENERGY STATISTICS: JAY F. BREIDT, Chair Colorado State University NICOLAS HENGARTNER, Vice Chair Los Alamos National Laboratory MARK BERNSTEIN RAND Corporation MARK BURTON Marshall University MOSHE FEDER Research Triangle Institute BARBARA FORSYTH Westat NEHA KHANNA Binghampton University NAGARAJ K. NEERCHAL University of Maryland SUSAN M. SEREIKA University of Pittsburgh RANDY R. SITTER Simon Fraser University ALSO PRESENT: COLLEEN BLESSING EIA ERIN BOEDECKER EIA HOWARD BRADSER-FREDERICK EIA ALSO PRESENT (CONT'D): TOM BROENE EIA PHILLIP BUDZIK EIA GUY CARUSO EIA STACEY COLE Bureau of the Census DAVE COSTELLO EIA STAN FREEDMAN EIA FRED FREEM EIA CAROL FRENCH EIA JANET GORDON EIA DOUG HALE EIA TAMMY HEPPNER EIA MELINDA HOBBS EIA RICK HOGUE Bureau of the Census PAUL HOLTBERG EIA ALSO PRESENT (CONT'D): SUSAN HOLTE EIA CRAWFORD HONEYCUTT EIA ALTHEA JENNINGS EIA DIANE KEARNEY EIA NANCY KIRKENDALL EIA TANCRED LIDDERDALE EIA EDWIN LU EIA RUEY-PYNG LU EIA KEN MARTIN PricewaterhouseCoopers PRESTON McDOWNEY EIA HERB MILLER EIA RENEE MILLER EIA KARA NORMAN EIA JOE SEDRANSK Case Western Reserve University/EIA ALSO PRESENT (CONT'D): TOM SPERL EIA YVONNE TAYLOR EIA PHILLIP TSENG EIA KEN VAGTS EIA SHAUNA WAUGH EIA BILL WIENIG EIA LORIE WIJNTJIS PricewaterhouseCoopers NATHAN WILSON EIA * * * * * C O N T E N T S AGENDA SESSION PAGE Improving EIA's Website 333 Committee Comments and Questions 345 Revising Data Across EIA 368 Summary of Recommendations 419 Fall 2004 Meeting Suggestions 430 * * * * * P R O C E E D I N G S (8:36 a.m.) MR. BREIDT: Good morning. I think we're ready to get started. First, I'd like to ask any committee member guest or EIA staff who was not here yesterday to introduce yourself at one of the microphones. Is there anyone that falls in that category? Everyone looks pretty familiar. MS. HOBBS: I'm Melinda Hobbs, Energy Information Administration. MR. BREIDT: Also if you didn't sign in yesterday there's a sign-in sheet out on the desk. Lunch for the committee would be held at the conclusion of the session on suggestions for the Spring 2004 agenda. Lunch is where we had it yesterday. One other thing is the committee members have in front of you a notice from the Federal Register on the Monthly Natural Gas Production Report. Howard Gruenspecht wanted to get that to us so that's just for your information. We won't be discussing that but you can take a look at it. I'll now turn over to the agenda and the first talk of the morning is Improving EIA's Website, Creating a Vision for the Future, and that will be Colleen Blessing and Melinda Hobbs. MS. BLESSING: Good morning. How are you guys? This morning we're going to talk for a few minutes about EIA's website. Our presentation is going to be in two parts. I'm going to talk for just a few minutes about the next steps that we're going to take in web development and then Melinda Hobbs is going to show you some recent improvements that we've made. You guys have heard me talk before at ASA about EIA's website. You know that we're very proud of our site. We have 100,000 pages, and we have a lot of users. Actually in March we hit a new record, 1.5 unique user sessions, just for the month of March. So we've been going 1, 1.2, 1.3, so we know we have a huge customer base out there. You've probably heard us talk. We've won awards. We have a lot of products that are very popular but. There's a but. We get a lot of customer feedback and we know that there's still a lot of areas for improvement. The last time that we redesigned our site was in 2000, which in the web world is like 1820 or something, so we figured that we needed to take another look, look at some options, and see where we might go in the future. We've undertaken a project to take an objective look at our website, to step back and see where we are. We have a contractor who's helping us. What we're doing is a series of things. The contractor has completed, actually, interviews with internal staff. We did interviews with a total of 28 people in EIA and we're doing interviews with some groups in EIA, natural groups. Now, those could be actually separate people, too, a web team and usability team. And then the third group of people that have been interviewed are external customers. Those are people on the Hill, the media, trade associations, and actually we didn't pick any academic people yet because we're hopeful that the feedback that we get from you will be considered the academic outlook on our website. So we're hoping at the end that you guys will have some interesting comments about our website. What we're going to do is create a matrix of the themes. That's what the contractor's doing right now, create a matrix of the themes that we've heard. The interviews were very open-ended. Sometimes you can think of an interview as a scale of 1 to 5 how satisfied are you. We didn't do any of that. We just said what works, what doesn't, and what would you improve. Go. And some people can go for an hour and a half, showing screens and explaining things that work and don't work, so it was very open-ended and I thought very productive. A lot of people had a lot to say. Without tipping the cards, the cards will probably be tipped at the next ASA meeting, there's a lot of agreement as to the good things and the areas where we need to improve. The next step is we're going to hold a two-day workshop May 4th and May 12th so they're coming up. The contractor is also going to help us with the workshop. The people who are invited to the workshop are, according to the deputy administrators, the key web leaders in EIA, the key players. This is not an educational workshop or a workshop for people to come and listen and learn. I said to him yesterday we're emphasizing work and not shop. It's going to be a working session. The contractor is going to present what the internal people said, what the external people said, the intersections and the differences, and then we're going to try to take a look, I think, at three basic areas, the user interface navigation, if it's easy to find what they're looking for or not so easy to find, how the site is managed. Right now it's managed loosely by a committee and if there are suggested changes to maybe tighten that up. Also the deputy yesterday was talking about maybe we'd be looking at low-hanging fruit and there are some low-hanging fruit things that I think we can have improvements on but there is also what I would call some high-hanging fruit, which might be things like content management systems or databases, things that are more of a reach for us that we might want to take a look at or maybe at least put on the priority list or the options list to see if we want to take a look at. We've agreed with the contractor that at this workshop we are not redesigning the page. We are not actually voting on anything. We're just coming up with options and I think prioritizing them. And then those will be presented to the powers that be to make some decisions about where we're going to go in the future. You know that there is a plan in place and it's coming fairly quickly to move ahead with some minor and major changes. We haven't actually not been doing anything over the last year or two. Melinda Hobbs and others have been doing a lot of improvements and we're very proud of some of these and so we wanted to show them to you. Then after Melinda's done there's going to be about 15 minutes for you to comment on our site and we'll be furiously writing those things down and sending those to the contractor as input from the ASA Committee. Thanks. MS. HOBBS: Now I get to show you some of the good things that we have been doing. Some of the recent improvements that we've made, we have standards. We have a whole list of standards of what we want everybody to do when they put out a page. We want them to have a logo. We want them to have bread crumbs. Bread crumbs are things that allow people to get back and forth between where you're at and where you were at, and, importantly, the dates of the data. A lot of the publications, tables and stuff that we have out there, didn't have the date that this was collected, the date it was published, and what's the next time it was coming out. We had a site audit in 2000 and we're going to have another one this year to see if people are going with the standards, and what parts are not working. We've solicited feedback from the staff. We sent out EIA-wide three questions, what do you think is working, what do you think is not working, and how can we improve it. We've got a lot of good feedback so we're real pleased with that. That was in addition to a lot of the interviews that we've done. We've worked on transitioning from paper reports to web reports. We've done writing for the web. We've had classes on teaching a lot of the authors how to write better for the web and stuff like that and improve the user interface. Here's one of the publications that we had out there. This was last year's Annual Energy Outlook. While it was much better than it was the year before because people couldn't really find the tables still on this left sidebar it's a jumble of information. You don't really know that this is part of a publication. The after of the Annual Energy Outlook, this year's that just went out, has more of a publication feel. You can tell right here's the publication and here's your table of contents that are in the publication. The tables are still right on top so people can find them pretty easily. If we go to the site we can actually see a little bit better what's going on. We broke down the table of contents for the publication and then you have all the related links that were all jumbled on that left sidebar. You have it all split out so it's much easier to read. Another thing that people really liked was if you go to one of the pages, say, the electricity page, all the graphs and stuff now have the figure data right here so people don't have to call. There was a lot of calls about what's the data behind this figure. That was a good thing and everybody really had some good comments on that. This is a before shot of the country analysis briefs. I'm trying to give you an overview of publication. This is like a subsite, one of the places on the EIA site. This is the country analysis briefs. The focal point of the page is the maps and people really weren't seeing what was on the left and right sidebars so we wanted to get rid of that. This is the after. A lot of people weren't really geography oriented. This way you can see what's underneath some of the regions a little bit easier and they can see what CABs, country analysis briefs, are available for what regions in the site. Also on the previous version of the country analysis briefs underneath here there was another drop down that contained all this information. One of the big things, which was under this big list of stuff, was the OPEC brief. A lot of times it's one of the most requested things they get on the site. So we wanted to bring that and organize it a little bit better. This is an all new layout. This is a data layout. They have all their natural gas data in different publications so they wanted to bring it all together and make a layout where they could see everything natural gas. So all these things are all natural gas related and it's all the data they collect. Everything is right here. I can go to the site and you can see if you go to different places they have all these different things. It's more of a database feel instead of publication feel to the data. One of the other nice things they have is the series. You can see how far back all the series go. You can click on it and get a complete XL version of everything. DR. HENGARTNER: If I wanted one of the series do I need to click on each one of those on the link? MS. HOBBS: Yes. DR. HENGARTNER: It would be nice to have a little check box, I want this and that one and that one and check them and then you download it in one file. MS. HOBBS: Make a note of that. This is data and information, another layout that we have. A lot of the publications, even the data publications, have a lot of information behind them. Let me go to this and then to renewables. If you came into geothermal heat pumps and you clicked on this you got this page. When you clicked on geothermal heat pumps you came into a piece of a publication. I mean, when you clicked on geothermal heat pumps you come in and say geothermal shipments. Is this really what I wanted? Is this what I was after? And on this side chapter 1, chapter 2, is that real descriptive? So the after version of this is much better. You come in and you click on geothermal heat pumps and you get a title that says geothermal heat pumps. When you click on something you really want to get what you thought you were getting. And if you scroll down here you get a good description of what geothermal heat pumps is, a nice little picture of what it is, and then you get more information that you may want about geothermal heat pumps, and here are the shipments from the Form EI. So it was much more descriptive and it's laid out much better and you get the information that you wanted. It's not a high tech thing but it's just laid out well, organized, and it's visually exactly what you want to see. We want to get your comments. DR. NEERCHAL: What you were just mentioning what the other chapters have been when you're dragging the mouse on the chapters you can easily make the chapter heading pop. MS. HOBBS: You could or you could just have the chapter heading there instead of having chapter 1. DR. NEERCHAL: If you want to ÄÄÄÄ just show it up only when the mouse is dragging on it. DR. FEDER: If I go into Google and type the words "energy" and "statistics" how likely am I to get to this? MS. HOBBS: That's the site. You're going to have to do that. I don't know about statistics. I know energy information you'll get us. DR. SITTER: ÄÄÄÄ have to do with where you're coming from ÄÄÄÄ coming from here. MS. HOBBS: Well, we're number one. DR. SITTER: But you're hooking to the web in Washington, D.C. MS. HOBBS: It doesn't matter. DR. SITTER: Oh, no, it does. It matters. If I go in Canada it's not as likely ÄÄÄÄ MR. BERNSTEIN: I assume you have no choice in the matter, speaking of Google. I assume you don't have a choice on the search engine. MS. HOBBS: That will be something we'll have to take into consideration if you want to make that suggestion. MR. BERNSTEIN: The first search engine is absolutely ÄÄÄÄ no matter which agency you go to the search engine is just bad. You look for something and it gives you totally irrelevant stuff and it's not your fault. MS. HOBBS: Well, it is our fault as well because a lot of our pages don't have the correct keywords. I mean, we do have a lot of improvements we need to make as well but we've got a lot of those comments, get rid of the search engine. MR. BERNSTEIN: When you were going through the AEO the graphs are now up with the text as opposed to having to click on them every time you want to see the graph with the text. MS. HOBBS: Can you go back to the AEO? MR. BERNSTEIN: Sorry, I walked in late. MS. HOBBS: That's fine. MR. BERNSTEIN: It always bothered the hell of me. Every time you're reading through it, go to figure 1, go to figure 2. Well, can I just see it while I'm looking at it? MS. HOBBS: See, the graphs are right with the text now and you can actually get the data behind the figures. DR. HENGARTNER: But I think what Mark is saying is the opposite. If you go to a data table, click on it, and the graph comes up. MR. BERNSTEIN: No, that's not what I was saying at all. DR. HENGARTNER: But that's what I'm saying is that that would be cool too. If you go to a data table and you click on it and if -- MS. HOBBS: Here's a data table in here. You can get a printer-friendly version of it if you'd like. DR. HENGARTNER: Some tables lend themselves, some don't. MS. HOBBS: You want to click on this and be able to graph that table? DR. HENGARTNER: I mean, before you have a time series. I want to click on it and have the option to either download it or see the time series. MS. HOBBS: We're working on something like that. It's not quite ready. DR. HENGARTNER: And I'd like even to be able to overlay several time series on the same thing. It's thinking like stock process. I want to see the Dow Jones and I want to see Microsoft. That would be cool. I'm not sure how useful but cool. MS. KHANNA: Sometimes I want to actually just pick out the graph because I want to make a transparency out of it, let's say, for my class. This format, that would be very hard to do. You'd have to go to a data table and then ÄÄÄÄ MS. HOBBS: You can click on it and you can get the graph. MS. KHANNA: Excellent, thank you. MS. HOBBS: Sure. MS. KHANNA: The other question I had was as a user of publications I'm often looking for the technical appendix, how was something defined, how was it calculated. That was not obvious on your site. MS. HOBBS: It's right on the bottom. The appendices are right here. MR. BREIDT: Somewhere up on the right it said "detailed assumptions." MS. HOBBS: Yes, that's a supporting document. That's the supporting assumption for the entire AEO. Those were the appendices for just that. DR. BURTON: I have always enjoyed the EIA website. I've always found it to be one of the more productive federal sites that I've used. You guys have taken something that was good and made it even better. But the problem is when you do that it's like no good deed goes unpunished. Now I want more. You showed it had all of the natural gas information in one place. I don't use documents nearly as much as I use data. So being able to gather data from disparate sources, have them in one place, being able to have someone combine, basically when you do something like that you save me a great deal of work and that makes me happy. So the more that you're able to pull disparate data sources together into a single place the more useful the site is to me. MS. HOBBS: Good, we'll try and do more of that. Any more? MS. BLESSING: Anything else about any parts of the site, the navigation? MS. KHANNA: I haven't looked at your site in the last month or so so I'm a little bit outdated but it used to be that you didn't have long time series on certain variables, for example, crude imports and exports. You went up only to the last 10 years whereas I know that the data exist going back to the early '70s and even before in some cases if you're talking about what's in the US. If you're going to the international site I used to find that very frustrating because I knew it was there in print because I have the old EIA reports but it's not there on the web. I realize why. They were printed before the web was existing. MS. BLESSING: They may not be available. MS. KHANNA: And so it's a question of someone is going to have to key in those data, probably. That would be really nice. No, grad students are not going to be willing to key in data for me. They don't do that. That's not what they do any more. With scanning the 2s become 3s. I mean, this is small print. DR. BURTON: Scanning with alphabets is fine but with numbers you have to go back and verify it. MS. HOBBS: Thank you very much. MS. FEDER: Can we ask one question? Is this a general trend to have more and more on the web and less and less in print? MS. HOBBS: Yes. We've only got a couple of publications like the Annual Energy Outlook that are supposed to stay in print. DR. FEDER: If people write an academic paper and they need to reference this what guarantee do they have that the same publication exists or might be superseded by a revision and then you have a problem because you are citing something and now you're proven wrong because this has been changed or whatever? There's a little bit of a problem. I don't use energy statistics much but if I write see such and such downloaded this date and it's not there so I think there needs to be some kind of archive so people who cite those things will not be in error. I hope I'm saying this right. MS. HOBBS: Most of our publications are archived so you can go back. DR. FEDER: I can get reference? MS. HOBBS: Yes, but the data is not. DR. HENGARTNER: So you don't freeze the data every six months like in the software? Software evolves, they freeze it, and then that's it for the next sixth months. It stays the same. MS. BLESSING: That feeds in to the next breakout session, revising data. There's a question that we're asking for your input on. DR. FEDER: That raises an issue that if somebody makes some inference based on data that would not be available how can you validate ÄÄÄÄ might go back and say wait a minute, Professor Khanna, you're wrong. MS. KHANNA: Actually, that happens a lot, I know, with EIA data. They're always dynamically updating it. And it's been a real issue where somebody uses the data set in a published work and it's not in the data bank with the journal. They just say it's available on the web; just go there. By then six months later the data are different and you cannot replicate it. MR. BERNSTEIN: There's always updating data, for example, and that happens all the time in every field when you're using some real time data and you're doing something and three months later they revise the statistics. MS. KHANNA: No, there's a slight difference. If it was in print I could see what the revision was. MR. BERNSTEIN: But they're not printing them either. DR. HENGARTNER: That's a problem there too? MR. BERNSTEIN: It's happening all over. The academic reference situation is going to change with it. I mean, basically it already has. If you look at some of the academic journals now have changed their bibliography requirements and now you can use websites. You just have to have the date in there and it's okay if it changes. DR. NEERCHAL: I think one of the best recommendations is you download a copy and keep it with you ÄÄÄÄ but I do have a specific question. When you discontinue printing a particular document do you keep a PDF or something that would have been printed? MS. BLESSING: Yes. DR. NEERCHAL: If I send an e-mail saying can you please send me that particular version so you have a unique document ÄÄÄÄ DR. BURTON: Another related issue, a lot of times in the case of the data, not so much the publications or figures within, it's really hard to figure out exactly how to cite it. You have a data table that will be coal prices by region. How do I cite that? I'm sure that that's what's part of something else and it's just coming out as a data table but I need a specific way to cite it so that somebody else that wants to come back I can't just give them the web address. I need to give them a more tangible source ÄÄÄÄ and sometimes that's a problem. MS. BLESSING: You were saying ÄÄÄÄ going away from the publications and you like the natural gas navigator look where it's more of a data presentation but then that leads into your problem of not having a publication to cite because all you've got is a web page. DR. BURTON: I don't necessarily need a publication. I just need something more substantial than the website, something that says these data represent data collected by EIA Form whatever, just something that I can attribute these data to. It doesn't have to be a publication but something I can -- MS. HOBBS: And that's another problem, though, because a lot of that data is collected from different sources. It comes in at different times. It's revised at different times. Each cell would have to have little notations in it. MS. BLESSING: I know the navigator people are working really hard because they said there are different levels of data and some of it's state, some of it is national and regional, and some of it's quarterly so when you pull it in it might be coming from different sources. DR. HENGARTNER: Are you saying you have data on your web page that's not EIA collected? MS. BLESSING: No. DR. BURTON: It's from a bunch of different sources. MS. HOBBS: Different surveys. MS. BLESSING: I'm sorry, different surveys so there might be a weekly survey and a monthly survey and an annual survey. MS. FORSYTH: You were talking earlier about setting standards for individual pages and maybe one of the standards that you could include is a suggested citation so that you could have like a standard across EIA of how individual pages could be referred to or at least let the areas developing the pages develop the citation as well. MS. BLESSING: That's a good suggestion. MS. HOBBS: Yes. DR. SITTER: And those should be attached to your archiving; that is, when you make a reference to a page if you come to EIA and say I want this one it actually refers back to that archived page, not an updated version of that page. DR. HENGARTNER: It goes back to freezing it once a month. DR. SITTER: That's right. MS. SEREIKA: Couldn't also just part of the name of the page be the date that it was created? MS. HOBBS: No, because we like to keep the same page name. If somebody bookmarks that page and it's revised next week, I mean, they're lost. DR. SITTER: Nick wants to say something. DR. HENGARTNER: According to the agreement between ÄÄÄÄ we need a frozen version, say, once a month that's archived, maybe downloaded as a CD ISO image. I mean, really make a CD, why not? I mean, you could just download that on a CD and people can download the whole CD and here it is. And you can navigate your CD offline. DR. FEDER: I mean, it needs to be something that can be referenced. If it's a moving target that could change any time, any day then I recognize it should be some dynamic form, too, but for the purpose of citation and referencing you need something that's more stable. So a compromise would be what ÄÄÄÄ and the suggestion is to freeze it or maybe have two versions and one can be a reference ÄÄÄÄ most publications require a source and the source has to be -- DR. BURTON: We realize that we're setting the bar extraordinarily high but we got invited to do that. For somebody who's organizationally challenged slightly, I wouldn't want to get anywhere near this. DR. SITTER: But it's a real problem. In many of the statistics journals they won't let you refer to a web page. They just won't because they know how unstable they are. It's just 9not an option. So if the only place I can get at that is on the web page, I can't use it. I can't refer to it. I know some agencies almost like put 10 publications that on their web pages and everybody knows they are but the citation looks legit ÄÄÄÄ it's a problem if all you've got is the website. MS. HOBBS: Do you have an example of one of those? DR. BURTON: I'll find one. I'll find one and e-mail it to you. MS. BLESSING: I was also wondering if maybe one of you guys could come up with an idea. You said maybe we should have a standard. But what would that look like to you? What elements would it have in it? DR. HENGARTNER: You freeze it, you have a CD. MS. BLESSING: But that still wouldn't have the sites at the bottom. DR. HENGARTNER: You can cite the CD. You cite the CD and then the table number and the date. MR. BERNSTEIN: Perhaps you should just go ask the statistics journal what they would like you to do. DR. BURTON: The SEC does a very good job with their data in this regard. It's not any better in any other regard but in the process of making it citable the SEC stuff tends to be the easiest to work with. I'll find some examples on that. In the SEC data stuff there's something called state link that contains a huge number of data sets. I think if you guys look at those it will give you some sense of maybe a different way to make these things citable. MS. BLESSING: You said state -- DR. BURTON: State link. And they hide it. It's almost impossible to find. MS. BLESSING: Another challenge so don't do what they do. MS. KHANNA: When you integrate things into a database format like you did for the natural gas page why not give it a name? This is the natural gas database or something like that of the EIA and then you can cite that as a database, download it and also the date you downloaded it on. DR. FEDER: You can also use volume and issue number so every month you have a new issue number ÄÄÄÄ ISO image? MS. BLESSING: It's not every month because Natural Gas Navigator stuff is coming in. I mean, their data tables are changing daily. DR. HENGARTNER: But what I'm thinking is you just freeze it every so often. It's like software gets updated continuously but ever so often the stuff gets frozen and says this is the state of the art as of today and if you need the reference in a month come back in a month. DR. BURTON: Even quarterly would probably ÄÄÄÄ you're still going to have monthly series. Locking it down once a quarter as opposed to once a month is a lot of work. DR. SITTER: For most methodological publications you want the data sets. It's not important that it be today's. It can be three months ago. That's fine. But the point is if you say that when you do this to this data set this is what you get somebody else wants to go and see if you're right if you make a mistake so they can play with it and try other things on the same data set. But if the data set has changed they can't even repeat what you did. How can they then compare what you did to something else? MS. BLESSING: I like that. That's good ÄÄÄÄ usability is what makes it ÄÄÄÄ DR. FEDER: People who download ÄÄÄÄ I had a ÄÄÄÄ with JBS. They give you a password in the site when you submit a paper to put the data set that you used in the website. Do you have any objection to people using it? Do you give people permission to upload the data that they downloaded from you? Is there any copyright or intellectual property issue? MS. HOBBS: I don't think so. DR. FEDER: It has to be stated because otherwise we will be in violation of your rights. MS. BLESSING: No, we have no copyrights. I think what we do is we ask that people cite us. DR. FEDER: You have to anyway. MS. BLESSING: But some people write us and say can I use your stuff and we say yes. Actually that policy is hidden on our website but yes, it does say you can use everything; you just attribute it to us. MR. BREIDT: Were there any other comments? DR. SITTER: Are we behind schedule now? MR. BREIDT: So we can now move to the breakout sessions. Going downstairs to the survey quality assessments of EIA will be Nick, Barb, Mark Bernstein, Neha, and Susan. Everybody else stay here. (Recess) MR. BREIDT: Are you ready? MS. MILLER: Good morning. I'm Rene Miller and Alethea Jennings is here with me today. We're both glad to be here to talk with you and to share with you our challenge on revising our data together across the agency. I was going to start by telling you what we mean by this and why it's important but you've already expressed some ideas on this. From our points of view one of the reasons that it has become important is that we have more of what we call data dependencies. What I mean by this is offices are using more data from other offices in their publications. I'm using the term "publications" rather loosely. These aren't necessarily hard copy publications. They can be publications on the web. Not too long ago Bob Schnapp and I, for example, told you about how we're now obtaining data for the electric power sector from electric power surveys and these data are being used in natural gas publications, in coal publications, in renewable publications, in our integrated statistics publications, as well as the electric power publications. So you can see how a change to electric power data could affect many different areas and in fact that's what happens. We had a change to our data on natural gas consumed by the electric power sector and it was picked up in an electric power publication but not in a natural gas publication because they had a different processing cycle. Now, we're not talking about a big change here. We're talking about the last digit but nevertheless this made us think well, maybe we can do a better job in coordination. We were concerned because with the web users are more easily able to obtain data from different publications and notice if they don't match and, as you've pointed out, this could cause some confusion. So by revising together what we are talking about is the coordination that would enable revisions made to a data series to appear wherever that series is published to avoid this type of confusion. I'm going to start by telling you about our current revision standard, which was initially developed before we had as many data dependencies as we currently do. In fact an earlier generation of this committee in the 1980s provided us with input that led to the developments of the standard. I remember this pretty well because my very first presentation to the committee was on revisions. So I'm going to tell you a little about our current standard and then I'll go on to tell you about some of the discussions that we've had on coordinating revisions. These are discussions that our interoffice issues group has had. Some people from the group are here. We have Roy Kass, for instance, sitting in Johnny Blair's seat, and Susan Holt is in the back. I'm going to tell you about an opportunity for interoffice coordination that we found concerning revising final data and what our next steps are. And then Alethea is going to talk with you about issues pertaining to revisions before the data become final and she'll show you an example of her revision trail. She's going to show you some actual data that will give you an idea of some of the challenges that we face. Then an issue that comes up every time we talk about revisions is can't we make better use of the web and you've touched upon some of these issues. The issue that has come up is can't we show the latest data someplace on the web and possibly still have our official statistics someplace else. Alethea will get into that and then she'll end with our questions for you. So to start with our current revision standard, our standard currently says that we should establish a schedule for our anticipated revisions, make it available to users, and that we shouldn't plan for more than two revisions of the same survey data cell. But it does allow for additional revisions if we find some errors that we think need to be corrected and these could be based on what we call threshold criteria developed ahead of time. For example, an office could decide well, if as a result of a revision the national level data will change by 1 percent we'll go ahead and make that revision. That's a big enough change to make. The standard also will ask for revisions at the discretion of the sponsoring office director. So you can see there is a fair amount of leeway. This standard was basically addressing data before they were declared final so it didn't really address modifications to final data. In our discussions about what we could do about better coordinating we talked about both scheduling revisions and limiting the number of revisions. What's happening now is that we're each focusing on different aspects of the standard. I think that's the best way of putting it. For example, for our monthly data in the petroleum supply and marketing area they schedule their revisions. They both publish a preliminary estimate based on the survey data. They're revised the following month and then they hold their revisions until the end of the reporting cycle. They both publish estimates prior to publishing their survey data. In petroleum supply the estimates are based on weekly data for selected categories and Ruey-Pyng told you a little about that yesterday. In petroleum marketing they publish a forecast prior to publishing their survey data. This was something that wasn't really done that we initially developed our standard using a forecast before publishing survey data so we didn't address it. In the natural gas area they rely mainly on threshold criteria to make their revisions and electric power does something in between. The policy is to publish a preliminary monthly estimate and they're not revised until the end of the reporting year unless there are large errors. Back in the days that the electric utilities were our only respondents there weren't large errors; we didn't have many revisions. Now that we have other players and there are more complications due to the restructuring of the industry we do have more revisions in that area. We found out about what other agencies are doing for their monthly establishment surveys. We contacted Bureau of Labor Statistics and the Census Bureau and we found, for example, that for the current employment survey they schedule their revisions and they publish a preliminary estimate. Then they're revised again the following month and then hold their revisions until the end of the reporting cycle. We found the Census does something similar. They have additional revisions, though, to benchmark to their annual survey and also to the economic census. So both of these agencies schedule their revisions but they have more revisions than our current standard allows for. In our group discussions we couldn't come to consensus on scheduling revisions or limiting the number of versions of the number we should publish. We thought our situation, actually, was different from Census in that we can publish facility-level data and in some situations we do whereas they can't and our concern was well, if there were errors at the respondent level and a respondent resubmitted data and we didn't show the corrections that it would be noticed and this would be pointed at us and we didn't think that we could not do this. An area that we did think would be worthwhile having more coordination on, though, was revising data that had been declared final. And we thought in this situation if somebody wanted to reopen the data and make a revision they really needed to make a case for it and that we should all be together on whether we would make the change because, after all, these are EIA's final data. So what we did is we came up with a scheme. I think you have a little flow chart on the scheme that we came up with. It's really not complicated. The basic philosophy is that revisions will no longer be made in isolation. If an office finds a change that must be made other offices must be notified and agree on the pending change. If there is disagreement then we'll talk about it in our interoffice issues group. So our next step is going to be to issue the policy on revising "final data" and to update our revision standard to reflect the change that we're no longer making revisions in isolation. But now we have a whole other area to address and that is the situation before the data become final. And now I'll turn it over to Alethea to show you an example of some of the challenges we have there and to ask for your help. Thank you. MS. JENNINGS: To illustrate some of the concerns that we have with publishing revised data across EIA we're going to take a look at an example of a revision trail. When we selected a data series that we believe demonstrates our need from time to time to revise data frequently that series is, of course, natural gas delivered to residential consumers. We often refer to this in EIA also as natural gas consumption in the residential sector. The reporting period that we're going to look very closely at is the month of January 2002. The data are taken from the natural gas monthly publications from April 2002 through January 2004 and this will allow us to look at the revision trail over a span of 22 months. What you see here is a portion of Table C-1 that was included in the paper that was on the website and this shows natural gas deliveries at the national level for each monthly reporting period in the year 2002. Now, you'll note that for each reporting period the data actually revised several times within the first few months of publication. We're going to take a closer look at our selected period of January 2002. Looking closely at this, you'll notice that the data revised each month for the first six months following the initial date of publication. Typically data for the month of January is actually revised more often than other months because it's during this time that new respondents are added and they have not yet been trained. It does take time to train new respondents and also to resolve any existing response issues at that time. But it is important to note that for any reporting period the data could revise for a number of reasons. For example, there could be late submissions by respondents. EIA could find errors and have a desire to correct them as we normally do. And then also respondents often resubmit data. In addition to that the monthly data are also benchmarked to the annual data when they become available and this causes revisions as well. This is a portion of Table C-2 that was included with the paper on the website. This table demonstrates the impact of state-level data on national level totals. The table shows, of course, the publication date, the state for which the data were revised, the initial data published in the previous publication, the publication that came out before, and the revised number, and then we've included a percent change column. This column highlights the change in total volume by percentage, total volume for that state. That's what that represents. To specify some of the challenges that we face in looking at the state data, first, there are some revisions that actually amount to 10 or more percent of the total volume for that state. We believe that the users of state data would actually consider this to be significant so we would definitely want to publish that. As I mentioned earlier, revisions occur for a number of different reasons. In some instances, as you'll note for the May 2002 publication for the State of Washington, an NA is being replaced by actual reported data. The NA stands for not available. And there are other times, of course, when respondents resubmit and then when we are correcting errors you will see the revisions as well. Now, often when there's a significant change in the state level data it does not affect the national total. This is sometimes the result of NAs being replaced by actually reported data. For this particular data series an NA is published when the data reported at the state level falls below a predetermined percentage of the total volume for that state. Then when EIA has received enough data to meet the requirement for publishing the NAs are lifted, and they are replaced by the actual reported data. The NAs actually represent imputed data that are not shown on the table but they are calculated as a part of the national total. And the total often will not change when the imputed values closely match the data reported. But when there is a change at the state level that does affect the national total it is our current practice to publish that revised number and that's what we do. Keeping in mind that we're exploring the possibility of limiting revisions, we realize it may be necessary for us to reconsider this practice. I'm going to ask you to take a look at Table 15 that's in the handout that you have before you. Table 15 is an example of what state data users will see when there are changes in the state-level data. This table shows natural gas deliveries to residential consumers at the state level. The first column shows data for January 2002. You'll notice that there are two revisions noted there, one for the State of Idaho, and the other for the State of Nevada. These revisions actually do impact the national total so, of course, we have published that total value, the revised number, at the bottom; however, given that there are remaining NAs on that table the sum of the components does not equal the total. Now, in considering this if we opt to limit revisions we realize that we could consider not revising the total number until all of the NAs have been lifted. EIA is considering whether this is really an acceptable practice so as you respond to our questions at the end of this session we'd like your feedback regarding whether it's ever considered acceptable to publish data where the components do not sum to the total. In addition to thinking about the coordination of revisions and all of the discussion that's taken place regarding our current revisions policy it's been suggested and it was discussed earlier this morning that we ought to make better use of the website by publishing the latest data available to our users on the EIA website. This is based, of course, on the premise that it would benefit our users to have the latest data available to them at all times. Now, there are advantages and disadvantages, of course, to adopting this practice. The advantages are that, of course, errors would be corrected as soon as possible and that our users would have the latest data available to them so EIA wouldn't have to scramble to produce it when a request comes in. I do want to point out, also, that a number of users believe that if an error is found it should be corrected and published as soon as possible. But let's think about the disadvantages. First of all, maintaining the site could place an added burden on EIA staff members. And then, of course, in addition to that there are different vintages of the data which could cause confusion by the users. By that I mean that there are sometimes different levels of aggregation between versions so a user may see a different version than is actually published in a hard copy or on the web. And, of course, the latest may not actually be the greatest. We'll move to the next slide and I'll explain what I mean by that. Is the latest really the greatest? The data here that you see in Table D-1 are taken from the Monthly Energy Review, which, incidentally, is a multi-source publication which speaks directly to what Colleen and Melinda were talking about earlier. The data are actually collected within various program offices across EIA and then the data are submitted to the office that publishes or produces the Monthly Energy Review. Table D-1 shows us natural gas prices in the electric power sector for each monthly reporting period in the year 2002. Now, this is a very good example of a data series where the data have a tendency to fluctuate and often it revises back to previously published values. If you'll look at the month of January the published number for April 2003 was the same as what it was revised to in July 2003 and then again in November 2003. Although this series is a fairly new series and probably revises more often than most it does demonstrate how we can revise frequently and then end up back where we started. Given the challenges that we've presented as they relate to our current revisions practices and our policy and our concerns with wanting to provide the latest data available to our users we have a few questions that we'd like you to answer for us. First, when EIA developed its revision standard it was back in the early '80s and at that time our main concern was our credibility and we thought it was a good idea to limit revisions. We're wondering whether this is still a good assumption. Now, there's been a lot of discussion about this and we've had different comments. We've heard that, given the widespread use of spreadsheets, there are many users who really don't mind the revisions because they simply plug in a new number and recalculate. They don't find that it's a big issue for them to deal with. The other side of this is that these very same users often don't like to see breaks in the data series. Also there are those who publish but they produce the publications and they've shared with us that it's been pointed out that revisions are not exactly a costless effort. This is because there's often a ripple effect within the data and this is due to data dependencies. So when we revise the data it's important that we are certain that we clearly understand what's going to happen to the other data series that could be affected by any change that we make. We are wondering whether it's still a good assumption to limit revisions. And then regarding the natural gas data for the residential customers we'd like your suggestions for another approach, maybe a better approach, for presenting these revisions. Referencing our example of Table 15, would it ever be an acceptable practice to publish data where the components do not sum to the total? And then finally we'd like your thoughts about showing the latest and what we believe may be the greatest data on the web. DR. FEDER: You talked about how sometimes components don't add up to the sum. A somewhat related issue is measure of change is not always best described by comparing the best two estimates ÄÄÄÄ for instance so that's another issue. Sometimes our best estimates in terms of statistical methodology for change is not the difference between two estimates because of correlations and other issues. I know that in Statistics Canada and the BLS, more about labor statistics, the practice is to use benchmarking when you are attempting to make the components add up to the sum and that's actually based on certain theory that shows that smaller estimates are improved by benchmarking to the total. Now, again, I say I don't want to comment on policy but I just feel as a user I would like to see the best data there is and revisions are okay. One final comment, we all know from election nights that sometimes the first estimate as soon as the polls close is ÄÄÄÄ and we see some fluctuation during the night and in the end we come back to this. Now, still we know that the standard error is proportional to one over the square root of the sample size, which means, simplified here, the more data you have the better your estimate is. So my inclination is to say the latest is the best in terms of ÄÄÄÄ error. Now, sometimes you're proven wrong. Sometimes the sample size gives you the most accurate estimate by the lack of the draw but you don't know that. So, again, as a user I would like to see the latest information and the most up to date. How often can you revise it? You said it's not a costless effort. That's a policy issue. But you don't publish estimates of change, do you? MS. JENNINGS: Absolutely. Okay, thank you. DR. NEERCHAL: I would like to ask you a couple of questions. First of all, for each publication you might have something like how long does it take for that table to get to a steady state. I'm sure that is different for different tables depending on the kind of survey you're dealing with and so on so I'm sure that has to be one of the crucial aspects of how often you need to revise. That has to be a critical factor in terms of when to freeze because you don't want to freeze if you're going to have to change it the next day, for example. That's number one. The other question is once a data changes would anybody want you to give the chron data? Are there people who want that one from that day? Does it happen? I think that can happen in the context of what Randy was saying earlier, someone who is trying to develop a methodology. Those people, they probably don't care about two days number, obviously. So they are willing to work with your last year's data or whatever. All that is important to them is that it stays stable. For each survey you may want to say okay, what percentage of my users ÄÄÄÄ first time who just want stable numbers versus what percentage of people want this data that is current, two days. I'm sure even that is two issues. One is the steady state time. How long does it take to achieve that steady state for each publication is number one. Number two, in terms of customers what percentage are those looking for the steady state number versus today's number. I think those two issues should be taken into account when you're deciding the number of revisions and so on. MR. BREIDT: I have a question about this acceptable practice on components not summing to the total. Is that just in the case where you have a column with missing values? MS. JENNINGS: Rene, do you know of any other examples where we actually have that? MR. KASS: The problem comes up when we have the NAs. Behind the NAs there is an estimated value and that goes into the total. When we distribute it electronically that ÄÄÄÄ a zero. I get calls all the time, how come I don't add up to your -- MR. BREIDT: Oh, it shows as a zero rather than a missing value? MR. KASS: EIA does not stand behind the number because of credibility issues. We have rules about suppression if we don't have a certain portion of responses in. MR. BREIDT: Why don't you call it an NA and not a zero? MR. KASS: It has to do with the way the database is created. It doesn't accept numerics. An NA would be an alpha character and a dot translates to a zero. DR. FEDER: Could that ÄÄÄÄ like different colors? MR. KASS: We can't assume that the people who get our numbers can recognize color or fonts or anything. DR. SITTER: So this has nothing to do with the revisions? MR. KASS: No, the fact that it sums up has to do with revision because the question is if we were to revise at a state level should we have a corresponding revision to the national level. DR. SITTER: No, but it didn't add up before. MR. KASS: Right. DR. SITTER: So if it's not adding up before it doesn't add up now what's the revision got to do with that? DR. BURTON: Well, you're taking one of the NAs that were not acceptable for release and replacing that with a number that is so I don't see any harm in changing the total. DR. SITTER: Well, the issue of whether it should sum up to the total has nothing to do with your revision. It didn't sum up to the total before. MR. KASS: I misspoke. What summed up to the total were the numbers behind the NA. What we released is not the number behind the NA but rather -- DR. SITTER: I don't understand the question. Would it ever be acceptable practice to publish data where the components do not sum to the total? You're already doing that. It's nothing to do with revision. I don't understand the connection to revision. Revision doesn't change it. It doesn't make it any worse, doesn't make it any -- DR. FEDER: It's a separate issue. DR. SITTER: Yes, it's a separate issue is what I'm saying. DR. FEDER: What we saw yesterday in one of the breakout sessions, there was a line called balancing item and you could have that with a footnote saying those represent the sum of the states for which reliable data are not available yet before they are attributed to ÄÄÄÄ whatever so people will say it's a part of that item. DR. BURTON: I have a question. When the data go out in the tabular form that NAs are represented by zeros is there a note so that the user is going to know that those zeros actually reflect an NA? Because in my world there's a very big difference. MR. KASS: They have to refer to the PDFs in order to know that. DR. BURTON: But if I'm diligent I will see it. MR. KASS: If you are diligent you will know. DR. BURTON: Well, if I'm lazy and I mess up that's my problem. MR. BREIDT: It seems like anything would be better than a zero, negative 99 or something. DR. SITTER: I have a real problem with an answer that says our database can't handle a nonnumeric. As a user I'm just like, well, change the database. I'm sorry if that sounds harsh but that's just not acceptable. MR. KASS: I'm not involved with that. DR. SITTER: Well, even more than that I would want the imputed value with the star that tells me it's an imputed value. I don't quite understand because if you've got an imputed value it means you have some estimate which you're not willing to stand behind. I'm quite willing for you to tell me this point is something I don't stand behind but if you've got it I'd like to see it. You do that. Oh, you suppress it; you don't star it? DR. FEDER: They suppress, yes, and the total will be published ÄÄÄÄ that's a common practice ÄÄÄÄ you suppress cells for which you cannot have enough precision but you know the total is precise. And sometimes you have a very odd situation because only one cell is suppressed. They say I have an estimate for Hawaii. I know; it's 49. DR. SITTER: No, I understand the issue. It's a big difference between the imputed value being used for the State of Arkansas and the imputed value being used in the estimate of the total. DR. BURTON: But I agree with Randy. If you warn me that that's the case and I choose at my own peril to go out and use it I'd still rather see it than not. But if you're intent on saving me from myself I should be grateful. DR. FEDER: Mark, what you say now I've said so many times. I was told time and again because users think that statistics are real and so we have to suppress it to guard them from themselves. It's a losing battle. But you're right. What you're saying is what I've been saying all the time. DR. NEERCHAL: I mean, you say it is 1.372357 and say it's only an estimate. DR. BURTON: Honestly, I've learned that in a policy setting I round everything so that it looks like a little lump because if you do extend it out people will absolutely assume you've got that much precision in your estimates. MR. BREIDT: I've often advocated with no success that you just publish a range and not a point. And it's a big range. DR. FEDER: Last time we talked a little bit about this in a breakout session. I mentioned something from my experience where the suppressions rules are not even inconsistent. It's a common thread that many official statistics do that. Whether it's wise I don't know. DR. SITTER: It seems more of a political decision and not a data decision. I mean, I don't care. If a user is so unsophisticated that they're going to do something silly they're going to do something silly whether you put an NA there or a zero there or ÄÄÄÄ that's my view. DR. FEDER: I'll give you one example that I think is very paradoxical. In a certain survey they suppressed an estimate if no cases were observed because that's the rule. You observe a zero, you don't publish anything. But if you observe one and it runs to zero ÄÄÄÄ you put the zero. Now, if you don't observe anything you don't put the zero but if you observe one you do put the zero? Is that consistent? We are trying to change that. DR. SITTER: No, I'm okay with that. Now I know there's a difference between these two things. Whatever code you use I want to know the difference between a zero that's a zero. Even if your definition of a zero is it fell below 500 at least I know there was a value and that value was low. I want to know the difference between that and an NA. An NA is a completely different animal. An NA, I can make some assumption, if I'm willing to, that the NAs are distributed in some ways like the rest of the data. DR. BURTON: Well, that's exactly right. If I see an NA and I need a value I'll do my own imputation ÄÄÄÄ that's a very different signal. DR. FEDER: Mark and Randy, let me tell you one thing ÄÄÄÄ does do which I think is good, at least in one instance I know. If you call them and say will you please give me the number they'll give it to you over the phone. I guess. I'm guessing. They say you are sophisticated enough to handle this. You are 21 or older ÄÄÄÄ but we won't publish it. I don't know if this is still the practice. I haven't been ÄÄÄÄ for seven, eight years now but that was the practice, to give it to you over the phone with some caution and call me in the morning. DR. BURTON: With regard to the issue to when and how often you should revise this is just another example of what we were talking about earlier, that you have such a diverse set of users with such diverse purposes that I don't know that you'll ever be able to get any consensus. For example, Mark because of his applications tends to want very current data. He would not only accept frequent revisions but probably would advocate them. I on the other hand because I tend to work in academic setting where it doesn't matter, as somebody said earlier, it doesn't matter when it was. I just need to know when it was. I'd probably rather see fewer revisions and then only when they're going to make a really significant difference in the interpretation of the data. So you're between a rock and a hard place. You're never going to make everybody happy. I could pretty much promise you that. MS. MILLER: But in this situation where we have large changes to state data we would want to make the changes for the state user but then it doesn't affect the national total. So you wouldn't have a problem with having small fluctuations in the national total as a result of that? DR. BURTON: Like Randy said, the numbers don't add up to the total anyway so who cares? MS. MILLER: So you don't care if the total changes? DR. BURTON: If it does or it doesn't it doesn't -- MS. JENNINGS: Any more questions or comments? DR. FEDER: It's more than ridiculous sometimes or of great importance. I know sometimes local governments need to know if their value has changed. And for them the fluctuations that we saw on those revisions for one of the states really matter if it was 819 or 823 or 829 because they look at last year's data and they want to know. And we talked about the imprecision of estimates of change but it matters a lot to them if it went up or down. So there's a trade-ff here. DR. SITTER: And I think to some extent they do have your series. I mean, you are publishing your revision revised, revised, revised? MS. JENNINGS: Yes. DR. SITTER: So you can see it. You can see how stable it is. Is it still changing? So there's more information than just the latest greatest value. Like when you gave that table you said, see, we got back to where we'd started but yes, we can see that. So we have more information than that last number. It's quite a bit different. MS. JENNINGS: And what you're saying is that means something to you as a user. That's what we're trying to determine, as a user what's most important to you. And the comments have been made that we have a diverse group of users. DR. SITTER: But one of the concerns raised in the talk but weren't in any of the questions or maybe it's hidden in there is the issue of what impact that has on a whole bunch of other documents and how you're going to coordinate that kind of revision. You make those revisions. Are you updating every other publication that uses them? MS. JENNINGS: That speaks directly to Rene's portion of the talk. MS. MILLER: Yes, we're attempting to do it but because of different publication schedules sometimes we do and sometimes we don't. MS. JENNINGS: That goes back to the scheduling issue of revising across EIA. DR. BURTON: Like grand master table that crosses-indexes all the different uses. So you just depend on people remembering hey, that got used over here too? MS. JENNINGS: Yes. Yes, definitely. DR. FEDER: I have a question on the data. I don't know how far this concept is. I don't if Randy knows -- DR. SITTER: I think that's your biggest problem. Inside a data set like this you're showing me what you've done. As long as I know what you've done I can deal with that. MS. MILLER: Where do we publish what we've done? MS. BLESSING: You do have to go back to previous editions. The table that Alethea showed with all the revisions, that was created by you guys. So, Randy, if you ÄÄÄÄ publication you wouldn't ÄÄÄÄ MS. MILLER: That wouldn't make you happy. DR. BURTON: Unless you have saved earlier versions that you accessed. MS. JENNINGS: That's right. MS. BLESSING: You'd have to go to the April and the May and the June and the July and the August and you'd have to write it down yourself. MS. MILLER: But all of these publications, I mean, we've got these from the web. MS. BLESSING: They're all there. It's just not as clean as what you showed. DR. NEERCHAL: So could I ask my question again? Who wants to go back and track it? Does a typical user want to go and track it? I know Randy might want to do it for research. DR. SITTER: No, no. I probably would use the most recent one but I'd know what I was getting myself into if I could see that. I take a look at that and I say well, the most recent one. That kind of fluctuation is not going to bother me. DR. NEERCHAL: That's an atypical user who can just make a call and say how stable these numbers are. I think that may be an infrequent situation. The person who wants today's data probably doesn't really care what happened yesterday because his use is so instantaneous he's not going to care what happened yesterday. DR. BURTON: Colleen, I have no trouble with the responsibility. If I see that R there and I know that the one that I'm reviewing in the current publication is a revised value then I may say okay, I need to know how stable this is and go back. And as long as the other stuff is accessible then I'm good with that being my responsibility to go back because there will also be most settings where I don't care. It's not going to affect the ultimate quality of what I do whether it's stable or not so I'm just going to use what's there. MS. BLESSING: I think the key point is as long as the archived versions are there. I think for some data series they're easy to get to and some data series they're not so easy to get to. MS. MILLER: I think everybody for the monthly publications -- MS. BLESSING: They're all easy? MS. MILLER: They all seem to be available. MR. KASS: You've got to define "easy." The PDFs are all there. MS. MILLER: Yes. MR. KASS: The electronic downloaded material is exclusively current value. DR. SITTER: But this issue about propagation through other publications, that is absolutely critical, not because you do or don't do but you don't know what you do. I mean, this is far worse than what you do or don't do because you can't tell. And somebody's going to take different aspects of your data and try to combine them together, assuming that they're internally consistent, and some of these are really not very -- I mean, some of these changes are big. You take an NA and you replace by 11,000, this could be a huge thing. You do it in one and not in another, all of a sudden you see a significant difference somewhere and you think it's real, and it's nothing to do with that. One's revised and one has an unrevised thing built into it somewhere that you don't know and it's all hidden. That's really serious. MS. MILLER: Yes, we're getting a much better handle on this through our interoffice issues group because we have people from all the offices, so we've been learning a lot about each other's data. MS. BLESSING: And, Rene, I think you or Alethea pointed out that when we had paper publications people just got one or another paper publication. It wasn't as easy, maybe, to see these differences but now that we're on the web and you can go click, click, click, and see three different numbers there's an awareness because of the easiness of getting all the pieces of information that we have to make some changes. MS. JENNINGS: That's true. DR. SITTER: Well, that's your viewpoint. I mean, now they can find out so now it's worse? MS. MILLER: It is better because it is forcing us to actually face the issues. It wasn't as glaring before because it was much harder to see. MR. KASS: It wasn't a problem because we couldn't see it. MS. MILLER: ÄÄÄÄ nobody tells us everything is fine. DR. BURTON: Do you know what sort of things are going on administration-wide? This is a bigger issue than this group. Do you know if there are plans to maybe try and address this systematically at some point? Obviously you've put together a program that provides a path for making sure that revisions ÄÄÄÄ are disseminated based on people's own knowledge of what's in what document but do you know if there's any thought toward automating this kind of process? MS. BLESSING: Is it an issue that ever comes up in the inter-agency heads? I think that's what he's asking. MS. MILLER: Are you talking about with other agencies or just within EIA? MS. BLESSING: Just within EIA ÄÄÄÄ that's what you're doing. MR. KASS: That essentially is what we're doing. The problem came up with final data because unless told otherwise once data are declared final they're shared throughout the organization and other groups within EIA don't bother looking at the source systems any more. The situation that Rene talked about, data that had been declared final were changed ÄÄÄÄ simply weren't told about it. DR. BURTON: So what you're telling me is that, given the program that you all developed, final data won't be a problem. Those changes will get made now or revised. In trying to get to that point where they're final there may still be issues but, given the work that you're doing, once the data are declared final then if there are any revisions that's taken care of. MS. MILLER: Yes. DR. FEDER: Years ago the Economist published some assessment, and maybe they still do, assessing national statistical agencies. I noticed in the tabulation that some had very few revisions but took the time to publish and others published frequently and revised a lot. I mean, this was my area just as a reader, so take it for what it's worth. If you are quick to publish I think that's good and you have to be prepared to revise what you publish. Now, how the Economist views that I don't remember, and I don't think it really matters. But if you publish timely data ÄÄÄÄ DR. SITTER: Let's look at the example from the natural gas in Texas. Here is a case where your strategy would be just awful. I'm not saying that you do this but it would be awful. The pattern is very clear. One month after the date it's always 15 percent low, two months after the date it's 4 percent low, three months after the date it's very close to the final value, and after that it's pretty stable. But I can't see that. I just see your current value. If you actually showed me that I would see that immediately and say wow, okay, I've got a lot of information here. I've got information about the pattern in which the data arrives. These don't have any pattern. This just seems to be more fluctuation ÄÄÄÄ and that's more common. But I think this goes back to the point that it's going to depend on the data set and whether the latest greatest is best and when to revise and how often to revise. One decision about that would be don't publish it until three months after. That was the same thing or in this case we're going to have to give you the previous three or four revised values so that you can see that pattern or some decision like that. So I think the comment it depends is always good. It depends on the actual publication and the data. DR. BURTON: Well, Randy, didn't we actually spend some time last fall looking at a situation, either this situation or one like it, and saying okay, these reporting patterns are so dependable that we can take these and do estimated values in those first and second months and release those as estimated values, knowing that they're going to be far more accurate than what is actually reported? DR. SITTER: That's right and that is a form of imputation, and we seemed okay with that in this particular case but not in general. MS. JENNINGS: Are there any more questions or any more comments? Well, certainly we appreciate your input. Thank you so much. DR. SITTER: One question on publication. You talk about the final data. Could you clarify the distinction, final and nonfinal? I just wanted to clarify the point about what is final data versus nonfinal data. MS. MILLER: And I was saying that when we initially published our monthly data we published them with a "P" for preliminary. And then what often times happens is that we get late respondents, and we have a lot of those, and respondents sometimes resubmit. And so then the following month we'll revise. Then we usually wait until the end of the reporting cycle to get all the revisions in and then different offices have different procedures for when they declare their data to be final. But it takes into account the late respondents, the resubmissions, and any other errors that we might find. Like in Roy's situation, he benchmarks the monthly data and it's with an annual number so we also have that. So it's quite a series of steps. When we spoke to the people at Census and ÄÄÄÄ they seem to go through a similar iterative process. DR. SITTER: And as I understand it most of what you have been working on is on the final data. The others, your questions were less firm. MS. MILLER: Right, where we were able to come to some consensus on what to do and how to coordinate was with the final data. Before the data became final basically we couldn't come to agreement that we were all going to revise at a certain time or we were just going to publish a certain number of revisions. And basically we were showing you the natural gas data as an example of some of the challenges and why we couldn't come to a consensus. DR. SITTER: I just wanted to clarify. Thank you. MR. WEINIG: Thank you both. (Recess) MR. BREIDT: We're ready now for the summaries by the ASA Committee members on the breakout sessions. Randy, do you want to go first on the revising data? DR. SITTER: No, I'll go second. MR. BREIDT: Please? DR. SITTER: How much time do I have, Jay? MR. BREIDT: Take what you need. DR. SITTER: Take what I need, okay. First of all, what were we talking about, revising data across EIA, should they change their standards, what are their standards, and so forth. This was based on two points. One example is EIA's expanded use of electric power data and using it in a lot more publications. This is just an example, I imagine, of other situations like this. So if you revise one it's going to have an impact on a lot of different publications. And it's easier to do revisions and to detect when you do revisions so this has raised some questions about should they change their revision schedules, how do you coordinate such revisions across EIA, when should you do them, should you do them, et cetera. So the specific questions asked to the committee, one was that the EIA's revision standards were built in the '80s. There was a concern for credibility and it was a basic assumption that you should limit the number of revisions and was that still a good assumption. Regarding natural gas specifically there was a question about whether there's a better method for doing that. Three, is it ever acceptable to publish data that does not benchmark to the national total? And, four, what about the latest greatest, is it really the latest greatest or is it just the latest in terms of instantaneous updates? I would say that it was a heated discussion by a group of reasonably admittedly uninformed nonexperts and that what EIA was really doing was using us as an end-user group that they hadn't sampled yet, or at least that's the way we took our role, I believe, and there was really no attempt at directly answering the first two questions though they're all, of course interrelated. One committee member did say that they always want the latest; the latest is best. But you need to balance cost, of course, so these two things are an issue, most up to date versus the cost of revising. Other comments that were made were it's going to be specific to the data set, how steady state does different data, how quickly does it get to some form of steady state. So another balancing act is most current versus stability. You have a diverse end user group and some of your end users what they want is what is the number I'm going to use and what is it today, what's the best you've got, so those are the latest is the greatest. And then there are those such as, say, myself who I don't care if it's three years old as long as it's stable so that if I refer to it it's going to still be referable. So there's the combination of those two things. I think there was one more comment on the latest greatest. Getting at previous versions will be an issue for some end users; that is, it's okay to put your latest greatest. It's okay for it to be my responsibility to go and see how that's been changing in a revised schedule when I see an R behind it. In other words is it at a steady state? That's fine as long as I can get at it. EIA people said well, you can but sometimes it's not as easy as other times and perhaps that's something we should look at. And then is it ever acceptable? There was quite a debate about benchmarking to national totals. One committee member pointed out this is a separate issue and has nothing to do with revisions since they already don't add up to the totals before you revised since you've got those NAs in there and we don't get to see them. So from an end user point of view you even may put them as zeros in the published version which we didn't like at all. There's a big difference between a zero and an NA. One of the committee members didn't really like the answer that the database wouldn't handle nonnumeric numbers as a reason for that. So there were some issues around that and some debate about whether you should actually give the numbers and tell us they're imputed or not. Even the committee wasn't in total agreement, I think, because we also are a diverse group of end users. I think there was agreement that we would certainly like to know the difference between a number that you're not making available to us and one that is an actual near zero value because there is a difference analytically as to how you would treat those two things. One other issue discussed was the bigger issue around the propagation to other publications even though that wasn't one of the specific questions. That's probably of most concern but the topic of the talk by EIA presentation didn't say that this is essentially what they're doing with their final data. So once they've declared the data final they're coordinating within the different agencies within EIA to schedule the revisions and try to take care of that propagation problem at least with the final data, more difficult to do on the nonfinal data but I don't think that we would be as concerned with the nonfinal data since it's stated as being in progress. I think that's all the committee had to say. Thank you. MR. BREIDT: And now, Nicolas, summarize the survey quality assessments of EIA, please. DR. HENGARTNER: We had what I would call a frank discussion on the problem of assessing the quality of EIA products. That is very much in part a management problem, trying to assess how good each survey is and trying to also set targets to possibly improve the surveys in future years. The notion is that there are problems with doing that. Some of the things that came up was, for example, that although we are able to assess the survey if it is possible nevertheless the landscape might be changing over time. And so not only should we try to track how the surveys are doing but also trying to see what is the landscape, how is the industry changing at the same time as the survey as to match the two, not to give a distorted image. The other thing that came up was the notion of data quality. I would paraphrase it as to answer efficiently the asked question and although the right question to be asked is usually looked at in the design stage it might be worthwhile to revisit on a periodic basis if we still are asking the right questions, if the users are finding it useful, if there's some usefulness to it. So that is in part something that wasn't in the document that we were looking at. Then there came again notions of interviews and trying to see how do we extract or how do we get those measurements. As in some the sessions of yesterday, there were notions of well, we can interview the teams and talk to them and trying to see through focus groups what they think. There were in parallel to yesterday notions that maybe it's a good idea to separate the management and the staff interviews for brainstorming sessions and other interviews in the hope that one gets a more frank and realistic picture of what's going on. The other problem that arises then is once you measure the quality of the survey and the data how do you disseminate. How do you get feedback without sounding critical, without having this sense of competition, acknowledging that each survey has different quality targets because they are either in very easy situations ÄÄÄÄ the response rate might not be an issue for one survey and might be a very big issue in another survey and so trying to acknowledge this. One thing that came up, for example, is to try to bring in a kind of lessons learned, trying to get positive feedback and say these are the kinds of problems that we've encountered in our situation; this is how we solved them. Being able to do that would allow more positive feedback. The other thing that was suggested is to also that the measures that we get here are retrospective in that they tell us what has been done. It was suggested that one should also have the opportunity to look forward and say this is what's on the landscape, for example, deregulation, and that is possibly how it's going to affect the quality of the survey and because of this we'll need to already be able to put in more resources. And in fact what came out at the end is that this whole exercise of being able to get quality surveys should be also tied in possible resource allocations, say well, if there's a true difficulty here then we need to do something about it. And the way to do something about it might not be simply to crack the whip but also to say well, we're going to allow you and provide you means to address these issues. Finally there was the question of do people understand why they're collecting the data and how important it is. The general agreement is that everybody here at EIA knows how important it is. This is really something very important. But there was also the use of contractors and bringing them into the loop and making them feel how important it is by, for example, sharing quotes or pictures that appear in the news that they produced would be a boost in trying to explain to them why it matters. That is my summary. As I said, it was a very interesting session. MR. BREIDT: Thanks, Nick. At this time we have a moment for invitation for public comments. Are there any comments? We'll be taking a break and following the break the committee will be discussing possible topics for the -- MR. BERNSTEIN: Can I recommend we don't take a break and just -- MR. BREIDT: Let's go ahead and do it because that way we have opportunity for the public to leave if they want to because after the break there's nothing really on the agenda except for the committee members to discuss what's going on. So go ahead and take a short break. Take 10 minutes if you'd like. (Recess) MR. BREIDT: We have some time now to talk about possible topics for the fall meeting. A few things have come up in discussions and maybe other people have come up with others as we were talking. We're talking now about possible topics for the fall meeting. A few things have come up during different breakout sessions and maybe we can just put a few things down on the list and then add to it as we go. These frame comparisons that the Census Bureau is doing is one possibility. That came up in the breakout session on frame adequacy. Another I think came up in regional components for the STIFS model. And then there was some mention of this Dallas paper. I don't know what it was about but it was prepped for this meeting but didn't make the agenda. MS. KIRKENDALL: Yes, Dallas has another proposal for natural gas production in Texas. This is to correct the bias. DR. SITTER: That's perfect. That's going to force us to do what we were going to do. DR. HENGARTNER: Yes. DR. FEDER: Could I suggest a topic? MR. BREIDT: Sure. DR. FEDER: It came out in our breakout session. Should estimates be harmonized across surveys and should estimates add up to the total, also known as benchmarking. We asked about revisions. We talked about revisions. There was a somewhat related issue that people other than myself were talking about is that should ÄÄÄÄ of low quality be suppressed or published with caution? I think that these are all related. MR. BERNSTEIN: On the same theme as the issue from yesterday where we were talking about short-term energy outlook, data that was being used versus the date collected on the weekly ÄÄÄÄ but a broader discussion about the information being used in one part of the agency versus the data collected in the other part of the agency. I mean, I don't know how to structure that into a session but it's an issue. DR. HENGARTNER: There's something else I'd like to bring up. As an ASA committee I think we have the option of having an ASA session at JSA. I think the last few times they had really cool stuff to talk about. DR. SITTER: You joined at the right time. It was boring stuff before. DR. FEDER: You're talking about this coming JSA? DR. HENGARTNER: No, no, not in Toronto, the one after. And we need to think about this now if we want to do it. We don't need to pick the topics but if we're going to do it maybe have a list of topics and invite the speakers ÄÄÄÄ MS. SEREIKA: If it's invited sessions it'd be going on right now. DR. HENGARTNER: It's an invited session for next year. I don't know where it's going to be. MR. BREIDT: Minneapolis. DR. HENGARTNER: Oh, God. No, let's wait a year. DR. FEDER: Canada, maybe. DR. SITTER: You're the person that has to deal with this. You're on top of this already, right? MR. BREIDT: I haven't received anything yet on the invited sessions for '05 but that comes out in early summer, doesn't it? MS. SEREIKA: Yes, I think it's June. MR. BREIDT: June or July. DR. HENGARTNER: So we need to make a decision as a group, do we want to try something like that, and if yes that we get it done. The other thing is replacements. We need to get a consensus of who's rotating out and have a list of names already ready. MS. SEREIKA: Trends, he's looking at trends. DR. HENGARTNER: So that's something that needs to be on the agenda. MR. BREIDT: It needs to be a homework assignment for us which is to come next time with suggested names. MS. FORSYTH: Has it been six years already? MR. BREIDT: It has. DR. HENGARTNER: Are you leaving? MR. BREIDT: Yes, after next time. MS. KIRKENDALL: Bill Moss is off the committee for health reasons. There's a vacancy right now. DR. HENGARTNER: Is he going to rejoin or he's going to stay off? MS. KIRKENDALL: I think he retired. I think he resigned. MR. BREIDT: We just haven't made that real obvious to the ASA until we get our suggested replacement. DR. HENGARTNER: Well, if we need a suggestion right now if we have five minutes do you want to brainstorm? DR. SITTER: I'd need to know who's coming off. DR. HENGARTNER: Bill Moss. DR. SITTER: Bill Moss and Jay. MR. BERNSTEIN: Bill is gone now. MS. FORSYTH: I don't know Bill. Is he a mathematical statistician? MR. BERNSTEIN: No, he's an energy economist. MR. BREIDT: Energy guy. MR. BERNSTEIN: So it would be nice to replace him with another energy person. DR. HENGARTNER: Would a macro person work? I don't know. Steve Berry, labor statistics at Yale. DR. BURTON: He's smart. DR. HENGARTNER: Well, he's my friend. MR. KASS: I'm going to have to reconsider my statement. DR. HENGARTNER: He's good. He would be good for our committee. MR. BERNSTEIN: There is a young faculty at Berkeley, Energy Resources, Alex Farrell. He has done work for DOE in the past. I know he uses a lot of the data and information. He certainly may be a good person to transition because I'm only giving one more year after this. I know he's been a big user of EIA material. DR. BURTON: A brainstorm, this is more like a brain flurry. DR. SITTER: The only economists I know are on this committee. MR. KASS: I suppose one could go to the International Association of Energy Economics and ask for a suggestion. MS. KHANNA: I know one person who's more of an environment economist but does do energy-related work. It's John Erickson at the University of Vermont. I think he's with the Department of Natural Resources. He is a state economist. MR. WEINIG: Periodically there are suggestions of getting one or more industry specialists into this area. And if we had a view of three to six years of one or more industry specialists that might be a place to return because this was a strategy that the committee used to have and steered away from and this now may be a good time to re-think that. MS. KIRKENDALL: We had some statisticians from oil companies and from electric power companies and coal through the years. There aren't that may of them so you go through them pretty fast. DR. HENGARTNER: You said business sector. Which sector would be most useful for EIA, coal, petroleum, gas? MS. KIRKENDALL: Probably gas and electricity. Those are our big problem areas right now. MR. BERNSTEIN: How about somebody from the Gas Technology Institute? You can ask them or I can give you a name of somebody there to ask for a recommendation. It's going to be hard to get somebody from a company because they'll have often times a narrow viewpoint but somebody from EPRI or somebody from GTI. I don't know. MS. KIRKENDALL: There's a list of things to consider. MR. BREIDT: How about other topics for the fall meeting? MS. KIRKENDALL: ÄÄÄÄ follow-ons for a number of the things you've heard about. DR. HENGARTNER: Susan has a suggestion. MS. SEREIKA: Could you invite one of these industry specialists to make an appearance, more as an invitation, not necessarily have them a member of the committee. MR. KASS: Right. MS. KIRKENDALL: Yes, we can do that. MS. SEREIKA: Do something like that, like have an invited guest, I guess, from different industries for each different meeting? MR. KASS: I had the same good idea; I just didn't say it. DR. SITTER: Me, too. MR. GRUENSPECHT: And that would be tied, obviously, to the agenda ÄÄÄÄ a lot on -- MS. KIRKENDALL: Right, a particular topic. MR. BERNSTEIN: I certainly wouldn't mind talking about the transmissions issues again next time, what has happened, what has not, what are the objectives, if you've gotten around to deciding what you're actually going to do, how you're going to do it. MR. GRUENSPECHT: I missed the transmission issue discussion. Was it about the issue as a modeling issue? Was it about the issue as what data we should be trying to collect there? MR. BERNSTEIN: Well, the discussion became a hybrid of both a little bit but I think it came back down to not a clear goal in mind for what you want to do with something that you get. In other words what do you want to do with transmission data and what reports do you want to have? What do you want to be able to do with it and then design your data and analysis to that. It was pretty clear that that was not ÄÄÄÄ two years from now how do you want the electric output to look and what do you want it to include on transmission sites. So that would be something to do. Guy brought up a couple of things as we went along. We're spending a lot of time responding to crisis issues and that's not totally the reason but you're getting more of that than you used to. How do you begin to structure the system to be able to handle that? MR. GRUENSPECHT: There's another one that I hope will not come up and I don't know that it's a good one for the committee but we probably face at some point in part because of some new work that we have to do, in part because of the budget situation, probably really some decisions about doing less of some things. So we're doing a whole lot of surveys. I don't think there's ever a discussion of laying out the scope of what the program is and then trying to figure out, well, suppose in fact the guidance numbers that we have for budgets going out to 2010 we get stuck with and we're doing our best not to get stuck with those. But were that to happen we would need to think about pulling in the perimeter a little bit, making the program smaller. I wonder if there's among statisticians but also policy types a notion of if you had to contract the program in terms of the data program how would you do it. It's like playing Tanga. Which block would you pull out so that the whole thing doesn't come tumbling down? Reserve as much of the value of the program as you have. I don't know if in other statistical programs that people have dealt with in this committee that's been an issue about how do you define what it is you're doing because at the same time there are these requests in certain areas to do more but resources are not necessarily there. The way I think of this is a lot of the talks I hear about like you decided you wanted to do something and how do you do it the right way. That's been a lot of the focus whether it's estimating natural gas production in Texas. We get input on that but is there input on the level of what do you do, assuming that you can't do everything you want to do? DR. HENGARTNER: But isn't that the question you should ask end users instead of statisticians? MR. GRUENSPECHT: Obviously we do and we will ask end users and we have but I view end users as the demand side of this thing. And obviously each end user a lot of that just ends up being like the political power of the different people. I mean, their immediate reaction is not my survey. I'm the American public power people so when you want to drop a survey of power public facilities, no, don't do it. After all, they're paying zero for it. The band curve doesn't hit the axis. So given that they're paying zero and given that it's of some use to them they want to continue it. And that's true of everything. I assume that most of our stuff probably doesn't have zero value so there's someone who's for it. Then it becomes an issue of who's more powerful politically and that's how some of these things get sorted out. So that's definitely a big part of what actually is going to happen. In some sense part of the program is set up to meet demand. Particular people want this and people want that. But part of it was set up to be how do we set up a good statistical program for the energy sector. And that's not necessarily driven by demand. I mean, that's driven by what people thought was included in a good statistical program for the energy sector. I mean, obviously this other side is going to matter a whole lot but I'm just wondering if on that side if you had to make a statistical program for the energy sector smaller how would you do that understanding that these other things are going to be very important and be there also and may be dispositive. I don't know if there is anything in the discipline of statistics, that how do you maximize the value of a program. I don't know. It may be way out of line but that is an issue that we face. DR. BURTON: I think a number of us are here with two hats on. I mean, there are things like, for example, a number of us would say protect the data. We can live without the analysis, we'll do that ourselves, but make sure you protect the data to the extent possible. Protect the time series. Some of the cross-section stuff maybe can go but if you lose elements in a time series you can't go back and get them later. So there are things to talk about. DR. FEDER: I was going to say exactly what Mark said. I think he's right. It's important to keep the continuity of the data for the sake of trend analysis and so on. But because I think the energy market is going to be so important and we are facing dramatic changes I wouldn't change anything in terms of the programs. I might be more inclined to say reduce the effort on each by perhaps reducing sample size. I think sometimes economies actually can be heard by improving analysis and reducing data collection because if you do things right and you optimize your survey methods you can reduce errors -- DR. BURTON: I mean, is this a topic worthy of -- DR. FEDER: I think it is one good analyst can save the work of 10 field collections. MR. BERNSTEIN: I think one thing this committee could do is discuss and help design characteristics for what a good program should be. I don't think we're going to be able to help you decide which things should go and come but in terms of what are minimum or characteristics you can begin to measure yourselves against and see that here is the minimum that you should have in both statistical quality and quantity and stuff and analysis quality ÄÄÄÄ do something about that. DR. HENGARTNER: Nancy has this wonderful draft that she showed us about two or three meetings ago on where coal goes and where each survey's taken and so you have the pathway and you can see along the pathway each survey and the information it brings. You can ask yourself can you by looking at this pathway impute some of the answers. The only thing that goes through here is whatever comes from there and then I measure it down there again and that may be a place where you can cut corners. It's confirmation data. It's like being able to do accounting. Currently you're able to do accounting and being able to double check. Those two number should be the same. Do they match? If you want to save money you can get rid of that redundancy at the cost of not being able to do that matching. So statisticians can come up and look at this and say yes, we can publish and continue publishing the same numbers at the risk of not being able to double-check if those numbers are good. So there is redundancy in the system. We can help maybe identify some of it. I'm not sure how wise it is to get rid of it but if it's that or just shutting down then -- MR. GRUENSPECHT: We'll send you up to Congress to say you made a mistake. DR. BURTON: Well, that's what I was going to say. Why not turn this whole thing on its head? Do what we're talking about but do it in an effort to justify the necessary budget. We can't as a group say we've examined these issues if we haven't looked at all the data we need. DR. FEDER: I think if you charged a little bit for the data you'd know which ones are in greater demand. MS. FORSYTH: That's true. DR. HENGARTNER: Talking like an economist. DR. FEDER: As a consumer but I know this might anger some people. DR. SITTER: But this notion really ties in to your ÄÄÄÄ because what you were saying is we look at the end use, what are we going to do with it. Really, your point was that they really need an increase in funding for transmission. When you're doing the exercise you say this is what they need. What should we do to get it? What would it cost? I pointed out at the time, too, that there are two aspects to that, what new do we need and the change in deregulation, what do we give now that we're no longer are going to be able to do for the same cost because of the changing market? So those relate back to this question. You do a hypothetical budget cut and say this is what we will do if that's the case and this is the best we can do. And there are things we can do, doing less frequent surveys here and a combination with some analysis, less frequency there, and so forth but this is going to be the impact. And then you also have data on the trend of deregulation, which is going to impact it even if you had the same resources. And you just build the whole thing and say this is where we're going if you don't give us more money. MR. GRUENSPECHT: We will do that in our own way. There's a political thing. Obviously we're not saying gee, thanks, this is our target; here's what we're going to get rid of. We're definitely going to make a case. But is there anything from the academic side -- DR. SITTER: No, I just meant that the topics relate quite well so I think it would be a nice -- MR. GRUENSPECHT: I don't know. I don't have them well thought out but posing it to you. It's a little different than what I see as a meeting for this committee ÄÄÄÄ there's a possibility that there may be something there. MR. BERNSTEIN: I think it's probably worth us taking a shot at having a discussion about it and seeing what we can contribute to your decision-making process that you can utilize. I think it also comes back to the mission of EIA and how that's shifting at some level and defining what you want and deciding what it is you want and getting that filtered down. DR. HENGARTNER: I feel if that's the kind of topic you want to address the committee will need to do a little bit more homework. It's not just coming here and having the EIA give a presentation as shooting the breeze. I think it's a case where I would feel not very comfortable saying anything without having sat down and said okay, I've got to chop $5 million off this budget by doing this and trying argue why this would be a good thing and having other committee members argue against it. It seems that this might require more involvement from our part but if that's what we want I think we have to be honest. MR. WEINIG: You probably recall on James Hammitt's exit from the last meeting last fall he suggested something that we haven't really responded to about creating a committee to create a paper. If that's where you're headed it's not without precedent in terms of the committee's suggestion. DR. HENGARTNER: Another reaction of the committee when James said that is everybody said ÄÄÄÄ works. MR. GRUENSPECHT: And a free bagel. DR. SITTER: As an aside, our department decided on committees of one because committees take a long time to do anything. And I think if you had a committee of 12 trying to do such a thing it would just be nightmarish. If it ever got done it would come down to two or three people ÄÄÄÄÄ not a committee of this size. It just can't be done. MS. KIRKENDALL: Well, the other thing is if we're going to do something like that not only would some people have to look at in advance. We'd have to prepare something for them to look at in advance. DR. HENGARTNER: Yes. MR. BERNSTEIN: I wasn't suggesting that all of us go through a process of figuring out exactly what to do and I'm not sure that's what Howard asked. But we can have discussion about what are the types of things they need to take into account in making the decisions from a statistical validity-analytical validity standpoint. This is a group of outsiders and we can maybe help focus the discussion that they may end up having to have. And so we could lay out characteristics of what we think are important to maintain and develop that they can then use in evaluating programs. It's almost like us helping them set up a strategic evaluation plan and not actually do it. We're not actually going to say you should do this and this and this. I think that's what Howard's asking is can we actually help them come up with ways to evaluate, characterize, and trade off. MR. GRUENSPECHT: I mean, I don't know if in statistics this typically comes up where it's not like design a program but you have so much resource. There are probably some criteria you would use in building the program to do the best job with that resource. Some of those same criteria might be of value in dealing with the problem that we might face or maybe in reverse, unfortunately. But I'm wondering if there are manuals in the statistics literature on how to build programs of a certain size where you don't have unlimited resource but you need to build the best program. I don't know about the topic that the literature addresses. MS. FORSYTH: No. MR. GRUENSPECHT: Well, you're all still smart people anyway. But I think Mark's on the track of this notion, not actually if you were an administrator for a day or something what would you do but ideas on how to think about what you would do from the basis of the knowledge that you all have as statisticians and analysts. I think that might be not the actual doing it but about the criteria. Anyway it's just a thought. It's not like you've got to do it. DR. NEERCHAL: I just want to make a comment, basically, to supplement what has been said. I agree with Mark that I don't think this committee can really get into the nitty-gritty of how much -- MR. GRUENSPECHT: This survey, do this survey. DR. NEERCHAL: Right. On the other hand I think the session which I did not attend, one of the breakout sessions on survey quality assessment, in some sense it should give us that information, saying this is the quality you're getting out of this survey and you know how much it is costing you. I think putting those two information together I think that should be part of the discussion. If the committee ends up thinking about it as an overall utility of the survey then this will be an input in more detail. This is the survey, these are the affected people, this is the affected program, and this is how much it's going to cost you. You don't need to tell us that; I don't think we can do much with it. But at least as a group of people we can say that sounds like an important thing. As Randy put it earlier, as an admittedly uninformed group but a useful group to get feedback from. MR. GRUENSPECHT: The way I look at it there are probably a few people who are better informed than you and then there's you and then there are 280 million people who are much less informed than you. So no matter how much you want to pass yourself off as the uninformed group the fact of the matter is you're probably the most informed group there is outside of people who work here. DR. HENGARTNER: Well, God help us. DR. SITTER: I think if we're going to have that we would probably need some documents. MR. GRUENSPECHT: A document from us. DR. SITTER: Well, to package some of the things we've already seen that give a global view like this thought and how everything fits together, which surveys, the names, what they collect, which overlap, that kind of thing. And then I think some idea of breakdowns of costs. I don't know how you would do that but I would want to know at least some orders of magnitude about collection versus analysis, how much does a revision process cost. Like what we were talking about today, do we revise automatically or shouldn't we? And we're saying well, you have to balance cost. We have no idea what that means in terms of cost. MS. KIRKENDALL: Neither do we. DR. SITTER: So I think whether you get money or you lose money you're going to need to know what costs because the balance is going to come down to do we drop stuff or do we drop the quality of the stuff we have? Those are going to be two big issues. If we don't know the relationship between the cost of those two things it's going to be impossible. At least orders of magnitude, it's going to be hard for us to do too much. We can give sort of very broad stroke advice. DR. BURTON: Is there information available about usage by energy source by fuel type, by sector, industry versus government versus academic? Is there demand side information ÄÄÄÄ do you know? MS. KIRKENDALL: I don't really know. There may be something on web usage. We track a lot of web usage and to the extent you can break it into government industry, academia they may have something like that. I'm not sure that we have much else. We have listservs with e-mail addresses so you could look at them the same way. There's some information. One of the things we did last year was we have a similar list. I don't know exactly which list Nick's talking about but we have a sampling list of these surveys. We've got about 70 surveys and one of the things we did last year was to document which surveys were most important, least important, and not used in our various internal analytical products. So which are used in NEMS, which are used in STEO, which are used in our integrated products. DR. BURTON: That probably is a pretty good indication of what the rest of the world does with it too. MS. KIRKENDALL: It might well be. DR. BURTON: We're all trying to get to the same direction. DR. SITTER: One thing that always bothers me over the years that I've been here is that I was never really certain about the difference between EIA with a legislated role, and this is what EIA's here for, and EIA as a demand customer-run organization. I mean, you're sometimes measuring your quality by your users. And you're saying we've got to go to our users, we've got to go to our users, we've got to go to our users. In some sense that means you're driven by a customer base. But at the same time why are you here? And that seems to be primarily to advise the government. I mean, that's why you're here. Those are completely different. That data gets at the users and the customer aspect of who you are. It's like there are these competing things when you're going to talk about whether we should get rid of this. I mean, it could be that this is absolutely your most popular survey but absolutely useless, never used by the people who actually created you. I don't know but those two things seem to be not really in conflict so much as we'll talk about one in one session, and in another session we'll look at it in a completely different way. I'm always a bit confused as to that. Where should you be looking? Should it matter at all what your users are? You seem to care a lot and I think some of us are users. I'm not a big user. So you do care about users but at the same time I'm not entirely sure that that's really your mandated role. I don't know. MS. FORSYTH: Well, you could elicit the utility functions. You could go that far. I mean, maybe that's not a big deal. If you know in each sector who important users are so who the important government users are and who the important nongovernment users are, I mean, that would be fun. DR. SITTER: But you can see even this discussion raises self-analysis issues which you're going to have to face if you actually face budget cuts. Some of the questions you're asking us are more like who is EIA. What Mark was saying, what is the end product you want to deliver. If you step even more above that it's like who are you. Why do you want to deliver this? Is it because the President of the United States says I want to deliver this? Not really. You're allowing yourself to be driven by a market in some way and I'm not sure which part of the market is most important to you. MS. KHANNA: I feel that's a really good question because that goes back to the point about is this is my budget this is what I can do. Take it or leave it. Who makes that decision? If it's a government agency if they're going to go from their perspective then you're losing your other end users. DR. SITTER: I think that it comes down to a US lobby system. It's more complicated than that because if it were just who directly uses it there that would be one thing. But you know that if you take away somebody, a very powerful segment of the industry that wants this data, they will come at you through the people that can affect your budget. You know that better than I do. But I think that this notion of who is your market and who are the important ones is going to have to be a major decision. Then you can put some utility on it and categorize, put a score, if you like, on different aspects of the data and say to yourself okay, should we reduce quality here, should we eliminate this, and so forth. If you ask us specific points you're probably going to get the wrong answer because, I mean, this whole issue of greatest latest versus citation gave you a very good example of that because you talked to everybody else. Nobody talked about the fact that we'll be happy if it's three months old as long as we can go back and cite it. So, clearly, we're not the same group you're dealing with out there. If you look at specifics we may not be able to give you the answer that's correct for the decisions you're making on that kind of specifics. DR. HENGARTNER: It seems that there's definitely material for a session on this next time. MS. FORSYTH: This might be off target but it seems like part of what you need, and I don't know whether this group is the right place or not, but part of what you need is a strategy for mapping what's important in those two domains, and maybe there are more than two domains, and figuring out where they overlap because where they overlap is maybe the place where you can do the most good with limited budgets. I'm not a quantitative type but there are probably models for that kind of measurement problem. MR. BREIDT: Well, I think we'll go ahead and adjourn the meeting formally now. (Whereupon, at 11:30 a.m., the PROCEEDINGS were adjourned.) * * * * *