wpe3.jpg (3386 bytes)
ICES - II
International Conference on
Establishment Surveys - II
Survey Methods for
Businesses, Farms, and Institutions
June 17-21, 2000     The Adam's Mark Hotel     Buffalo, New York
wpe2.jpg (3349 bytes)
wpe1.jpg (2836 bytes)
 
The Software Demonstration Program

MONDAY, June 19, 2000
8:30 a.m. - 1:00 p.m.
SESSION I: ELECTRONIC DATA COLLECTION
 1.  Computerized Self-Administered Questionnaires  U.S. Bureau of the Census
 2.  Electronic Forms  U.K. Tariff & Statistical Office
 3.  E-mail surveyer  ISTAT, Italy
 4.  Touchtone Data Entry and World Wide Web  U.S. Bureau of Labor Statistics
 5.  USDA-NASS Data Warehousing  U.S. Department of Agriculture
 6.  Common Collection and Processing System  U.S. Energy Information Administration
 
1:30 - 6:00 p.m.
SESSION  II:  Editing, Imputation, and Analysis
 7.  SLICE: A Framework for Editing and Imputation  Central Bureau of Statistics, Netherlands
 8.  AGGIES: Edit & Imputation  U.S. Department of Agriculture
 9.  SOLAS for missing data analysis   Statistical Solutions, United States / Ireland
10.  QCDAS (Quality Control & Data Analysis System)  Statistics Canada
11.  Dead Graphs, RIP - New Interactive EDA Graphic Techniques  U.S. Bureau of the Census
12.  Graphical Editing Analysis Query System  U.S. Energy Information Administration
13.  INSIGHT: A Visualization Framework for Survey Data  BT Research Labs, U.K.
 
TUESDAY, June 20, 2000
8:30 a.m. - 1:00 p.m.
SESSION  III:  Sampling and Estimation
14.  Random Digit Dialing  Marheting Systems Group, United States
15.  The Sample Planning Tool  Research Triangle Institute, United States
16.  GSAM (Generalized Sampling System)  Statistics Canada
17.  GES (Generalized Estimation System)  Statistics Canada
18.  BASCULA  Central Bureau of Statistics, The Netherlands
19.  Ag@ccess: A Geographical Estimation System  Abare, Australia
20.  WesVar  Westat, United States
 
1:30 - 6:00 p.m.
SESSION  IV:  Integrated Systems and Utility Systems
21.  StEPS (Standard Economic Processing System)  U.S. Bureau of the Census
22.  Sprocet (Survey Processing System)  Statistics New Zealand
23.  NASS's Record Linkage System  U.S. Department of Agriculture
24.  ACS  (Automated Cell Suppression)   Sande & Associates Inc., United States
25.  X-12-ARIMA:  Time Series Modeling  U.S. Bureau of the Census
26.  TRAMO - SEATS: Time Series Modeling  Gomez-Maravall, Spain
27.  SEASABS: Seasonal Analysis  Australian Bureau of Statistics
 



 
SESSION  I:  Electronic Data Collection
 
1.   Computerized Self-Administered Questionnaires
U.S. Bureau of the Census
Demonstrator: Diane Harley, Kimberly Pressley
 
The Electronic Reporting Staff (ERS) was established in 1992 to coordinate electronic reporting for the 1992 Economic Census.  Since then, electronic reporting has expanded and now includes three annual surveys and one quarterly survey:  
The Computerized Self-Administered Questionnaire (CSAQ)  provides companies with the capability of exporting and importing information and data files, facilitates linking with corporate databases, and provides interactive edits.  Traditionally, we mail a diskette to the company containing the CSAQ and prelisted data.  The company installs the software, completes the survey then sends the data back to Census via mail diskette or modem.

For the 2002 Economic Census, our goal is to offer electronic reporting via Internet or CSAQ to all respondents.  We will achieve this goal by designing all of the collection instruments from one source using the Generalized Instrument Design System (GIDS).  The Economic Directorite has contracted to build GIDS. GIDS is a development tool used for the creation, administration, and maintenance of surveys and other types of form-based data collection activities.  GIDS will allow analysts the ability to create sophisticated electronic and paper surveys via a user-friendly graphical user interface.
 

2.   Electronic Forms
Tariff & Statistical Office, United Kingdom
Demonstrator: Jon Walmsley
 
An electronic form for the capture of data from enterprises, required to submit data for the Intrastat survey for the compilation of Intra EC trade statistics, was implemented in December 1998. The form is delivered over the Internet and includes extensive front end validation of data before submission. The form also provides a file import facility for traders to submit data electronically. Trader take up of the service has been very good resulting in increased accuracy and timeliness of data received and reduced resource requirement at the administration and enterprise. A new version of the form is currently under development, alongside other similar data capture services, with planned release first quarter 2000.

The form has been presented/demonstrated, and well received, at several domestic and European meetings, including most recently ETK ‘99, (October, Prague). The service of, which it is part, practically demonstrates techniques and/or benefits in the following areas:

 
The Demonstration can either be ‘live’ or by use of a personal web server.
 
3.   E-mail Surveyer
ISTAT, Italy
Demonstrator: Francesco Grasso
 
The software package (called “E-mail surveyer” in the following) is a set of tools developed for the automation of the process of capturing survey data of statistical businesses (via E-mail).
It includes:
  1. A software tool for automatically building of a list of respondents’ E-mail addresses.
  2. An electronic questionnaire implemented as an HTML page with validation and mandatory field controls in JavaScript (for controlled data imputation). In our solution, personalized questionnaires are disseminated by E-mail (each questionnaire presents name and identificative data of the relative respondent).  The respondents also send back their answers by E-mail.
  3. A tool for receiving the answers, decoding them and loading the decrypted data into the final archive for further processing.
  4. A log generator.
 
This package has been used within our institute in two business surveys:
  1. the Internet Providers survey;
  2. the inquiry on provisional value added.
 
Since this package is easy to customize, we are thinking of using it soon in other future surveys. The “E-mail surveyer” has been developed at Istat using some “of-the-shelf” commercial software, such as Eudora as the E-mail software and FrontPage for designing the questionnaire layout.  It also can manage several databases as final archives (e.g. Oracle, Excel, Access).
 
4.   Touchtone Data Entry and World Wide Web
U.S. Bureau of Labor Statistics
Demonstrators: Richard Rosen, Chris Manning, Louis Harrell

The Current Employment Statistics (CES) survey, conducted by the US Bureau of Labor Statistics, is a monthly survey of about 380,000 business establishments.  CES collects, analyzes, and publishes data on employment, hours, and earnings at the national, state, and area levels.  CES data, widely viewed as a major economic indicator, are published monthly after only two and a half weeks of collection.

Traditionally, CES data were collected by mail.  However, in the mid-1980s, we began offering automated collection methods such as Touchtone Data Entry (TDE), and more recently, World Wide Web (WWW).   These methods now constitute the bulk of CES data collection.

Touchtone Data Entry is applicable in many types of surveys, and is useful for any information collection effort requesting numeric and yes/no answers.  The CES implementation of TDE allows respondents to dial a toll-free 800 number and report data using the numbers on their telephone keypad.  TDE uses broadcast FAX technology to send advance notice and nonresponse prompt messages to respondents.

World Wide Web data collection offers significant potential for collecting high quality data at low cost for all types of surveys.  The CES application offers data entry and basic on-line integrity edits. Data security is maintained through the use of a digital ID and the Secure Socket Layer protocol.  Web data collection uses e-mail for advance notices and non-response prompts.

Both TDE and WWW have led to product and customer service improvements, such as more accurate microdata, more timely responses, simplified reporting, and improved customer access to our survey products.
 

5.   USDA-NASS Data Warehousing
U.S. Department of Agriculture
Demonstrator: Douglas Boline

NASS believes we can reduce respondent burden, improve the quality of data collected, and maintain high response rates by maximizing the use of information that we already know from previous survey responses.  Therefore, NASS has developed an integrated, easy to use, high performance Data Warehouse System.  NASS utilizes Redbrick ODBC as our data store software.  This system already contains over one-half billion records of survey and census data from farm operators and is providing improvements to our survey process.  NASS queries the Redbrick ODBC using Brio Technologies Explorer software.  Improvements in data quality and survey management are expected as the Data Warehouse is fully integrated with NASS's sampling, survey management, data collection, data analysis, and estimation systems.

Three lessons learned from our implementation are: 1) Select the dimensional or star schema design for the Data Warehouse, 2) choose a database that is optimized for very fast data loading and data access using the dimensional design, and 3) select a primary data access interface tool that users can point and click/drag and drop.

Our live demonstrations will illustrate these three critical success factors: 1) Our easy to understand dimensional data warehouse design, 2) the rapid ad-hoc query response times against our data warehouse, and 3) the ease of using the access system.
 

6.   Common Collection and Processing System
U.S. Energy Information Administration
Demonstrator: Patty MacNaught
 
The Energy Information Administration (EIA), the independent statistical agency within the United States Department of Energy, developed the Common Collection and Processing System (CCAPS) to provide a comprehensive online repository of energy information.  Data operations at the Department of Energy are planned and conducted to monitor fuel lines, including coal, oil, nuclear, and natural gas energy.  The previous data collection system used 83 survey instruments to question approximately 120,000 respondents.  The collecting and analyzing was done by four EIA program offices, which include seven divisions and thirteen branches.  Over the years, an estimated 35 separate systems had evolved to collect and process the data.

In 1996, the first Joint Application Development (JAD) session was held, which brought together users and system developers in a series of structured workshops to determine how best to centralize, standardize, and streamline EIA data collection and processing.  Many meetings and much coding later, the Common Collection and Processing  System (CCAPS) was developed and  released in late 1998.

The Common Data Collection and Processing (CCAPS) system is a repository of electronic working data maintained by the Energy Information Administration.  Using CCAPS, EIA personnel collect data from EIA survey respondents and energy industry resources, integrate and analyze this data, and disseminate the resulting information in electronic and printed formats.

CCAPS was developed using Visual Basic and a SQL Server database engine.  Currently 10 surveys are in productions and another 20 surveys will be incorporated within the next year.

A Master Universe Database (MUD) was also developed to track all potential EIA respondents and their affiliates.  This system provides users with one place to analyze all EIA contacts and works in conjunction with CCAPS.



 
SESSION  II:  Editing, Imputation, and Analysis
 
7.   SLICE: A Framework for Editing and Imputation
Central Bureau of Statistics, The Netherlands
Demonstrators: Wim Hacking, Hans Wings, Lon Hofman

Data for statistical agencies are becoming more and  more digitized. Also, administrative data files are increasingly being used, either directly for tabulation or for matching and mass imputation. In addition, budget constraints call for more efficient data processing methods. In the course of automating part of the data processing the SLICE project has been started. SLICE is a general framework for modules (based on COM) that process data for statistical purposes. Its main activities are editing / checking and imputation. Other activities, such as weighting and variance estimation are done in Bascula (see the paper of N.J. Nieuwenbroek et al.); in the future, though, Bascula will also become one or more modules in SLICE. The SLICE modules can be connected to define the route of processing. There is also a graphical interface, where the user can define these connections and configure the modules.
 
Currently, a few modules have been implemented and are being tested: an editing prototype based on an extended version of the Fellegi-Holt paradigm to automatically detect errors record-wise, a module to aggregate data in order to detect/edit outliers graphically (see also the paper on MacroView by T. de Waal et al.) and a module based on the Kosinsky algorithm to automatically detect outliers. Data input/output can be any OLEDB source or a Blaise database.
In the future the SLICE program will be embedded in the Blaise suite.
 

8.   AGGIES: Edit & Imputation
 U.S. Department of Agriculture
Demonstrator: Kara Perritt
 
The Research and Development Division of the National Agricultural Statistics Service (NASS) is developing and evaluating an automated edit and imputation system called the Agricultural Generalized Imputation and Edit System (AGGIES).  Using methodology based on GEIS, Statistics Canada's Generalized Edit and Imputation System, AGGIES is programmed in SAS® with object-oriented features that make for a user-friendly environment.  The system, designed to edit non-negative, continuous values, is structured in a modular fashion and offers NASS several potential advantages over current editing procedures.  First, the editing and imputation functions are fully automated eliminating the need for the extensive manual data reviews currently done.  Second, these functions are performed objectively allowing for more consistency throughout the editing process.  Third, it is written in a language that is heavily used throughout the agency.  Thus, integration with tools currently being used will be simplified.  Finally, the system can be easily applied to any number of surveys and censuses, thereby conserving resources to the development and maintenance of a single system.  Evaluations using data collected through surveys and censuses conducted by NASS have been completed by comparing AGGIES output to current processing output.
 
9.   SOLAS for missing data analysis
Statistical Solutions, United States / Ireland
Demonstrator: Neil Geary, Fiona O'Callaghan

SOLAS 2.0 for Missing Data Analysis is a windows based software tool for data imputation and missing data exploratory analysis that provides a choice of both Multiple Imputation and Single Imputation methods.

The Single Imputation methods available in SOLAS include; Hot Decking, Regression Imputation, Group Means and, Last Value Carried Forward.

Multiple Imputation was originally proposed by Rubin in the early 1970’s as a possible solution to the problem of survey nonresponse, to address the failings of standard analyses of incomplete datasets. The idea behind Multiple Imputation is that for each missing value in a dataset, we impute several values (M) instead of just one, to represent the uncertainty about which values to impute.

In SOLAS, users have two Multiple Imputation approaches to choose from, namely: a predictive model-based approach, where the predictive information contained in a user-specified set of covariates is used to predict the missing values, or a propensity score-based approach, in which cases are grouped according to their probability of being missing (i.e. propensity score) and then an approximate Bayesian bootstrap is applied to sample observed values to impute the missing values.

This demonstration will include several examples of how SOLAS can be used to perform multiple imputation on survey-type datasets containing both continuous and categorical data. SOLAS is currently licensed by many survey organisations including the National Opinion Research Center (US), AC Nielsen, Statistics Denmark, and Statistics Finland.
 

10.   QCDAS (Quality Control & Data Analysis System)
Statistics Canada
Demonstrators: Keith Davis, Walter Mudryk

The QCDAS is a generalized micro-computer based data analysis system that was developed for the specific data analysis needs of the Quality Assurance Methods Section in Statistics Canada. The system’s generalized capabilities can easily be adapted and made suitable for managing and analyzing continually updated application data sets on an on-going basis.  The system was developed using Microsoft Access 97 and fully utilizes its architecture and functionality along with its Visual Basic programming language.

This generalized system is suitable for any organization that manages and analyzes data sets on a continuous time series basis.  The framework of the system provides tools that can be used to customize edit specifications for any type of input data, define file record layouts, design and develop algorithms and formulae used to tabulate, estimate and analyze data over time, as well as design and produce reports, graphs and charts as required.  This system can be used for applications such as longitudinal data analysis, cyclical tabulations, research analysis and other analytical studies.

The system was designed with flexibility, simplicity and user friendliness in mind.  The system utilizes generic templates and interactive screens to help develop the customized data analysis algorithms and specify the desired outputs.
 

11.   "Dead" Graphs, RIP - New Interactive EDA Graphic Techniques
U.S. Bureau of the Census
Demonstrator: David Desjardins
 
This workshop is designed to show user how the interactive nature of the new Exploratory Data Analysis techniques has led to a true revolution in data analysis methodology.

Four key factors contribute to this "revolution" in data analysis -- and make the introduction of these EDA methods at this time a momentous opportunity.  First, these graphics software packages provide analysts with the ability to generate hundreds of graphs in a matter of mere minutes -- a feat that would have taken months or weeks to do just a few years ago.  These graphs can yield a large number of different insights into the data.   Second, these software packages often allow sophisticated graphical methods of looking at the data and reviewing subcomponents of it.  If the analysts believe the data is in error, then they can easily correct it in an interactive manner using these point-and-click tools.  Third, this software is available at very low cost -- a very comprehensive student version of the SAS JMP-In PC software package is available for less than $60 (with a 500+ page statistical data analysis manual).  Fourth, and most important, individuals using the software are not locked into fixed ways of looking at the data.  By using the above hardware and software tools, we (see particularly DesJardins 1998) have developed new graphical forms and special techniques that greatly enhance the speed and efficiency of data editing/analysis tasks.  Analysts no longer need to waste their time (and valuable subject matter expertise) trying to edit their data with fixed methods and cumbersome, boring, tabular printouts.

Further, Graphs also have the extraordinary ability to communicate across a wide area of expertise.   They can thus make some sophisticated statistical concepts clear to laymen.  Because of this, Statisticians can now more quickly/effectively explain to subject matter specialists the fundamental concepts behind these new graphical data analysis techniques -- and instead focus the majority of their efforts on improving our methodology.

Accordingly, The U.S. Census Bureau is entering a whole new world of data analysis capability.  Again, this is made possible by new, very fast hardware (i.e. Pentiums and Unix Workstations) and powerful, easy to learn/use, interactive point-and-click software (JMP and INSIGHT from SAS Institute).  Formerly, analysts had to learn the intricacies of programming or wait for systems development efforts to produce custom software that they needed for their data analysis tasks.  Instead, in conjunction with a quick, 40 hour, EDA course taught by Mr. DesJardins, analysts are being taught a variety of powerful EDA techniques using an easy to learn (basically point-and-click) software. The design of this courseware is revolutionary in two other ways as well.  First, it stresses a multivariate analysis -- or all of the variables on the survey form -- allowing for comparisons between hither-to-fore never compared variables in these data sets -- aimed at gaining a real understanding of these data. Second, it is designed for subject matter specialists who have only a moderate statistical background -- to give them these key insights into their data.
 

12.   Graphical Editing Analysis Query System
U.S. Energy Information Administration
Demonstrator: Paula Weir
 
Seven years ago at the first ICES, the visualization of data as an efficient method of editing survey data was presented.  Since that time, a number of graphical editing systems have been developed that exploit exploratory data analysis techniques.  In this demonstration, the most recent version of the Graphical Editing Analysis Query System (GEAQS) will show how macro editing is performed on survey aggregates through anomaly maps, box-whisker plots, and time series graphs with point and click drill down from highest to lowest level aggregates.  Micro data contributing to anomalous macro data will then be viewed on a split screen using scatter graphs and time series graphs with data points mapped to the corresponding row of supplementary or metadata within a spreadsheet to identify and prioritize  edit "failures." This visual exploratory approach enables the user to view the relationships among aggregates and among respondents over time and edit data within this informative perspective, regardless of predetermined groupings, thresholds, boundaries, etc.
 
13.   INSIGHT: A Visualisation Framework for Survey Data
BT Research Labs, U.K.
Demonstrator: David Yearling
 
Conducting employee satisfaction surveys within large corporations can be problematic both in planning and analysis.  Although Statistics is used to provide a common language to convince others of a particular state of nature from the survey data, it can be confusing and difficult to use.  What is required is an intuitive tool, which paints pictures of the data using easy to understand descriptive and exploratory statistics.  Following this general strategy, it is quite straightforward to develop fully functioning concept demonstration software using Rapid Application Development (RAD) tools.  This software demonstration tackles these issues and provides survey data with a visual structure, in this case abstract business models.  It allows novice users to explore the data interactively, and discover patterns and relationships.  These can then be used to raise issues and provide information for further discussion and investigation.  It is hoped this approach will be adopted as the main intranet based delivery mechanism for all company wide employee attitude and satisfaction studies within BT.  How this will be achieved is also discussed.
 


 
SESSION  III:  Sampling and Estimation
 
14.   Random Digit Dialing
Marheting Systems Group, United States
Demonstrator: Frank Markowitz
 
In 1987, the GENESYS Sampling Systems division of Marheting Systems Group was founded, in part, to provide a wide array of sampling options to the survey research community. One of these options, which is unique to the survey research industry, allows researchers to design and generate their own Random Digit Dialing sample on a PC at their location. These samples can be based on virtually any geographic area ranging from national down to the census tract level or various demographic characteristics such as race/ethnicity, income, etc. This in-house sampling system also provides a number of additional features that are useful in demographic analysis, viewing of working number rates, sample size calculations, exploration of the efficiencies of alternative sampling plans and related sampling considerations. In addition to regular updates of the various databases utilized within the system and ongoing customer support, the company continues to provide custom sampling consultation to users.  All of this will be illustrated as the system is described in detail and actual samples are generated during the demonstration sessions of the conference.
 
15.   The Sample Planning Tool
Research Triangle Institute, United States
Demonstrators: Jill D. Kavee, Robert E. Mason, Timothy W. Elig
 
Research Triangle Institute has developed software to enable researchers to optimize the size and allocation of the sample needed for a survey.  This software, known as the Sample Planning Tool, is written in Visual Basic and executed in Microsoft Access 97.

The Sample Planning Tool uses a non-linear optimization algorithm for computing the sample size and allocation.  This algorithm computes an allocation that minimizes the specified cost model while simultaneously meeting or exceeding the required precision constraints.  This is accomplished by providing a point-and-click interface to assist the user with the following steps: (1) specifying the sampling design, (2) stratifying the sampling frames, (3) constructing cost models, and (4) specifying precision requirements.

Examples will include the sample allocation developed for the 1994/1995 Status of the Armed Forces Surveys and an optimal sample allocation for commercial establishments.
 

16.   GSAM (Generalized Sampling System)
Statistics Canada
Demonstrators: Ron Carpenter, Lyne Guertin
 
The Generalized Sampling System was developed at Statistics Canada to meet the sample design requirements of many of its surveys.  The interactive nature of GSAM permits the user to examine different sampling strategies for a survey period, and chose the one that best meets his needs.  The current version 2.0 includes four major functions which are designed to support periodic surveys as well as ad hoc surveys.  These functions are: frame maintenance, stratification, allocation and sampling.
 
The frame maintenance is used to keep track of the units from period to period, to update the information on existing units, and to add and delete units from the survey frame.  The stratification function offers the cumulative square root f rule, a simple clustering algorithm and user-defined rules to stratify the sampling frame.  The allocation module offers two optimal strategies which are solved using a modified Bethel algorithm. You can either choose to minimize the survey costs under given CV constraints, or do a power allocation to minimize a weighted variance under given costs.  The sampling function allows for simple random sampling, controlled sample rotation or maximization of sample overlap. Sample selection is based on the collocated sampling technique.
 
GSAM V2.0 is a SAS®-based application. It runs under SAS 6.12 on all Microsoft Windows™ operating systems.
 
17.   GES (Generalized Estimation System)
Statistics Canada
Demonstrators: Victor Estevao, Lyne Guertin
 
The Generalized Estimation System (GES) was developed at Statistics Canada to meet the estimation requirements of many of its surveys. The use of auxiliary information in GES has led to a modern framework for domain estimation. The current version 4.0 handles the estimation requirements of stratified one-stage cluster or element sample designs. GES provides four main functions: calculation of sample design weights, calculation of g-weights under a calibration approach, calculation of calibration estimates, and calculation of synthetic estimates. Domain estimates can be produced using calibration or synthetic estimation. Calibration is an extension of theory of the generalized regression estimator.
 
GES can produce estimates for the entire survey population or any specified domain of interest within this population. It can estimate the number of units in each domain, the total or mean of any survey variable or the ratio of two variables.  The product can be used for social or business surveys, whether large or small. It has been designed to facilitate the estimation requirements of periodic surveys but it can also be used for ad hoc surveys. It is suitable for statistical agencies, market research companies, polling firms and other organizations requiring estimates from sample based surveys.
 
GES V4.0 is a SAS®-based application. It runs under SAS 6.12 on all Microsoft Windows™ operating systems.
 
18.   BASCULA
Central Bureau of Statistics, The Netherlands
Demonstrator: N.J. Nieuwenbroek
 
At Statistics Netherlands the software package Bascula version 3.0 has been developed combining weighting of sample data using auxiliary information with variance estimation based on balanced repeated replication (BRR). Much attention has been paid to implement various techniques in an easy and user-friendly way. It neatly fits in the general structure of Blaise in that it is capable to use both data and meta-data information provided by the Blaise system. The package is already in use for various person and household surveys. An increasing acceptance with business surveys is expected. Eventually Bascula will be a part of a general processing and estimation environment in which the whole process of outlier detection and handling, (micro and macro) editing, imputation (for unit and item nonresponse) and weighting of the clean records will be integrated. We will report on our experiences with Bascula and on extensions that we have already planned.
 
19.   Ag@ccess: A Geographical Estimation System
Abare, Australia
Demonstrator: Terry Neeman
 
Ag@ccess is a software program designed to provide user-defined area estimates of averages of farm financial performance and farm physical characteristics for broadacre farms across Australia.  This software, together with a database of farms in the grains industry, Grains@access, is currently available for use by the farming community on the Internet at www.abare.gov.au/research/grainaccess.htm.  The user can circumscribe an area in an Australian agricultural region, and the program will display tables of estimated average financial performance and average farm physical characteristics for all farms and all top-performing farms in the defined area.  In addition, contour maps of individual variables can be generated, showing patterns of regional variability across Australia.

The data used to generate the tables and maps come from ABARE's annual farm survey of approximately 1600 broadacre farms across Australia.  Estimates of local averages are calculated using a kernel smoothing function, adapted to account for survey weights.  Areas where there is insufficient sample coverage are masked to ensure that the confidentiality of individual farm dat is not breached.

Also to be demonstrated is the in-house software "Smooth Operator".  This application processes farm data into Ag@ccess data files or text files consisting of smoothed gridded geographic information.
 

20.   WesVar
Westat, United States
Demonstrator: Richard Valliant
 
When complex survey designs are used to collect survey data, special techniques are needed to obtain meaningful and accurate analyses.  WesVar computes estimates and replicate variance estimates that reflect complex sampling and estimation procedures.  WesVar is flexible and can be used with a wide range of sample designs, including multi-stage, stratified, and unequal probability samples.  The replicate variance estimates can reflect many types of estimation schemes, including poststratification, raking, and ratio estimation.

Estimates and standard error estimates can be calculated for totals, means, and percentages in multi-way tables.  Standard error estimates can also be easily computed for complex functions of estimates, including ratios, differences of ratios, and log-odds ratios.  WesVar calculates standard errors, coefficients of variation, and confidence intervals for the survey estimates you specify and calculates chi-square tests of independence for two-way tables of weighted frequencies.  WesVar also computes estimated coefficients and their standard errors for linear and logistic regression models and tests the significance of subsets of linear combinations of parameters.

WesVar includes five replication options for estimating sampling errors: balanced repeated replication (BRR), three jackknife methods (JK1, JK2, and JKn), and Fay's BRR method (FAY).

WesVar operates under Windows 95, Windows 98, or Windows NT 4.0.
 



 
SESSION  IV:  Statistical Survey Utility Systems
 
21.   StEPS (Standard Economic Processing System)
U.S. Bureau of the Census
Demonstrator: Doug Hallam, Aref Dajani
 
The Standard Economic Processing System, known as StEPS, is an integrated and generalized processing system developed for economic surveys at the U.S. Bureau of the Census.  StEPS is coded in SAS® software to run in a UNIX operating environment.  Fifty annual surveys processed on StEPS during calendar year 1999.  Another 30 surveys are slated to move to StEPS in 2000.

The StEPS software is designed to handle these post-collection activities: data editing, imputation, data review and correction, data query, estimation, variance estimation, disclosure, and analysis.  Additionally, StEPS has links that support collection technologies for mailout, check-in, and data capture.  All of these modules are encased in a GUI interface that walks users through the available functionality.

The demonstration of the StEPS software will show users the many features this system has to offer.  In addition to the post-collection functionality mentioned above, members of the StEPS team will demonstrate other interactive modules to administer surveys, enter parameters that tailor generalize code to a specific survey, and access tools including those available through SAS -- such as SAS INSIGHT® and SAS ASSIST®.
 

22.   Sprocet (Survey Processing System)
Statistics New Zealand
Demonstrator: Ray Freeman, David Archer
 
Sprocet ( SNZ's Survey Processing Template ) is a reusable survey processing system build using Lotus Notes. This is SNZ's answer to the challenges and costs associated with developing and operating many different business survey systems.

Sprocet is a survey application template that can be copied and modified for each survey specific processing system. The aim has been to retain the best of the survey application template features while adding, at a marginal cost, the required survey specific features.

It is a blend of making those things standard that can be made so, while allowing specific customisation and flexibility for specific survey circumstances. This has several competitive advantages.:
 

 
23.   NASS's Record Linkage System
U.S. Department of Agriculture
Demostrators: Greg Chong, Kara Broadbent
 
The National Agricultural Statistics Service (NASS) uses record linkage to maintain it's list sampling frame and to merge new list sources with the frame.  NASS began development of a new record linkage system in the early 1990s. A commercial software package, AutoMatch, was selected to be the core engine of the matching system. Front and back ends were built around the AutoMatch program using Sybase PowerBuilder development software. Data for the record linkage system are stored in a Sybase database.

The back ends were developed to facilitate online review of matches, possible matches, and/or nonmatches. They also allow users to update information on the list frame with data from the record linkage system. The demonstration will primarily focus on the back-ends of the system. Front ends are currently under development to aid in file preparation and development of matching parameters.  These front ends include default parameter sets for files which are matched against the frame on a routine bases. The additional functionality gained by the integration of the front and back ends with the AutoMatch software may be helpful to other organizations working with record linkage projects.
 

24.   ACS (Automated Cell Suppression)
Sande & Associates Inc., United States
Demonstrator: Gordon Sande
 
The ACS Suite of software builds upon the technology proven in the Statistics Canada CONFID system to provide a comprehensive solution to the development and auditing of cell suppression patterns for the protection of the confidentiality of respondents to establishment based surveys. The methods have been extended to include additional look ahead and reordering heuristics to determine complementary suppressions as well as extended analysis of groupings of cells which are sensitive. Other extensions include self consistent table completions and analysis for price indices. The new implementation uses easily understood commands which can be used by end users and which they can use as documentation of all their processing steps.
The self contained system starts from a micro data (and meta data) file of establishment responses and returns a control file specifying the publication status of the cells in the publication. The aggregation of establishments to enterprises and the identification of sensitive cells is done in a tabulation component. The working tables follow the structure used in the cell suppression analysis, with publication quality tables left to the main survey automation. Utilities provide for specification of unique subject matter requirements of the survey. The entire process is scripted and rerunnable so that the design of the suppression pattern can be completed before the final data is available, and then redone completely as a rehearsed activity of low delay when the final data becomes available.
 
25.   X-12-ARIMA: Time Series Modeling
U.S. Bureau of the Census
Demonstrator: David F. Findley

X-12-ARIMA is the Census Bureau's new time series modeling and seasonal adjustment program.  It provides four types of enhancements to X-11-ARIMA: (1) Extensive robust time series modeling and model selection capabilities for linear regression models with ARIMA errors; (2) Alternative seasonal, trading day, and holiday effect adjustment options, including the estimation of effects described by user-defined regressors; (3) New diagnostics of the quality and stability of the adjustment achieved by any set of specified options; (4) A new user interface with features to facilitate the analysis and adjustment of large numbers of series. X-12-ARIMA has been adopted for the production of official adjustments by statistical offices in the U.S., Europe, and Asia. X-12-Graph is a companion graphics program that is written in SAS but does not require SAS knowledge of its users. It offers many types of diagnostic graphs for time series modeling and seasonal adjustment.
 

26.   TRAMO - SEATS: Time Series Modeling
Gomez-Maravall, Spain
Demonstrators: Agustin Maravall, Victor Gomez

Two programs will be demonstrated: Tramo ("Time Series Regression with ARIMA Noise, Missing Observations and Outliers") and Seats ("Signal Extraction in ARIMA Time Series").
 
Tramo is a program for estimation and forecasting of regression models with possibly nonstationary (ARIMA) errors and any sequence of missing values. The program interpolates these values, identifies and corrects for several types of outliers, and estimates special effects such as Trading Day and Easter and, in general, intervention variable effects. Fully automatic model identification and outlier correction procedures are available.
 
Seats is a program for estimation of unobserved components in time series following the so-called Arima-model-based method. The trend, seasonal, irregular, and cyclical components are estimated and forecasted with signal extraction techniques applied to ARIMA models.  The standard errors of the estimates and forecasts are obtained and the model-based structure is exploited to answer questions of interest in short-term analysis of the data.
 
The two programs are structured so as to be used together both for in-depth analysis of a few series (as presently done at the Bank of Spain) or for automatic routine applications to a large number of series (as presently done at Eurostat). When used for seasonal adjustment, Tramo preadjusts the series to be adjusted by Seats.
 

27.   SEASABS: Seasonal Analysis
Australian Bureau of Statistics
Demonstrator: Craig McLaren
 
SEASABS  (SEASonal  analysis ABS) is a "knowledge-based" seasonal analysis and adjustment system used by the ABS (Australian Bureau of Statistics) for our time series  data.   It  allows  expert  and  non-expert use of the ABS enhanced  X11 method of seasonal adjustment.  SEASABS has an intelligent interface that guides the  user  through  the seasonal analysis process, making appropriate choices of parameters  and  adjustment  methods.  SEASABS  keeps  records  of  the previous analysis  of  a  series  so it can compare X11 diagnostics over time and "knows" what  parameters and prior factors lead to the acceptable adjustment at the last analysis.

SEASABS creates seasonal adjustment factors for the series and gives the user an indication  of  the suitability of these factors.    It  identifies and corrects trend and seasonal breaks as well as extreme values, inserts trading day factors if  necessary, chooses appropriate moving averages for the computation of trends and seasonal factors, and allows for moving holiday corrections.  The history of changes to the parameters and prior factors can be viewed.

Graphs  of  original,  seasonally  adjusted,  trend, seasonal/irregular, trading day/irregular,  X11  outputs and  facilities, such as decomposition, sensitivity analysis and the effects of variable henderson filters are available.

SEASABS provides, not only the ability to adjust, but also tools to analyse time series.