Effective Clinical Practice


The Healthcare Cost and Utilization Project: An Overview

Effective Clinical Practice, May/June 2002

Claudia Steiner, MD, MPH, Anne Elixhauser, PhD, Jenny Schnaier, MA, Center for Organization and Delivery Studies, Agency for Healthcare Research and Quality, Rockville, Md

Database. Healthcare Cost and Utilization Project (HCUP)—a family of databases including the State Inpatient Databases (SID), the Nationwide Inpatient Sample (NIS), the Kids' Inpatient Database (KID), and the outpatient databases State Ambulatory Surgery Data (SASD) and State Emergency Department Data (SEDD).

Description. Multistate, inpatient (SID, NIS, KID) and outpatient (SASD, SEDD) discharge records on insured and uninsured patients.

Source. Partnership between the Agency for Healthcare Research and Quality (AHRQ) and public and private statewide data organizations.

Available Data. Selected data elements from inpatient and outpatient discharge records, including patient demographic, clinical, disposition and diagnostic/procedural information; hospital identification (ID); facility charges; and other facility information.

Data Years Available. Varies by database: NIS 1988-2000; SID 1995-2000; KID 1997 and 2000; SASD 1995-2000; and SEDD in pilot phase. Future data years anticipated for all datasets and back years for SID and SASD.

Units Of Analysis. Patient (in states with encrypted patient identification), physician, market, and state.

Research Questions. Quality assessment, use and cost of hospital services, medical treatment variations, use of ambulatory surgery services, diffusion of medical technology, impact of health policy changes, access to care (inference), study of rare illness or procedures, small area variations, and care of special populations.

Strengths. Largest collection of all-payer, uniform, state-based inpatient and ambulatory surgery administrative data.

Limitations. Lacks clinical detail (e.g., stage of disease, vital statistics) and laboratory and pharmacy data. Ability to track patients across time and setting varies by state.

Access to Data. Access available to all users who sign and abide by the Data Use Agreement. Application kits available at HCUPnet, an on-line interactive query tool, allows access to data without purchase ( htm).

The Healthcare Cost and Utilization Project (HCUP) was established by the Agency for Healthcare Research and Quality (AHRQ) to provide multistate, administrative, population-based data that include information on both insured and uninsured patients in a uniform format. The data are designed for health services research to improve health care delivery and to build tools to aid researchers and other users of administrative data. The HCUP databases are managed by the Center for Organization and Delivery Studies within AHRQ. This article provides details about HCUP databases and supplies information on how to obtain the data resources made available through HCUP.

The core of HCUP is the statewide inpatient data, which would not exist without the efforts of individual states that collect the data and make them available to the project and researchers. (1) Some of the partnering states also contribute statewide outpatient data to the project. Finally, several derivative databases are created using the core statewide data.

The HCUP project also includes several products and tools, including the Clinical Classifications Software, the AHRQ Quality Indicators, and the Comorbidity Index, which will not be discussed in this article. However, information about all HCUP resources is available at our Web site. (2)

Administrative Data

HCUP data are a collection of administrative patient records. It is important to understand how and why administrative discharge abstract data are collected. When a patient is discharged from the hospital or hospital-based facility (e.g., an emergency department or ambulatory surgery center), a summary of that hospital stay is created in the form of a bill to the insurer, a discharge abstract, or both. This summary contains basic administrative information about the patient, such as age and gender, the patient's conditions, the procedures the patient received, and other features about the hospital stay (e.g., length and cost of the stay, who will pay for it, from where the patient was admitted, and to what setting the patient was discharged). The information captured in the discharge abstract is collected primarily for payment purposes.

Hospitals in many states provide hospital discharge summaries to the state government, a hospital association, or some other health information organization designated to collect this information. These hospital information systems were established for a variety of purposes, including public health monitoring, rate setting, and certificate-of-need determination. These data can be submitted and collected as part of a statewide voluntary data effort or under a state legislative mandate. For the past several decades, researchers have recognized the tremendous value of these collections of data and have taken advantage of them to study topics ranging from quality of health care, diffusion of technologies, and racial disparities to assessing the effects of market forces on hospitals. (3, 4) At last count, 44 states maintain inpatient discharge data systems. (5)

Although many of these data collections are modeled on uniform discharge summary and billing systems, each state customizes this information. Hence, certain core data elements tend to be represented in most states' data, but each state has unique ways of handling specific data elements. Even a data element as seemingly straightforward as gender has as many as five variations on coding across the states—M-F, 1-2, and 0-1; or 0-1-2 and 1-2-3 (with the third option being for "other"). In addition, states vary in the elements included, such as patient race/ethnicity, physician specialty, detailed charges, and payer information. (6)

The HCUP Databases

HCUP is built through a partnership between the state data organizations and AHRQ. The state data organizations (private and public) provide their unique statewide databases to HCUP. First, the data are subjected to internal consistency and edit checks to the extent possible without referral to outside sources, such as medical records. Second, similar data elements are re-coded into a uniform coding scheme. Finally, data elements unique to individual states are retained if they are useful for research purposes. These uniformly formatted datasets are the core of the HCUP databases.

The State Inpatient Database

At a minimum, each partnering state contributes its statewide database to the project. Over time, the number of states contributing to HCUP has grown. Twenty-nine states provided statewide discharge abstract data to the project for the 2000 data year, capturing approximately 80% of all U.S. hospital discharges. HCUP anticipates recruiting 4 additional states—for a total of 33 states—for the 2001 data year. In general, the SID contains the universe of each participating states' community hospital inpatient discharge records. The definition of a community hospital follows that of the American Hospital Association (AHA): nonfederal (e.g., not military, Veterans Administration, or Indian Health Service), short-term general and other specialty hospitals, including obstetrics-gynecology, ear-nose-throat, short-term rehabilitation, orthopedic, and pediatric. Excluded are long-term care hospitals, psychiatric hospitals, alcoholism/chemical dependence treatment facilities, and hospital units within institutions (such as prisons).

While most state government data organizations provide information on all acute care hospitals in their respective states, private data organizations are sometimes restricted to member hospitals and may not provide information on all hospitals in their state. The SID technical documentation provides tabular and narrative details about which hospitals' data are included. (7)

The state data comprise annual, state-specific files that share a common structure and common data elements. The HCUP process re-codes most of the data elements into a consistent format across all states. In addition to the core set of uniform data elements, the SID includes state-specific data elements that are available only for a few states, some of which lend themselves to uniform coding (e.g., race/ethnicity) and some of which do not (e.g. detailed charges). Table 1 provides a summary of the key analysis variables. The SID technical documentation also provides tabular and narrative details about the coding and availability of specific data elements for each participating state. (7)

There are up to three hospital identifiers on the SID. The HCUP-specific hospital identifier (a data element created by AHRQ) is always included on the SID and is available for community hospitals. This identifier allows aggregating observations to an institution but does not identify the individual hospital. Some data organizations allow the AHA hospital identifier to be included on the SID. This data element enables the SID to be linked to the AHA Annual Survey of Hospitals. Finally, some data organizations allow the original hospital identifier to be included on the SID. If available on the SID, this identifier is coded for all hospitals and may distinguish different units within a hospital.

The Nationwide Inpatient Sample

The NIS is the major derivative database created using the SID data. The NIS provides a research database for conducting national and regional studies of inpatient care delivered in the United States. The NIS is a sample of hospitals designed to approximate a 20% sample of all U.S. community hospitals. Hospitals are selected on the basis of a sampling frame that uses five strata: rural/urban location, number of beds, region, teaching status, and ownership. All discharges are retained for each sampled hospital. The NIS is a yearly database and includes roughly 1000 hospitals with about 7 million discharge records that are weighted to national estimates.

The NIS is built using the core data elements that are typically available across all the states, as described earlier. While the NIS provides the capacity for national and regional estimates, it loses the richness of many of the individual SIDs. However, the large size of the NIS makes it well suited to investigate care for rare diseases or special populations. The longitudinal nature of the NIS enables study of health policy changes and dissemination of new medical technology. Table 2 provides a descriptive overview of the NIS. The NIS technical documentation provides tabular and narrative details about the coding and availability of specific data elements for every year of the data. (8)

Linkages to other databases enhance the research capabilities of HCUP. The AHA annual survey includes a multitude of information on hospital characteristics, staffing, and resources. (9) In addition, since 1997, Medicare hospital identifiers are included in the survey. For states that allow release of the AHA identifier through HCUP, it is possible to link HCUP data to Medicare public release data, such as the Medicare cost reports. (10)

Several states allow release of hospital and patient-level geographic information. For these states, the county of the hospital or patient allows for linkage to the area Resource File, which is produced by the Bureau of Health Professions at the Health Resources and Services Administration. (11) This database contains more than 7000 variables at the county level, including information on health facilities, health professions, measures of revenue, health status, mortality and natality, economic activity, health training programs, and socioeconomic and environmental characteristics.

Some states allow release of the zip code for the hospital or patient. This information allows linkage to zip code-level files that are made available through third-party vendors on the basis of data from the Bureau of the Census. Zip code data include an array of socioeconomic data elements. (12)

The uniform format of the SID facilitates cross-state comparisons. In addition, the SID is well suited for research that requires complete enumeration of hospitals and discharges within market areas or states. Since the SID is a census of all inpatient discharges, no sampling or weighting is needed.

The Kids' Inpatient Database

The comprehensive hospital data in the SID provide the opportunity for other derivative databases beyond the NIS. The most recent database created through HCUP is the KID. This database is a sample of patients 18 years and younger discharged from hospitals from all participating states, including 10% of uncomplicated births and 80% of all other pediatric and adolescent hospital stays. Sampling strata developed for the NIS are also used for the KID, with the addition of a hospital-type stratifier that identifies pediatric hospitals. The definition of pediatric hospital was obtained from the National Association of Children's Hospitals and Related Institutions. The KID contains over 2500 hospitals and 1.9 million unweighted discharge records that can be weighted to generate national estimates of pediatric hospitalizations. The data elements are similar to the NIS with the addition of several data elements designed specifically for pediatric research: age in months (for all children 10 years and younger), a designation of whether the stay involved an uncomplicated birth in the hospital, and a data element that defines the type of hospital (not a children's hospital, children's general hospital, children's specialty hospital, or children's unit in a general hospital). Table 3 provides a descriptive overview of the KID database. The KID technical documentation provides tabular and narrative details about the coding and availability of specific data elements for the entire database. (13)

The State Outpatient Databases

In recent years, states have expanded beyond the inpatient setting and have established a variety of outpatient administrative databases, most notably covering ambulatory surgery and emergency departments. HCUP has followed this trend and has embarked on the collection and standardization of outpatient data, beginning with State Ambulatory Surgery Data (SASD) and State Emergency Department Data (SEDD).

All of the HCUP ambulatory surgery databases include records from hospital-affiliated ambulatory surgery sites. Some contain the universe of abstracts of ambulatory service encounters for that state, including records from both hospital-affiliated and freestanding surgery centers. Composition and completeness of data files vary from state to state. Some of the states include all procedures that occur in the ambulatory surgery setting. Other states limit submitted data to a predefined set of procedures of interest to the state. The data elements are very similar to those in the SID and include a core set of clinical and nonclinical information on all patients. In addition to the core set of uniform data elements common to all SASD, some states include other elements, such as type and duration of anesthesia. Table 1 provides a descriptive overview of the SASD databases. The SASD technical documentation provides tabular and narrative details about the coding and availability of specific data elements for each participating state. (14)

The SEDD is currently under way as a pilot activity. However, the partnering states have indicated that emergency department data are an important source of information for public health, community assessment and planning, nonfatal preventable conditions, and access to health care. Therefore, HCUP intends to complete the pilot phase of the SEDD development, including further exploration and analysis of emergency department data components.

Acquiring HCUP data

The HCUP central distributor was created by AHRQ in the spring of 1999 with HCUP partner organizations participating on a voluntary basis. The purpose of the HCUP central distributor is to prepare and distribute restricted-access, public-release versions of the SID, SASD, and KID for research outside of AHRQ on behalf of participating data organizations. Beginning in the summer of 2002, all HCUP data, including the NIS, will be available though the central distributor.

The SID and SASD that are available through the central distributor differ from the original raw file of each organization in two major ways: Data elements are restricted to meet public-release confidentiality requirements, and all data elements are coded in the uniform HCUP format. State data organizations determine the price of their state's HCUP central distributor files, and payment is made to AHRQ's contractor and fully reimbursed to the data organization in the state.

The HCUP central distributor files and documentation are available for purchase only after requesters complete a data-use agreement and application for the data. The data use agreement stipulates that the data not be used to identify individual patients, physicians, or institutions and may be reported only as aggregate statistics. AHRQ does not release specific dates (e.g., birth date, discharge date) or unencrypted person identifiers on the central distributor databases. Detailed documentation describing the available states and years of data, the file structure, the data elements, and pricing are available. (15) The central distributor may be contacted at HCUP Central Distributor, Social and Scientific Systems, Inc., 866-556-4287 (toll-free), or e-mail:

All of the HCUP partner states receive copies of the HCUP version of their data. Several of the partners, who do not distribute their HCUP data through the central distributor, release the data directly to researchers who comply with data organization requirements. The central distributor can provide information to requesters on how to contact those states.

The NIS databases are also available for purchase to users who sign a data-use agreement. Users must agree to use the database for research and statistical purposes only and to make no attempts to identify individual patients, physicians, or hospitals. Similar to the SID, the NIS excludes data elements that could directly or indirectly identify individuals.

NIS datasets may currently be purchased in sets of CD-ROMs with accompanying documentation from the National Technical Information Service, 800-553-6847. (16) By the summer of 2002, the NIS will be distributed through the central distributor.


A variety of health services research topics can be studied using the HCUP databases. These include, but are not limited to, use and cost of hospital services, quality assessment, medical treatment variations, diffusion of new medical technology, impact of health policy changes, access to care (inference), small-area variations, and care of special populations

The HCUP Web site provides an updated list of publications that have used HCUP data. This published research increases the knowledge base about health care delivery in the United States and illustrates the breadth of possible topics to investigate. (17)

HCUPnet is a Web-based query tool that allows users to gain access to HCUP-derived information that was previously only available in limited print form or to researchers who could use HCUP data directly. HCUPnet allows inquiry into the HCUP NIS and KID data as well as a subset of states that agreed to include their SIDs. HCUPnet can be used by researchers to validate their own estimates or to determine, before purchasing the data, whether the NIS or SID will provide sufficient sample sizes. (18) HCUPnet presents information on the number of discharges, length of stay, total hospital charges, and in-hospital mortality for diagnoses and procedures by various patient and hospital characteristics.

Information is provided by individual International Classification of Diseases, ninth revision, clinical modification codes; clinical classification software categories; diagnosis-related groups; and major diagnostic categories (groupings of diagnostic research groups). The Clinical Classifications Software is a tool developed and maintained at AHRQ for clustering patient diagnoses and procedures into a manageable number of clinically meaningful categories. (19)

HCUPnet can be used to examine such issues as the most expensive and the longest hospital stays, trends in specific diagnoses and procedures (from 1993 onward), and variations in the number of cases. It can also be used to analyze other characteristics of hospital stays by different age groups or by expected payer categories. HCUPnet can be accessed at htm.


HCUP is a public-private partnership whose goal is to build a multistate health care data system. The growing family of HCUP databases includes SID, SASD, NIS, KID, and pilot testing of SEDD. AHRQ has taken the lead in making the databases available to researchers. It is important to note that the HCUP project also includes a companion set of software tools and other resources that enhance the value of the data. HCUP would not be possible without the voluntary participation of statewide data collection activities by state data organizations, hospital associations, and other private organizations. Because the HCUP databases contain information on the vast majority of all hospital discharges in U.S. community hospitals, they are a unique and powerful resource for health care analysts and researchers.


The following state data organizations participate in HCUP: (1)

Arizona Department of Health Services; California Office of Statewide Health Planning & Development; Colorado Health & Hospital Association; CHIME, Inc. (Connecticut); Florida Agency for Health Care Administration; Georgia Hospital Association; Hawaii Health Information Corporation; Illinois Health Care Cost Containment Council; Iowa Hospital Association; Kansas Hospital Association; Kentucky Department for Public Health; Maine Health Data Organization; Maryland Health Services Cost Review Commission; Massachusetts Division of Health Care Finance and Policy; Michigan Health and Hospital Association; Missouri Hospital Industry Data Institute; New Jersey Department of Health and Senior Services; New York State Department of Health; North Carolina Department of Health and Human Services; Office for Oregon Health Plan Policy and Research; Oregon Association of Hospitals and Health Systems (for 1996 data and forward); Pennsylvania Health Care Cost Containment Council; South Carolina State Budget and Control Board; Tennessee Hospital Association; Utah Department of Health; Texas Health Care Information Council; Virginia Health Information; Washington State Department of Health; West Virginia Health Care Authority; Wisconsin Department of Health and Family Services


Claudia Steiner, MD, MPH, Center for Organization and Delivery Studies, Agency for Healthcare Research and Quality, 2101 E. Jefferson St, Suite 605, Rockville, MD 20852; telephone: 301-594-6821; fax: 301-594-2314; e-mail: