Effective Clinical Practice


Veterans Health Administration Inpatient and Outpatient Care Data: An Overview

Effective Clinical Practice, May/June 2002

Patricia A. Murphy, MS, VA Information Resource Center (VIReC), Health Services Research and Development Service, Department of Veterans Affairs, Edward Hines, Jr. Hospital, Hines, Ill; Diane C. Cowper, MA, Rehabilitation Outcomes Research Center, Health Services Research and Development Service, Rehabilitation Research and Development Service, North Florida/South Georgia Veterans Health System, Gainesville, Fla; VA Information Resource Center (VIReC), Health Services Research and Development Service, Department of Veterans Affairs, Edward Hines, Jr. Hospital, Hines, Ill; Gregg Seppala, MA, National Data Systems, Office of Information, Department of Veterans Affairs, Silver Spring, Md; Kevin T. Stroupe, PhD, VA Information Resource Center (VIReC), Health Services Research and Development Service, Midwest Center for Health Services and Policy Research, VA Cooperative Studies Program Coordinating Center, Department of Veterans Affairs, Edward Hines, Jr. Hospital, Hines, Ill; Department of Medicine, Northwestern University, Chicago, Ill; Denise M. Hynes, PhD, VA Information Resource Center (VIReC), Midwest Center for Health Services and Policy Research, Health Services Research and Development Service, VA Cooperative Studies Program Coordinating Center, Department of Veterans Affairs, Edward Hines, Jr. Hospital, Hines, Ill, Department of Medicine, Loyola University Chicago, Maywood, Ill.

Database. Annual inpatient and outpatient care files of the Veterans Health Administration (VHA).

Sponsor. VHA, United States Department of Veterans Affairs (VA).

Patients. All patients having a health care episode at a VA medical center, hospital, or clinic in a given fiscal year.

Data Available. Patient identification (masked) and demographic information, selected patient status information (e.g., exposure to Agent Orange), date and time of health service, type of provider, location of service, purpose of visit or reason for admission (diagnostic codes), and service provided (surgical and other procedure codes).

Years Available. Annual datasets begining in 1980, with some changes made each year; International Classification of Diseases, ninth revision, clinical modification codes for diagnoses beginning in fiscal year 1980; and current procedure terminology codes for all ambulatory encounters beginning in fiscal year 1997.

Units of Analysis. Patient-specific health care episodes. Data can be searched by any variable (e.g., patient, health care facility, year, clinic, hospital bed-section, diagnosis, procedure).

Possible Research Questions. Health care utilization and outcomes, geographic patterns, relationships among health factors/patient characteristics, facility comparisons.

Strengths. Data are fairly comprehensive (all patients, inpatient and outpatient care) and longitudinal; a unique patient identifier can be used to link data across years.

Limitations. Datasets do not include specific pharmacy products dispensed. Specific laboratory and radiology procedures performed and their results are also not reported. The quality of the data across variables is inconsistent, and not all years have identical variables. Users must understand some coding issues to properly interpret results of analyses.

Access to Data. Granted to VA employees with supervisor certification and institutional review board approval for any research use. Access is granted to non--VA employees through the Freedom of Information Act or to researchers collaborating with a VA employee, after receiving administrative and institutional review board approval.

The Department of Veterans Affairs (VA), through the Veterans Health Administration (VHA), operates the largest centrally directed health care system in the United States.(1) The VHA has a long history of computerized clinical information systems.(2) At present, all episodes of care provided under VA auspices in VA hospitals, nursing homes, domiciles, and outpatient clinics are reported by facility staff using nationally distributed software. Data are subsequently transmitted to a central repository at the VA Office of Information, Austin Automation Center, where specific clinical data elements are maintained in various SAS datasets (SAS Institute, Cary, NC). Among the most commonly used datasets are the VHA annual medical SAS datasets, which include information on the following: VA inpatient (four datasets) and outpatient (two datasets) care, VHA extended care (four datasets), VA inpatient short stay (<24 hours) observation care (four data sets), and health care provided for veterans outside the VA with VA funding (four data sets). This article focuses on the inpatient and outpatient care datasets, which are basic tools of VA health services researchers (Table 1). These datasets have historically been referred to as the patient treatment files and outpatient care files, respectively.


The VHA began collecting electronic outpatient health care data for the VHA annual medical SAS datasets in 1980.(3) Since then, local data have been combined at the national level, and new variables and datasets have been added.(4) The inpatient datasets were created later, but include annual data back to 1970. The number and type of variables included in the national datasets have changed through the years; therefore, while many data elements are consistent over time, some data elements are not available for every year. New datasets have been added as well, so not all datasets exist for every year.(5)

Comprehensive research user guides for the inpatient and outpatient care datasets were first developed in 1994 as part of a service-directed research project supported by the VA Health Services Research and Development (HSR&D) service.(6,7) The HSR&D service continued and expanded support for research user documentation and other data information products with the funding of the VA Information Resource Center (VIReC) in 1998. The VHA Office of Information, which maintains the SAS datasets, began to publish technical documentation via the World Wide Web and began the data tracking information page of its intranet Web site in 2001.


There are six datasets for VHA-provided inpatient and outpatient care: inpatient main, bed-section, procedure, surgery, outpatient visit, and event datasets. Each dataset contains data from 1 fiscal year (FY), from October 1 through September 30. (e.g., FY 2002 data covers October 1, 2001 through September 30, 2002). As shown in Table 2, the six datasets have different years of inception and use different definitions of a single data record. However, in every dataset (as in the other medical SAS datasets), each patient has a unique identifier that is an encryption of the patient's Social Security number. This unique number can be used to identify a patient across fiscal years and datasets. Access to unencrypted identifiers is available only to authorized users whose research requires Social Security numbers.

Variables common to both inpatient and outpatient data files include demographics (age, sex, race, birth date, marital status, city, county, and state of residence); period of military service; and selected special characteristics, including the patient's spinal cord injury status, whether the patient was a prisoner of war, and whether the patient was exposed to radiation or Agent Orange. Also included are the patient's category of eligibility for VA medical care and the extent of a service-connected disability.

Inpatient Care Data

Inpatient variables describing each hospital admission include date, time, facility, and the primary diagnosis at admission. Discharge data include date, time, destination (e.g., home, hospice, community nursing home), type of discharge (e.g., regular, transfer to another hospital, death), and length of stay. Care provided during the admission is reported in terms of : 1) diagnoses (International Classification of Diseases, ninth revision, clinical modification [ICD-9-CM] codes and diagnosis-related group [DRG] codes); 2) procedures, dialysis treatments, surgeries performed (dates and times, specialty of care providers, and current procedure terminology [CPT]) codes; and 3) specialties of care providers (using a VHA code list of 80 categories). Patients who died during hospitalization are included in the dataset, and the date of in-hospital death is included in the patient record.

Outpatient Care Data

Each outpatient data record represents one date of service for one outpatient, and includes a facility identifier, date and time of visit, and the type of clinic location where care was provided. Visits on a single day to multiple clinics, laboratories, and treatment programs are captured. Outpatient care is reported in terms of diagnoses (ICD-9-CM codes) and procedures (CPT codes), including dates and times. The type of provider for each service is reported using a VHA list of more than 500 categories (these may be changed in the future to agree with provider-type codes used by the Centers for Medicare and Medicaid Services).(8)

Possible Units of Analysis

The inpatient and outpatient care datasets are patient-specific and can be searched by any variable. As noted, the care of a given patient can be analyzed across time by using the patient's unique identifier to link datasets from different years. Longitudinal analyses can also be done by facility (e.g., number of admissions or outpatient care visits). Data allow for studies of patient groups (demographic characteristics, geographic areas), disease groups and comorbid conditions (ICD-9-CM and DRG codes), use of procedures (CPT codes), and hospitalization outcomes (inpatient mortality, readmission) as well as studies of facilities (facility characteristics, geographic area) and specialty of the care provider.

Linked Databases

The unique identifier can be used to link a patient's data across the various medical SAS datasets (e.g., inpatient care, extended care, observation stays, outpatient care). If authorized, a researcher can use the Social Security number to link a patient's records in the annual medical SAS datasets with individual-level data in other databases. Researchers have successfully linked the following inpatient and outpatient data with other VA data: the Veterans Benefits Administration's Beneficiary Identification and Record Locator System(9); detailed data from VA-based, randomized clinical trials(10, 11); pharmacy, laboratory, and other data from local VA facilities(12, 13); and national VA surgical data from a special quality-care program.(14) Linkage of VA inpatient and outpatient care data with non--VA mortality information, such as the National Death Index, has also been used widely.(15, 16) Linkage of VA patient care data with non--VA health care use datasets, such as Medicare claims data, has also been shown in previous research.(17, 18)

Previous Work

Overviews of VA databases,(19) inpatient and outpatient datasets,(4, 20-23) and studies of the reliability and validity of VA data have been published.(24-26) Research reports have appeared in numerous medical journals; two are highlighted here as examples. In an analysis of VA inpatient and outpatient services used from 1991 to 1995 by patients with one of eight chronic diseases (chronic obstructive pulmonary disease, pneumonia, congestive heart failure, angina, diabetes, chronic renal failure, bipolar disorder, and major depression), Ashton and colleagues(27) found substantial geographic variation in the use of inpatient hospital services for all eight patient cohorts and all years. Moos and Mertens(28) compared the diagnoses, comorbid psychiatric and medical conditions, and past and current treatment received by late--middle-aged and older patients with affective disorder treated in mental health and medical service settings to identify predictors of length of inpatient stay.

Strengths and Limitations

As a research tool, the VHA annual medical SAS datasets are a centralized repository for fairly comprehensive, patient-level demographic, clinical, and inpatient and outpatient data on health care utilization for all patients receiving care at the VA. Many years of data and use of unique patient and facility identifiers make longitudinal studies feasible. Data are provided as SAS datasets, a format widely used in research.

Datasets of annual inpatient and outpatient care have some limitations for research use. For longitudinal analyses, some datasets or variables of interest do not exist for every year. Not all variables in these datasets are of equal quality, particularly regarding the number of records with missing data. For example, for FY 2000, patient race was missing from 4.5% of records in the main inpatient dataset and 24.9% of records in the outpatient visit dataset.(29)

Researchers must also be aware of specific coding issues in order to define patient cohorts and interpret results properly. For example, some patients with spinal cord injury or geriatric patients with multiple health conditions are coded as inpatients with exceptionally long lengths of stay (1 year or more in some cases).(29) For certain analyses of hospital care, it may be more appropriate to exclude these patients and to consider them as nursing home or extended-care patients. For studies of disease prevalence, inpatient datasets may underestimate some rates because previous diagnoses may not be represented in the data of a patient's admission in the current year or outpatient care. A 1998 study also found that, in comparison with written medical records, the electronic dataset included more diagnoses per discharge.(26) In a more recent study, VA procedure-coding error rates at four sites were found to be similar to sites in the private sector.(30) Last, the inpatient and outpatient datasets do not include all information about veterans' health care that might be of interest to researchers. Although dates of service for pharmacy, radiology, and laboratory services are included, datasets do not contain specific pharmacy products dispensed nor specific laboratory and radiology tests performed. However, data on medications dispensed and medication costs are available in local clinical databases and in other national VA databases, which can be linked with inpatient and outpatient datasets by using the common unique patient identifier. Information about health conditions of VA patients diagnosed or treated outside the VA is not included in these datasets.

How To Access the Data

VA Researchers

VA researchers are granted access to VHA databases through a process requiring certification that the use of a specific dataset is consistent with requirements of their work. As a condition of VA employment, a researcher must sign a commitment to maintain the security and confidentiality of all VA information systems. Authorization to use the national datasets discussed in this paper requires submission of VA Form 9957, "Time Sharing Request Form," and approval by supervisors and data managers. Information on obtaining and completing this form is available to VA employees by calling the Austin Automation Center help desk (Table 3). The administrative approval process usually takes a few weeks. Research use of VHA data also requires review and approval by the researcher's local institutional review board (IRB). IRB submission requirements are available from the local facility IRB office. The length of time for the IRB approval process varies by facility.

Non-VA Researchers

Researchers not employed by the VA have two options for access to the patient care datasets. The first is to request data under provisions of the Freedom of Information Act. Requests are considered on a case-by-case basis, with review by the data steward and possibly the VHA Privacy Officer. Further information can be requested from the World Wide Web (

A second option for non--VA researchers is to establish research collaborations with VA researchers and, with appropriate administrative arrangements, to become employed on a "without-compensation" basis. In this capacity, a researcher would then be eligible to request access to the inpatient and outpatient medical SAS datasets as a VA employee.

How To Get Help Using the Data

Selected sources of assistance in using inpatient and outpatient data are shown with contact information in Table 3. The Austin Automation Center help desk provides assistance with remote connection to the mainframe computer and using SAS software to access data. Another source of assistance for researchers is VIReC, which is funded by the VHA HSR&D service. The mission of VIReC is to serve as a resource and referral center for researchers using VA data. In addition to detailed user guides for the inpatient and outpatient datasets,(31, 32) VIReC issues a periodic monograph series, VIReC Insights,(33-40) which covers data sources and topics published on VIReC's Web site.

The VHA HSR&D service also supports the Health Economics Resource Center (HERC), which provides information and advice for researchers engaged in economics research and cost-effectiveness studies. Researchers planning to use the inpatient and outpatient datasets to measure health care utilization for cost analyses are encouraged to consult with the HERC staff.

The VHA supports many other programs that assist investigators in various research applications using VA databases. Information about these programs can be found at the VA Research and Development Services Web site,


Grant Support

This research was supported in part by VA Health Services Research and Development Service grants SDR-9804 and ECI 20-016 and by the Veterans Health Administration-funded HCFA/VA Data Merge Initiative (XVA-69-001).


Denise M. Hynes, PhD, Edward Hines, Jr. VA Hospital, VA Information Resource Center (VIReC), Fifth and Roosevelt Roads, Building 1, Room B 305, PO Box 5000, (151-V), Hines, IL 60141; telephone: 708-202-2413; fax: 708-202-2415; e-mail: