Effective Clinical Practice


The National Health Interview Survey: An Overview

Effective Clinical Practice, May/June 2002

Jane F. Gentleman, PhD, John R. Pleis, MS, Division of Health Interview Statistics, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, Md

Database. National Health Interview Survey (NHIS).

Sponsor. National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention.

Subjects. A representative sample from the U.S. civilian noninstitutionalized population. The interviewed sample for the year 2000 consisted of about 39,000 households, yielding data for over 100,000 persons in all 50 states.

Data Available. Family identification, health status, limitation of activity, injuries and poisonings, health care access and utilization, health insurance, sociodemographic information, income and assets, health conditions, immunizations (for children), health behaviors and lifestyle, HIV/AIDS knowledge and attitudes, and special topics (supplements) each year.

Years Available. 1957 to present.

Units of Analysis. Household; family; and person within family (one sample adult per family, one sample child within family if children are present).

Strengths. In-person interview (longer and more personal than telephone survey), large sample size, nationally representative sample, high response rates, rich multivariate data, stable core variables plus annual special topics. NHIS data are linked to the National Death Index and the Medicare Expenditure Panel Survey.

Limitations. Self-reported or proxy data (not objective measurements), less timely than telephone surveys, not designed for estimates below the regional geographic level, user access subject to stringent confidentiality restrictions, special software for complex survey data required for proper variance analysis, multiple files sometimes need to be merged.

Access to Data. Public-use microdata are available free from the NHIS Web site and on CD-ROM. Most in-house data are available under controlled conditions via NCHS's Research Data Center. Selected key indicators are released quarterly on the NHIS Web site through the Early Release Program.

The National Health Interview Survey (NHIS) is the principal source of information on the health of the civilian, noninstitutionalized population of the United States. It is a multipurpose, cross-sectional health survey conducted by the National Center for Health Statistics (NCHS), which is part of the U.S. Centers for Disease Control and Prevention (CDC). NHIS data are used to monitor the health status of the U.S. population, track progress toward achieving national health objectives, plan and evaluate health policies, and conduct public health and other research. The data are also useful to students, educators, journalists, and the public. The survey has been conducted continuously since it began in 1957. Recent surveys have obtained data from over 100,000 people annually in about 40,000 households in all 50 states. Each week's data are a representative sample, and data are released on an annual basis. In addition, selected health indicators based on less than a year's data are now released before the annual data via the new Early Release Program (see the NHIS Web site).(1) The U.S. Bureau of the Census is the data collection agent for the NHIS.

The NHIS questionnaire underwent major changes in 1997, and this paper focuses mainly on the NHIS since then. The redesigned questionnaire contains a core of questions that remains essentially unchanged from year to year, plus other questions added as supplements each year, as needed, to provide more in-depth information and/or information on new topics.

Major Domains

The major subjects covered by the NHIS core questions are health status, limitation of activity, injuries and poisonings, health care access and utilization, health insurance, sociodemographic information, income and assets, health conditions, immunizations (for children), health behaviors and lifestyle, and HIV/AIDS knowledge and attitudes.

In addition, since 1997, about 20 minutes of the interview (which typically lasts less than 1 hour) have been reserved on each year's NHIS for one or more supplements that address emerging health issues and/or have additional details on core topics. In 1998 and 1999, supplements collected data to track progress toward meeting some of the objectives of the Healthy People 2000 program.(2) Also in 1999, detailed questions about specific chronic health conditions supplemented the core survey. The 2000 NHIS contained a cancer control supplement sponsored by the National Cancer Institute of the National Institutes of Health (NIH) and by the National Center for Chronic Disease Prevention and Health Promotion of the CDC. The 2001 NHIS included two supplements, one with questions on children's mental health, sponsored by the National Institute of Mental Health of the NIH, and the other with questions used to track progress toward meeting some of the Healthy People 2010 program objectives.(3) The 2002 NHIS contains a supplement on alternative medicine, sponsored by NIH's Center for Complementary and Alternative Medicine, plus more questions for tracking Healthy People 2010 objectives.

The Supplement on Aging (SOA) to the 1984 NHIS became a longitudinal survey; the Longitudinal Study of Aging (LSOA)—conducted in 1986, 1988, and 1990—re-interviewed persons from the SOA who were 70 years of age or older in 1984. The LSOA was a collaborative effort of NCHS and the National Institute of Aging of the NIH. See Table 1 and the NCHS Web site (4) for further information about NHIS data.

The collection of information on health conditions has been a major objective of the NHIS since its inception. Before 1997, the NHIS covered 133 conditions using six condition lists (only one of which would be administered to members of any given household), and these were coded using International Classification of Diseases (ICD) codes in preparation for public release. With the redesign, the six lists have been reduced to one shorter list for adults and another shorter list for children, and the practice of ICD coding has been discontinued because it was felt that the use of these codes sometimes conferred an unrealistic degree of precision on the data.

Gathering the Data

NHIS interviews are conducted face-to-face. Households receive a letter in advance of the interview explaining the purpose of the survey, that participation is voluntary, and that confidentiality will be protected. The letter mentions that the median length of an interview is less than an hour. The first contact made by the interviewer is at the home, and the telephone is sometimes used if necessary to continue the interview at a later time.

To administer the NHIS questionnaire, the interviewer reads questions from the screen of a laptop computer and types in the responses. The three main components of the NHIS questionnaire are the Family Core, the Sample Adult Core, and the Sample Child Core. Once general information about the household and its occupants has been obtained, the NHIS Family Core questionnaire is the first component to be administered. A knowledgeable adult provides responses about the whole family, and other family members may be present and may assist in answering questions. The interview may be conducted in English or Spanish. Respondents are not paid to participate.

The Family Core questionnaire collects information on every family member residing in the household. Questions that can reasonably be answered by proxy are included in this section of the survey. Questions from the Sample Adult Core are administered to a randomly selected adult in each family, and a knowledgeable adult is asked questions from the Sample Child Core about a randomly selected child under 18.

The random selection of a single adult for the Sample Adult Core and a single child for the Sample Child Core was a new feature introduced in 1997. The use of this procedure has both advantages and drawbacks. Administering these questions to one adult shortens the total time spent responding to the survey (because only one person per family is asked these questions) and increases the accuracy of responses (because proxy responses are not accepted, except in very rare cases when caretakers may respond). However, if the computer program happens to select a Sample Adult who is not home at the time, the interviewer must return to the household a second time (or perhaps complete that part of the survey by telephone). This increases both the total time spent by the interviewer (and therefore increases the cost of the interview) and the chance that the interview will not be completed. (NHIS interviewers have strict "closeout" deadlines and must complete interviews within 2 weeks and 2 days of receiving the assignment.)

The interviewed sample for 2000 consisted of 38,632 households, which yielded 100,618 persons in 39,264 families (Table 2). The final household response rate was 88.9%; of the 11.1% noninterview rate, 7.3% was the result of household respondent refusal, and 3.8% was primarily the result of failure to locate an eligible respondent at home after repeated calls or insufficiently complete interviews.

Response rates typically decrease as the interview proceeds further into the survey. For example, Sample Adult response rates are lower than Family response rates because Sample Adult questions occur later in the NHIS and because the randomly selected adult may be a different person from the Family respondent and may not be available at the time the interview is initiated.

Analytic Issues

Complex Survey Data

The NHIS's probability sample has a multistage, stratified, cluster design and uses an area frame. In brief, the country is divided into geographic units about the size of a county. A number of these so-called primary sampling units are chosen at random within specified strata. Next, areas within these units are selected. The sampling strategy was designed to oversample black and Hispanic subpopulations to increase the precision of estimates for those subpopulations. The sample for the NHIS is redesigned every 10 years to cover the changing U.S. population and to meet new survey objectives. The new design is implemented about 5 years after each decennial census. The current design was implemented in 1995 and will be used through 2004.

To calculate estimates given the sampling design, observations should be weighted by using the appropriate weight variable (Table 3). If the variables to be analyzed come from more than one section of the survey and thus correspond to more than one set of weights, the weights from the file that contains the predominant analytic variables should be used. For example, in fitting a multiple regression predicting whether adults had a flu shot in the past year, some variables from the Family Core may be used as well as variables from the Sample Adult Core. Because the focus of the analysis is on the sample adult's characteristics, the weights from the Sample Adult Core should be used.

Variances of estimates are higher, confidence intervals wider, and significance harder to achieve when analyzing complex survey data than when analyzing a simple random sample. In the former case, the variance increases as if the sample size were decreased by a factor equal to the design effect, which is estimate-specific. Users must correct for this by using the appropriate statistical software (e.g., SUDAAN, Research Triangle Park, NC). Two design variables, STRATUM and PSU, are provided in the public use file for this purpose. These variables identify the stratum and primary sampling unit, respectively, for the complex survey design software.

Condition Data

Asking about health conditions has traditionally been done in two ways on the NHIS: by asking directly whether the respondent had been diagnosed with specific conditions, and by indirectly ascertaining that conditions exist by determining behaviors related to ill health, such as contacting a physician's office, staying in bed, and/or cutting down on normal activities, and asking what conditions caused these behaviors. The latter approach yields lower rates for a condition, because the condition will be mentioned by the respondent only if it triggered contact with the health care system or a reduction in the individual's ability to function. The resulting rates of conditions elicited by the two types of questioning have their respective valid uses, but analysts should be careful to use each rate for the appropriate purpose. Published examples of prevalence rates derived from direct questions about the presence of chronic conditions include Collins as well as Benson, Adams, and Hing and their colleagues. (5-8)


The NHIS's Family Core serves as a sampling frame for the Medical Expenditure Panel Survey (MEPS).(9) Half of the NHIS sample is reserved each year for follow-up by MEPS, which is a survey on financing and utilization of medical care. In the past, NHIS has also served as a sampling frame for NCHS's National Survey of Family Growth(10) (Table 1). Periodically, the NHIS is linked by NCHS staff to the National Death Index (NDI). Thus, data are eventually obtained for NHIS respondents indicating when they died and what the underlying and contributing causes of death were.

Data files that can be used to produce linked NHIS-MEPS data (for the 1995 NHIS linked to the 1996 MEPS) and linked NHIS-NDI data (for NHIS years 1986 to 1994 linked to National Death Index years through 1997) are presently available from the NCHS Web site.(4)

NCHS's National Health and Nutrition Examination Survey (NHANES) uses some of the same primary sampling units as the NHIS. Also, some of the NHANES's household interview questions are the same as or similar to those of the NHIS. However, these surveys cannot be linked because it is almost certain that none of the same people would be on both surveys.

Selected Publications

In-house analysis of a statistics agency's survey data can be highly beneficial to the analysis itself, to the data, and to the statistics agency.(11) Numerous articles and reports have been authored by NCHS staff members featuring analyses of NHIS data, including summary reports on health status, on the incidence of and resulting health care utilization and disability from acute and chronic health conditions for children and adults according to a variety of socioeconomic indicators, (5-8), (12-16) alcohol use, (17) and poisonings and nonfatal injuries.(18) Other articles that nicely illustrate the use of NHIS data include studies of risk factors for repeated ear infections in children (19); the relationship between health insurance coverage and cancer screening, such as mammograms and Papanicolaou (Pap) tests (20); the prevalence of chronic conditions in the elderly (21); and characteristics of people who receive flu vaccinations.(22)


In-person interviews, such as the NHIS, have certain advantages over telephone surveys. In-person questionnaires can typically be longer, thus providing a rich array of multivariate data, and they can be more personal. Also, in-person surveys usually achieve significantly higher response rates than do telephone surveys. Other strengths of the NHIS are its large sample size, the fact that it is a nationally and regionally representative sample, its useful combination of stable core data and topical and/or in-depth annual supplements, and the ease of access to and cost-free availability of the data on the Internet. Also, NCHS's vigilance in adhering to its legal mandate to preserve confidentiality is reassuring to respondents, thus increasing the accuracy of responses. In addition, the centralized control by NCHS of the NHIS yields more consistent data than, say, a survey in which states collect and then pool their own data.


Self-reported or proxy data such as those from the NHIS are not objective measurements like those obtained from the physical examinations conducted as part of NHANES (i.e., participants in NHANES undergo an extensive physical examination in addition to being interviewed). Other limitations of the NHIS include high costs to NCHS and to supplement sponsors of conducting a large, complex, in-person survey and less timely release of data than with telephone surveys. Some of the strengths of the NHIS could also be considered limitations; for example, as a national survey, the NHIS is not designed to produce state or small-area estimates (although it can produce estimates for the four regions). Another example is the limitation on data release required by adherence to the legal mandate to preserve confidentiality. As noted, analysis of such surveys as the NHIS that have complex sample designs requires the use of specialized software (such as SUDAAN) to produce valid variance estimates. A relatively minor limitation is that because NHIS data are complex in structure, they are released as an assortment of files that sometimes need to be merged, depending on the variables needed for the particular analysis.

How To Obtain the Data

A contact point for all NCHS data requests is NCHS's Data Request Line at 301-458-4636. This telephone number may be used to obtain information about NCHS publications and the availability of NCHS data, including data tapes, CD-ROMs, and Internet files. The telephone number for more detailed requests specific to NCHS's Division of Health Interview Statistics (which conducts the NHIS as well as other surveys) is 301-458-4901.

The NCHS Web site4 is the electronic gateway for obtaining information about NCHS, including the NHIS. Through this Web site, data users can access NHIS public-use data, estimates, and reports and obtain extensive NHIS documentation, free of charge. Data users can also join an electronic DHIS mailing list (by accessing the CDC's mailing list (23) and selecting "National Health Interview Survey (NHIS) researchers"), which is a convenient way to be notified about new data releases, receive announcements and updates, and ask for information.

NCHS has a legal requirement to ensure that no individual can be identified by using publicly released survey data. Thus, not all details in the survey data are publicly released. In response to the need for more detailed data than can be released, NCHS has created a research data center that provides analysts with controlled access to much of the in-house microdata.

NCHS provides two methods of special access via the research data center—on-site access and remote access. In both cases, potential users submit a detailed proposal to the Research Data Center, indicating the purpose of analysis to be undertaken, specifying the software, methods, and data to be used, and describing the output files. If the proposed analysis is generally sound and can be undertaken without breaching confidentiality, it is approved and the analyst arranges to visit the data center or to submit a computer program to be run at NCHS. There is a fee for use of this center. For more details, see the Research Data Center Web site.(24)


The authors thank Peter Meyer of the National Center for Health Statistics for help in revising the paper.


Jane F. Gentleman, Division of Health Interview Statistics, National Center for Health Statistics, 6525 Belcrest Road, Rm 850, Hyattsville, MD, 20782; telephone: 301-458-4233; fax: 301-458-4035; e-mail: