Data Reliability

Reliability Index

HAC’s Rural Data Central presents a reliability index to help users determine the statistical accuracy of data estimates presented from the Census Bureau’s American Community Survey (ACS). The reliability index utilizes a coefficient of variation (CV) calculated for each county- and congressional-level estimate presented from ACS data. The CV is a ratio of an estimate’s standard error to the actual estimate itself.

[CV = SE/X̂ * 100]¹

The standard error (SE) of an estimate, as the Census Bureau notes, "provides a quantitative measure of the extent to which an estimate derived from the sample survey can be expected to deviate from the population value." Stated another way, the SE is a measure of a presented estimate’s precision. Generally, a smaller SE indicates a more precise estimate.

Dividing the standard error by the estimate generates a coefficient of variation (CV), which in turn creates a metric to assess an estimate’s precision or error as a proportion of the estimate. The relative size of estimates is an important reason for using a CV to help determine statistical reliability. Similar to the presentation of standard error, the larger the CV ratio, the less accurate the estimate.

HAC’s Rural Data Central reliability index is presented below:

Green =
High Reliability Estimate – Coefficient of Variation (CV) less than 15 percent

Yellow =
Reliable Estimate – Coefficient of Variation (CV) between 15 percent and 29.9 percent

Red =
Low Reliability Estimate - Coefficient of Variation (CV) 30 percent or higher. Use caution when referring to or presenting this estimate. Please consult the Census Bureau for more information and guidance.

Gray =
Reliability Estimate Not Available

The choice of an estimate’s reliability, accuracy, or precision threshold is not standard and agreed upon. HAC considered three general approaches in developing a reliability indicator – all of which use a coefficient of variation to evaluate ACS estimates. An ESRI white paper on the American Community Survey suggests using CV threshold standards of 0-12 percent indicating reliable, 12 percent to less than 40 percent medium reliable, and 40 percent and higher low reliability. The Washington State Office of Financial Management produced an ACS User Guide that employed the following thresholds: 0 to less than 15 percent as good, 15 percent to less than 30 percent as fair and 30 percent or more as caution.⁴ The National Research Council notes a CV between 10 and 12 percent is “a reasonable estimate of precision.”⁵

Derived Measures ⁶

Mean. This measure represents an arithmetic average of a set of values. It is derived by dividing the sum (or aggregate) of a group of numerical items by the total number of items in that group. For example, mean household earnings is obtained by dividing the aggregate of all earnings reported by individuals with earnings in households by the total number of households with earnings.

Median. This measure represents the middle value (if n is odd) or the average of the two middle values (if n is even) in an ordered list of n data values. The median divides the total frequency distribution into two equal parts: one-half of the cases falling below the median and one half above the median. The median is computed on the basis of the distribution as tabulated, which is sometimes more detailed than the distribution shown in specific census publications and other data products.

Interpolation. Interpolation frequently is used in calculating medians or quartiles based on interval data and in approximating standard errors from tables. Linear interpolation is used to estimate values of a function between two known values.

Percentage. This measure is calculated by taking the number of items in a group possessing a characteristic of interest and dividing by the total number of items in that group and then multiplying by 100.

Rate. This is a measure of occurrences in a given period of time divided by the possible number of occurrences during that period. Rates are sometimes presented as percentages.

Data Limitations

Census 2020 Overcount/Undercount ⁷

The Census Bureau conducts a Post-Enumeration Survey (PES) to measure the accuracy of the census by independently surveying a sample of the population. The PES’s sample size means that estimates cannot be made below the state level; the Census Bureau calculates them for states and state equivalents and for the four Census regions. The 2020 PES considered the accuracy of the census’s counts of both people and housing units.

People
The PES found that the populations of 37 states (or state equivalents) did not have estimated statistically significant undercounts or overcounts. In 15 states (or state equivalents) the population was either undercounted or overcounted. Undercounts were identified in Arkansas, Florida, Illinois, Mississippi, Tennessee, and Texas. Overcounts were estimated to have occurred in Delaware, Hawaii, Massachusetts, Minnesota, New York, Ohio, Puerto Rico, Rhode Island, and Utah.

Two regions had statistically significant differences. There was an estimated undercount of 1.85 percent in the South region and an estimated overcount of 1.71 percent in the Northeast region.

Historically, the decennial census has undercounted some population groups, and the 2020 count was no exception. The PES determined that the 2020 census undercounted the Black or African American population (3.30 percent), the American Indian or Alaska Native population living on a reservation (5.64 percent), the Hispanic or Latino population (4.99 percent), and people who reported being of some other race. It overcounted the Non-Hispanic White population (1.64 percent) and the Asian population (2.62 percent). It also undercounted children 0 to 17 years old (0.84 percent), particularly young children 0 to 4 years old (0.72 percent). Young children are persistently undercounted in the decennial census.

Housing Units
The PES estimated there was no net coverage error in the 2020 census’s count of U.S. housing units. It did estimate that 3.1 percent of homes, or 4.4 million units, were enumerated erroneously – that is, these units were either overcounted or undercounted – but the overcounts and undercounts balanced each other out, resulting in no net error.

Coverage errors varied by region and by state. The PES estimated a statistically significant overcount of housing units in the Northeast Region. Census counts of housing units in the Midwest, South and West did not have an estimated statistically significant undercount or overcount. Two states had a statistically significant undercount of housing units: South Carolina and Vermont. Seven states had a statistically significant overcount: Alabama, Massachusetts, New Jersey, New York, Ohio, Rhode Island, and Utah.

A statistically significant undercount was also identified in hard-to-count geographic places, which are often rural. The PES reported an undercount of 4.2 percent of homes in two types of enumeration areas: the Update Leave type and the Update Enumerate type. Update Leave areas are those where the majority of households may not receive mail at their home’s physical location, such as small towns where mail is only delivered to post office boxes or areas recently affected by natural disasters. About 6.8 million households in the U.S. and Puerto Rico live in these places. In the Update Enumerate locations, census takers visited approximately 6,500 households and conducted the census in person. These places are primarily in remote parts of northern Maine and southeast Alaska, where it is difficult to deliver mail and the internet is not readily available.

Errors varied by tenure as well. The estimated net coverage error rate of owner-occupied housing units was not statistically significant. Rented homes had a statistically significant overcount of 0.85 percent. For vacant housing units, there was a statistically significant undercount of 2.6 percent.

Statistically significant errors occurred in measuring two types of housing structures commonly found in rural places. Small multi-unit buildings (those with two to nine units) had a statistically significant overcount of 5.1 percent. “Mobile homes and other units” had a statistically significant undercount of 4.3 percent.

The PES found varying error rates for coverage based on race or Hispanic origin of the head of household. Coverage estimates by race alone or in combination and Hispanic origin of the householder Housing units with a householder who was Black or African American (0.87 percent), Asian (1.37 percent), Native Hawaiian or Other Pacific Islander (2.64 percent), or Some Other Race (0.58 percent) had statistically significant overcounts. The estimated coverage rate of housing units with a householder who was White, American Indian or Alaska Native, or Hispanic or Latino were not statistically different from zero.

Margin of Error in the American Community Survey ⁹

Data from the American Community Survey (ACS) is based on a sample and is subject to sampling variability. Sampling error is the uncertainty associated with an estimate that is based on data gathered from a sample of the population rather than the full population. The American Community Survey (ACS) provides users with measures of sampling error along with each published estimate. To accomplish this, all published ACS estimates are accompanied either by 90 percent margins of error or confidence intervals, both based on ACS direct variance estimates.

The margin of error is most often indicated by plus and minus signs followed by a number value. This value represents the range within which one can assert the population value will be found, according to varying levels of confidence. The margin of error gives nuance to the best guess point estimates by providing a more accurate range of data values. Adding and subtracting the margin of error to a point estimate creates the range, or the confidence interval.

Point estimates use statistical techniques, such as regression models, to infer from sample data what the actual value of the characteristic is in the population. These point estimates can be thought of as a best guess of the population characteristic value, given the available sample survey data information. As with any guess or prediction, estimates are only as reliable as the information they are based on. Estimates such as those presented in the ACS can vary in precision, especially in relationship to the overall sample size. A smaller number of sample observations leads to less accurate estimates, while a larger number of sample observations often provide more accurate estimates.

For more information of accuracy of data from the American Community Survey please consult the Census Bureau publication, ACS Design and Methodology: http://www.census.gov/acs/www/methodology/methodology_main/.

NOTES

¹ In the formula the standard error notation is SE, and the estimate is X̂.

² Champaign County Regional Commission, “Understanding and Using the U.S. Census Bureau’s American Community Survey,” accessed August 26, 2022, https://ccrpc.org/wp-content/uploads/2015/02/american-community-survey-guide.pdf.

³ “The American Community Survey” (Redlands, California: ESRI, 2018), http://www.esri.com/library/whitepapers/pdfs/the-american-community-survey.pdf.

⁴ Erica Gardner, Thomas Kimpel, and Yi Zhao, “American Community Survey User Guide: ACS Publication No. 1” (rev. 2015), https://ofm.wa.gov/sites/default/files/public/legacy/pop/acs/ofm_acs_user_guide.pdf.

⁵ “Using the American Community Survey: Benefits and Challenges” (Washington, DC: National Academies Press, 2007), https://nap.nationalacademies.org/catalog/11901/using-the-american-community-survey-benefits-and-challenges.

⁶ “American Community Survey and Puerto Rico Community Survey: 2020 Subject Definitions,” U.S. Census Bureau, accessed August 3, 2022, https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2020_ACSSubjectDefinitions.pdf.

⁷ “Post-Enumeration Surveys,” U.S. Census Bureau, August 17, 2022, https://www.census.gov/programs-surveys/decennial-census/about/coverage-measurement/pes.html.

⁸ “2020 Census: Update Leave and Update Enumerate,” U.S. Census Bureau, June 11, 2020, https://www.census.gov/newsroom/press-kits/2020/update-leave.html.

⁹ “American Community Survey Design and Methodology,” U.S. Census Bureau, 2014, https://www.census.gov/programs-surveys/acs/methodology/design-and-methodology.html.

Data Reliability

Reliability Index

Derived Measures ⁶

Data Limitations

Census 2020 Overcount/Undercount ⁷

Margin of Error in the American Community Survey ⁹

About The Data

Support

Data Reliability

Reliability Index

Derived Measures 6

Data Limitations

Census 2020 Overcount/Undercount 7

Margin of Error in the American Community Survey 9

Derived Measures ⁶

Census 2020 Overcount/Undercount ⁷

Margin of Error in the American Community Survey ⁹