Enhanced Demographic Data

Data Team

December 07, 2023 16:57

INTRODUCTION

IMPLAN’s expanded demographic data comes from the United States Census Bureau as described in the next section. They consist of estimates for the splits on population data related to factors such as age, sex, race, ethnicity, language spoken at home, and educational attainment. They also include estimates on Housing Occupancy and Vacancy, and Labor Force Participation Rates and Unemployment Rates by Age and Race. Note that currently this expanded demographic data is only for the 2020-2022 IMPLAN Data Years.

DATA SOURCES

The source of IMPLAN’s first Demographic data expansion is the Census Bureau’s American Community Survey 5 year estimates. Data are available at the National, State, County, and Zip-code level. Source data can be found at data.census.gov. Data is current relative to the IMPLAN data year, beginning with the year 2020.

CATEGORIES OF DEMOGRAPHIC EXPANSION DATA

The following categories of data with examples represent the totality of our expanded demographics data release. This is in addition to the standard offering for demographic data, which includes, for each region and year, the number of households by 9 income categories, land area, population, population density, and Shannon-Weaver Index (S-W Index).

1. Age & Sex:

36 Categories
Examples:
- Males Ages 0-4
- Females Ages 0-4
- Males Ages 5-9
- Females Ages 5-9
- Males ages greater than 85
- Females ages greater than 85

2. Race & Ethnicity:

16 Categories
Examples:
- White alone, Hispanic or Latino in Origin
- Black or African American alone, not Hispanic or Latino in Origin

3. Language Spoken at Home:

9 Categories
Examples:
- English Only
- Spanish, Speaks English Less than Very Well

4. Population Ages 18-24, Achieved Education:

4 Categories
Examples:
- Less than High School Graduate
- High School Graduate (Includes equivalency)
- Some College or associate’s degree
- Bachelor’s degree or higher

5. Population 25 years and over, Achieved Education:

7 Categories
Examples:
- Less than 9th grade
- 9th to 12th grade, no diploma
- Some College, no degree
- Graduate or professional degree

6. Housing – Occupancy and Vacancy Status

8 Categories
Examples:
- Occupied
- Vacant
- Vacant, Rented, not occupied
- Vacant, For sale only

7. Labor Force Participation Rate & Unemployment Rate – Age

10 Categories
Examples:
- 16 to 19 years old
- 20 to 24 years old
- 25 to 29 years old

8. Labor Force Participation Rate & Unemployment Rate – Race

7 Categories
Examples:
- White Alone
- Black or African American Alone
- American Indian and Alaska Native Alone

DATA PRODUCTION

AGE & SEX

The data behind IMPLAN’s Age and Sex breakouts for population come from the American Community Survey (ACS) 5-Year estimates table S0101. While data is available in a percentage format, IMPLAN uses direct counts to recalculate its own percentages. These percentages are then applied to IMPLAN’s existing population data within each geography in order to calculate an estimate of the raw number of individuals belonging to each Age and Sex category.

Finally, we control our estimates and employ the RAS method to ensure that the final data is geographically balanced.

RACE & ETHNICITY

The data behind IMPLAN’s Race and Ethnicity breakouts for population come from the American Community Survey 5-Year estimates table B03002. The process for production of the data on Race and Ethnicity is the same as for producing the data on Age and Sex.

LANGUAGE SPOKEN AT HOME

The data behind IMPLAN’s Language Spoken at Home breakouts for population come from the American Community Survey 5-Year estimates table S1601. The process for production starts by using the already-created Age and Sex data to create estimates of the population in each geography for those aged 5 and up. This is because the ACS Language Spoken at Home is only applicable to those aged 5 and up. While we recognize that language develops earlier than the age of 5, this is an ACS-based data limitation that we must abide by. A secondary limitation of this data is that it does not include considerations for those who speak more than English and one other language. From there, the data production process is analogous to the Age and Sex production process.

EDUCATIONAL ATTAINMENT

The data behind IMPLAN’s Educational Attainment breakouts for population come from the American Community Survey 5-Year estimates table S1501. Data is produced for two categories: Achieved Education by those Ages 18-24, and those aged 25 and over. Like the Language Spoken and Home data, Educational Attainment splits can only be applied to certain brackets of the population by age. Using the Age data that has already been produced, we can create estimates for the population that falls under the specified age ranges. Even though the Age data comes in five year increments, the raw data for Educational Attainment provide balanced population totals that give us the ability to estimate the portion of the population that falls in the 18-24 category. From there, the data production process is analogous to the Age and Sex production process.

HOUSING DATA – OCCUPANCY AND VACANCY

The data behind IMPLAN’s Occupancy and Vacancy breakouts for housing units come from the American Community Survey 5-Year estimates tables B25002 and B25004. Estimates for Occupancy and Vacancy are only displayed in percentages due to the rapidly changing number of housing units that exist within a given geography within a year. This figure represents an annualized version of housing unit status and is not dynamic within a given year. When creating combined regions, IMPLAN household counts are treated as occupied housing units (not total household units) for the purposes of recalculating percentages. The data production process is analogous to the Age and Sex production process, with the substitution of estimated housing units for population.

LABOR FORCE PARTICIPATION RATE & UNEMPLOYMENT RATE – BY AGE AND BY RACE

The data behind IMPLAN’s Labor Force Participation Rate and Unemployment Rate estimates come from the American Community Survey 5-Year estimates table S2301. The major steps to produce this data are as follows:

Determine a population count for each age group category. This is mostly aided by the Age & Sex data, but there are a few hurdles:
1. The age by labor force participation rate data computation is made a little less straightforward by the fact that age categories from S2301 – Employment Status do not match the age categories found in the Age and Sex data. The first major difference is that the labor force data has an age category of 16 to 19 rather than 15 to 19 because in most places, you must be 16 or older to work. Thus, we need to get an estimate of 15 year olds in each region so that we can exclude them from our calculations that use population data. To do this, we take the portion of 16-19 year olds out of the total 15-19 year olds in both data sources.
2. The second major difference is that the ACS employment status tables stop after “75 years and over”, while our Age & Sex data have categories for 75-79, 80-84, and 85+. Employment status tables also only have 45 -54 and 65 -74 age categories, while Age & Sex data have 45-49, 50-54, 65-69, and 70-74 categories. So, we need to Sum these as appropriate from the Age & Sex data in order to get a total population count for each age group that matches the age groups in the Employment Status table.
Geographically control our estimates of the 16 to 19 year old population group.
Use the percentage figures from the raw data multiplied by the new population counts in each age category to produce a count of individuals that participate in the labor force. According to the ACS, “The labor force participation rate represents the proportion of the population that is in the labor force. For example, if there are 100 people in the population 16 years and over, and 64 of them are in the labor force, then the labor force participation rate for the population 16 years and over would be 64 percent.”
Use the ratio of employees to population supplied in the raw data to back calculate the Civilian Labor Force count. Then, use the unemployment rate multiplied by the Civilian Labor Force count in each geography to calculate a count of unemployed persons. The unemployment rate represents the count of unemployed persons out of the Civilian Labor Force. According to the ACS, the definition of “Unemployed” is: “All civilians 16 years old and over are classified as unemployed if they (1) were neither ‘at work’ nor ‘with a job but not at work’ during the reference week, and (2) were actively looking for work during the last four weeks, and (3) were available to start a job. Also included as unemployed are civilians who did not work at all during the reference week, were waiting to be called back to a job from which they had been laid off, and were available for work except for temporary illness.”
Geographically control and RAS the counts for labor force participators, the Civilian Labor Force, and the count of unemployed persons.
Repeat the above steps for all categories of race, using the RAS method with the previously created population greater than age 16 to ensure that the total population 16 and up matches when all race groups are summed prior to step #3.
Re-divide totals to produce percentages.

When looking at the definitions of the Labor Force Participation Rate and the Unemployment Rate from the ACS, one may notice that IMPLAN already produces data points that constitute what is needed. For example, IMPLAN already produces its own estimates for population and employment, with a breakout for military and civilian employment. It is conceivable that IMPLAN-based Labor Force Participation Rates and Unemployment rates could be calculated, but these are not what is found in this data set. Due to coverage differences in employment source data, commuting, and differing population estimates, an IMPLAN-based set of these rates would not match what is found in the ACS. The featured rates are simply an integration of ACS data such that they can be more conveniently found and referenced if needed by IMPLAN analysts.

IMPLAN also recognizes that some geographies allow for workers to join the labor force prior to the age of 16. These cases are not accounted for, as the ACS does not account for them.

ZIP CODE LEVEL DATA PRODUCTION ADDENDUMS

The list of zip codes used by the ACS is not the same as the list of zip codes that IMPLAN creates models for. For more information on how we select which zip codes to assign data to, please see our article on Estimating Zip Code Data.

As a result, there are discrepancies that can arise when comparing the ACS zip-code level data to IMPLAN’s data. These are all due to one of three specific problems that appear during the data production cycle:

There can be zip codes with population in IMPLAN that have no demographic data according to the ACS.
There can be instances in which zip codes have demographic data in a certain category, but the parent county shows no values for that same category.
There can be cases in which all of the zip codes that constitute a county have no representation of a specific data element, while the county does.
In cases where we are missing zip code level data, category-level splits are assigned using the next geography up in our hierarchical scheme. For instance, if we have population in a zip code but no data to distribute out Age, Sex, Race, etc., we distribute using the ratios of the county that the zip code belongs to.

In cases where zip code data has a demographic split but the parent county does not allow for such splitting, the geographic controlling of the data overwrites these values with zeros. That does not mean that certain demographic groups are underrepresented, though; rather, those groups are represented in other zip-codes such that the counties and states still all add up.

In cases where no zip code within a county has a value for a specific category while the county itself does, data is distributed to zip codes using several methods. The most common (and the method that will be used exclusively as of the 2021 data forward) is an even split across all zip codes in a county.

For most data elements found in this expansion of our demographic data, zip codes with the above problems account for less than 1% of total zip codes in the United States. This is more widespread in the Educational Attainment Data and Labor Force and Unemployment Data.

CONSIDERATIONS WHEN USING THE IMPLAN DEMOGRAPHIC EXPANSION DATA

As of now, none of this data is applied to impact results that come from an IMPLAN model. It is strictly available as a set of study area data that is not to be applied to results. If the analyst does choose to convert raw data into ratios which can be applied to a set of IMPLAN results, they should be aware that this is not endorsed by IMPLAN Group LLC. It is upon the analyst to understand the IMPLAN data and input-output modeling framework, including its limitations, and to employ best practices when using the IMPLAN system.

Enhanced Demographic Webinar

Demographic Data in Data Library