In this report we analyzed performance of the Tampa Bay region using real time and traditional indicators. We assessed along the dimensions of personal prosperity, income inequality and economic mobility. We took into consideration the jobs availability, people’s emotions about each MSA, outcome variables and supplemental variables to understand the region performance.
Exploratory Data Analysis
For this report we gathered:
- Search index data for each search unit based on its relative popularity with its pair from Google Trends.
- Real-time job signals on job opportunities from platforms like LinkedIn and Indeed.
- Text data from tweets gathered using Twitter’s python search API with the help of official twitter handles.
- Numerical data from different Government official websites like U.S. Census Bureau, U.S. Bureau of Labor Statistics, etc. This includes attributes like Unemployment rate, GRP per capita, Net migration rate, Poverty rate, Income inequality, Economic mobility, Mean commute time, Transit availability, etc.
Before proceeding with analysis, we had to clean the data and make it into format suitable for analysis.
For text data,
- Converted text to lowercase
- Removed punctuations
- Tokenized the tweets and picked to do lemmatization over stemming as lemmatization ensures that the mapped root word is a valid word in English dictionary
For numerical data,
- Checked with the data quality by validating data points picked randomly
- removed duplicates and created master panel data table
Text data – Sentiment analysis
To understand the people’s emotions and feelings about each MSA, a sentiment analysis is performed using TextBlob which is a part of NLTK library in python. The sentiment score between most positive MSA and Tampa Bay region is analyzed and visualized.
It was observed that the positive twitter sentiment for the Tampa Bay region is driven by the conversation about downtown, universities, and schools.
Numerical data – Panel data analysis
Panel data is a dataset in which the behavior of entities is observed across time. To decide between fixed effects or random effects model to run on this panel data, a Hausman test is performed. Here the null hypothesis is that the preferred model is random effects (i.e., the unique errors are not correlated with the regressors) vs. the alternative. As the p-value is significant, we used fixed effects model.
With this analysis, a primary driver for each of the outcome indicators is identified as an actionable driver.
It was observed that higher public transit availability leads to lower poverty rate and income inequality and higher economic mobility.
The policy experiments were performed using sensitivity analysis. This analysis can be employed to assign the changes in the output of the system to different sources of uncertainty in its inputs. The competitive positions of the MSAs related to economic growth for the next three years were forecasted using this analysis and visualized using Tableau.
In this analysis, the primary conclusion is that public transit availability and STEM education are the key drivers for inclusive economic growth of a region.