The topic of assignment 5 is correlation and spatial
autocorrelation and consisted of two parts. Part one focused on census tracts
and population in Milwaukee, WI. In part one, SPSS was used to create a
correlation matrix, which can be found below.
Figure 1
Part 1: Correlation
The correlation matrix displays various interesting patterns. For
example, there is a high strength correlation that can be seen for the number
of manufacturing employees that are white. The value given for this correlation
is 0.735, which is very close to 1, making the relationship between these two
variables more linear than the relationship between others, such as the
correlation between the black and white populations, which is -0.582, showing a
weaker correlation. The negative sign in this value also represents a change in
direction. The Hispanic population’s relationship with the number of
manufacturing employees had a value of 0.303, which is still somewhat of a
strong correlation, and can be seen as a high value when comparing it to
-0.221, which is the given value for the relationship between the black
population and the number of manufacturing workers. This sheds light on what
people have manufacturing jobs in Milwaukee, and that the probability of the
black population working at a manufacturing job is lower than that for the Hispanic
and white populations in Milwaukee. Another example of a high strength
correlation can be seen when looking at the relationship between the white
population and median household income, with a value of 0.585. This value
becomes more significant when comparing it to the relationship between the
black population in the area and median household income, which is given as
-0.417. This difference is quite shocking, and outlines the low probability of
a high strength relationship between the black population and median household
income. Another interesting comparison can be seen when looking at the
relationship between the black and white populations compared to the number of
retail employees. The white population had a value of 0.722, while the black
population had a value of -0.152. This is a high strength correlation for the
white population, and shows that the probability of the white population
working as retail employees is quite high, and the probability is much lower
for the black population. Overall, it can be inferred that the white population
in Milwaukee tend to work as manufacturing or retail employees and finance
employees. They also tend to have a higher median household income. It can be
inferred that the black population in Milwaukee have a significantly lower
household income, and not a lot of the population works in finance, retail, and
manufacturing. It can be inferred that the Hispanic population does work in
manufacturing and retail jobs, but not as many work in finance positions. The median
household income for the population was also slightly higher for the Hispanic
population than for the black population.
Part 2: Spatial Autocorrelation
The second part of
assignment 5 focused on spatial autocorrelation, and GeoDa and SPSS were used
to gather data to be analyzed. For this question, data was given from the Texas
Election Commission (TEC) for the 1980 and 2016 Presidential Elections. The data
consisted of the percent of Democratic votes for both elections and the voter
turnout for each election. The US census fact finder website was used to
download the percent Hispanic populations from 2015. The fact finder website
was also used to download a shapefile of Texas and all of its counties. With the
data given and downloaded, the mission was to analyze the patterns of each
election and determine if there are clustering of voting patterns in the state,
and clustering with voter turnout. The TEC will provide the data created to the
governor in order to detect any changes in election patterns over the past 36
years.
Various spatial tools
were used in order to answer this question, including SPSS and GeoDa (and the
US census fact finder). In order to answer the given question from the TEC, it
must be determined if there is spatial autocorrelation of the voting results in
each election and the voter turnout. The first step in answering this question
was using the US census fact finder website to download the necessary data and
Texas county shapefile. Once all of the necessary data was gathered, the
Hispanic population data and Voting data were joined together using ArcMap. The
next step after joining the data tables was to export the data as a new
shapefile, in order to use the data in GeoDa it must be downloaded as a
shapefile. Once GeoDa was opened, a Spatial Weight was created in order to see
if there is spatial autocorrelation for both elections, voter turnouts, and
Hispanic populations. Rook contiguity was used for the contiguity weight, and
GeoDa was then used to determine the Moran’s I and LISA cluster maps for this data
set. There were three variables to choose from when calculating the Moran’s I
and LISA maps, these included POLY_ID, SHAPE_AREA, and SHAPE_LENGTH. The three
Moran’s I charts that were created are shown below. Three LISA cluster maps
were also created based on the same three variables. Univariate Moran’s I and
LISA were both used on GeoDa, and the same weight was used for all calculations.
A discussion of the results can be found below the diagrams.
Figure 2: POLY_ID Moran's I
Figure 3: SHAPE_AREA Moran's I
Figure 4: SHAPE_LENGTH Moran's I
Figure 5: POLY_ID LISA Cluster Map
Figure 6: SHAPE_AREA LISA Cluster Map
Figure 7: SHAPE_LENGTH LISA Cluster Map
The POLY_ID Moran’s I chart shows a null
relationship, meaning that there is no relationship between the variables based
on the POLY_ID. This shows a low strength relationship, and this can be seen by
the very low Moran’s I of 0.0271803. This indicates a weak association between
the variables.
The SHAPE_AREA Moran’s I chart shows a positive
relationship and a very strong association. This is because all of the data
points are clustered in one area of the chart, while there are some outliers
that can be seen, the SHAPE_AREA Moran’s I chart shows a strong association
between the variables. The Moran’s I value for this chart is 0.554015, which is
significantly higher than the Moran’s I calculated for POLY_ID. This shows a
strong correlation.
The SHAPE_LENGTH Moran’s I chart appears to be
quite similar to the SHAPE_AREA chart, and also shows a positive relationship.
This chart does show a strong association between the variables, but not as
strong as that of SHAPE_AREA, because there are more outliers present and the
data is not as clustered. The Moran’s I calculated for SHAPE_LENGTH is 0.49795,
which is very close to the value found for SHAPE_AREA, showing the similarity
between these two variables. Both SHAPE_LENGTH and SHAPE_AREA showed high
spatial autocorrelation.
The POLY_ID LISA cluster map did not provide
helpful results for the question at hand. This map came out quite scattered,
and does not show any patterns of clustering or any indication of change in
Democratic voters. This map shows high red values more towards the east of the
map, possibly indicating some type of movement.
The SHAPE_AREA LISA cluster map showed a
similar pattern to the one seen in the SHAPE_LENGTH map. The southwest corner
of Texas appears to be the point of interest when it comes to Democratic votes.
The blue areas indicating low values are a lot more scattered in this map
compared to those in SHAPE_LENGTH. There is a high concentration of Democratic
voters in the southwest corner counties of Texas, indicating that this should
be an area of focus for the governor to look at.
The SHAPE_LENGTH LISA cluster map shows
clustering in the southwest corner of Texas, this indicates that a large number
of people living in the red highlighted counties voted Democratic in both
elections, this provides an idea of where the Democratic voter population in
Texas is located. The blue highlighted areas it can be seen where there was a
low amount of Democratic voter turnout, and these counties appear to be more in
the northeast corner of Texas. It can be noted that the southwest corner of
Texas will be important for future Democratic votes.
In this part of the assignment, data was given from the Texas
Election Commission for the 1980 and 2016 Presidential Elections. The data was
the percent Democratic votes for both elections and the voter turnout for both.
Hispanic population data from 2015 was also downloaded from the US census fact
finder website. The patterns of the data given were analyzed using SPSS and
GeoDa and produced the Moran’s I and LISA cluster maps that were provided
above. These charts and maps were then analyzed in order to determine patterns
of voting in Texas counties. There was a similar trend of clustering for the
SHAPE_LENGTH and SHAPE_AREA LISA cluster maps that were created, and there was
also a lot of similarity between the Moran’s I values and charts of these two
variables as well. The similar trend in the cluster maps indicated that there
is a high number of Democratic voters located in the counties in the southwest
corner of Texas (the dark red highlighted counties in the maps above). The POLY_ID
data did not provide helpful results, the Moran’s I value was quite low and the
chart indicated a null relationship. The LISA cluster map for POLY_ID showed
scattered data throughout the Texas counties. Overall, the governor should
focus in on the counties of Texas that are located in the southwest corner.