The purpose of this assignment was to get familiar with
various statistical methods and computer programs. This assignment required the
use of a calculator and paper in order to hand write standard deviations, and
the use of MS Excel in order to compute data from a given data set. There were
two parts to this assignment, part one was hand calculations of the data, and
part two was calculating mean centers and weighted mean centers for the data
given. The topic of the data studied in this assignment was cycling. The story
behind this assignment is as follows; you are looking to invest a large sum of
money in a cycling team. During the last race of the Tour de Geographia, the
overall individual won $300,000, with 25% going to the team owner. The team
that won gained $400,000 in a variety of ways and 35% went to the team owner.
With the skills and knowledge learned in quantitative methods, this cycling
team data will be analyzed in order to decide which team to invest in.
Typically, team Astana has produced a race winner, however team Tobler has
recently been making waves as a new coming team in the cycling circuit. Using
the race times from each team for each team member, calculations for each team
will be made regarding range, mean, median, mode, kurtosis, skewness, and
standard deviation. Each of these terms are defined below.
Range: Range can
be defined as the highest value in a data set subtracted from the lowest value
in the data set.
Mean: Mean is
another term for average (also known as sample mean). This is found by adding
all of the values in a data set up to get the sum, and then dividing this
response by the total number of values that are in the data set. Some examples
of common uses for mean include batting averages and even the number of beers
that you drink in a week.
Median: The
median is the middle observation in your data set. When “n” is an odd number
value, you take the middle number in the data set, when it is an even number
you take the difference between the two middle values.
Mode: The mode of
a data set is the most frequent occurring value within the set.
Kurtosis: Kurtosis
refers to the relative peak or flatness of the distribution of your data set,
when comparing to the normal distribution curve. If kurtosis for a data set is
negative, this is called platykurtic and means that the data distribution is
flat and spread out from the mean (any value less than -1 is platykurtic). If
the kurtosis is positive, this is called leptokurtic and means that the data
set includes all positive numbers and has a high peak (any value greater than 1
is leptokurtic). Finding the kurtosis for a data set helps you in figuring out
what the outliers are trying to say about the data set.
Skewness: Skewness
refers to how much the distribution of a data set differs from the mean of the
set. Skewness is the symmetry of the distribution of data in a set. If a data
set has a 0 for skewness, there is none present. Skewness helps you to better
understand your data set. It shows which way the normal distribution curve for
your data set is being pulled by the outliers. Skewness can be either positive
or negative.
Standard
Deviation:
This is a normal distribution technique. Normal distribution helps in
describing data that clusters around the mean. A standard deviation is a type
of normal distribution statistic that tells you just how tightly that values in
the data set are clustered around the mean within the data set. Standard
deviations help us understand outliers and how they can influence a normal distribution
curve by pulling the mean (or average) of a data set to either the
left or the right.
Results:
After looking over the standard deviations that were
calculated for both teams, I will be investing in team Tobler because the
standard deviation for this team is closer to the mean (has a smaller standard
deviation value). This smaller value for the standard deviation means that most
of the observations in team Tobler’s data set fell closer to the mean/average
than the racing times for team Astana. I decided to pick team Tobler because while they may be new
to the cycling circuit, there team members’ race times provided to more
centered around the teams mean/average race time. With this being said, Tobler’s
race times are more consistent as a team, and this makes them a better
investment choice when trying to decide which team will be make you more money
in the future. The use of standard deviations for both teams’ racing times
was the most helpful tool that I used when making my investment decision. While
the average race time for team Astana was slightly quicker (at 2,276.67
minutes, while Tobler’s was 2,285.47 minutes), the team member’s race times
within Astana were not as close to the team’s average time as they were for
team Tobler. This can be seen by comparing Astana’s rather high standard
deviation of 16.63 with the much lower standard deviation of 7.62 that was
calculated for team Tobler’s racing times. The range of each team’s racing
times was another statistical method that was used to decide which team would
be best to invest in. The range of team Astana was significantly higher than
that of Tobler’s, comparing a range of 70 for Astana to 31 for Tobler. The
smaller range of team Tobler shows how this team’s members all have a similar
skill range with race times that are much closer together and more similar to
the team’s average time than that of team Astana. Other statistical methods
that were used to help make the decision of which team to back include the
skewness and kurtosis of each team’s race times. While both teams had a
negative skewness (Astana: -.003, Tobler: -1.56), Astana’s data set was less
skewed than Tobler’s, this also means that Astana’s data set is not as left
skewed as Tobler’s is. Both data sets had a positive kurtosis value as well
(Astana: 1.17, Tobler: 2.93), which indicates that both data sets have a high
(leptokurtic) peak on their distribution curves, with the peak on Tobler’s
curve being a little higher. Skewness and Kurtosis were not the main factors
used for deciding which team to invest in, but seeing these values for each
time helped in visualizing the race times and deciding which team is best to
put money on. Although team Astana appears to be the best choice at first
glance of both teams’ race times, team Tobler ends up to be the best choice of
team to invest in due to the lower standard deviation value and the lower range
of the team’s data set. These statistical methods were used to analyze the data
in this assignment, among other techniques that were defined and utilized in
this assessment of race times. The standard deviation calculations (alone with the other calculations that were defined above) that were
hand written and completed for each team are shown in the photograph below.
The goal of part two of this assignment was calculating mean
centers and weighted mean centers for two cycling teams that were analyzed.
(INSERT DEFINITIONS) Population data for Wisconsin counties from 2000 to 2015
were used to make the calculations. The geographic mean center of population at
the county level will be calculated (for Wisconsin), along with the weighted
mean center of population for 2000 to 2015, which will be weighted by
population. The completed map that shows the three mean center data points is
shown below, along with definitions for geographic mean center and weighted mean center.
Geographic Mean
Center: Mean center is measure of central tendency that is also spatial. A
measure of central tendency is a measure that indicates the middle or center of
the distribution (includes mean, median, and mode). Mean center is attached to
a Cartesian Plane, which includes X and Y coordinates, like those of latitude
and longitude. Mean center is constructed from the average of the X and Y
values included in a data set. Mean center answers the question; where is the
center of the data?
Weighted Mean Center:
The weighted mean center considers the frequencies of grouped data in a
data set. Points are then weighted by the frequencies.
As seen in the completed map, the weighted mean center for
Wisconsin counties populations shifted to the right from 2000 to 2015, and both
of the weighted mean centers are slightly above the mean center of this data
set. This shift to the right of the weighted mean center may have resulted from
a shift and wear the majority of people in Wisconsin are located, which may be
in a more urban and crowded area. This shift to the right could have resulted
from some type of economic shift, or some kind of shift in the housing market
that caused a large number of people to move in that specific county. I found
it interesting that both of the weighted mean centers of population for 2000
and 2015 are both located in one county, Wood county. It will be interesting to
see how this weighted mean center will shift in years to come, and I wonder
whether or not it will still be located within Wood county. Overall, the
weighted mean center points are both located above the mean center point
(green), and the population change from 2000 to 2015 can be seen as a shift to
the right and a little bit down (from the purple point to the red).
No comments:
Post a Comment