Monday, February 20, 2017

Quantitative Methods in Geography: Assignment 2-Descriptive Statistics and Mean Centers

The purpose of this assignment was to get familiar with various statistical methods and computer programs. This assignment required the use of a calculator and paper in order to hand write standard deviations, and the use of MS Excel in order to compute data from a given data set. There were two parts to this assignment, part one was hand calculations of the data, and part two was calculating mean centers and weighted mean centers for the data given. The topic of the data studied in this assignment was cycling. The story behind this assignment is as follows; you are looking to invest a large sum of money in a cycling team. During the last race of the Tour de Geographia, the overall individual won $300,000, with 25% going to the team owner. The team that won gained $400,000 in a variety of ways and 35% went to the team owner. With the skills and knowledge learned in quantitative methods, this cycling team data will be analyzed in order to decide which team to invest in. Typically, team Astana has produced a race winner, however team Tobler has recently been making waves as a new coming team in the cycling circuit. Using the race times from each team for each team member, calculations for each team will be made regarding range, mean, median, mode, kurtosis, skewness, and standard deviation. Each of these terms are defined below. 

Range: Range can be defined as the highest value in a data set subtracted from the lowest value in the data set.
Mean: Mean is another term for average (also known as sample mean). This is found by adding all of the values in a data set up to get the sum, and then dividing this response by the total number of values that are in the data set. Some examples of common uses for mean include batting averages and even the number of beers that you drink in a week.
Median: The median is the middle observation in your data set. When “n” is an odd number value, you take the middle number in the data set, when it is an even number you take the difference between the two middle values.
Mode: The mode of a data set is the most frequent occurring value within the set.
Kurtosis: Kurtosis refers to the relative peak or flatness of the distribution of your data set, when comparing to the normal distribution curve. If kurtosis for a data set is negative, this is called platykurtic and means that the data distribution is flat and spread out from the mean (any value less than -1 is platykurtic). If the kurtosis is positive, this is called leptokurtic and means that the data set includes all positive numbers and has a high peak (any value greater than 1 is leptokurtic). Finding the kurtosis for a data set helps you in figuring out what the outliers are trying to say about the data set.
Skewness: Skewness refers to how much the distribution of a data set differs from the mean of the set. Skewness is the symmetry of the distribution of data in a set. If a data set has a 0 for skewness, there is none present. Skewness helps you to better understand your data set. It shows which way the normal distribution curve for your data set is being pulled by the outliers. Skewness can be either positive or negative.
Standard Deviation: This is a normal distribution technique. Normal distribution helps in describing data that clusters around the mean. A standard deviation is a type of normal distribution statistic that tells you just how tightly that values in the data set are clustered around the mean within the data set. Standard deviations help us understand outliers and how they can influence a normal distribution curve by pulling the mean (or average) of a data set to either the left or the right. 

Results:
After looking over the standard deviations that were calculated for both teams, I will be investing in team Tobler because the standard deviation for this team is closer to the mean (has a smaller standard deviation value). This smaller value for the standard deviation means that most of the observations in team Tobler’s data set fell closer to the mean/average than the racing times for team Astana. I decided to pick team Tobler because while they may be new to the cycling circuit, there team members’ race times provided to more centered around the teams mean/average race time. With this being said, Tobler’s race times are more consistent as a team, and this makes them a better investment choice when trying to decide which team will be make you more money in the future. The use of standard deviations for both teams’ racing times was the most helpful tool that I used when making my investment decision. While the average race time for team Astana was slightly quicker (at 2,276.67 minutes, while Tobler’s was 2,285.47 minutes), the team member’s race times within Astana were not as close to the team’s average time as they were for team Tobler. This can be seen by comparing Astana’s rather high standard deviation of 16.63 with the much lower standard deviation of 7.62 that was calculated for team Tobler’s racing times. The range of each team’s racing times was another statistical method that was used to decide which team would be best to invest in. The range of team Astana was significantly higher than that of Tobler’s, comparing a range of 70 for Astana to 31 for Tobler. The smaller range of team Tobler shows how this team’s members all have a similar skill range with race times that are much closer together and more similar to the team’s average time than that of team Astana. Other statistical methods that were used to help make the decision of which team to back include the skewness and kurtosis of each team’s race times. While both teams had a negative skewness (Astana: -.003, Tobler: -1.56), Astana’s data set was less skewed than Tobler’s, this also means that Astana’s data set is not as left skewed as Tobler’s is. Both data sets had a positive kurtosis value as well (Astana: 1.17, Tobler: 2.93), which indicates that both data sets have a high (leptokurtic) peak on their distribution curves, with the peak on Tobler’s curve being a little higher. Skewness and Kurtosis were not the main factors used for deciding which team to invest in, but seeing these values for each time helped in visualizing the race times and deciding which team is best to put money on. Although team Astana appears to be the best choice at first glance of both teams’ race times, team Tobler ends up to be the best choice of team to invest in due to the lower standard deviation value and the lower range of the team’s data set. These statistical methods were used to analyze the data in this assignment, among other techniques that were defined and utilized in this assessment of race times. The standard deviation calculations (alone with the other calculations that were defined above) that were hand written and completed for each team are shown in the photograph below. 
Displaying IMG_6994.JPG

The goal of part two of this assignment was calculating mean centers and weighted mean centers for two cycling teams that were analyzed. (INSERT DEFINITIONS) Population data for Wisconsin counties from 2000 to 2015 were used to make the calculations. The geographic mean center of population at the county level will be calculated (for Wisconsin), along with the weighted mean center of population for 2000 to 2015, which will be weighted by population. The completed map that shows the three mean center data points is shown below, along with definitions for geographic mean center and weighted mean center.
Geographic Mean Center: Mean center is measure of central tendency that is also spatial. A measure of central tendency is a measure that indicates the middle or center of the distribution (includes mean, median, and mode). Mean center is attached to a Cartesian Plane, which includes X and Y coordinates, like those of latitude and longitude. Mean center is constructed from the average of the X and Y values included in a data set. Mean center answers the question; where is the center of the data?
Weighted Mean Center: The weighted mean center considers the frequencies of grouped data in a data set. Points are then weighted by the frequencies. 
As seen in the completed map, the weighted mean center for Wisconsin counties populations shifted to the right from 2000 to 2015, and both of the weighted mean centers are slightly above the mean center of this data set. This shift to the right of the weighted mean center may have resulted from a shift and wear the majority of people in Wisconsin are located, which may be in a more urban and crowded area. This shift to the right could have resulted from some type of economic shift, or some kind of shift in the housing market that caused a large number of people to move in that specific county. I found it interesting that both of the weighted mean centers of population for 2000 and 2015 are both located in one county, Wood county. It will be interesting to see how this weighted mean center will shift in years to come, and I wonder whether or not it will still be located within Wood county. Overall, the weighted mean center points are both located above the mean center point (green), and the population change from 2000 to 2015 can be seen as a shift to the right and a little bit down (from the purple point to the red).

No comments:

Post a Comment