Statistical Analysis
of Results
1. Review the laws of probability
2. Analyze self-generated experimental results using the Chi-square and
T-test tests of significance
Basic Statistics
Mean - the mean
of a set of numbers is the average value; it is obtained by adding
all of the numbers together and dividing by the number of values you have
|
What is the mean for 5 test scores ... 80, 94, 76,
88, and 84?
Sum = 80+94+76+88+84
= 422
n = 5
Mean = (Sum)/n = 422/5 = 84.4
|
Mode -the mode
is the most common number appearing in a set of values
|
Given this set of values [7, 4, 5, 5, 9, 4, 5, 8,
5, 10], what is the mode?
The mode is 5, since it appears the most (4 times).
|
Median - the median
is the number that appears exactly in the middle of a set of values
|
Given this set of values [22, 9, 14, 12, 20, 17,
8], what is the median?
The median is 12, since there are the same number
of values before and after it..
|
Range - the range
is the difference etween the highest value and the lowest value in a set
of values
|
Given this set of values [5, 25, 4, 17, 46, 19],
what is the range?
The range is 42 (46-4).
|
Standard deviation
-
Standard deviation is a statistic
that shows how 'spread out' your data is. The larger the value of
standard deviation, the more spread out the data is. If you have
a small standard deviation then your data is more clustered around the
mean.
Plus and minus one standard
deviation from the mean should encompass about 68% of the data, while using
two standard deviations should cover 95% and three standard deviations
should cover 99.7%.
| Standard deviation
is calculated by taking the difference between each value and the mean
and squaring it. Then sum those values and divide by (n-1) ... where
n is the number of data values. Finally, take the square root to
get the standard deviation (usually represented by the Greek letter omega). |
 |
Laws of Probability
Probability is the chance of
getting a desired outcome [desired outcome(s) divided by the total possible
outcomes].
For example, the probability
of getting a heads when flipping a coin is 1 out of two (since there are
two possible outcomes), or 50%. The sum of all of the possibilities
should equal to one. To get the probability of multiple outcomes
in a row, you multiply them (though if the order is irrelevant, this may
change).
Chi-Square test of significance
| The Chi-Square test is used
to compare a set of observed frequencies to the expected freqencies to
determine if the null hypothesis is true. The closer they are, the smaller
the Chi-Square value, and the more likely it is for the null hypothesis
to be true. |
 |
In words, the equation is as follows:
Chi-Square Value is equal to the sum of the square of the difference of
the observed and expected amounts divided by the expected amount.
The letter o is the observed number and the letter e is the expected number.
For example, if you are rolling a die (one of a pair of dice), the e would
be 1/6 of the total amount of trials. If you roll the die 600 times,
you should theoretically get each number 100 times. So e=100.
The o is the actual number of rolls for that particular number.
Simply put, if the Chi-Square
value that you get is too high, then the variance in the observed data
is statistically significant. For a simple problem with 2 outcomes,
there would be 1 degree of freedom (degrees of freedom = number of categories
minus 1). The number we'll use for comparison is 3.47 (changes
depending on the situation). [NOTE: check this
table for other values.] If the Chi-Square value is
higher than that, then the observed data is variant by a significantly
significant amount. If less than that value, then the observed data
falls within what is to be expected. Note: if the expected value
is less than 5, the Chi-Square test should not be used.
Example: Flipping
a Coin
If you flip a
coin, the expected results would be to get heads 50% of the time and tails
50% of the time. Let's say you flip a coin 100 times.
You actually get 42 heads and 58 tails. You would expect
(null hypothesis) to get 50 heads and 50 tails. You want to know
if this data is acceptible and within expected limits. The math goes
like this:

Since the value (2.56) is less than
3.47, the variance in the observed data is statistically insignificant.
T-test of significance
The t-test is often used in
calculating the significance of observed differences between the means
of two samples. The null hypothesis is that there are no significant difference
between the means. The t-test is usually used with scalar variables.
To calculate your t-value, you
need to first calculate the mean and the sample variance (s2) of each of
your samples. You will be performing this test on your calculators, but
the explanation of the calculation is as follows.
Calculate the variance of the
difference between the two means. Then take the square root.
The t value is calculated by the equation to the right. If the calculated
t value exceeds the tabulated value then the means are significantly different.
[NOTE: check this
table for t-test comparison values.] |
 |
 |