More on Statistics.....P-Value, Anova

Notes from Net for quick reference on P-Value and Anova..

https://onlinecourses.science.psu.edu/statprogram/node/138

The P-value approach involves determining "likely" or "unlikely" of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed.

If the P-value is less than (or equal to) α, then the null hypothesis is rejected in favor of the alternative hypothesis. 


Specifically, the four steps involved in using the P-value approach to conducting any hypothesis test are:
  1. Specify the null and alternative hypotheses.
  2. Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. 
  3. Using the known distribution of the test statistic, calculate the P-value
  4. Set the significance level, α, the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P-value to α. If the P-value is less than (or equal to) α, reject the null hypothesis in favor of the alternative hypothesis. If the P-value is greater than α, do not reject the null hypothesis.
In an example, if we get the P-value, 0.0127 which is less than α = 0.05, we reject the null hypothesis H0 : μ = 3 in favor of the alternative hypothesis HA : μ > 3.


Anova (Ref: CP Kothari)

Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches
in the fields of economics, biology, education, psychology, sociology, business/industry and in researches of several other disciplines. This technique is used when multiple sample cases are involved. as stated earlier, the significance of the difference between the means of two samples can be judged
through either z-test or the t-test, but the difficulty arises when we happen to examine the significance
of the difference amongst more than two sample means at the same time. 

The ANOVA technique enables us to perform this simultaneous test and as such is considered to be an important tool of analysis in the hands of a researcher. Using this technique, one can draw inferences about whether
the samples have been drawn from populations having the same mean.
The ANOVA technique is important in the context of all those situations where we want to
compare more than two populations such as in comparing the yield of crop from several varieties of
seeds, the gasoline mileage of four automobiles, the smoking habits of five groups of university
students and so on.

 “The essence of ANOVA is that the total amount of variation in a set of data is broken

down into two types, that amount which can be attributed to chance and that amount which can be
attributed to specified causes."


If the 
worked out value of F, as stated above, is less than the table value of F, the difference is
taken as insignificant i.e., due to chance and the null-hypothesis of no difference between
sample means stands. 

In case the calculated value of F happens to be either equal or more
than its table value, the difference is considered as significant (which means the samples
could not have come from the same universe) and accordingly the conclusion may be
drawn. The higher the calculated value of F is above the table value, the more definite and
sure one can be about his conclusions
Per acre production data
Plot of land Variety of wheat
A
 B
C
1
6
5
5
2
7
5
4
3
3
3
3
4
8
7
4
Mean
6
5
4

A
x-MeanX
square
6
0
0
7
1
1
3
-3
9
8
2
4
                                                                                                14
B
x-MeanX
square
5
0
0
5
0
0
3
-2
4
7
2
4





8


C


5
1
1
4
0
0
3
-1
1
4
0
0
                                                2
Sum of Sq within group (SSW)
=14+8+2=24




x-MeanX
Square
6
1
1
Total Mean
7
2
4
3
-2
4
8
3
9
5
0
0
5
0
0
3
-2
4
7
2
4
5
0
0
4
-1
1
3
-2
4
4
-1
1
Total Mean
5
32

Mean within Groups
A
 B
C

6
5
5

7
5
4

3
3
3

8
7
4

mean
1
0
1

sum of sqa
2

sample size*sum of square
8

Total Sum of Sqaures
Sum of Squares between groups+Sum of Squares within groups
32
8+24
sum of squares between groups/df
4
(df=# of Groups-1=3-1=2)
sum of squares within group/df
2.666667
(df=# of obs-# of groups=12-3=9)
F=(sum of squares between grp/df)/(sum of squares within grp/df)
1.499998
F(2,9)
1.499998

from table F(2,9) critical value
4.26
F is within critical value so Ho is accepted.So there is no difference and whatever diff is likely due to chance.
The above table shows that the calculated value of F is 1.5 which is less than the table value of
4.26 at 5% level with d.f. being v1 = 2 and v2 = 9 and hence could have arisen due to chance. This
analysis supports the null-hypothesis of no difference is sample means. We may, therefore, conclude that the difference in wheat output due to varieties is insignificant and is just a matter of chance.


Comments

Popular posts from this blog

ScoreCard Model using R

Zeppelin and Anaconda

The auxService:mapreduce_shuffle does not exist