More on Statistics.....P-Value, Anova

May 02, 2016

Notes from Net for quick reference on P-Value and Anova..

https://onlinecourses.science.psu.edu/statprogram/node/138

The P-value approach involves determining "likely" or "unlikely" of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed.

If the P-value is less than (or equal to) α, then the null hypothesis is rejected in favor of the alternative hypothesis.

Specifically, the four steps involved in using the P-value approach to conducting any hypothesis test are:

Specify the null and alternative hypotheses.
Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic.
Using the known distribution of the test statistic, calculate the P-value:
Set the significance level, α, the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P-value to α. If the P-value is less than (or equal to) α, reject the null hypothesis in favor of the alternative hypothesis. If the P-value is greater than α, do not reject the null hypothesis.

In an example, if we get the P-value, 0.0127 which is less than α = 0.05, we reject the null hypothesis H₀ : μ = 3 in favor of the alternative hypothesis H_A : μ > 3.

Anova (Ref: CP Kothari)

Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches

in the fields of economics, biology, education, psychology, sociology, business/industry and in researches of several other disciplines. This technique is used when multiple sample cases are involved. as stated earlier, the significance of the difference between the means of two samples can be judged

through either z-test or the t-test, but the difficulty arises when we happen to examine the significance

of the difference amongst more than two sample means at the same time.

The ANOVA technique enables us to perform this simultaneous test and as such is considered to be an important tool of analysis in the hands of a researcher. Using this technique, one can draw inferences about whether

the samples have been drawn from populations having the same mean.

The ANOVA technique is important in the context of all those situations where we want to

compare more than two populations such as in comparing the yield of crop from several varieties of

seeds, the gasoline mileage of four automobiles, the smoking habits of five groups of university

students and so on.

“The essence of ANOVA is that the total amount of variation in a set of data is broken

down into two types, that amount which can be attributed to chance and that amount which can be

attributed to specified causes."

If the

worked out value of F, as stated above, is less than the table value of F, the difference is

taken as insignificant i.e., due to chance and the null-hypothesis of no difference between

sample means stands.

In case the calculated value of F happens to be either equal or more

than its table value, the difference is considered as significant (which means the samples

could not have come from the same universe) and accordingly the conclusion may be

drawn. The higher the calculated value of F is above the table value, the more definite and

sure one can be about his conclusions

Per acre production data
Plot of land Variety of wheat
	A	B	C
1	6	5	5
2	7	5	4
3	3	3	3
4	8	7	4

Mean	6	5	4

A	x-MeanX	square
6	0	0
7	1	1
3	-3	9
8	2	4

B	x-MeanX	square
5	0	0
5	0	0
3	-2	4
7	2	4
		8
C
5	1	1
4	0	0
3	-1	1
4	0	0

Sum of Sq within group (SSW)	=14+8+2=24

		x-MeanX	Square
	6	1	1
Total Mean	7	2	4
	3	-2	4
	8	3	9
	5	0	0
	5	0	0
	3	-2	4
	7	2	4
	5	0	0
	4	-1	1
	3	-2	4
	4	-1	1
Total Mean	5		32

Mean within Groups

	A		B	C
	6		5	5
	7		5	4
	3		3	3
	8		7	4
mean	1		0	1
sum of sqa	2
sample size*sum of square	8
Total Sum of Sqaures		Sum of Squares between groups+Sum of Squares within groups
32		8+24

sum of squares between groups/df		4
(df=# of Groups-1=3-1=2)

sum of squares within group/df		2.666667
(df=# of obs-# of groups=12-3=9)

F=(sum of squares between grp/df)/(sum of squares within grp/df)					1.499998
F(2,9)		1.499998
from table F(2,9) critical value		4.26

F is within critical value so Ho is accepted.So there is no difference and whatever diff is likely due to chance. The above table shows that the calculated value of F is 1.5 which is less than the table value of 4.26 at 5% level with d.f. being v1 = 2 and v2 = 9 and hence could have arisen due to chance. This analysis supports the null-hypothesis of no difference is sample means. We may, therefore, conclude that the difference in wheat output due to varieties is insignificant and is just a matter of chance.

Search This Blog

TechMusings (BigData,Hadoop,Pig,Hive,DataScience,IoT,EAI,SOA,J2EE)

More on Statistics.....P-Value, Anova

Comments

Post a Comment

Popular posts from this blog

The auxService:mapreduce_shuffle does not exist

Logistic Regression using German Credit Data

Zeppelin and Anaconda