Non-Parametric Test ~ Bed Prasad Dhakal

Introduction

There are for types of data

Nominal Data
नाम मात्रको measurement scale लाई Nominal Data भनिन्छ यो सबैभन्दा आधारभूत प्रकारको Data हो, जहाँ अंकले अन्तर्निहित क्रम वा संख्को मान बिना लेबलको मात्र प्रतिनिधित्व गर्दछ। जस्तै
1. लिङ्गः पुरुष, महिला
2. वैवाहिक स्थितिः अविवाहित विवाहित, सम्बन्धविच्छेद
3. रक्त समूहः A, B, AB, O
Ordinal Data
क्रम मात्रको measurement scale लाई Ordinal Data भनिन्छ जसले एक विशिष्ट क्रम वा श्रेणी जनाउछ। यस्ता क्रमबद्ध डेटाका उदाहरणहरू तल दिएको छ।
1. शिक्षा स्तर :हाई स्कूल, स्नातक, स्नातकोत्तर, डॉक्टरेट
2. सन्तुष्टि मूल्याङ्कन :धेरै असन्तुष्ट, असन्तुष्ट, तटस्थ, सन्तुष्ट, धेरै सन्तुष्ट
3. उत्पादनहरूको श्रेणीकरण :1st, 2nd, 3rd
Continuous Data
संख्याको मात्रा बुझाउने measurement scale लाई Continuous Data भनिन्छ जसले अंशात्मक मानहरू सहित दिइएको दायरा भित्र कुनै पनि मान लिन सक्छ। जस्तै
1. तापक्रम
2. समय
ratio Data
अनुपात डेटा एक प्रकारको निरन्तर डेटा हो जसमा absolute zero हुन्छ, जस्तै
1. उचाइ
2. वजन
3. बालबालिकाको संख्या

Introduction

A statistical method to test hypothesis where data is often nominal or ordinal, is called Non-parametric test. Therefore “fewer and weaker” than parametric tests. For this reason, we often use parametric tests if/when possible

Non-parametric test पनि hypothesis test गर्ने statistical method हो । यस test को प्रयोग साधारणतया निम्न अवस्थामा गरिन्छ ।

nominal or ordinal scale मा data भएमा
population को distribution उल्लेख नभएमा वा population को distribution normal नभएमा

Parametric Tests	Non-parametric Tests
Population is normal,assumed normal, approximate normal	Population is not normal, but continuous
Uses the population parameter	Do not uses population parameter
Data are on interval or ratio scale	Data can be of nominal or ordinary scale
Shape of the distribution is required	Shape of the distribution is not required
Populations have equal variance	Populations may not have equal variance
Mean is index of central location	Median is index of central location

Types of Non-Parametric test

The common non-parametric tests are

Sign test
Wilcoxon's test
Mann-Whitney U test
Kruskal-Wallis test
Friedman's test
Run test

Sign Test

Show/Hide 👉 Click Here

The sign test is one of the simplest nonparametric test. It is used for

one sample
two repeated (or correlated) samples

The measurement in this test is assumed to be at least ordinal.
The usual null hypothesis for this test is that there is no difference between the two treatments. If this is so, then the number of + signs (or - signs) should have a binomial distribution with \( p=0.5 \) and \( q=0.5 \) and the number of subjects \( n \) .
Therefore the sign test will be proceed according to

Small sample - binomial distribution
Large sample - normal approximation with \( z= \frac{x-n p}{\sqrt{npq}} \)

Scoring procedure

To operate sign test, we execute following procedure

subtract score from mean \( (x- \mu) \)
write down the sign of difference \( (x- \mu) \)
write “-” if the difference score is negative, and “+” if the difference score is positive
if the difference score is zero, discard sign, there will be no sign
In the case of tied scores, make one a “+” and another one “-” sign.
if there is an even number of subjects with tied scores, make half of them “+” signs, and half “-” signs. For an odd number, drop one randomly selected subject, and then proceed as for an even number.

Test Procedure

Count total observations of plus signs = \( x \)
Count total signed observations =\( n \)
Use Binomial statistic if \( n \leq 20 \)
Use Z statistic if \( n > 20 \)
right tail test को लागी, calculate \( P (X \geq x) \)
left tail test को लागी, calculate \( P (X \leq x) \)
two tail test को लागी, calculate \( P (X = x) \)

The following are measurements of height in cm of a college students. Use the sign test to test the null hypothesis \( \mu= 160 \) against the alternative hypothesis \( \mu < 160 \) at 0.05 level of significance.
163, 165, 160, 189, 161 , 171 , 158, 151, 169, 162, 163
139, 172, 165, 148 , 166 , 172, 163, 187 173
Solution
Given that, \( \mu= 160 \), therefore, we assign the “+” sign for the data value \( > 160 \) and “-”sign for the data value \( < 160 \), we discard the sign for data value \( =160 \)
Then we get
163+, 165+, 160 (Discard), 189+, 161+, 171+, 158-, 151-, 169+, 162+, 163+
139-, 172+, 165+, 148-, 166+, 172+, 163+, 187+, 173+

Reading, the sign , we get
number of positive sign x= 15
number of sign n=19
Now,
1. \( H_0:\mu \ge 160 \)
  \( H_1:\mu < 160 \) [Left-tailed]
2. \( \alpha =0.05\)
3. Since sample size is small, we use binomial distribution
  Thus, we reject \(H_0\) if p-value is less or equal to 0.05
4. According to binomial distribution table
  \( P(x \leq 15, n=19,p=0.5) \)
  = \(1- P(x =16,x =17,x =18,x =19) \)
  = \( 0.99\)
5. Since probability is P =0.99, which is greater than \( \alpha =0 .05 \)
  Thus, we cannot reject \(H_0\).
  Interpretation: \(\mu \ge 160 \).
The following are measurements of height in cm of a college students. Use the sign test to test the null hypothesis \( \mu= 160 \) against the alternative hypothesis \( \mu > 160 \) at 0.05 level of significance.
163, 165, 160, 189, 161, 171 , 158, 151, 169, 162, 163, 139
172, 165, 148, 166 , 172, 163, 187, 173
Solution
Given that, \( \mu= 160 \), therefore, we assign the “+” sign for the data value \( > 160 \) and “-”sign for the data value \( > 160 \), we discard the sign for data value \( =160 \)
Then we get
163+, 165+ , 160 (Discard), 189+, 161+, 171+ , 158-, 151-, 169+, 162+
163+, 139-, 172+, 165+, 148-, 166+, 172+ , 163+, 187+, 173+

Reading, the sign , we get
number of positive sign x= 15
number of sign n=19
Now,
1. \( H0:\mu \le 160 \)
  \( H1:\mu > 160 \) [Right-tailed]
2. \( \alpha =0.05\)
3. Since sample size is small, we use binomial distribution
  Thus, we reject \(H_0\) if p-value is less or equal to 0.05
4. According to binomial distribution table
  \( P(x \geq 15, n=19,p=0.5)\)
  = \( P(x =15, x =16,x =17,x =18,x =19)\)
  = \( 0.0096\)
5. Since probability is P =0.0096, which is less than \( \alpha =0 .05 \)
  Thus, we reject \(H_0\).
  Interpretation: \( \mu > 160 \).
The following are measurements of height in cm of a college students. Use the sign test to test the null hypothesis \( \mu =160 \) against the alternative hypothesis \( \mu \ne 160 \) at 0.05 level of significance.
163, 165, 160, 189, 161, 171 , 158, 151, 169, 162, 163, 139
172, 165, 148, 166 , 172, 163, 187, 173
Solution
Given that, \( \mu= 160 \), therefore, we assign the “+” sign for the data value \( > 160 \) and “-”sign for the data value \( < 160 \), we discard the sign for data value \( =160 \)
Then we get
163+, 165+ , 160 (Discard), 189+, 161+, 171+ , 158-, 151-, 169+, 162+, 163+, 139-
172+, 165+, 148-, 166+, 172+ , 163+, 187+, 173+

Reading, the sign , we get
number of positive sign x= 15
number of sign n=19
Now,
1. \( H0:\mu= 160 \)
  \( H1:\mu \ne 160 \) [Two-tailed]
2. Since sample size is small, we use binomial distribution
  Thus, we reject \(H_0\) if p-value is less or equal to 0.05
3. According to binomial distribution table
  \( P(x = 15, n=19,p=0.5) =0.0074 \)
4. Since probability is P =0.007, which is less than \( \alpha =0 .05 \)
  Thus, we reject \(H_0\).
  Interpretation: \(\mu \neq 160 \).
The following data, are amount of sulfur oxides emitted by a large industrial plant. Use sign test if \( \mu=21.5 \) against \( \mu \neq 21.5 \) at 0.01 level.
17, 15, 20, 29, 19, 18, 22, 25
27, 9, 24 ,20, 17, 6, 24, 14
15, 23, 24, 26, 19, 23, 28, 19
16, 22, 24, 17, 20, 13, 19, 10
Solution
Given that, \( \mu=21.5 \), therefore, we assign the “+” sign for the data value \( \mu > 21.5 \) and “-”sign for the data value \( \mu < 21.5 \).
We discard the sign for data value \( \mu=21.5 \).
Then we get
17- 15-, 20-, 29+, 19-, 18-, 22+, 25+
27+, 9-, 24+, 20-, 17-, 6-, 24+, 14-
15-, 23+ ,24+, 26+, 19-, 23+, 28+, 19-
16-, 22+, 24+ ,17- ,20-, 13-, 19- ,10-
23+, 18-, 31+, 13-, 20-, 17-, 24+ ,14-
Reading, the sign , we get
number of positive sign x= 16
number of sign n=40
Now
1. \( H0:\mu= 21.5 \)
  \( H1:\mu \neq 21.5 \) [Two-tailed]
2. Since sample size is large, we use z statistic
  Thus,
  \( Z_{\frac {\alpha}{2}}=Z_{\frac{0.05}{2}}=Z_{0.025}=1.96 \)
  Upper critical value 1.96 (we reject H0 if Z-value is greater or equal to 1.96)
  Lower critical value -1.96 (we reject H0 if Z-value is less or equal to -1.96)
3. According to formula
  \( Z=\frac{x-np}{\sqrt{npq}}=\frac{16-40 \times 0.5}{\sqrt{40 \times 0.5 \times 0.5}}=\frac{-4}{\sqrt{10}}=-1.26 \)
4. Since Z =-1.26, which does not lie in critical region
  Thus, we cannot reject H0.
  Interpretation: \( \mu= 21.5 \).

To determine the effectiveness of a new traffic control system, the number of accidents that occurred at 12 dangerous intersections during four weeks before and four weeks after the installation of the new system were observed and the following data were obtained.

Number of accidents before	3	5	2	3	3	3	0	4	1	6	4	1
Number of accidents after	1	2	0	2	2	0	2	3	3	4	1	0

Use sign test to test null hypothesis that new traffic control system is only as effective as the old system. Use 0.05 level.

Solution The data are in pair, thus the reading differences are

difference= before-after

-2

Now, we assign +ve sign for the +ve data value and –ve sign for -ve data value
Then we get

difference= before-after	2	3	2	1	1	3	-2	1	-2	2	3	1
sign	+	+	+	+	+	+	-	+	-	+	+	+

Reading, signed data, we get
number of positive sign x= 10
number of sign n=12
Now

\( H_0:\mu_1= \mu_2 \)
\( H_1:\mu_1 > \mu_2 \) [Right-tailed]
\( \alpha =0.05 \)
Since sample size is small, we use binomial distribution
Thus, we reject H0 if p-value is less or equal to 0.05
According to binomial distribution table
\( P(x \geq 10, n=12,p=0.5) =P(x=10,x=11,x=12)=0.019 \)
Thus, p- value is 0.0193
Since, p-value 0.0193, which is less than \( \alpha = 0.05 \), we reject H0.
Interpretation:
The data provide sufficient evidence to indicate that new traffic control system is effective as old system.

Wilcoxon Signed-rank Test

Show/Hide 👉 Click Here

Wilcoxon signed-rank test (Sign Rank Test) is a non-parametric test. It can be used for

one-sample
a paired sample

where a numerical scale is inappropriate but rank is possible.

In sign test, we consider direction of the difference only, but not the magnitude of the difference. But, in signed-rank rest, we also consider magnitude (to some degree) of difference.

To operate Wilcoxon signed-rank test, we sort differences of data based on absolute values (i.e., discarding the sign). Then we assign rank to the differences ignoring the signs (i.e. assign rank 1 to the smallest difference, rank 2 to the next etc). If there are tied ranks, we give mean of the ranks they would have if they were not tied. If null hypothesis is true, then sum of positive ranks and sum of negative ranks are expected to be roughly equal. But if null hypothesis is false, we expect one of the sums to be quite small/large.

Critical Values

\( \ne \)	\( T \le T_\alpha \)	Two tailed
\( < \)	\( T^- \le T_{2_\alpha }\)	Left tailed
\( > \)	\( T^+ \le T_{2_\alpha }\)	Right tailed

Summary Steps

Calculate difference \( X - \mu \)
Ignore if difference is zero, reduce n accordingly
Rank the differences ignoring their sign
Calculate sum of ranks of positive differences \( T^+ \)
Calculate sum of ranks of negative differences \( T^-\)
Calculate \( T=min\{ T^+, T^- \} \)
If \( n \le 15 \), use Wilcoxon signed-rank test
The rule is that if T is equal to or less than \( T_{critical} \), we reject the null hypothesis.
otherwise Z test with
\( Z=\frac{T -\mu}{\sigma} \)
where \( \mu=\frac{n(n+1)}{4}, \sigma^2=\frac{n(n+1)(2n+1)}{24} \)

The following are 15 measurements of weights in kg:
97.5, 95.2, 97.3, 96.0, 96.8, 100.3, 97.4, 95.3
93.2, 99.1, 96.1, 97.6, 98.2, 98.5, 94.9

Use sign ranked test at 0.05 level of significance to test whether the mean weight is 98.5.

Solution
Given that, \( \mu=98.5\), therefore, we calculate difference as \( d=x-\mu\), and ignore the data 98.5

X	97.5	95.2	97.3	96	96.8	100.3	97.4	95.3	93.2	99.1	96.1	97.6	98.2	94.9
d	-1	-3.3	-1.2	-2.5	-1.7	1.8	-1.1	-3.2	-5.3	0.6	-2.4	-0.9	-0.3	-3.6
\|d\|	1	3.3	1.2	2.5	1.7	1.8	1.1	3.2	5.3	0.6	2.4	0.9	0.3	3.6
R	4	12	6	10	7	8	5	11	14	2	9	3	1	13

Form the sample data \( n=14\)

\( T^+ \) sum of the ranks of the positive differences = 2+8= 10
\( T^- \) sum of the ranks of the negative differences =95
\( T=min\{ T^+,T^-\} = 10 \)

Now,

\( H_0: \mu =98.5 \)
\( H_1: \mu \ne 98.5 \)
\( \alpha = 0.05 \)
Since, test is concerning sign rank, we use \( T\) statistics
Thus, \( T_{\alpha}=Z_{0.05}=21 \) for n=14
Based on sample data, value of test statistic is T=10
Here, \( T=10 \) is less than 21, so \( H_0 \) is rejected.
Interpretation: The mean weight is \( \mu \ne 98.5 \)

Kruskal- Wallis H-Test

Show/Hide 👉 Click Here

The Kruskal-Wallis (H-test) is also called Kruskal-Wallis one-way analysis of variance by ranks. It is for use with k independent samples, where k is equal to or greater than 3, and measurement is at least ordinal. (When k = 2, we use the Mann-Whitney U-test instead). Since, samples are independent, they can be of different sizes.

To operate Kruskal-Wallis H-test, we combine the data. Then assign rank to the whole (i.e. assign rank 1 to the smallest, rank 2 to the next etc.) If there are tied ranks, give mean of the ranks they would have if they were not tied.

Summary Steps

Rank the data as whole.
Calculate the sum of the ranks of each sample (Ri).
Compute statistic
\( H=\left [ \frac{12}{n(n+1)} \displaystyle \sum_{i=1}^k \frac{R_i^2}{n_i}\right ] -3(n+1) \)
where k = the number of samples
Reject null hypothesis as per chi square statistics

The final grades of samples from three groups of students who were taught the same mathematics course by three different methods are as follows

1st Method	94	88	91	74
2nd method	82	82	79	{}
3rd Method	98	67	72	76

Use H-test to test the null hypothesis that three methods are equally effective at 0.05 level of significance.

Solution

Method			Rank
1st	2nd	3rd	1st	2nd	3rd
74	79	67	3	5	1
88	82	72	8	6.5	2
91	82	76	9	6.5	4
94		98	10		11
			Sum =30	Sum= 27	Sum =18

From the sample data

\( R_1= \) sum of ranks occupied by 1st sample = 30
\( n_1= \) the number of cases in the 1st sample =4
\( R_2= \) sum of ranks occupied by 2nd sample= 27
\( n_2= \) the number of cases in the 2nd sample =3
\( R_3= \) sum of ranks occupied by 3rd sample= 18
\( n_3= \) the number of cases in the 3rd sample =4
\( n= \) the total number of cases all sample =11

Now,

\( H_0: \) All means are equal
\( H_1: \) All means are not equal
\( \alpha = 0.05 \)
Since, test is concerning 3 sample, we use \( H\) statistics
Thus,
\( H_{\alpha, 4,4,3}=H_{0.05}=5.57 \) for \(n_1=5,n_2=5\)
Based on sample data, value of test statistic is
\( H=\left [ \frac{12}{n(n+1)} \displaystyle \sum_{i=1}^k \frac{R_i^2}{n_i}\right ] -3(n+1) =6.67\)
Here, \( H=6.67 \) is greater than 5.57, so \( H_0 \)is rejected.
Interpretation: Three methods are not equally effective.

Run Test: Test for Randomness

Show/Hide 👉 Click Here

Non-Parametric Test

Introduction

Introduction

Types of Non-Parametric test

Sign Test

Scoring procedure

Test Procedure

Wilcoxon Signed-rank Test

Critical Values

Summary Steps

Kruskal- Wallis H-Test

Summary Steps

Run Test: Test for Randomness

Scoring Procedure

No comments:

Post a Comment

Follow Me

SEARCH

LATEST

SECTIONS

Pageviews

Popular

Archive

Courses

Categories

Comments

Top Links Menu

Non-Parametric Test

Introduction

Introduction

Types of Non-Parametric test

Sign Test

Scoring procedure

Test Procedure

Wilcoxon Signed-rank Test

Critical Values

Summary Steps

Kruskal- Wallis H-Test

Summary Steps

Run Test: Test for Randomness

Scoring Procedure

No comments:

Post a Comment

Follow Me

SEARCH

LATEST

SECTIONS

Pageviews

Popular

Archive

Courses

Categories

Comments