Introduction
- Nominal Data
नाम मात्रको measurement scale लाई Nominal Data भनिन्छ यो सबैभन्दा आधारभूत प्रकारको Data हो, जहाँ अंकले अन्तर्निहित क्रम वा संख्को मान बिना लेबलको मात्र प्रतिनिधित्व गर्दछ। जस्तै- लिङ्गः पुरुष, महिला
- वैवाहिक स्थितिः अविवाहित विवाहित, सम्बन्धविच्छेद
- रक्त समूहः A, B, AB, O
- Ordinal Data
क्रम मात्रको measurement scale लाई Ordinal Data भनिन्छ जसले एक विशिष्ट क्रम वा श्रेणी जनाउछ। यस्ता क्रमबद्ध डेटाका उदाहरणहरू तल दिएको छ।- शिक्षा स्तर :हाई स्कूल, स्नातक, स्नातकोत्तर, डॉक्टरेट
- सन्तुष्टि मूल्याङ्कन :धेरै असन्तुष्ट, असन्तुष्ट, तटस्थ, सन्तुष्ट, धेरै सन्तुष्ट
- उत्पादनहरूको श्रेणीकरण :1st, 2nd, 3rd
- Continuous Data
संख्याको मात्रा बुझाउने measurement scale लाई Continuous Data भनिन्छ जसले अंशात्मक मानहरू सहित दिइएको दायरा भित्र कुनै पनि मान लिन सक्छ। जस्तै- तापक्रम
- समय
-
ratio Data
अनुपात डेटा एक प्रकारको निरन्तर डेटा हो जसमा absolute zero हुन्छ, जस्तै- उचाइ
- वजन
- बालबालिकाको संख्या
Introduction
A statistical method to test hypothesis where data is often nominal or ordinal, is called Non-parametric test. Therefore “fewer and weaker” than parametric tests. For this reason, we often use parametric tests if/when possible
Non-parametric test पनि hypothesis test गर्ने statistical method हो । यस test को प्रयोग साधारणतया निम्न अवस्थामा गरिन्छ ।
- nominal or ordinal scale मा data भएमा
- population को distribution उल्लेख नभएमा वा population को distribution normal नभएमा
Parametric Tests | Non-parametric Tests |
Population is normal,assumed normal, approximate normal | Population is not normal, but continuous |
Uses the population parameter | Do not uses population parameter |
Data are on interval or ratio scale | Data can be of nominal or ordinary scale |
Shape of the distribution is required | Shape of the distribution is not required |
Populations have equal variance | Populations may not have equal variance |
Mean is index of central location | Median is index of central location |
Types of Non-Parametric test
The common non-parametric tests are
- Sign test
- Wilcoxon's test
- Mann-Whitney U test
- Kruskal-Wallis test
- Friedman's test
- Run test
Sign Test
The sign test is one of the simplest nonparametric test. It is used for
- one sample
- two repeated (or correlated) samples
The usual null hypothesis for this test is that there is no difference between the two treatments. If this is so, then the number of + signs (or - signs) should have a binomial distribution with \( p=0.5 \) and \( q=0.5 \) and the number of subjects \( n \) .
Therefore the sign test will be proceed according to
- Small sample - binomial distribution
- Large sample - normal approximation with \( z= \frac{x-n p}{\sqrt{npq}} \)
Scoring procedure
To operate sign test, we execute following procedure
- subtract score from mean \( (x- \mu) \)
- write down the sign of difference \( (x- \mu) \)
- write “-” if the difference score is negative, and “+” if the difference score is positive
- if the difference score is zero, discard sign, there will be no sign
- In the case of tied scores, make one a “+” and another one “-” sign.
if there is an even number of subjects with tied scores, make half of them “+” signs, and half “-” signs. For an odd number, drop one randomly selected subject, and then proceed as for an even number.
Test Procedure
- Count total observations of plus signs = \( x \)
- Count total signed observations =\( n \)
- Use Binomial statistic if \( n \leq 20 \)
- Use Z statistic if \( n > 20 \)
- right tail test को लागी, calculate \( P (X \geq x) \)
- left tail test को लागी, calculate \( P (X \leq x) \)
- two tail test को लागी, calculate \( P (X = x) \)
-
The following are measurements of height in cm of a college students. Use the sign test to test the null hypothesis \( \mu= 160 \) against the alternative hypothesis \( \mu < 160 \) at 0.05 level of significance.
163, 165, 160, 189, 161 , 171 , 158, 151, 169, 162, 163
139, 172, 165, 148 , 166 , 172, 163, 187 173
Solution
Given that, \( \mu= 160 \), therefore, we assign the “+” sign for the data value \( > 160 \) and “-”sign for the data value \( < 160 \), we discard the sign for data value \( =160 \)
Then we get
163+, 165+, 160 (Discard), 189+, 161+, 171+, 158-, 151-, 169+, 162+, 163+
139-, 172+, 165+, 148-, 166+, 172+, 163+, 187+, 173+
Reading, the sign , we get
number of positive sign x= 15
number of sign n=19
Now,- \( H_0:\mu \ge 160 \)
\( H_1:\mu < 160 \) [Left-tailed] - \( \alpha =0.05\)
- Since sample size is small, we use binomial distribution
Thus, we reject \(H_0\) if p-value is less or equal to 0.05 - According to binomial distribution table
\( P(x \leq 15, n=19,p=0.5) \)
= \(1- P(x =16,x =17,x =18,x =19) \)
= \( 0.99\) - Since probability is P =0.99, which is greater than \( \alpha =0 .05 \)
Thus, we cannot reject \(H_0\).
Interpretation: \(\mu \ge 160 \).
- \( H_0:\mu \ge 160 \)
- The following are measurements of height in cm of a college students. Use the sign test to test the null hypothesis \( \mu= 160 \) against the alternative hypothesis \( \mu > 160 \) at 0.05 level of significance.
163, 165, 160, 189, 161, 171 , 158, 151, 169, 162, 163, 139
172, 165, 148, 166 , 172, 163, 187, 173Solution
163+, 165+ , 160 (Discard), 189+, 161+, 171+ , 158-, 151-, 169+, 162+
Given that, \( \mu= 160 \), therefore, we assign the “+” sign for the data value \( > 160 \) and “-”sign for the data value \( > 160 \), we discard the sign for data value \( =160 \)
Then we get
163+, 139-, 172+, 165+, 148-, 166+, 172+ , 163+, 187+, 173+
Reading, the sign , we get
number of positive sign x= 15
number of sign n=19
Now,- \( H0:\mu \le 160 \)
\( H1:\mu > 160 \) [Right-tailed] - \( \alpha =0.05\)
- Since sample size is small, we use binomial distribution
Thus, we reject \(H_0\) if p-value is less or equal to 0.05 - According to binomial distribution table
\( P(x \geq 15, n=19,p=0.5)\)
= \( P(x =15, x =16,x =17,x =18,x =19)\)
= \( 0.0096\) - Since probability is P =0.0096, which is less than \( \alpha =0 .05 \)
Thus, we reject \(H_0\).
Interpretation: \( \mu > 160 \).
- \( H0:\mu \le 160 \)
- The following are measurements of height in cm of a college students. Use the sign test to test the null hypothesis \( \mu =160 \) against the alternative hypothesis \( \mu \ne 160 \) at 0.05 level of significance.
163, 165, 160, 189, 161, 171 , 158, 151, 169, 162, 163, 139
172, 165, 148, 166 , 172, 163, 187, 173Solution
163+, 165+ , 160 (Discard), 189+, 161+, 171+ , 158-, 151-, 169+, 162+, 163+, 139-
Given that, \( \mu= 160 \), therefore, we assign the “+” sign for the data value \( > 160 \) and “-”sign for the data value \( < 160 \), we discard the sign for data value \( =160 \)
Then we get
172+, 165+, 148-, 166+, 172+ , 163+, 187+, 173+
Reading, the sign , we get
number of positive sign x= 15
number of sign n=19
Now,- \( H0:\mu= 160 \)
\( H1:\mu \ne 160 \) [Two-tailed] - Since sample size is small, we use binomial distribution
Thus, we reject \(H_0\) if p-value is less or equal to 0.05 - According to binomial distribution table
\( P(x = 15, n=19,p=0.5) =0.0074 \) - Since probability is P =0.007, which is less than \( \alpha =0 .05 \)
Thus, we reject \(H_0\).
Interpretation: \(\mu \neq 160 \).
- \( H0:\mu= 160 \)
-
The following data, are amount of sulfur oxides emitted by a large industrial plant. Use sign test if \( \mu=21.5 \) against \( \mu \neq 21.5 \) at 0.01 level.
17, 15, 20, 29, 19, 18, 22, 25
27, 9, 24 ,20, 17, 6, 24, 14
15, 23, 24, 26, 19, 23, 28, 19
16, 22, 24, 17, 20, 13, 19, 10
Solution
Given that, \( \mu=21.5 \), therefore, we assign the “+” sign for the data value \( \mu > 21.5 \) and “-”sign for the data value \( \mu < 21.5 \).
We discard the sign for data value \( \mu=21.5 \).
Then we get
17- 15-, 20-, 29+, 19-, 18-, 22+, 25+
27+, 9-, 24+, 20-, 17-, 6-, 24+, 14-
15-, 23+ ,24+, 26+, 19-, 23+, 28+, 19-
16-, 22+, 24+ ,17- ,20-, 13-, 19- ,10-
23+, 18-, 31+, 13-, 20-, 17-, 24+ ,14-
Reading, the sign , we get
number of positive sign x= 16
number of sign n=40
Now- \( H0:\mu= 21.5 \)
\( H1:\mu \neq 21.5 \) [Two-tailed] - Since sample size is large, we use z statistic
Thus,
\( Z_{\frac {\alpha}{2}}=Z_{\frac{0.05}{2}}=Z_{0.025}=1.96 \)
Upper critical value 1.96 (we reject H0 if Z-value is greater or equal to 1.96)
Lower critical value -1.96 (we reject H0 if Z-value is less or equal to -1.96) - According to formula
\( Z=\frac{x-np}{\sqrt{npq}}=\frac{16-40 \times 0.5}{\sqrt{40 \times 0.5 \times 0.5}}=\frac{-4}{\sqrt{10}}=-1.26 \) - Since Z =-1.26, which does not lie in critical region
Thus, we cannot reject H0.
Interpretation: \( \mu= 21.5 \).
- \( H0:\mu= 21.5 \)
-
To determine the effectiveness of a new traffic control system, the number of accidents that occurred at 12 dangerous intersections during four weeks before and four weeks after the installation of the new system were observed and the following data were
obtained.
Number of accidents before 3 5 2 3 3 3 0 4 1 6 4 1 Number of accidents after 1 2 0 2 2 0 2 3 3 4 1 0 Use sign test to test null hypothesis that new traffic control system is only as effective as the old system. Use 0.05 level.
Solution The data are in pair, thus the reading differences are
difference= before-after 2 3 2 1 1 3 -2 1 -2 2 3 1 Now, we assign +ve sign for the +ve data value and –ve sign for -ve data value
Then we getdifference= before-after 2 3 2 1 1 3 -2 1 -2 2 3 1 sign + + + + + + - + - + + + Reading, signed data, we get
number of positive sign x= 10
number of sign n=12
Now- \( H_0:\mu_1= \mu_2 \)
\( H_1:\mu_1 > \mu_2 \) [Right-tailed]
\( \alpha =0.05 \) - Since sample size is small, we use binomial distribution
Thus, we reject H0 if p-value is less or equal to 0.05 - According to binomial distribution table
\( P(x \geq 10, n=12,p=0.5) =P(x=10,x=11,x=12)=0.019 \)
Thus, p- value is 0.0193 - Since, p-value 0.0193, which is less than \( \alpha = 0.05 \), we reject H0.
Interpretation:
The data provide sufficient evidence to indicate that new traffic control system is effective as old system.
- \( H_0:\mu_1= \mu_2 \)
Wilcoxon Signed-rank Test
Wilcoxon signed-rank test (Sign Rank Test) is a non-parametric test. It can be used for
- one-sample
- a paired sample
In sign test, we consider direction of the difference only, but not the magnitude of the difference. But, in signed-rank rest, we also consider magnitude (to some degree) of difference.
To operate Wilcoxon signed-rank test, we sort differences of data based on absolute values (i.e., discarding the sign). Then we assign rank to the differences ignoring the signs (i.e. assign rank 1 to the smallest difference, rank 2 to the next etc). If there are tied ranks, we give mean of the ranks they would have if they were not tied. If null hypothesis is true, then sum of positive ranks and sum of negative ranks are expected to be roughly equal. But if null hypothesis is false, we expect one of the sums to be quite small/large.
Critical Values
\( \ne \) | \( T \le T_\alpha \) | Two tailed |
\( < \) | \( T^- \le T_{2_\alpha }\) | Left tailed |
\( > \) | \( T^+ \le T_{2_\alpha }\) | Right tailed |
Summary Steps
- Calculate difference \( X - \mu \)
- Ignore if difference is zero, reduce n accordingly
- Rank the differences ignoring their sign
- Calculate sum of ranks of positive differences \( T^+ \)
- Calculate sum of ranks of negative differences \( T^-\)
- Calculate \( T=min\{ T^+, T^- \} \)
- If \( n \le 15 \), use Wilcoxon signed-rank test
The rule is that if T is equal to or less than \( T_{critical} \), we reject the null hypothesis.
otherwise Z test with
\( Z=\frac{T -\mu}{\sigma} \)
where \( \mu=\frac{n(n+1)}{4}, \sigma^2=\frac{n(n+1)(2n+1)}{24} \)
- The following are 15 measurements of weights in kg:
97.5, 95.2, 97.3, 96.0, 96.8, 100.3, 97.4, 95.3
93.2, 99.1, 96.1, 97.6, 98.2, 98.5, 94.9
Use sign ranked test at 0.05 level of significance to test whether the mean weight is 98.5.
Solution
Given that, \( \mu=98.5\), therefore, we calculate difference as \( d=x-\mu\), and ignore the data 98.5X 97.5 95.2 97.3 96 96.8 100.3 97.4 95.3 93.2 99.1 96.1 97.6 98.2 94.9 d -1 -3.3 -1.2 -2.5 -1.7 1.8 -1.1 -3.2 -5.3 0.6 -2.4 -0.9 -0.3 -3.6 |d| 1 3.3 1.2 2.5 1.7 1.8 1.1 3.2 5.3 0.6 2.4 0.9 0.3 3.6 R 4 12 6 10 7 8 5 11 14 2 9 3 1 13 Form the sample data \( n=14\)
- \( T^+ \) sum of the ranks of the positive differences = 2+8= 10
- \( T^- \) sum of the ranks of the negative differences =95
- \( T=min\{ T^+,T^-\} = 10 \)
Now,
- \( H_0: \mu =98.5 \)
\( H_1: \mu \ne 98.5 \)
\( \alpha = 0.05 \) - Since, test is concerning sign rank, we use \( T\) statistics
Thus, \( T_{\alpha}=Z_{0.05}=21 \) for n=14 - Based on sample data, value of test statistic is T=10
- Here, \( T=10 \) is less than 21, so \( H_0 \) is rejected.
Interpretation: The mean weight is \( \mu \ne 98.5 \)
Kruskal- Wallis H-Test
The Kruskal-Wallis (H-test) is also called Kruskal-Wallis one-way analysis of variance by ranks. It is for use with k independent samples, where k is equal to or greater than 3, and measurement is at least ordinal. (When k = 2, we use the Mann-Whitney U-test instead). Since, samples are independent, they can be of different sizes.
To operate Kruskal-Wallis H-test, we combine the data. Then assign rank to the whole (i.e. assign rank 1 to the smallest, rank 2 to the next etc.) If there are tied ranks, give mean of the ranks they would have if they were not tied.
Summary Steps
- Rank the data as whole.
- Calculate the sum of the ranks of each sample (Ri).
- Compute statistic
\( H=\left [ \frac{12}{n(n+1)} \displaystyle \sum_{i=1}^k \frac{R_i^2}{n_i}\right ] -3(n+1) \)
where k = the number of samples - Reject null hypothesis as per chi square statistics
- The final grades of samples from three groups of students who were taught the same mathematics course by three different methods are as follows
1st Method 94 88 91 74 2nd method 82 82 79 {} 3rd Method 98 67 72 76 Use H-test to test the null hypothesis that three methods are equally effective at 0.05 level of significance.
Solution
Method Rank 1st 2nd 3rd 1st 2nd 3rd 74 79 67 3 5 1 88 82 72 8 6.5 2 91 82 76 9 6.5 4 94 98 10 11 Sum =30 Sum= 27 Sum =18 From the sample data
- \( R_1= \) sum of ranks occupied by 1st sample = 30
- \( n_1= \) the number of cases in the 1st sample =4
- \( R_2= \) sum of ranks occupied by 2nd sample= 27
- \( n_2= \) the number of cases in the 2nd sample =3
- \( R_3= \) sum of ranks occupied by 3rd sample= 18
- \( n_3= \) the number of cases in the 3rd sample =4
- \( n= \) the total number of cases all sample =11
Now,
- \( H_0: \) All means are equal
\( H_1: \) All means are not equal
\( \alpha = 0.05 \) - Since, test is concerning 3 sample, we use \( H\) statistics
Thus,
\( H_{\alpha, 4,4,3}=H_{0.05}=5.57 \) for \(n_1=5,n_2=5\) - Based on sample data, value of test statistic is
\( H=\left [ \frac{12}{n(n+1)} \displaystyle \sum_{i=1}^k \frac{R_i^2}{n_i}\right ] -3(n+1) =6.67\) - Here, \( H=6.67 \) is greater than 5.57, so \( H_0 \)is rejected.
Interpretation: Three methods are not equally effective.
Run Test: Test for Randomness
Run test is a statistical procedure to examine whether a string of data are occurring randomly or not. It is a non-parametric statistical test that checks a randomness for a two-valued data sequence.
Run is basically defined as the set of identical (or related) symbols contained between two different symbols. It is sequence of letters of one kind surrounded by another. For instance, if a sample of 22 responses is
MMMMFFFMFFFFMFFFFMMMFF
M= male, F=female
Then run counts starting from MMMM and ending with FF, so there are 8 runs
In a numerical data, a run is defined as a series of increasing values or a series of decreasing values. The number of increasing, or decreasing, values is the length of the run.
Scoring Procedure
- Count the runs
- Compute the numbers \( n_1,n_2\)
- Use R statistic if \( n_1,n_2\) both less than 15
The rule is that if R lies outside the interval of \( u_\alpha , u_\alpha ' \) we reject the null hypothesis.
otherwise use Z with
\( Z=\frac{R-\mu}{\sigma} \)
where \( \mu=\frac{2n_1n_2)}{n_1+n_2}+1, \sigma^2=\frac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)} \)
- In 22 tosses of a coin, the following sequence of heads (H) and tails (T) is obtained:
HHHHTTTHHHHHHHTTHHTTTT
Test at 0.05 significance level whether the sequence is random.
Solution
- \( H_0: \) sequence is random
\( H_1: \) sequence is not random
\( \alpha = 0.05 \) - Since, test is concerning runs, we use \( R\) statistics
Thus,
\( u_{\alpha}=u_{0.025}=6 \) and \( u_{\alpha}'=u_{0.025}'=17 \) for \( n_1=13,n_2=9\) - Based on sample data, value of test statistic is
R=6 - Here, \( R=6 \) lies in critical region, so \( H_0 \) is rejected.
Interpretation: The coin tosses are not random.
- \( H_0: \) sequence is random
No comments:
Post a Comment