Chap11_Chie Square & Non Parametrics

download Chap11_Chie Square & Non Parametrics

of 47

Transcript of Chap11_Chie Square & Non Parametrics

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    1/47

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    2/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-2

    Chapter Goals

    After completing this chapter, you should be

    able to:

    Perform a 2test for the difference between two

    proportions Use a 2test for differences in more than two proportions

    Perform a 2test of independence

    Apply and interpret the Wilcoxon rank sum test for the

    difference between two medians

    Perform nonparametric analysis of variance using the

    Kruskal-Wallis rank test for one-way ANOVA

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    3/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-3

    Contingency Tables

    Contingency Tables

    Useful in situations involving multiple population

    proportions Used to classify sample observations according

    to two or more characteristics

    Also called a cross-classification table.

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    4/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-4

    Contingency Table Example

    Left-Handed vs. Gender

    Dominant Hand: Left vs. Right

    Gender: Male vs. Female

    2 categories for each variable, so

    called a 2 x 2 table

    Suppose we examine a sample of

    size 300

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    5/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-5

    Contingency Table Example

    Sample results organized in a contingency table:

    (continued)

    Gender

    Hand Preference

    Left Right

    Female 12 108 120

    Male 24 156 180

    36 264 300

    120 Females, 12

    were left handed

    180 Males, 24 were

    left handed

    sample size = n = 300:

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    6/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-6

    2Test for the DifferenceBetween Two Proportions

    If H0is true, then the proportion of left-handed females should be

    the same as the proportion of left-handed males

    The two proportions above should be the same as the proportion of

    left-handed people overall

    H0: p1= p2 (Proportion of females who are left

    handed is equal to the proportion of

    males who are left handed)

    H1: p1p2 (The two proportions are not the sameHand preference is notindependent

    of gender)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    7/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-7

    The Chi-Square Test Statistic

    where:

    fo= observed frequency in a particular cell

    fe= expected frequency in a particular cell if H0is true

    2for the 2 x 2 case has 1 degree of freedom

    (Assumed: each cell in the contingency table has expected

    frequency of at least 5)

    cellsall e

    2

    eo2

    f

    )ff(

    The Chi-square test statistic is:

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    8/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-8

    Decision Rule

    2U

    Decision Rule:

    If 2> 2U, reject H0,

    otherwise, do not

    reject H0

    The 2test statistic approximately follows a chi-

    squared distribution with one degree of freedom

    0

    Reject H0Do notreject H0

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    9/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-9

    Computing theAverage Proportion

    Here:120 Females, 12were left handed

    180 Males, 24 were

    left handed

    i.e., the proportion of left handers overall is 12%

    n

    X

    nn

    XXp

    21

    21

    12.0300

    36

    180120

    2412p

    The average

    proportion is:

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    10/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-10

    Finding Expected Frequencies

    To obtain the expected frequency for left handed

    females, multiply the average proportion left handed (p)

    by the total number of females

    To obtain the expected frequency for left handed males,multiply the average proportion left handed (p) by the

    total number of males

    If the two proportions are equal, then

    P(Left Handed | Female) = P(Left Handed | Male) = .12

    i.e., we would expect (.12)(120) = 14.4 females to be left handed

    (.12)(180) = 21.6 males to be left handed

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    11/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-11

    Observed vs. ExpectedFrequencies

    Gender

    Hand Preference

    Left Right

    FemaleObserved = 12

    Expected = 14.4

    Observed = 108

    Expected = 105.6120

    Male

    Observed = 24

    Expected = 21.6

    Observed = 156

    Expected = 158.4 180

    36 264 300

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    12/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-12

    Gender

    Hand Preference

    Left Right

    FemaleObserved = 12

    Expected = 14.4

    Observed = 108

    Expected = 105.6120

    MaleObserved = 24

    Expected = 21.6

    Observed = 156

    Expected = 158.4180

    36 264 300

    6848.0

    4.158

    )4.158156(

    6.21

    )6.2124(

    6.105

    )6.105108(

    4.14

    )4.1412(

    f

    )ff(

    2222

    cellsall e

    2

    eo2

    The Chi-Square Test Statistic

    The test statistic is:

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    13/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-13

    Decision Rule

    Decision Rule:

    If 2

    > 3.841, reject H0,otherwise, do not reject H0

    3.841d.f.1with,6848.0isstatistictestThe2

    U

    2

    Here,

    2= 0.6848 < 2U= 3.841,

    so we do not reject H0and conclude that there is

    not sufficient evidence

    that the two proportions

    are different at = .05

    2U=3.841

    0

    Reject H0Do notreject H0

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    14/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-14

    Extend the 2test to the case with more than

    two independent populations:

    2Test for the Differences inMore Than Two Proportions

    H0: p1= p2= = pc

    H1: Not all of the pjare equal (j = 1, 2, , c)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    15/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-15

    The Chi-Square Test Statistic

    where:

    fo= observed frequency in a particular cell of the 2 x c table

    fe= expected frequency in a particular cell if H0is true

    2for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom

    (Assumed: each cell in the contingency table has expected

    frequency of at least 1)

    cellsall e

    2

    eo2

    f

    )ff(

    The Chi-square test statistic is:

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    16/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-16

    Computing theOverall Proportion

    n

    X

    nnn

    XXXp

    c21

    c21

    The overall

    proportion is:

    Expected cell frequencies for the c categories

    are calculated as in the 2 x 2 case, and the

    decision rule is the same:

    Decision Rule:If 2> 2U, reject H0,

    otherwise, do not

    reject H0

    Where 2Uis from thechi-squared distribution

    with c1 degrees of

    freedom

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    17/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-17

    2Test of Independence

    Similar to the 2test for equality of more than

    two proportions, but extends the concept to

    contingency tables with r rowsand c columns

    H0: The two categorical variables are independent

    (i.e., there is no relationship between them)

    H1: The two categorical variables are dependent(i.e., there is a relationship between them)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    18/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-18

    2Test of Independence

    where:

    fo= observed frequency in a particular cell of the r x c table

    fe= expected frequency in a particular cell if H0is true

    2for the r x c case has (r-1)(c-1) degrees of freedom

    (Assumed: each cell in the contingency table has expected

    frequency of at least 1)

    cellsall e

    2

    eo2

    f

    )ff(

    The Chi-square test statistic is:

    (continued)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    19/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-19

    Expected Cell Frequencies

    Expected cell frequencies:

    n

    totalcolumntotalrow

    fe

    Where:

    row total = sum of all frequencies in the rowcolumn total = sum of all frequencies in the column

    n = overall sample size

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    20/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-20

    Decision Rule

    The decision rule is

    If 2> 2U, reject H0,

    otherwise, do not reject H0

    Where 2Uis from the chi-squared distribution

    with (r1)(c1) degrees of freedom

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    21/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-21

    Example

    The meal plan selected by 200 students is shown below:

    Class

    Standing

    Number of meals per week

    Total20/week 10/week none

    Fresh. 24 32 14 70

    Soph. 22 26 12 60

    Junior 10 14 6 30Senior 14 16 10 40

    Total 70 88 42 200

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    22/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-22

    Example

    The hypothesis to be tested is:

    (continued)

    H0: Meal plan and class standing are independent

    (i.e., there is no relationship between them)

    H1: Meal plan and class standing are dependent

    (i.e., there is a relationship between them)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    23/47Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-23

    Class

    Standing

    Number of mealsper week

    Total20/wk 10/wk none

    Fresh. 24 32 14 70

    Soph. 22 26 12 60

    Junior 10 14 6 30

    Senior 14 16 10 40

    Total 70 88 42 200

    Class

    Standing

    Number of mealsper week

    Total20/wk 10/wk none

    Fresh. 24.5 30.8 14.7 70

    Soph. 21.0 26.4 12.6 60

    Junior 10.5 13.2 6.3 30

    Senior 14.0 17.6 8.4 40

    Total 70 88 42 200

    Observed:

    Expected cell

    frequencies if H0is true:

    5.10

    200

    7030

    n

    totalcolumntotalrowfe

    Example for one cell:

    Example:Expected Cell Frequencies

    (continued)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    24/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-24

    Example: The Test Statistic

    The test statistic value is:

    709.04.8

    )4.810(

    8.30

    )8.3032(

    5.24

    )5.2424(

    f

    )ff(

    222

    cellsall e

    2

    eo2

    (continued)

    2U= 12.592 for = .05 from the chi-squareddistribution with (41)(31) = 6 degrees of

    freedom

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    25/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-25

    Example:Decision and Interpretation

    (continued)

    Decision Rule:

    If 2

    > 12.592, reject H0,otherwise, do not reject H0

    12.592d.f.6with,709.0isstatistictestThe2

    U

    2

    Here,

    2= 0.709 < 2U= 12.592,

    so do not reject H0

    Conclusion:there is notsufficient evidence that meal

    plan and class standing are

    related at = .05

    2U=12.592

    0

    Reject H0Do notreject H0

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    26/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-26

    Wilcoxon Rank-Sum Test forDifferences in 2 Medians

    Test two independent population medians

    Populations need not be normally distributed

    Distribution free procedure

    Used when only rank data are available

    Must use normal approximation if either of the

    sample sizes is larger than 10

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    27/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-27

    Wilcoxon Rank-Sum Test:Small Samples

    Can use when both n1 , n2 10

    Assign ranks to the combined n1+ n2sample

    observations

    If unequal sample sizes, let n1refer to smaller-sized

    sample

    Smallest value rank = 1, largest value rank = n1+ n2

    Assign average rank for ties

    Sum the ranks for each sample: T1 and T2

    Obtain test statistic, T1 (from smaller sample)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    28/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-28

    Checking the Rankings

    The sum of the rankings must satisfy the

    formula below

    Can use this to verify the sums T1and T2

    2

    1)n(nTT 21

    where n = n1+ n2

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    29/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-29

    Wilcoxon Rank-Sum Test:Hypothesis and Decision Rule

    H0: M1= M2

    H1: M1M2

    H0: M1M2

    H1: M1>M2

    H0: M1M2

    H1: M1 T1U

    Reject H0if T1< T1L Reject H0if T1> T1U

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    30/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-30

    Sample data is collected on the capacity rates

    (% of capacity) for two factories.

    Are the median operating rates for two factoriesthe same?

    For factory A, the rates are 71, 82, 77, 94, 88

    For factory B, the rates are 85, 82, 92, 97

    Test for equality of the sample medians

    at the 0.05 significance level

    Wilcoxon Rank-Sum Test:Small Sample Example

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    31/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-31

    Wilcoxon Rank-Sum Test:Small Sample Example

    Capacity Rank

    Factory A Factory B Factory A Factory B

    71 1

    77 2

    82 3.5

    82 3.5

    85 5

    88 6

    92 7

    94 8

    97 9

    Rank Sums: 20.5 24.5

    Tie in 3rdand

    4thplaces

    Ranked

    Capacity

    values:

    (continued)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    32/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-32

    Wilcoxon Rank-Sum Test:Small Sample Example

    (continued)

    Factory B has the smaller sample size, so

    the test statistic is the sum of the

    Factory B ranks:

    T1= 24.5

    The sample sizes are:

    n1= 4(factory B)

    n2= 5(factory A)

    The level of significance is = .05

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    33/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-33

    n2

    n1

    One-

    Tailed

    Two-

    Tailed4 5

    4

    5

    .05 .10 12, 28 19, 36

    .025 .05 11, 29 17, 38

    .01 .02 10, 30 16, 39.005 .01 --, -- 15, 40

    6

    Wilcoxon Rank-Sum Test:Small Sample Example

    Lower and

    Upper

    Critical

    Values forT1from

    Appendix

    table E.8:

    (continued)

    T1L = 11 and T1U = 29

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    34/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-34

    H0: M1= M2

    H1: M1M2

    Two-Tail Test

    Reject

    T1L=11 T1U=29

    RejectDo Not

    Reject

    Reject H0if T1< T1L=11

    or if T1> T1U=29

    = .05

    n1= 4 , n2= 5Test Statistic (Sum of

    ranks from smaller sample):

    T1= 24.5

    Decision:

    Conclusion:

    Do not rejectat = 0.05

    There is not enough evidence to

    prove that the medians are not

    equal.

    Wilcoxon Rank-Sum Test:Small Sample Solution

    (continued)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    35/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-35

    Wilcoxon Rank-Sum Test(Large Sample)

    For large samples, the test statistic T1 is

    approximately normal with mean and

    standard deviation :

    Must use the normal approximation if either n1or n2> 10

    Assign n1to be the smaller of the two sample sizes

    Can use the normal approximation for small samples

    1T

    1T

    2

    )1n(n

    1T1

    12

    )1n(nn

    21T1

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    36/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-36

    Wilcoxon Rank-Sum Test(Large Sample)

    The Z test statistic is

    Where Z approximately follows a

    standardized normal distribution

    1

    1

    T

    T1

    T

    Z

    (continued)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    37/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-37

    Wilcoxon Rank-Sum Test:Normal Approximation Example

    Use the setting of the prior example:

    The sample sizes were:

    n1= 4(factory B)

    n2= 5(factory A)

    The level of significance was = .05

    The test statistic was T1= 24.5

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    38/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-38

    Wilcoxon Rank-Sum Test:Normal Approximation Example

    The test statistic is

    202

    )19(4

    2

    )1n(n

    1T1

    739.212

    )19()5(4

    12

    )1n(nn

    21T1

    (continued)

    64.1

    2.739

    205.24

    TZ

    1

    1

    T

    T1

    Z = 1.64 is not greater than the critical Z value of 1.96

    (for = .05) so we do not reject H0there is not

    sufficient evidence that the medians are not equal

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    39/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-39

    Kruskal-Wallis Rank Test

    Tests the equality of more than 2 populationmedians

    Use when the normality assumption for one-

    way ANOVA is violated Assumptions:

    The samples are random and independent

    variables have a continuous distribution

    the data can be ranked populations have the same variability

    populations have the same shape

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    40/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-40

    Kruskal-Wallis Test Procedure

    Obtain relative rankings for each value

    In event of tie, each of the tied values gets the

    average rank

    Sum the rankings for data from each of the c

    groups

    Compute the H test statistic

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    41/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-41

    Kruskal - Wallis Test Procedure

    The Kruskal - Wallis H test statistic:(with c1 degrees of freedom)

    )1n(3n

    T

    )1n(n12H

    c

    1j j

    2

    j

    where:

    n = sum of sample sizes in all samplesc = Number of samples

    Tj= Sum of ranks in the jthsample

    nj= Size of the jthsample

    (continued)

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    42/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-42

    Decision rule

    Reject H0 if test statistic H > 2U

    Otherwise do not reject H0

    (continued)

    Kruskal-Wallis Test Procedure

    Complete the test by comparing the

    calculated H value to a critical 2valuefrom

    the chi-square distribution with c1

    degrees of freedom

    2U

    0

    Reject H0Do notreject H0

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    43/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-43

    Do different departments have different class

    sizes?

    Kruskal-Wallis Example

    Class size

    (Math, M)

    Class size

    (English, E)

    Class size

    (Biology, B)

    23

    45

    54

    78

    66

    55

    60

    72

    45

    70

    30

    40

    18

    34

    44

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    44/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-44

    Do different departments have different class

    sizes?

    Kruskal-Wallis Example

    Class size

    (Math, M)Ranking

    Class size

    (English, E)

    RankingClass size

    (Biology, B)

    Ranking

    23

    41

    54

    78

    66

    2

    6

    9

    15

    12

    55

    60

    72

    45

    70

    10

    11

    14

    8

    13

    30

    40

    18

    34

    44

    3

    5

    1

    4

    7

    = 44 = 56 = 20

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    45/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-45

    The H statistic is

    (continued)

    Kruskal-Wallis Example

    72.6)115(35

    20

    5

    56

    5

    44

    )115(15

    12

    )1n(3n

    R

    )1n(n

    12H

    222

    c

    1j j

    2

    j

    equalareMedianspopulationallotN:H

    MedianMedianMedian:H

    A

    HEM0

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    46/47

    Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc. Chap 11-46

    Since H = 6.72

  • 8/12/2019 Chap11_Chie Square & Non Parametrics

    47/47

    Chapter Summary

    Developed and applied the 2test for thedifference between two proportions

    Developed and applied the 2test for

    differences in more than two proportions Examined the 2test for independence

    Used the Wilcoxon rank sum test for twopopulation medians Small Samples Large sample Z approximation

    Applied the Kruskal - Wallis H-test for multiplepopulation medians