Progetto Statistics ENG

download Progetto Statistics ENG

of 23

Transcript of Progetto Statistics ENG

  • 8/12/2019 Progetto Statistics ENG

    1/23

    Statistical analysis of performances of LeBron James, Dwayne Wade andChris Bosh, the Miami Heat s Big Three , during NBA season 2012-13

    Summary

    Introduction to the statistical analysis done

    Dataset presentation

    Descriptive analysis of data

    Verification of normality of variables

    Computation of confidence intervals for the scored points average of the BigThree

    Hypothesis testing on the difference between points scored by the Big Three inwins and losses

    Linear regression model:

    o Linear relation between LeBron James performance and the scored pointsby Dwayne Wade e Chris Bosh

    o Linear relation between the sum of the points by the Big Three and Miamiwinning margin

    o Linear relation between points scored individually by LeBron James,Dwayne Wade and Chris Bosh and Miami winning margin

    o Linear relation between points scored by overall Miami team and the sumof the Big Three

  • 8/12/2019 Progetto Statistics ENG

    2/23

    Introduction to the statistical analysis : how does the Big Three influenceMiamis wins?

    In the 2010 Summer, LeBron James joined Chris Bosh e Dwayne Wade in Miami Heatteam, arranging a players core with athletic means and huge basketball capabilities.These three players are members of USA national basketball team and a lot of criticismarose when they decided to stay together in one team. Put three players of this leveltogether seemed to ruin the basketball game. Miami Heat goal was clear : win thechampionship and then the trophy. The first year they failed. Then they won in 2012and 2013 and while we are writing (mid May) they are in the playoffs. In these twoyears LeBron was honored with two MVP (most valuable player) awards. He was seen

    as Miami leader and the main reason for Miami wins thanks to his exceptionalperformances whereas Wade and Bosh were put on a secondary level.

    Our statistical analysis main aim is to determine if there is statistical evidence to statethat the King is essential for Miamis wins o if anyway a linear correlation existsbetween his performances and the final result of a game. We have also taken intoaccount Wade and Bosh and we have wanted to study if there are evidences to statethat LeBron performances influence their ones. In the end we have tried to understandif the Big Three is necessary for Miamis wins or if there are other factors that l eads theteam to win.

  • 8/12/2019 Progetto Statistics ENG

    3/23

    DATASET

    To perform proper statistical tests, we considered all the games played by all themembers of the Big Three during the regular season and playoffs of 2012-13, since atthis time we are unable to get full data about 2013-14 season still in progress.

    ame Date Result Margin AssistsLJ TurnoverLJ ReboundsLJ PointsLJ PointsDW PointsCB PointsMIA1 30/10/2012 W 13 3 0 10 26 29 19 120

    2 02/11/2012 L -20 5 5 7 23 15 12 104

    3 03/11/2012 W 3 11 0 9 20 14 40 119

    4 05/11/2012 W 25 1 3 11 23 22 18 124

    5 07/11/2012 W 30 8 2 12 20 22 8 103

    6 09/11/2012 W 6 9 2 11 21 9 24 95

    7 11/11/2012 L -18 6 2 10 20 8 22 104

    8 12/11/2012 W 3 6 0 10 38 19 24 113

    9 14/11/2012 L -7 7 4 5 30 6 11 107

    10 15/11/2012 L -5 12 4 7 27 16 14 98

    11 17/11/2012 W 9 3 1 7 21 13 24 97

    12 21/11/2012 W 7 8 1 10 28 28 24 113

    13 24/11/2012 W 2 5 7 6 30 18 23 110

    14 29/11/2012 W 5 7 4 9 23 19 18 105

    15 01/12/2012 W 13 6 2 9 21 34 8 102

    16 04/12/2012 L -4 11 3 13 26 24 20 105

    17 06/12/2012 L -20 9 2 10 31 13 12 112

    18 08/12/2012 W 16 7 4 5 24 26 13 106

    19 10/12/2012 W 9 6 3 7 27 26 14 101

    20 12/12/2012 L -2 5 4 3 31 14 21 97

    21 15/12/2012 W 30 5 2 10 23 13 12 102

    22 18/12/2012 W 11 11 0 6 22 24 15 103

    23 20/12/2012 W 15 5 5 9 24 19 17 110

    24 22/12/2012 W 16 7 4 9 30 21 9 105

    25 25/12/2012 W 6 9 3 8 29 21 16 103

    26 26/12/2012 W 13 8 3 12 27 29 14 105

    27 28/12/2012 L -10 5 3 6 35 3 28 109

    28 29/12/2012 L -19 7 6 6 26 24 12 104

    29 31/12/2012 W 2 11 3 8 36 21 22 112

    30 02/01/2013 W 10 9 4 12 32 27 17 119

    31 04/01/2013 L -7 2 2 6 30 18 14 96

    32 06/01/2013 W 28 7 1 2 24 14 17 99

    33 08/01/2013 L -10 4 7 10 22 30 14 87

    34 10/01/2013 L -2 9 4 10 15 18 29 92

    35 12/01/2013 W 29 7 1 5 20 11 16 128

    36 14/01/2013 L -7 6 3 4 32 11 16 104

    37 16/01/2013 W 17 10 1 7 25 15 11 92

    38 17/01/2013 W 9 8 2 7 39 27 7 99

    39 23/01/2013 W 7 11 3 10 31 35 12 123

    40 25/01/2013 W 22 7 2 7 23 29 14 110

    41 27/01/2013 L -2 7 3 16 34 20 16 100

    42 30/01/2013 W 20 7 4 9 24 21 16 105

    43 01/02/2013 L -13 3 3 6 28 17 13 102

    44 03/02/2013 W 15 7 3 8 30 23 28 100

    45 04/02/2013 W 5 8 5 8 31 20 23 99

    46 06/02/2013 W 6 5 4 6 32 31 12 114

  • 8/12/2019 Progetto Statistics ENG

    4/23

    47 08/02/2013 W 22 6 5 5 30 20 9 111

    48 10/02/2013 W 10 4 4 7 32 30 12 107

    49 12/02/2013 W 13 9 1 6 30 24 32 117

    50 14/02/2013 W 10 7 4 12 39 13 20 110

    51 20/02/2013 W 13 11 2 6 24 20 6 103

    52 21/02/2013 W 19 7 4 12 26 17 12 86

    53 23/02/2013 W 24 11 2 10 16 33 13 114

    54 24/02/2013 W 4 8 2 3 28 24 7 109

    55 26/02/2013 W 12 16 2 8 40 39 15 141

    56 01/03/2013 W 7 10 1 8 18 22 13 98

    57 03/03/2013 W 6 7 2 11 29 20 16 99

    58 04/03/2013 W 16 4 7 10 20 32 11 97

    59 06/03/2013 L 1 2 2 3 26 24 17 97

    60 08/03/2013 W 9 5 3 10 25 22 16 102

    61 10/03/2013 W 14 7 4 6 13 23 24 105

    62 12/03/2013 W 17 7 4 7 15 23 14 98

    63 13/03/2013 W 4 8 4 7 27 21 10 98

    64 15/03/2013 W 13 7 2 10 28 20 28 107

    65 17/03/2013 W 17 8 3 12 22 24 18 108

    66 18/03/2013 W 2 12 5 7 37 16 13 105

    67 20/03/2013 W 3 10 4 12 25 11 11 98

    68 22/03/2013 W 14 8 3 8 29 19 5 103

    69 24/03/2013 W 32 10 4 8 32 22 15 109

    70 25/03/2013 W 14 11 3 9 24 13 12 108

    71 27/03/2013 L -4 3 4 7 32 18 21 101

    72 29/03/2013 W 19 6 3 4 36 17 10 108

    73 02/04/2013 L 19 5 1 6 23 20 23 102

    74 05/04/2013 W 10 3 4 8 18 20 18 89

    75 06/04/2013 W 19 7 2 4 27 11 16 106

    76 09/04/2013 W 11 7 3 7 28 22 12 94

    77 12/04/2013 W 8 9 1 6 20 21 17 109

    78 14/04/2013 W 12 6 3 7 24 15 12 105

    79 21/04/2013 W 23 8 5 10 27 16 15 110

    80 23/04/2013 L 12 6 4 8 19 21 10 98

    81 25/04/2013 W 13 6 5 5 22 4 16 104

    82 28/04/2013 W 11 7 5 8 30 16 10 88

    83 06/05/2013 L -7 7 2 8 24 14 9 93

    84 08/05/2013 W 37 9 3 5 19 15 13 115

    85 10/05/2013 W 10 7 1 8 25 10 20 104

    86 13/05/2013 W 23 8 5 7 27 6 14 88

    87 15/05/2013 L -3 8 2 7 23 18 12 94

    88 22/05/2013 W 1 10 4 10 30 19 17 103

    89 24/05/2013 L -4 3 5 8 36 14 17 97

    90 26/05/2013 W 18 3 0 4 22 18 15 114

    91 28/05/2013 L -7 5 2 6 24 16 7 99

    92 30/05/2013 W 11 6 3 8 30 10 7 90

    93 01/06/2013 L -14 6 4 7 29 10 5 91

    94 03/06/2013 W 23 4 2 8 32 21 9 99

    95 06/06/2013 L -4 10 2 18 18 17 13 92

    96 09/06/2013 W 19 7 3 8 17 10 12 103

    97 11/06/2013 L -37 6 2 11 15 16 12 113

    98 13/06/2013 W 16 4 2 11 33 32 20 109

    99 16/06/2013 L -10 8 3 6 25 25 16 114

    100 18/06/2013 W 3 11 6 10 32 14 10 103

    101 20/06/2013 W 7 4 2 12 37 23 10 95

  • 8/12/2019 Progetto Statistics ENG

    5/23

  • 8/12/2019 Progetto Statistics ENG

    6/23

  • 8/12/2019 Progetto Statistics ENG

    7/23

    Analysis of ReboundsLJ variable

    Sample mean 8.089109

    Variance 7.34198

    Standar deviation 2.709609

    Median 8

    Min 2

    Max 18

    Range 16

    Quantiles

    0% 25% 50% 75% 100%

    1 5 7 9 16

  • 8/12/2019 Progetto Statistics ENG

    8/23

    Analysis of TurnoversLJ variable

    Sample mean 3.009901

    Variance 2.389901

    Standard deviation 1.54593

    Median 3

    Min 0

    Max 7

    Range 7

    Quantiles

    0% 25% 50% 75% 100%

    0 2 3 4 7

  • 8/12/2019 Progetto Statistics ENG

    9/23

    Analysis of PointsLJ, PointsDW and PointsCB variables

    POINTS SCORED BY LEBRON JAMESSample mean 26.46535

    Variance 34.81129

    Standard deviation 5.900109

    Median 26

    Min 13

    Max 40

    Range 27

    POINTS SCORED BY DWAYNE WADESample mean 19.38614

    Variance 48.51941

    Standard deviation 6.965587

    Median 20

    Min 3Max 39

    Range 36

    POINTS SCORED BY CHRIS BOSH

    Sample mean 15.40594Variance 37.12356

    Standard deviation 6.092911

    Median 14

    Min 5

    Max 40

    Range 35

  • 8/12/2019 Progetto Statistics ENG

    10/23

  • 8/12/2019 Progetto Statistics ENG

    11/23

    Normality check of considered variables

  • 8/12/2019 Progetto Statistics ENG

    12/23

    Computation of confidence intervals for scored points mean of the Big Three

    The histrogram and the qqplot are passing, it is possible to assume normality of data.From the qqplot the variable seems to be discrete, thanks to the fact that the sampledimension (large) we can use asymptotic confidence intervals without doing theassumption of normality of data.

    Confidence intervals for all variable : 0.9

    CI LeBron James scored points meanmeanLJ meanLJ[1] 26.46535> sdLJ sdLJ[1] 5.900109> CI.alphaLJ CI.alphaLJ[1] 25.49968 27.43101

    CI Dwayne Wade scored points mean> meanDW meanDW[1] 19.38614> sdDW sdDW[1] 6.965587> CI.alphaDW CI.alphaDW[1] 18.24609 20.52619

    CI Chris Bosh scored points mean> meanCB meanCB

    [1] 15.40594> sdCB sdCB[1] 6.092911> CI.alphaCB CI.alphaCB[1] 14.40872 16.40316

    Hypothesis testing on the difference of scored points by the Big Three in

    Miamis wins and losses

  • 8/12/2019 Progetto Statistics ENG

    13/23

    I take into account a new random variable PointsSum, sum of the scored points duringone game. The mean of this variable is bigger in wins than in losses?

    Now we compute an evaluation of PointsSum mean in wins, in losses, hence thedifference between the means.

    mean.v mean.v

    [1] 62.39189> mean.p mean.p[1] 58.14815> mean.diff mean.diff[1] 4.243744

    At this point we evaluate the normality of data separately for each group, before we goon.

  • 8/12/2019 Progetto Statistics ENG

    14/23

    Data seems to have a good normality,

    hence we proceed computing confidence intervals for means difference of 0.95.> CI.alpha CI.alpha[1] -0.5159391 9.0034266

    The confidence interval includes 0, it could not exist any significant difference betweenthe two means, we will verify with a test.

  • 8/12/2019 Progetto Statistics ENG

    15/23

    H0: MeanPointsSumWins = MeanPointsSumLossesH1: MeanPointsSumWins > MeanPointsSumLosses

    We compute statistic test and 0.95 quantile of a t-Student a (n.losses + n.wins -2) dof,

    in this way I am able to compute the critical region of the unilateral test for the twosamples of 0.05.

    > t.alpha t.alpha[1] 1.660391> T.0 T.0[1] 1.769132

    The statistic test is in the critical region, at 0.05 we have evidence to reject H 0 and statethat a difference exists between the means of the two sub-samples. To be moreprecise, we compute p-value of the unilateral test with unknown variance.

    p p[1] 0.03997566

    The p-value is quite low, we have evidence to reject H 0, hence to state that the mean ofthe sum of points in wins is bigger than in losses.

  • 8/12/2019 Progetto Statistics ENG

    16/23

    Linear regression models:

    1) Linear relation between LeBron James and Dwayne Wade - Chris Bosh pointsscored

    Predictors: points, assists, rebounds, turnovers by LeBron JamesResponse : points sum of Dwayne Wade - Chris Bosh

    Call:lm(formula = PointsW.B ~ PointsLJ + ReboundsLJ + AssistsLJ + TurnoversLJ)

    Residuals:Min 1Q Median 3Q Max

    -18.324 -6.340 -1.758 6.128 20.203

    Coefficients:Estimate Std. Error t value Pr(>|t|)

    (Intercept) 30.66267 5.36985 5.710 1.26e-07 ***PointsLJ 0.07160 0.15047 0.476 0.6353ReboundsLJ 0.60188 0.33054 1.821 0.0717 .AssistsLJ 0.03768 0.34481 0.109 0.9132TurnoversLJ -0.96367 0.57594 -1.673 0.0975 .---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    Residual standard error: 8.797 on 96 degrees of freedomMultiple R-squared: 0.06077, Adjusted R-squared: 0.02163F-statistic: 1.553 on 4 and 96 DF, p-value: 0.1932

    R-squared is really low, p-value of the test is high, no predictor is significant, we canstate that there is no linear correlation between LeBron James performance and Bosh-Wade points.

  • 8/12/2019 Progetto Statistics ENG

    17/23

    2) Linear relation between points sum of Big Three and Miami wins margin

    Predictors: points sum by the Big ThreeResponse: Miami win margin

    Call:lm(formula = Margin ~ SumBigThree)

    Residuals:Min 1Q Median 3Q Max

    -44.173 -6.953 1.708 8.149 29.692

    Coefficients:Estimate Std. Error t value Pr(>|t|)

    (Intercept) 5.71383 7.38270 0.774 0.441SumBigThree 0.03393 0.11871 0.286 0.776

    Residual standard error: 12.8 on 99 degrees of freedomMultiple R-squared: 0.0008243, Adjusted R-squared: -0.009268F-statistic: 0.08167 on 1 and 99 DF, p-value: 0.7756

    R-squared is really low, p-value is high, we can state that there is no linear correlationbetween points sum by the Big Three and Miami win margin. This result is not strange,

    the fact that these players score a lot of points in a single game doesnt give us anyinformation about win margin, since really often in a game with a lot of points themargin is little even if the performances of the Big Three are exceptional.

  • 8/12/2019 Progetto Statistics ENG

    18/23

    3) Linear relation between points scored by James, Wade, Bosh separately and Miamiwin margin.

    Predictors: points scored by James, Wade, Bosh separatelyResponse: Miami win margin

    Call:lm(formula = Margin ~ PointsLJ + PointsDW + PointsCB)

    Residuals:Min 1Q Median 3Q Max

    -46.869 -5.634 0.513 7.411 28.562

    Coefficients:Estimate Std. Error t value Pr(>|t|)

    (Intercept) 9.38538 7.55921 1.242 0.2174PointsLJ -0.25410 0.21462 -1.184 0.2393PointsDW 0.33083 0.18221 1.816 0.0725 .PointsCB -0.08321 0.20807 -0.400 0.6901---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    Residual standard error: 12.63 on 97 degrees of freedom

    Multiple R-squared: 0.04633, Adjusted R-squared: 0.01684F-statistic: 1.571 on 3 and 97 DF, p-value: 0.2014

    The only predictor that seems to be significant is PointsDW, so we decide to take otherpredictors out of the regression model

    Call:lm(formula = Margin ~ PointsDW)

    Residuals:Min 1Q Median 3Q Max

    -43.697 -6.667 0.363 7.686 30.626

    Coefficients:Estimate Std. Error t value Pr(>|t|)

    (Intercept) 1.5226 3.7251 0.409 0.6836PointsDW 0.3234 0.1809 1.787 0.0769 .---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

  • 8/12/2019 Progetto Statistics ENG

    19/23

    Residual standard error: 12.6 on 99 degrees of freedomMultiple R-squared: 0.03126, Adjusted R-squared: 0.02148F-statistic: 3.195 on 1 and 99 DF, p-value: 0.07694

    Also this result is not suprising, since James is quite continuous player, hisperformances are always really good. Wade is quite irregular due to physical problems,but Miami takes advantage when he scores a lot of points, so the win margin seems tobe connected to Wade scored points.

    At this point thanks to dispersion chart we analyze how good is the model and weverify the normality and the homoscedasticity of residuals.

  • 8/12/2019 Progetto Statistics ENG

    20/23

  • 8/12/2019 Progetto Statistics ENG

    21/23

    Among selected models this one is the best working one, but it has some problems.The residuals seems normal and homoscedastic but R-squared index is low, also Wade

    scored points dont seem to be connected to Miami win margin.

    4) Linear relation between Miami points and Big Three points

    Predictors: Big Three points in a single gameResponse: Miami points in a single game

    Call:lm(formula = PointsMIA ~ SumBigThree)

    Residuals:Min 1Q Median 3Q Max

    -18.7926 -6.1266 -0.1266 4.2074 28.8734

    Coefficients:Estimate Std. Error t value Pr(>|t|)

    (Intercept) 82.63700 4.80426 17.201 < 2e-16 ***SumBigThree 0.35084 0.07725 4.542 1.58e-05 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    Residual standard error: 8.329 on 99 degrees of freedomMultiple R-squared: 0.1724, Adjusted R-squared: 0.1641F-statistic: 20.63 on 1 and 99 DF, p-value: 1.576e-05

  • 8/12/2019 Progetto Statistics ENG

    22/23

  • 8/12/2019 Progetto Statistics ENG

    23/23

    Also in this case, even if the predictor is significant (low p-value) and residuals seemshomoscedastic and normal, R-squared is really low, the variability of the phenomenoncan be explained better with internal variability of the group instead of variabilitybetween groups, as the scatterplot shows. This fact means that it is not possible tocreate a linear correlation between Big Three points summed up and Miami points. Theexplanation is logic : although these three champions of Miami Heat are a bigadvantage, in basketball team game is a key aspect. All this negative tests lead us tostate that there is no statistical evidence to consider these three player as fundamentalfor Miami Heat. Wins cant be explained only by LeBron James , Wade and Boshperformances, also a battleship like Miami Heat has to rely on its overall team.

    Luca BazzucchiFilippo Campolmi