Metastatistical Extreme Value distributions

34
XXXV CONVEGNO NAZIONALE DI IDRAULICA E COSTRUZIONI IDRAULICHE Bologna, 14-16 Settembre 2016 The Metastatistical Extreme Value Distribution Metodi Statistici per le Applicazioni Idrologiche Enrico Zorzetto 1 , Gianluca Botter 2 , Marco Marani 1,2,* 1 Earth and Ocean Science Division, Duke University 2 DICEA, Universita’ di Padova * [email protected]

Transcript of Metastatistical Extreme Value distributions

  • XXXV CONVEGNO NAZIONALE DI IDRAULICA

    E COSTRUZIONI IDRAULICHE

    Bologna, 14-16 Settembre 2016

    The Metastatistical Extreme Value Distribution

    Metodi Statistici per le Applicazioni Idrologiche

    Enrico Zorzetto1, Gianluca Botter2,

    Marco Marani1,2,*1Earth and Ocean Science Division, Duke University

    2 DICEA, Universita di Padova* [email protected]

  • Classical Extreme Value Theory (EVT)[Fischer-Tippett-Gnedenko, 1928-1943]

    Block Maxima:

    Three-Type Theorem:

    - As n -After renormalization, 3 possible asymptotic distributions, summarized by GEV (e.g. Von Mises, 1936):

    = Maxima n-event blocks

    h [

    mm

    ]

    1937 19381936 1939 1940 19411942

    = max

    ()

    for i.i.d =

    = exp 1 +

    +

    1

  • Marani and Ignaccolo, AWR, 2015

    Weibull-distributed, synthetic, daily rainfall data# events/year & Weibull parameters from Padova (Italy)

    GEV fitted on 30-year windows

  • Considerations on the validity of the classical EVT

    - Incomplete convergence to limiting distribution: n

  • A Metastatistical Extreme Value distribution (MEV)

    = ;

    for i.i.d. .

    F(X; ) = cdf of ordinary events

    The Block-maxima distribution

    Expected block-maxima distribution compounding stochastic n and :

    Marani and Ignaccolo, AWR, 2015; Zorzetto et al., GRL, 2016

    G(n,) = joint prob distrib. of the parameters.

    Approximating expectations with sample averages.

    Parameters ofordinary distributions

  • A Metastatistical Extreme Value distribution (MEV)

    Marani and Ignaccolo, AWR 2015; Zorzetto et al., GRL 2016

    1

    =1

    (; ) T = # years over which n

    and are estimated

    approximating expectations with sample averages:

    MEV:

  • MEV distribution conceptual interpretation

    Zorzetto et al., GRL 2016

  • A choice for F(x) - the pdf of daily ordinary rainfall

    =

    = 1

    Weibull Parent

    distribution

    =precipitation efficiency

    =specific humidity

    m=advection mass

    [Wilson e Tuomi, 2005]

    -Simple two-layers atmospheric model-Temporal average

  • MEV-Weibull distribution

    Marani and Ignaccolo, AWR, 2015; Zorzetto et al., GRL, 2016

    The MEV expression:

    1

    =1

    (; ) T = # sub-periods over which n

    and are estimated

    In the Weibull case becomes:

    1

    =1

    1

  • Marani and Ignaccolo, AWR, 2015

    Weibull-distributed synthetic dataGEV and MEV fitted on 30-year windows

    n random, c and w constant

    n, C, and w are constant

    n constant, C and w random

  • How about reality?

    36 daily rainfall timeseries, 106 -275 years of daily observations,( =135 yrs) Less than 5% of missing data

    OXFORD

    SHEFFIELDHOOFDOORP

    PUTTEN

    ZURICH

    HEERDE

    S. BERNARD

    MELBOURNE

    MILANO

    PADOVA

    BOLOGNA

    CAPE TOWN

    SAN FRANCISCO

    ROOSVELTASHEVILLE

    PHILADEPHIAKINGSTON

    ALBANY

    DUBLIN

    ZAGREB

    WORCESTER

    DUBLIN

    SYDNEY

  • Method of analysis

    To eliminate correlation and non-stationarity Preserving the true (unknown) distribution of the

    parameters and numbers of wet days.

    Fit on a sample of size s Test on remaining data. Non dimensional Root

    Mean Square Error:

    Which is studied as a function of sample size s.

    Bootstrap - Reshuffling of daily data preserving(1) yearly number of events, and(2) observed values (i.e. Pdfs)

    ORIGINAL TIME SERIES

    =1

    (

    )2

    RANDOMLY RESHUFFLED TIME SERIES

    T Years

    h [mm]

    h [mm]

    t [days]

    t [days]

  • Ratio of MEV to GEV estimation errors (using LMOM, but use of ML or POT gives same results)

    NOAA-NCDC

    Worldwidedataset

    Zorzetto et al., GRL, 2016

  • Estimation error as a function of Tr/(sample size)MEV vs. GEV (LMOM)

    Zorzetto et al., GRL, 2016

    Return time/sample size

    MEV error 50% of GEV error

  • Conclusions

    MEV ouperforms classical EV distributions:

    - Reliable assessment of high quantiles and smallsamples (50% improvement over GEV)

    - Better use of the available daily data- Removal of the asymptotic hypothesis

    Future:

    1.MEV is general approach (floods, wind, storm surges ...)

    2. MEV is arguably suited to tackle non-stationarity

  • Grazie per lattenzione

  • Some thoughts on non stationarity

    Bologna (Italy) original 180 years time-seriesSliding and overlapping windows analysis

    GEV and POT estimated quantile shows higher variance

    MEV shows a positivetrend in est. quantiles

    Due to trends in parameters of Weibull distribution

    Tr=100 years

    i-th temporal window

  • An interesting observation: GEV performs better if calibration data=testing data

  • Tr=100 daily rainfall from TRMM observations (17 yrs)

  • Estimation error as a function of Tr/(sample size)

  • Distribution of the error computed over 1000 random reshuffling, for all the analyzed datasets.Quantiles (Tr=100 yrs) estimated by GEV, POT, MEVcalibrated over 30-years samples

    Error distribution

    =

    = 1 1 1

    from the observational (independent) sample

  • Distribution of the error computed over 1000 random generations, for all the analyzed datasets.Theoretical quantiles (Tr=100 yrs) estimated by GEV, POT, MEV calibrated over 30-years samples

    Error distribution

    =

    = 1 1 1

    from the observational (independent) sample

  • Global QQ-Plots

    Sample size=45 years 100 random reshuffling

  • Global QQ plots

    GEV/ POT are a good fit for the calibration sample but they fail in describing the stochastic process from which the sample has been generated

    MEV allows a better description of the underlying process; less variance in high quantile estimation

    . 2

  • The MEV domain

  • N

    1982 1986198519841983t

    h [mm]

    1 = 97 2 = 105 3 = 89 4 = 94 5 = 114

    1, 1 2, 2 3, 3 4, 45, 5

    2. Fit Weibull to the singl ,

    1. Sampling n from the distribution p(n|C,w)

    The MEV distribution

    = 1

    Assuming Weibull as a pdf for daily rainfall Fit performed using Probability Weighted Moments (Greenwood et al, 1979)

    Number of events/ year

  • =

    =1

    , , 1

    Weibull parameters = , and are random variables themselves

    The CDF of annual maximum is the mean on all their possible realizations

    The Metastatistical Extreme Value Distribution

    n

    ()

    ()

    ()

    Density frequencies

  • Non stationary analysis

    GEV and POT estimated quantiles show oscillations with sameamplitude

    Due to the variance in the parameter estimates

    In the case of MEV the varianceof estimated quantiles is muchsmaller; Stationary behaviour

    Tr=100 years

    i-th window

    Bologna (Italy) randomly reshuffled time seriesSliding and overlapping windows analysis

  • Daily rainfall observations in Padova 1725-2015

    The Padova observatory

  • (Marani and Zanetti, 2015)

    The Padova daily precipitation time series 1725-2006

  • (Marani andIgnaccolo, 2015)

    Padova series:

    Wide fluctuations in pdf parameters and in number of events.

  • Peak Over Threshold Method (POT)[Balkema, De Haan & Pickand, 1975; Davison and Smith, 1990]

    Exceedances arrivals Poisson

    Distribution of excesses Generalized Pareto

    Advantages:

    1. Better description of the tail

    2. Consistent with GEV

    < x =

    =1

    =

    =1

    ! 1 1 +

    1/

    For a fixed threshold q Exceedances = . . . . .

    =

  • Performance when testing sample = calibration sample

  • Ratio of MEV estimation error to GEV-POT error