Calcolo della dimensione campionaria di uno studio clinico ... · confronto hanno effetti identici....

Calcolo della dimensione campionaria di uno studio

clinico!

-  un approccio basato sulle simulazioni –!

!

Lezione II – 19 Aprile 2016!"

Vincenzo Bagnardi"

Dipartimento di Statistica e Metodi Quantitativi"

Università degli Studi di Milano-Bicocca"

Studio di superiorità!

2

Obiettivo di uno studio di superiorità: dimostrare che un trattamento è superiore a un altro

p2 : proporzione di risposte osservate nel gruppo sperimentale p1 : proporzione di risposte osservate nel gruppo di controllo

Gruppo! Risposte! P!Sperimentale" 78/100" 78%"Controllo" 62/100" 62%""Numerosità campionaria: 200!

Differenza osservata" 78%-62%=16%"


3


D=16% differenza osservata nello studio (stima puntuale)

Differenza tra proporzioni δ=p2 – p1

δ=0

-20% -10% 10% 20%


p-‐value e IC

Ø  Se il valore dell’IC al 95% dell’effe2o s4mato comprende il valore so2o l’ipotesi nulla (solitamente nessun effe2o), allora il p-‐value sarà maggiore di 0.05

Ø  Se il valore dell’IC al 95% dell’effe2o s4mato non comprende il valore so2o l’ipotesi nulla (solitamente nessun effe2o), allora il p-‐value sarà minore di 0.05

Ø  Anche l’ampiezza dell’IC dipende dalla numerosità campionaria. A parità di effe2o s4mato, maggiore è la numerosità campionaria dello studio, più stre2o sarà l’IC al 95%

Effetto nullo (H0)

4


5




δ=0

-20% -10% 10% 20%

da 3% a 28% intervallo di confidenza al 95%

(stima intervallare)



6

Gruppo! Risposte! P!Sperimentale" 39/50" 78%"Controllo" 31/50" 62%""Numerosità campionaria: 100!

Differenza osservata" 78%-62%=16%"




7




δ=0

-20% -10% 10% 20%

Da -2% a 33% intervallo di confidenza al 95%

(stima intervallare)



8

L’ampiezza dell’intervallo di confidenza, nel confronto tra proporzioni, è funzione della numerosità campionaria."Più elevata la numerosità, minore l’ampiezza, maggiore la probabilità di rifiutare H0 quando falsa.""

Accettare H0 implica l’equivalenza dei trattamenti?!

9

Le domande “esiste una differenza tra i due trattamenti?” e “i due

trattamenti hanno lo stesso effetto?” hanno, nella lingua comune, lo

stesso significato. Sono semplicemente formulate in modo diverso. Una

risposta negativa a una di queste due domande implica la risposta

affermativa all’altra."

"

Nel linguaggio statistico questo invece non è mai vero."

"

Non riuscire a dimostrare una differenza significativa degli esiti tra due

gruppi (accettare H0) non implica che i due gruppi posti a confronto

siano tra loro uguali (H0 è vera)."

Studi di equivalenza

10

Ha senso condurre uno studio di equivalenza o di non-inferiorità se la nuova terapia o è più semplice da somministrare, o si associa a minori effetti collaterali, oppure è meno costosa, anche se non ci si aspetta che abbia un effetto clinico migliore delle terapia di riferimento."


11

Lo scopo di uno studio di equivalenza è stabilire che le terapie poste a

confronto hanno effetti identici. "

Una completa equivalenza comporta quindi un valore di δ pari a 0.!"

Numerosità campionaria

12

N

δ (%

)

0

5

10

25

20

15


13

Lo scopo di uno studio di equivalenza è stabilire che le terapie poste a

confronto hanno effetti identici. "

Una completa equivalenza comporta quindi un valore di δ pari a 0.!"

Ciò implicherebbe una numerosità campionaria infinita. "

"

Per questo motivo negli studi di equivalenza si valuta se la differenza tra gli effetti di due trattamenti sta all’interno di uno specificato intervallo che va da -δ a +δ.!


14


range di equivalenza

Equivalenza

+δ -δ

Indicato per studi che hanno l’obiettivo di dimostrare che un farmaco generico ha una farmacocinetica simile al farmaco commerciale di riferimento (bio-equivalence studies), Meno indicato per studi che confrontano l’efficacia di due terapie.


δ=0

Studi di non-inferiorità

15


range di non inferiorità

Non-inferiorità

-δ

Indicato per studi che hanno l’obiettivo di dimostrare che un trattamento sperimentale non è peggiore (da un punto di vista clinico) di un trattamento di riferimento


δ=0


16

Diventa quindi cruciale anche in questo caso specificare δ. ""Per limitare il più possibile l’introduzione di terapie che sono inferiori alla terapia di riferimento, δ dovrebbe essere minore della più piccola differenza che si ritiene significativa dal punto di vista clinico. "!Come regola generale, δ dovrebbe essere inferiore alla metà del valore che verrebbe utilizzato in uno studio di superiorità. !E’ stato anche suggerito di considerare (qualora il dato fosse disponibile in letteratura) la metà della differenza osservato tra il farmaco di riferimento e il placebo!


17

Per questo motivo, la dimensione campionaria di uno studio di non-inferiorità è molto più grande (da quattro a cinque volte) quella di un corrispondente studio di superiorità."


18

N

δ (%

)

0

5

10

25

20

15

δ =20% !ipotizzato in uno in studio di superiorità!!N=100



19

N

δ (%

)

0

5

10

25

20

15

δ =10% !ipotizzato in uno in studio di non-inferiorità!!!!N=500


Example of design and interpretation of noninferiority trials.

Antman E M Circulation 2001;103:e101-e104

Copyright © American Heart Association 20

Interpreting a noninferiority trial as a superiority trial

21

If the 95% confidence interval for the treatment effect not only lies

entirely below δ but also below zero then there is evidence of superiority

in terms of statistical significance at the 5% level (P < 0.05). "

There is no multiplicity argument that affects this interpretation because, in statistical terms, it corresponds to a simple closed test procedure. "

Usually this demonstration of a benefit is sufficient on its own, provided

the safety profiles of the new agent and the comparator are similar.

When there is an increase in adverse events, however, it is important to

estimate the size of the effect to evaluate whether it is sufficient in

clinical terms to outweigh the adverse effects."

Esercizio: simulazione per sample size di uno studio di non inferiorità

Simulazione per sample size di uno studio di non inferiorità

http://www.stat.ubc.ca/~rollin/stats/ssize/

24

Nomogramma di Altman!

0.1/(sqrt(0.35*0.65)

25

Nomogramma di Altman!

Sample size for survival analysis

27

Endpoint

A key question is to define the type of variable (endpoint) the

clinician/researcher has in mind.

In an oncology prospective randomized clinical trial usually the

endpoint is the time elapsed from the beginning of observation

(usually the time at randomization) to a specified event

(usually the time of death, relapse, or progression of the

disease).

28

Statistical analysis in survival studies

Data are analyzed using survival analysis techniques.

The survival curves are calculated using the Kaplan-Meier

estimator.

The difference between treatments (usually two: standard

treatment and new treatment) is evaluated using the log-rank

test or the Hazard Ratio from the Cox proportional hazard

model.

SAS PROC LIFETEST

Per ottenere le curve di sopravvivenza, insieme alle tabelle necessarie al loro calcolo, si richiama la procedura SAS LIFETEST, la cui sintassi generale è:

PROC LIFETEST DATA=nome-dataset PLOTS=(S); TIME variabile-tempo*variabile-stato(x); STRATA variabile-gruppo; RUN;

Indica di presentare i grafici relativi alle curve di sopravvivenza (S)

Valore della variabile STATO che indica il dato censurato (solitamente STATO viene codificato come:

0 ->censura, 1-> evento)

Stimatore della funzione di Sopravvivenza (metodo di Kaplan-Meier)

29


30

Esempio Pazienti con leucemia/tempo in remissione (settimane)

Gruppo 1 Gruppo 2 (trattamento) n=21 (placebo) n=21 6,6,6,7,10, 1,1,2,2,3, 13,16,22,23, 4,4,5,5, 6+,9+,10+,11+, 8,8,8,8, 17+,19+,20+, 11,11,12,12, 25+,32+,32+, 15,17,22,23 35+,35+ + denota il dato censurato

PROC LIFETEST DATA=REMISSION PLOTS=(S); TIME TIME*STATUS(0); STRATA GROUP; RUN;


31


32

33

Come valutare l’ipotesi nulla di uguaglianza tra le due curve? insuccessi osservati vs. attesi sotto l’ipotesi nulla di sopravvivenze uguali tra gruppi (test chi-quadrato)

Test Log-Rank per il confronto tra due curve di sopravvivenza

Test Log-Rank per il confronto tra curve di sopravvivenza

34

Test of Equality over Strata Test Chi-Square DF Pr >

Chi-Square Log-Rank 16.7929 1 <.0001 Wilcoxon 13.4579 1 0.0002 -2Log(LR) 16.5459 1 <.0001

Rank Statistics GROUP Log-Rank Wilcoxon PBO 10.251 271.00 TRT -10.251 -271.00

Nella sua formulazione più semplice, il modello assume che la funzione di

azzardo per un individuo i caratterizzato dal vettore di covariate Xi=(xi1, xi2,

…, xiK), sia

COX model

L’hazard per un individuo i al tempo t è il prodotto di due fattori:

- una funzione λ0(t) che non viene specificata

- una funzione lineare di un insieme di k covariate fisse, che viene poi

esponenziata.

λ0(t) è il la funzione d azzardo per un individuo le cui covariate hanno tutte

valore pari a 0. Viene definito rischio di base (baseline hazard). 35

Hazard Ratio (HR)

COX model: proportional hazard (PH)

Il modello è chiamato “dei rischi proporzionali” perché l’hazard di un

individuo j è una proporzione fissa dell’hazard di un individuo i (facendo il

rapporto tra gli hazard infatti λ0(t) scompare sia dal numeratore che dal

denominatore):

La misura dell’effetto relativo (HR) richiede solo la stima dei β, senza richiedere la stima di λ0(t).

36

T = tempo in remissione X1 = trattamento

proc phreg data=remission; model time*status(0)=group / risklimits; Run;

0 indica la codifica assegnata ai dati censurati

COX model: SAS PROC PHREG

37

COX model: SAS PROC PHREG

38

39

Example

A clinical trial is to be designed to compare a new form of

chemotherapy with a standard for the treatment of patients

with metastatic ovarian carcinoma.

The time from randomization to death is the response variable

of interest.

40

Example

As a first step, information is obtained on the survival times, in

years, of patients who have received the standard treatment.

This information is usually based on historical data and/or

previous studies.

41

Example

The survival proportion expected at 1-yr is approximately 0.25.

The median survival time is 6 months

42

Alternative hypothesis (expected benefit)

Based on the results of previous phase II studies, the new

treatment is expected to increase the 1-yr survival proportion

from 0.25 to 0.40 (H1).

Under the proportional hazard assumption, this information

can be used to calculate the value of the hazard ratio (HR)

under H1: SN t( ) = SS t( )!" #$

HR

HR =logSN t( )logSs t( )

=logSN 1( )logSs 1( )

=log(0.40)log(0.25)

= 0.66

43

Statistical test

A two-sided p-value less than 0.05 associated to the hazard

ratio comparing new vs standard treatment will be used to

declare statistical significance.

44

Formulas for the sample size calculation required for the

comparison of survival curves (both in terms of hazard ratios

assuming proportionality of hazard or log-rank test) were

developed.

When the trial is characterized by an unconventional design

and/or when formulas were not developed for the method

chosen for the analysis, the calculation of the sample size can

be based on a simulation study.

Sample size calculation

45

How to define the threshold beyond which the null hypothesis

is rejected?

Suppose we do not know the formula for calculating the

sample size in the case of comparison of two survival curves.

We could try to define the threshold based on the results of a

simulation study.

Simulation

46

Simulation: N(per group)=10

""

•  N=10 patients per group

•  the median survival time for the standard treatment is 6 months

•  the hazard of event in the standard group is constant (i.e.

exponential distribution of event times)

•  all patients are followed until death (no censoring)

47

Simulation of survival data

The survival proportion expected at 1-yr is approximately 0.25.

The median survival time is 6 months (0.5 years)

Estimate of the hazard (hS) can be obtained from the corresponding median survival time, using the result that the median tm of an exponential distribution with mean h-1 is such that h=log(2)/tm (hazard=log(2)/0.5 years= 1.39)

48


%let median_S=0.5; data simSurv; do i = 1 to 100;

t=rand("Exponential")/(log(2)/&median_S); stato=1;

output; end; run; proc lifetest data=simSurv plots=s; time t*stato(0); run;

49


""



•  the hazard of event in the standard group is constant (i.e. exponential

distribution of event times)


•  H0 is true (the effect of the new treatment is similar to the standard:

HR=1.0)

•  the hazard in the new treatment group is proportional to the hazard

in the standard group (i.e. proportional hazard assumption -> Cox

model)

50


%let N=10; %let median_S=0.5; %let HR=1; data simSurv; do treat=1 to 2; do i = 1 to 10;

if treat=1 then t=rand("Exponential")/(log(2)/&median_S); else if treat=2 then t=rand("Exponential")/((log(2)/&median_S)*&HR);

stato=1; output; end; end; run; proc lifetest data=simSurv plots=s; time t*stato(0); strata treat; run;

51


""



•  the hazard of event in the standard group is constant (i.e. exponential

distribution of event times)


•  H0 is true (the effect of the new treatment is similar to the standard:

HR=1.0)

•  the hazard in the new treatment group is proportional to the hazard in

the standard group (i.e. proportional hazard assumption -> Cox model)

Based on these parameters, 500 simulated studies were

conducted

52

HR NEW vs STANDARD = 1.1 (we know that the two treatments are equal. The observed difference is due to chance)


These are the survival curves resulted from a simulated study

53

HR NEW vs STANDARD = 5.6 (we know that the two treatments are equal. The observed difference is due to chance)


These are the survival curves resulted from another simulated

study

54

H0 is true


Distribution of the HRs estimated in the 500 simulated studies

55

H0 is true

The threshold to reject the null hypothesis, tolerating a 0.05 probability of a false positive result, is that the observed HR is less than 0.35 (or > 2.85)


56


What happens when H1 is true, that is when the new treatment

improves the chance to be alive at one year from 25% to 40%

(HR=0.66)?

57


%let N=10; %let median_S=0.5; %let HR=0.66; data simSurv; do treat=1 to 2; do i = 1 to 10;

if treat=1 then t=rand("Exponential")/(log(2)/&median_S); else if treat=2 then t=rand("Exponential")/((log(2)/&median_S)*&HR);

stato=1; output; end; end; run; proc lifetest data=simSurv plots=s; time t*stato(0); strata treat; run;

58


These are the survival curves resulted from a simulated study

HR NEW vs STANDARD = 1.1 (actually we know that the new treatment is superior to the standard)

59

HR NEW vs STANDARD = 0.2 (we know that the real HR is 0.66. The observed difference is due to chance)

These are the survival curves resulted from another simulated

study


60

H1 is true

In only 65/500 (≈13%) simulated studies the HR was below the threshold of 0.35 previously defined. The study is underpowered, that is the type II error is high (87%)


61

H1 is true

Note that the threshold (0.35) is well below the real hazard ratio (0.66).


62

Now, with N=50, the threshold to reject the null hypothesis, tolerating a 0.05 probability of a false positive result, is 0.66 (similar to the effect hypothesized under H0)


H0 is true

63

Power is still too low (≈ 50%). "With such a sample size, there is a 1 in 2 chance that the study would fail to detect the survival improvement.


H1 is true

64

Finally, with N=100, the threshold to reject the null hypothesis, tolerating a 0.05 probability of a false positive result, is 0.73 (now it is above the effect hypothesized under H1).


H0 is true

65

Power is now optimal (≈ 80%). "With such a sample size, there is only a 1 in 5 chance that the study would fail to detect the survival improvement.


H1 is true

66

N total

Pow

er

Power curve

Relationship between N (total number of patients) and power of the study, assuming H0: HR=1, H1: HR=0.66, type I error=5%.

67

Minimal detectable effect size curve

Relationship between N (total number of patients) and the minimal detectable HR, assuming H0: HR=1, type I error=5% and type II error=20%

N total

Haz

ard

ratio

Sample size formula for survival analysis

In a survival study, the occurrence of censoring means that it

is not usually possible to measure the actual survival times for

all individuals in the study.

However, it is the number of actual deaths that is important in

the analysis, rather than the total number of subjects (when

the follow-up time is long and/or when the hazard of event is

very high in the compared groups, the actual events and the

actual number of subjects are the same).

68


The first step in determining the required number of individuals

in a study is to calculate the number of deaths that must be

observed (assuming no censoring).

69


The formulas for sample size calculation are usually based on

the assumption of a proportional hazards (h)."

In the previous example:

hN t( ) =ψhS t( )

Hazard Ratio = 0.66

θ = log ψ( )

Effect size = log(0.66)=-0.42

70


To calculate the number of deaths that would be required in a study to compare the two treatments, we take α = 0.05 and β = 0.20. With these values of α and β, zα/2=1.96 and zβ=0.84. The number of deaths required to have a 80% chance of detecting a hazard ratio of 0.66 to be significant at the 5% level is then given by:

d =4 zα /2 + zβ( )

2

θ 2 =

4 1.96+0.84( )2

log(0.66)!" #$2 =181

71


72

Censoring

Calculations such as those used in this example are only going to be of direct use when a study is to be continued until all patients entered into the study have died. In most trials, the analysis will take place before everyone has experienced the endpoint, so that some observations will be censored.

73

Censoring

To calculate the actual number of individuals that are required

in a survival study, we need to consider the probability of

death over the duration of a study.

accrual period (length a)

follow-up period (length f)

time

0 a a+f

+ K K

K

K

+ +

+ Time of death unknown

O

censored

74

Pr(death) =1− 16S f( )+4S 0.5a+ f( )+ S a+ f( )!"

#$

S t( ) =SS t( )+ SN t( )

2

where """and SS(t) and SN(t) are the estimated values of the survivor functions for individuals on the standard and new treatment, respectively, at time t. "

Censoring

The probability of death can be taken as

75

Censoring

n = dPr(death)

Once the probability of an individual dying in the study has

been evaluated, the required total number of individuals will be

found from

76

Censoring

Assuming that survival times are exponentially distributed, S(t)

could be easily calculated knowing only the (costant) hazard of

event in the standard and new arm.

77

S t( ) = e−hSt +e−hNt

2

Estimates of hS and hN can be obtained from the

corresponding median survival times for each treatment group,

using the result that the median tm of an exponential

distribution with mean h-1 is such that

h=log(2)/tm

78

Example

The median survival time in standard is approximately 6 months (0.5 years) hS=log(2)/0.5=1.39 hN=1.39*0.66=0.92

HR under H1

Hazard&standard Hazard&new1.39 0.92

Time S(t))Standard S(t))New S(t))(mean)0 1.00 1.00 1.000.5 0.50 0.63 0.571 0.25 0.40 0.321.5 0.12 0.25 0.192 0.06 0.16 0.112.5 0.03 0.10 0.07

79

Example

accrual period: 1 year additional follow-up: 1 years total duration of the study: 2 years" S t( ) = e

−hSt +e−hNt

2

Hazard&standard Hazard&new1.39 0.92

Time S(t))Standard S(t))New S(t))(mean)0 1.00 1.00 1.000.5 0.50 0.63 0.571 0.25 0.40 0.321.5 0.12 0.25 0.192 0.06 0.16 0.112.5 0.03 0.10 0.07

80

Example

accrual period: 1 year additional follow-up: 1 years total duration of the study: 2 years"

Pr(death) =1− 16S f( )+4S 0.5a+ f( )+ S a+ f( )!"

#$=

Pr(Death) =1− 160.32+4 0.19( )+0.11!"

#$= 0.8

n = 1810.8

= 226 The required number of individuals to give 181 deaths is 226!(113 per group)!

Software: PASS

81

Software: PASS

82

Software: SAS PROC POWER

proc power; twosamplesurvival test=logrank curve("Control") = (0.5):(0.5) refsurvival = "Control" hazardratio = 0.66 accrualtime = 1 followuptime = 1 groupweights = (1 1) ntotal = . power = 0.8; run;

Time t

Survival at time t"

83

Software: STATA stpower

stpower exponential 1.39 0.92, onesided power(0.8) aperiod(1) fperiod(1) alpha(0.025)

84

⇒ "È importante assicurare in fase di disegno che il numero proposto di unità statistiche (pazienti, animali) da reclutare nello studio permetta di rispondere al principale obiettivo dello studio"

"⇒ "Un piccolo studio potrebbe non riuscire a rilevare importanti

effetti di interesse, o potrebbe stimarli con scarsa precisione, indipendentemente da quanto si siano controllati con cura gli effetti degli errori sistematici "

"⇒ "Uno studio più grande del necessario potrebbe sprecare

inutilmente delle preziose risorse che potrebbero essere destinate ad altri studi"

"⇒ "Uno studio più grande del necessario potrebbe individuare

come statisticamente significativi effetti che sono clinicamente non rilevanti"

"

Conclusioni!

85

Altre considerazioni!

⇒ le stime della numerosità campionaria sono indicative,

essendo basate su previsioni e approssimazioni dei risultati che lo studio potrà fornire;"

⇒ esistono limiti dovuti alla disponibilità di risorse;"

⇒ è necessario prospettare diversi scenari;"

⇒ è necessario tenere conto delle possibili diverse dimensioni

dei gruppi posti a confronto (es. esposti e non esposti a un

fattore di rischio in uno studio epidemiologico di coorte)""

86

Calcolo della dimensione campionaria di uno studio clinico ... · confronto hanno effetti identici....

Documents

Transcript of Calcolo della dimensione campionaria di uno studio clinico ... · confronto hanno effetti identici....