In
medicine
Medicine is the science and practice of caring for a patient, managing the diagnosis, prognosis, prevention, treatment, palliation of their injury or disease, and promoting their health. Medicine encompasses a variety of health care pract ...
, a stepped-wedge trial (or SWT) is a type of
randomised controlled trial
A randomized controlled trial (or randomized control trial; RCT) is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical t ...
(RCT). An RCT is a
scientific experiment
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a ...
that is designed to reduce
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
when testing a new
medical treatment
A therapy or medical treatment (often abbreviated tx, Tx, or Tx) is the attempted remediation of a health problem, usually following a medical diagnosis.
As a rule, each therapy has indications and contraindications. There are many different ...
,
a social intervention, or another testable
hypothesis
A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. For a hypothesis to be a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous obse ...
.
In a traditional RCT, the researcher randomly divides the experiment participants into two groups at the same time:
* One group receives the treatment (the "
treatment group
In the design of experiments, hypotheses are applied to experimental units in a treatment group.
In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
")
* The other group does not get the treatment (the "
control group
In the design of experiments, hypotheses are applied to experimental units in a treatment group.
In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...
").
In a SWT, a
logistic constraint typically prevents the
simultaneous
Simultaneity may refer to:
* Relativity of simultaneity, a concept in special relativity.
* Simultaneity (music), more than one complete musical texture occurring at the same time, rather than in succession
* Simultaneity, a concept in Endogenei ...
treatment of some participants, and instead, all or most participants receive the treatment in
waves
Waves most often refers to:
*Waves, oscillations accompanied by a transfer of energy that travel through space or mass.
* Wind waves, surface waves that occur on the free surface of bodies of water.
Waves may also refer to:
Music
* Waves (ban ...
or "steps".
For instance, researcher wants to measure whether teaching college students how to make several meals increased their propensity to cook at home instead of eating out.
* In a
traditional
A tradition is a belief or behavior (folk custom) passed down within a group or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examples include holidays or ...
RCT, a sample of students would be selected and some would be trained on how to cook these meals, whereas the some others would not. Both groups would be monitored to see how frequently they ate out. In the end, the number of times the treatment group ate out would be compared to the number of times the control group ate out, most likely with a
t-test
A ''t''-test is any statistical hypothesis testing, statistical hypothesis test in which the test statistic follows a Student's t-distribution, Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test stati ...
or some variant.
* If, however, the researcher could only train a limited number of students each week, then the researcher could employ an SWT, randomly assigning students to which week they would be trained.
The term "stepped wedge" was coined by The Gambia Hepatitis Intervention Study due to the stepped-wedge shape that is apparent from a schematic
illustration
An illustration is a decoration, interpretation or visual explanation of a text, concept or process, designed for integration in print and digital published media, such as posters, flyers, magazines, books, teaching materials, animations, vid ...
of the
design
A design is a plan or specification for the construction of an object or system or for the implementation of an activity or process or the result of that plan or specification in the form of a prototype, product, or process. The verb ''to design'' ...
.
The crossover is in one direction, typically from control to intervention, with the intervention not removed once implemented. The stepped-wedge design can be used for individually randomized trials,
i.e., trials where each individual is treated sequentially, but is more commonly used as a
cluster randomized trial (CRT).
Experiment design
The stepped-wedge design involves the collection of observations during a baseline period in which no
clusters are exposed to the intervention. Following this, at regular intervals, or steps, a cluster (or group of clusters) is randomized to receive the intervention
and all participants are once again measured.
This process continues until all clusters have received the intervention. Finally, one more measurement is made after all clusters have received the intervention.
Appropriateness
Hargreaves and colleagues offer a series of five questions that researchers should answer to decide whether SWT is indeed the optimal design, and how to proceed in every step of the study. Specifically, researchers should be able to identify:
;The reasons SWT is the preferred design:If measuring a treatment effect is the primary goal of research, SWT may not be the optimal design. SWTs are appropriate when the research focus is on the effectiveness of the treatment rather than on its mere existence. Overall, if the study is pragmatic (i.e. seeks primarily to implement a certain policy),
logistical
Logistics is generally the detailed organization and implementation of a complex operation. In a general business sense, logistics manages the flow of goods between the point of origin and the point of consumption to meet the requirements of ...
and other practical concerns are considered to be the best reasons to turn to a stepped wedge design. Also, if the treatment is expected to be beneficial, and it would not be ethical to deny it to some participants, then SWT allows all participants to have the treatment while still allowing a comparison with a control group. By the end of the study, all participants will have the opportunity to try the treatment. Note there may still be
ethical issues
Ethics or moral philosophy is a branch of philosophy that "involves systematizing, defending, and recommending concepts of right and wrong behavior".''Internet Encyclopedia of Philosophy'' The field of ethics, along with aesthetics, concerns ma ...
raised by delaying access to the treatment for some participants.
;Which SWT design is more suitable:SWTs can feature three main designs employing a closed
cohort
Cohort or cohortes may refer to:
* Cohort (educational group), a group of students working together through the same academic curriculum
* Cohort (floating point), a set of different encodings of the same numerical value
* Cohort (military unit ...
, an open cohort, and a continuous recruitment with short exposure. :In the closed cohort, all subjects participate in the experiment from beginning to end. All the outcomes are measured repeatedly at fixed time points which may or may not be related to each step.
:In the open cohort design, outcomes are measured similarly to the former design, but new subjects can enter the study, and some participants from an early stage can leave before the completion. Only a part of the subjects are exposed from the start, and more are gradually exposed in subsequent steps. Thus, the time of exposure varies for each subject.
:In continuous recruitment design with short exposure, very few or no subjects participate in the beginning of the experiment but more become eligible, and are exposed to short intervention gradually. In this design, each subject is assigned to either the treatment or the control condition. Since participants are assigned to either the treatment or the control group, the risk of carry-over effects, which may be a challenge for closed and open cohort designs, is minimal.
;Which analysis strategy is appropriate :Linear Mixed Models (LMM),
Generalized Linear Mixed Models (GLMM), and
Generalized Estimating Equations (GEE) are the principal
estimators
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
recommended for analyzing the results. While LMM offers higher power than GLMM and GEE, it can be inefficient if the size of clusters vary, and the response is not continuous and normally distributed. If any of those assumptions are violated, GLMM and GEE are preferred.
;How big the sample should be:
Power analysis
Power analysis is a form of side channel attack in which the attacker studies the power consumption of a cryptographic hardware device. These attacks rely on basic physical properties of the device: semiconductor devices are governed by the l ...
and sample size calculation are available. Generally, SWTs require smaller
sample size
Sample size determination is the act of choosing the number of observations or Replication (statistics), replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make stat ...
to detect effects since they leverage both between and within-cluster comparisons.
;Best practices for reporting the design and results of the trial :Reporting the design, sample profile, and results can be challenging, since no
Consolidated Standards Of Reporting Trials (CONSORT) have been designated for SWTs. However, some studies have provided both formalizations and flow charts that help reporting results, and sustaining a balanced sample across the waves.
Model
While there are several other potential methods for
modeling
A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure.
Models c ...
outcomes in an SWT,
the work of Hussey and Hughes
"first described methods to determine
statistical power
In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...
available when using a stepped wedge design."
What follows is their design.
Suppose there are
samples divided into
clusters. At each time point
, preferably equally spaced in actual time, some number of clusters are treated. Let
be
if cluster
has been treated at time
and
otherwise. In particular, note that if
then
.
For each participant
in cluster
, measure the outcome to be studied
at time
. Note that the
notation
In linguistics and semiotics, a notation is a system of graphics or symbols, characters and abbreviated expressions, used (for example) in artistic and scientific disciplines to represent technical facts and quantities by convention. Therefore, ...
allows for clustering by including
in the subscript of
,
,
, and
. We model these outcomes as:
where:
*
is a
grand mean The grand mean or pooled mean is the average of the means of several subsamples, as long as the subsamples have the same number of data points. For example, consider several lots, each containing several items. The items from each lot are sampling ( ...
,
*
is a random, cluster-level effect on the outcome,
*
is a time point-specific fixed effect,
*
is the measured effect of the treatment, and
*
is the residual noise.
This model can be viewed as a
Hierarchical linear model
Multilevel models (also known as hierarchical linear models, linear mixed-effect model, mixed models, nested data models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parame ...
where at the lowest level
where
is the mean of a given cluster at a given time, and at the cluster level, each cluster mean
.
Estimate of variance
The
design effect
In survey methodology, the design effect (generally denoted as D_ or D_^2) is a measure of the expected impact of a sampling design on the variance of an estimator for some parameter. It is calculated as the ratio of the variance of an estimator b ...
(estimate of unit variance) of a stepped wedge design is given by the formula:
where:
* ''ρ'' is the
intra-cluster correlation (ICC),
* ''n'' is the number of subjects within a cluster (which is assumed to be constant),
* ''k'' is the number of steps,
* ''t'' is the number of measurements after each step, and
* ''b i''s the number of baseline measurements.
To calculate the
sample size
Sample size determination is the act of choosing the number of observations or Replication (statistics), replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make stat ...
it is needed to apply the simple formula:
where:
* ''N
sw'' is the required sample size for the SWT
* ''N
u'' is the total unadjusted sample size that would be required for a traditional RCT.
Note that increasing either ''k'', ''t'', or ''b'' will result to decreasing the required sample size for an SWT.
Further, the required cluster ''c'' size is given by:
To calculate how many clusters ''c
s'' need to switch from the control to the treatment condition, the following formula is available:
If ''c'' and ''c
s'' are not
integers
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language o ...
, they need to be rounded to the next larger integer and distributed as evenly as possible among ''k.''
Advantages
Stepped wedge design features many comparative advantages to traditional RCTs (
Randomized controlled trials
A randomized controlled trial (or randomized control trial; RCT) is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical te ...
).
* First, SWTs are most appropriate both ethically and practically when the intervention is expected to produce a positive outcome. Since all subjects will eventually receive the benefits of the intervention,
ethical
Ethics or moral philosophy is a branch of philosophy that "involves systematizing, defending, and recommending concepts of right and wrong behavior".''Internet Encyclopedia of Philosophy'' The field of ethics, along with aesthetics, concerns ma ...
concerns can be appeased, and the recruitment of participants may become easier.
* Secondly, SWTs "can reconcile the need for robust evaluations with political or
logistical
Logistics is generally the detailed organization and implementation of a complex operation. In a general business sense, logistics manages the flow of goods between the point of origin and the point of consumption to meet the requirements of ...
constraints."
Specifically, it can be used to measure the effects of treatment when resources for performing an intervention are scarce.
* Thirdly, since each cluster receives both the control and the treatment condition by the end of the trial, both between and within-cluster comparisons are possible. This way statistical power increases while keeping the sample significantly smaller than it would be needed in a traditional RCT.
* Fourth, a
design effect
In survey methodology, the design effect (generally denoted as D_ or D_^2) is a measure of the expected impact of a sampling design on the variance of an estimator for some parameter. It is calculated as the ratio of the variance of an estimator b ...
(used to inflate the sample size of an individually randomized trial to that required in a cluster trial) has been established,
which has shown that the stepped wedge CRT could reduce the number of patients required in the trial compared to other designs.
* Finally, because each cluster switches randomly from control to treatment condition in different time points, it is possible to examine time effects.
For example, it is possible to study how repeated or long-term exposure to experimental stimuli affects the
efficiency
Efficiency is the often measurable ability to avoid wasting materials, energy, efforts, money, and time in doing something or in producing a desired result. In a more general sense, it is the ability to do things well, successfully, and without ...
of the treatment. Repeated measurements in regular time frames can average the noise out, which in turn increases the precision of estimates. This advantage becomes most apparent when measurement is noisy, and outcome
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
is low.
Disadvantages
SWT may suffer from certain drawbacks.
* First, since in SWTs the study period lasts longer and all the subjects eventually receive the treatment, costs may increase significantly.
Because the design can be expensive, SWTs may not be the optimal solution when measurement precision and outcome autocorrelation are high.
Moreover, since everyone is eventually treated, SWTs do not facilitate downstream analysis.
* Secondly, in an SWT, more clusters are exposed to the intervention at later than earlier time periods. As such, it is possible that an underlying temporal trend may confound the intervention effect, and so the confounding effect of time must be accounted for in both pre-trial power calculations and post-trial analysis.
Specifically, in post-trial analysis, the use of
generalized linear mixed model
In statistics, a generalized linear mixed model (GLMM) is an extension to the generalized linear model (GLM) in which the linear predictor contains random effects in addition to the usual fixed effects. They also inherit from GLMs the idea of exte ...
s or
generalized estimating equation
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized es ...
s is recommended.
* Finally, the design and analysis of stepped-wedge trials is therefore more complex than for other types of randomized trials. Previous systematic reviews] highlighted the poor reporting of sample size calculations and a lack of consistency in the analysis of such trials.
Hussey and Hughes were the first authors to suggest a structure and formula for estimating power in stepped-wedge studies in which data was collected at each and every step.
This has now been expanded for designs in which observations are not made at each step as well as multiple layers of clustering.
Ongoing work
The number of studies using the design have been on the increase. In 2015, a thematic series was published in the journal ''Trials''.
In 2016, the first international conference dedicated to the topic was held at the
University of York
, mottoeng = On the threshold of wisdom
, established =
, type = Public research university
, endowment = £8.0 million
, budget = £403.6 million
, chancellor = Heather Melville
, vice_chancellor = Charlie Jeffery
, students ...
.
References
{{reflist
Design of experiments
Clinical research
Clinical trials