---
title: "Simulate Fixed Designs with Ease via sim_fixed_n"
author: "Yujie Zhao and Keaven Anderson"
output: rmarkdown::html_vignette
bibliography: simtrial.bib
vignette: >
  %\VignetteIndexEntry{Simulate Fixed Designs with Ease via sim_fixed_n}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r, message=FALSE, warning=FALSE}
library(gsDesign2)
library(simtrial)
library(dplyr)
library(gt)

set.seed(2027)
```

The `sim_fixed_n()` function simulates a two-arm trial with a single endpoint, accounting for time-varying enrollment, hazard ratios, and failure and dropout rates. 

While there are limitations, there are advantages of calling `sim_fixed_n()` directly:

- It is simple, which allows for a single function call to perform an arbitrary number of simulations.
- It automatically implements a parallel computating backend to reduce running time.
- It offers up to 5 options for determining the data cutoff for analysis through the `timing_type` parameter.

If people are interested in more complicated simulations, please refer to the vignette [Custom Fixed Design Simulations: A Tutorial on Writing Code from the Ground Up](https://merck.github.io/simtrial/articles/sim_fixed_design_custom.html).

The process for simulating via `sim_fixed_n()` is outlined in Steps 1 to 3 below.

# Step 1: Define design parameters

To run simulations for a fixed design, several design characteristics may be used. 
Depending on the data cutoff for analysis option, different inputs may be required.
The following lines of code specify an unstratified 2-arm trial with equal randomization. 
The simulation is repeated 2 times. 
Enrollment is targeted to last for 12 months at a constant enrollment rate. 
The median for the control arm is 10 months, with a delayed effect during the first 3 months followed by a hazard ratio of 0.7 thereafter. 
There is an exponential dropout rate of 0.001 over time.

```{r}
n_sim <- 100
total_duration <- 36
stratum <- data.frame(stratum = "All", p = 1)
block <- rep(c("experimental", "control"), 2)

enroll_rate <- data.frame(stratum = "All", rate = 12, duration = 500 / 12)
fail_rate <- data.frame(stratum = "All",
                        duration = c(3, Inf), fail_rate = log(2) / 10, 
                        hr = c(1, 0.6), dropout_rate = 0.001)
```

We specify sample size and targeted event count based on the `fixed_design_ahr()` function and the above options.
The following design computes the sample size and targeted event counts for 85\% power. 
In this approach, users can obtain the sample size and targeted events from the output of `x`, specifically by using `sample_size <- x$analysis$n` and `target_event <- x$analysis$event`.

```{r}
x <- fixed_design_ahr(enroll_rate = enroll_rate, fail_rate = fail_rate, 
                      alpha = 0.025, power = 0.85, ratio = 1, 
                      study_duration = total_duration) |> to_integer()
x |> summary() |> gt() |> 
  tab_header(title = "Sample Size and Targeted Events Based on AHR Method", 
             subtitle = "Fixed Design with 85% Power, One-sided 2.5% Type I error") |>
  fmt_number(columns = c(4, 5, 7), decimals = 2)
```

Now we set the derived targeted sample size, enrollment rate, and event count from the above.

```{r}
sample_size <- x$analysis$n
target_event <- x$analysis$event
enroll_rate <- x$enroll_rate
```


# Step 2: Run `sim_fixed_n()`

Now that we have set up the design characteristics in Step 1, we can proceed to run `sim_fix_n()` for our simulations. This function automatically utilizes a parallel computing backend, which helps reduce the running time.

The `timing_type` specifies one or more of the following cutoffs for setting the time for analysis:

  + `timing_type = 1`: The planned study duration.
  + `timing_type = 2`: The time until target event count has been observed.
  + `timing_type = 3`: The planned minimum follow-up period after enrollment completion.
  + `timing_type = 4`: The maximum of planned study duration and time to observe the targeted event count (i.e., using `timing_type = 1` and `timing_type = 2` together).
  + `timing_type = 5`: The maximum of time to observe the targeted event count and minimum follow-up following enrollment completion  (i.e., using `timing_type = 2` and `timing_type = 3` together).

The `rho_gamma` argument is a data frame containing the variables `rho` and `gamma`, both of which should be greater than or equal to zero, to specify one Fleming-Harrington weighted log-rank test per row. 
For instance, setting `rho = 0` and `gamma = 0` yields the standard unweighted log-rank test, while `rho = 0 and gamma = 0.5` provides the weighted Fleming-Harrington (0, 0.5) log-rank test. 
If you are interested in tests other than the Fleming-Harrington weighted log-rank test, please refer to the vignette [Custom Fixed Design Simulations: A Tutorial on Writing Code from the Ground Up](https://merck.github.io/simtrial/articles/sim_fixed_design_custom.html).

```{r, message=FALSE}
sim_res <- sim_fixed_n(
  n_sim = 2, # only use 2 simulations for initial run
  sample_size = sample_size, 
  block = block, 
  stratum = stratum,
  target_event = target_event, 
  total_duration = total_duration,
  enroll_rate = enroll_rate, 
  fail_rate = fail_rate,
  timing_type = 1:5, 
  rho_gamma = data.frame(rho = 0, gamma = 0))
```

The output of `sim_fixed_n` is a data frame with one row per simulated dataset per cutoff specified in `timing_type`, per test statistic specified in `rho_gamma`.
Here we have just run 2 simulated trials and see how the different cutoffs vary for the 2 trial instances.

```{r}
sim_res |>
  gt() |>
  tab_header("Tests for Each Simulation Result", subtitle = "Logrank Test for Different Analysis Cutoffs") |>
  fmt_number(columns = c(4, 5, 7), decimals = 2)
```

# Step 3: Summarize simulations

Now we run `r n_sim` simulated trials and summarize the results by how data is cutoff for analysis.

```{r, message=FALSE}
sim_res <- sim_fixed_n(
  n_sim = n_sim,
  sample_size = sample_size, 
  block = block, stratum = stratum,
  target_event = target_event, 
  total_duration = total_duration,
  enroll_rate = enroll_rate, 
  fail_rate = fail_rate,
  timing_type = 1:5, 
  rho_gamma = data.frame(rho = 0, gamma = 0))
```

With the `r n_sim` simulations provided, users can summarize the simulated power and compare it to the targeted 85\% power.
All cutoff methods approximate the targeted power well and have a similar average duration and mean number of events.

```{r}
sim_res |>
  group_by(cut) |>
  summarize(`Simulated Power` = mean(z > qnorm(1 - 0.025)), 
            `Mean events` = mean(event),
            `Mean duration` = mean(duration)) |>
  mutate(`Sample size` = sample_size,
         `Targeted events` = target_event) |>
  gt() |>
  tab_header(title = "Summary of 100 simulations by 5 different analysis cutoff methods",
             subtitle = "Tested by logrank") |>
  fmt_number(columns = c(2:4), decimals = 2)
```

We can also do things like summarize distribution of event counts at the planned study duration.
We can see the event count varies a fair amount.

```{r, fig.width = 6}
hist(sim_res$event[sim_res$cut == "Planned duration"], 
     breaks = 10,
     main = "Distribution of Event Counts at Planned Study Duration",
     xlab = "Event Count at Targeted Trial Duration")
```

We also evaluate the distribution of the trial duration when analysis is performed when the targeted events are achieved.

```{r, fig.width = 6}
plot(density(sim_res$duration[sim_res$cut == "Targeted events"]), 
     main = "Trial Duration Smoothed Density",
     xlab = "Trial duration when Targeted Event Count is Observed")
```