Efficient simulation of individual-based population models

Daphné Giorgi; Sarah Kaakai; Vincent Lemaire

doi:10.57750/sfxn-1t05

Abstract

The R Package IBMPopSim facilitates the simulation of the random evolution of heterogeneous populations using stochastic Individual-Based Models (IBMs). The package enables users to simulate population evolution, in which individuals are characterized by their age and some characteristics, and the population is modified by different types of events, including births/arrivals, death/exit events, or changes of characteristics. The frequency at which an event can occur to an individual can depend on their age and characteristics, but also on the characteristics of other individuals (interactions). Such models have a wide range of applications in fields including actuarial science, biology, ecology or epidemiology. IBMPopSim overcomes the limitations of time-consuming IBMs simulations by implementing new efficient algorithms based on thinning methods, which are compiled using the Rcpp package while providing a user-friendly interface.

1 Introduction

In various fields, advances in probability have contributed to the development of a new mathematical framework for so-called individual-based stochastic population dynamics, also called stochastic Individual-Based Models (IBMs). Stochastic IBMs allow the modeling in continuous time of populations dynamics structured by age and/or characteristics. In the field of mathematical biology and ecology, a large community has used this formalism for the study of the evolution of structured populations (see e.g. Ferrière and Tran (2009); Collet, Méléard, and Metz (2013); Bansaye and Méléard (2015); Costa et al. (2016); Billiard et al. (2016); Lavallée et al. (2019); Méléard, Rera, and Roget (2019); Calvez et al. (2020)), after the pioneer works (Fournier and Méléard 2004; Tran 2008; Méléard and Tran 2009). IBMs are also useful in demography and actuarial sciences, for the modeling of human populations dynamics (see e.g. Bensusan (2010); Boumezoued (2016); El Karoui, Hadji, and Kaakai (2021)).

Indeed, they allow the modeling of heterogeneous and complex population dynamics, which can be used to compute demographic indicators or simulate the evolution of insurance portfolios in order to study the basis risk, compute cash flows for annuity products or pension schemes, or for a fine assessment of mortality models (Barrieu et al. 2012). There are other domains in which stochastic IBMs can be used, for example in epidemiology with stochastic compartmental models, neurosciences, cyber risk, or Agent-Based Models (ABMs) in economy and social sciences, which can be seen as IBMs. Many mathematical results have been obtained in the literature cited above, for quantifying the limit behaviors of IBMs over long time scales or in large population. In particular, pathwise representations of IBMs have been introduced in Fournier and Méléard (2004) (and extended to age-structured populations in Tran (2008); Méléard and Tran (2009)), as measure-valued pure jumps Markov processes, solutions of SDEs driven by Poisson measures. These pathwise representations are based on the thinning and projection of Poisson random measures defined on extended spaces. However, the simulation of large and interacting populations is often computationally expensive.

The aim of the R package IBMPopSim is to meet the needs of the various communities for efficient tools in order to simulate the evolution of stochastic IBMs. IBMPopSim provides a general framework for the simulation of a wide class of IBMs, where individuals are characterized by their age and/or a set of characteristics. Different types of events can be included in the modeling by users, depending on their needs: births, deaths, entry or exit in/to the population and changes of characteristics (swap events). Furthermore, the various events that can happen to individuals in the population can occur at a non-stationary frequency, depending on the individuals’ characteristics and time, and also including potential interactions between individuals.

We introduce a unified mathematical and simulation framework for this class of IBMs, generalizing the pathwise representation of IBMs by thinning of Poisson measures, as well as the associated population simulation algorithm, based on an acceptance/rejection procedure. In particular, we provide general sufficient conditions on the event intensities under which the simulation of a particular model is possible.

We opted to implement the algorithms of the IBMPopSim package using the Rcpp package, a tool facilitating the seamless integration of high-performance C++ code into easily callable R functions (Eddelbuettel and Francois 2011). IBMPopSim offers user-friendly R functions for defining and simulating IBMs. Once events and their associated intensities are specified, an automated procedure creates the model. This involves integrating the user’s source code into the primary C++ code using a template mechanism. Subsequently, Rcpp is invoked to compile the model so that the model is integrated into the R session and callable with varying parameters, enabling the generation of diverse population evolution scenarios. Combined with the design of the simulation algorithms, the package structure yields very competitive simulation runtimes for IBMs, while staying user-friendly for R users. Several outputs function are also implemented in IBMPopSim. For instance the package allows the construction and visualization of age pyramids, as well as the construction of death and exposures table from the censored individual data, compatible with R packages concerned with mortality modelling, such as Hyndman et al. (2023) or A. Villegas, Millossovich, and Kaishev Hyndman (2018). Several examples are provided in the form of R vignettes on the website, and in recent works of El Karoui, Hadji, and Kaakai (2021) and Roget et al. (2024).

To the best of our knowledge, there are no other R packages currently available addressing the issue of stochastic IBMs efficient simulation. Another approach for simulating populations is continuous time microsimulation in social sciences, which is implemented in the R package MicSim (Zinn 2014). In this framework, individual life-courses are specified by sequences of state transitions (events) and the time spans between these transitions. The state space is usually discrete and finite, which is not necessarily the case in IBMPopSim, where individuals can have continuous characteristics. But most importantly, microsimulation does not allow for interactions between individuals. Indeed, microsimulation produces separately the life courses of all individuals in the populations, based on the computation of the distribution functions of the waiting times in the distinct states of the state space, for each individual (Zinn 2014). This can be slow in comparison to the simulation by thinning of event times occurring in the population, which is based on selecting event times among some competing proposed event times. Finally, MicSim simplifies the Mic-Core microsimulation tool implemented in Java (Zinn et al. 2009). However, the implementation in R of simulation algorithms yields longer simulation run times than when using Rcpp.

In Section 2, we give a short description of Stochastic Individual-Based Models (IBMs) and a quick example of model implementation with the IBMPopSim package. In Section 3, we introduce the mathematical framework that characterizes the class of IBMs that can be implemented in the IBMPopSim package. In particular, a general pathwise representation of IBMs is presented. The population dynamics is obtained as the solution of an SDE driven by Poisson measures, for which we obtain existence and uniqueness results in Theorem 1. In Section 4 the two main algorithms for simulating the population evolution of an IBM across the interval [0, T] are detailed. In Section 5 we present the main functions of the IBMPopSim package, which allow for the definition of events and their intensities, the creation of a model, and the simulation of scenarios. Two examples are detailed in Section 6 and Section 7, featuring applications involving an heterogeneous insurance portfolio characterized by entry and exit events, and an age and size-structured population with intricate interactions.

2 Brief overview of IBMPopSim

Stochastic Individual-Based Models (IBMs) represent a broad class of random population dynamics models, allowing the description of population evolution on an individual scale. Informally, an IBM can be summarized by the description of the individuals constituting the population, the various types of events that can occur to these individuals, along with their respective frequencies. In IBMPopSim, individuals can be characterized by their age and/or a collection of discrete or continuous characteristics. Moreover, the package enables users to simulate efficiently populations in which one or more of the following event types may occur:

Birth event: addition of an individual of age 0 to the population.
Death event: removal of an individual from the population.
Entry event: arrival of an individual in the population.
Exit (emigration) event: exit from the population (other than death).
Swap event: an individual changes characteristics.

Each event type is linked to an associated event kernel, describing how the population is modified following the occurrence of the event. For some event types, the event kernel requires explicit specification. This is the case for entry events when a new individual joins the population, which requires to specify the age and characteristics of this new individual. For instance, the characteristics of a new individual in the population can be chosen uniformly in the space of all characteristics, or can depend on the distribution of his parents or those of the other individuals composing the population.

The last component of an IBM are the event intensities. Informally, an event intensity is a function \lambda^e_t(I, Z) describing the frequency at which an event e can occur to an individual I in a population Z at a time t. Given a history of the population (\mathcal{F}_t), the probability of event e occurring to individual I during a small interval of time (t,t+dt] is proportional to \lambda^e(I,t): \mathbb{P}(\text{event } e \text{ occurring to $I$ during } (t,t+dt] | \mathcal{F}_t) \simeq \lambda^e_t(I, Z)dt.

The intensity function \lambda^e can include various dependencies:

individual intensity: \lambda^e depends only on the individual’s I age and characteristics, and time t,
interaction intensity: in addition \lambda^e depends on the population composition Z.

Prior to providing a detailed description of an Individual-Based Model (IBM), we present a simple model of birth and death in an age-structured human population. We assume no interactions between individuals, and individuals are characterized by their gender, in addition to their age. In this simple model, all individuals, regardless of gender, can give birth when their age falls between 15 and 40 years, with a constant birth rate of 0.05. The death intensity is assumed to follow a Gompertz-type intensity depending on age. The birth and death intensities are then given by
\begin{equation*} \lambda^b(t, I) = 0.05 \times \mathbf{1}_{[15,40]}(a(I,t)), \quad \lambda^d(t, I) = \alpha\exp(\beta a(I,t)), \end{equation*} with a(I,t) the age of individual I at time t. Birth events are also characterized with a kernel determining the gender of the newborn, who is male with probability p_{male}.

2.1 Model creation

All models in IBMPopSim are created with a call to the mk_model function, which takes the list of events as an argument. In this example, the events are created with the mk_event_individual function, involving a few lines of cpp instructions defining the intensity and, if applicable, the kernel of the event. For a more in depth description of the event creation step and its parameters, we refer to Section 5.2.

The events of this simple model are for example defined through the following calls.

Hide/Show the code

birth_event <- mk_event_individual(
  type = "birth", 
  intensity_code = "result = birth_rate(I.age(t));",
  kernel_code = "newI.male = CUnif(0,1) < p_male;")

death_event <- mk_event_individual(
  type = "death",
  intensity_code = "result = alpha * exp(beta * I.age(t));")

In the cpp codes, the names birth_rate, p_male, alpha and beta refer to the model parameters defined in the following list.

Hide/Show the code

params <- list(
  "alpha" = 0.008, "beta" = 0.02, 
  "p_male" = 0.51,
  "birth_rate" = stepfun(c(15, 40), c(0, 0.05, 0)))

In a second step, the model is created by calling the function mk_model. A cpp source code is automatically created through a template mechanism based on the events and parameters, subsequently compiled using the sourceCpp function from the Rcpp package.

Hide/Show the code

birth_death_model <- mk_model(
  characteristics = c("male" = "bool"),
  events = list(death_event, birth_event),
  parameters = params)

2.2 Simulation

Once the model is created and compiled, the popsim function is called to simulate the evolution of a population according to this model. To achieve this, an initial population must be defined. In this example, we extract a population from a dataset specified in the package (a sample of 100\,000 individuals based on the population of England and Wales in 2014). It is also necessary to set bounds for the events intensities. In this example, they are obtained by assuming that the maximum age for an individual is 115 years.

Hide/Show the code

a_max <- 115
events_bounds = c(
  "death" = params$alpha * exp(params$beta * a_max),
  "birth" = max(params$birth_rate))

The function popsim can now be called to simulate the population starting from the initial population population(EW_pop_14$sample) up to time T = 30.

Hide/Show the code

sim_out <- popsim(
  birth_death_model, 
  population(EW_pop_14$sample), 
  events_bounds, 
  parameters = params, age_max = a_max, 
  time = 30)

The data frame sim_out$population contains the information (birth, death, gender) on individuals who lived in the population over the period [0,30]. Functions of the package allows to provide aggregated information on the population.

3 Mathematical framework

In this section, we define rigorously the class of IBMs that can be simulated in IBMPopSim, along with the assumptions that are required in order for the population to be simulatable. The representation of age-structured IBMs based on measure-valued processes, as introduced in Tran (2008), is generalized to a wider class of abstract population dynamics. The modeling differs slightly here, since individuals are kept in the population after their death (or exit), by including the death/exit date as an individual trait.

In the remainder of the paper, the filtered probability space is denoted by (\Omega,\{\mathcal{F}_t \},{\mathbb{P}}), under the usual assumptions. All processes are assumed to be càdlàg and adapted to the filtration \{\mathcal{F}_t \} (for instance the history of the population) on a time interval [0,T]. For a càdlàg process X, we denote X_{t^-} := \lim_{\genfrac{}{}{0pt}{2}{s\to t}{s<t}} X_s.

3.1 Population

As mentioned in Section 2 a population is a collection of individuals whose evolution defines the population process.

3.1.1 Individuals

An individual is represented by a triplet I = (\tau^b, \tau^d, x) \in \mathcal{I}= {\mathbb{R}}\times \bar {\mathbb{R}}\times {\mathcal{X}} with:

\tau^b \in {\mathbb{R}} the date of birth,
\tau^d \in \bar {\mathbb{R}} the death date, with \tau^d = \infty if the individual is still alive,
a collection x \in {\mathcal{X}} of characteristics where {\mathcal{X}} is the space of characteristics.

Note that in IBMs, individuals are usually characterized by their age a(t) =t-\tau^b instead of their date of birth \tau^b. However, using the latter is actually easier for the simulation, as it remains constant over time.

3.1.2 Population process

The population at a given time t is a random set Z_t=\{ I_k \in \mathcal{I}; \; k= 1,\dots, N_t\}, composed of all individuals (alive or dead) who have lived in the population before time t. As a random set, Z_t can be represented by a random counting measure on \mathcal{I} , that is an integer-valued measure Z: \Omega \times \mathcal{I}\to \bar {\mathbb{N}} where for A \in \mathcal{I}, Z(A) is the (random) number of individuals I in the subset A. With this representation: \begin{aligned} Z_t (\mathrm{d}\tau^b, \mathrm{d}\tau^d , \mathrm{d}x) &= \sum_{k=1}^{N_t} \delta_{I_k} (\tau^b, \tau^d,x), \\ \text{ with } \int_{\mathcal{I}} f(\tau^b, \tau^d, x) Z_t (\mathrm{d}\tau^b, \mathrm{d}\tau^d , \mathrm{d}x) &= \sum_{k=1}^{N_t} f(I_k). \end{aligned} \tag{1}

The number of individuals present in the population before time t is obtained by taking f\equiv 1: N_t = \int_{\mathcal{I}} Z_t(\mathrm{d}\tau^b, \mathrm{d}\tau^d, \mathrm{d}x) = \sum_{k=1}^{N_t} \mathbf{1}_{\mathcal{I}} (I_k). Note that (N_t)_{t\geq 0} is an increasing process since dead/exited individuals are kept in the population Z. The number of alive individuals in the population at time t is: N_t^a = \int_{\mathcal{I}} \mathbf{1}_{\{\tau^d > t \} }Z_t(\mathrm{d}\tau^b, \mathrm{d}\tau^d, \mathrm{d}x) = \sum_{k=1}^{N_t} \mathbf{1}_{\{\tau^d_k > t \} }. \tag{2} Another example is the number of alive individuals of age over a is N_t([a,+\infty)) := \int_{\mathcal{I}} \mathbf{1}_{[a,+\infty)}(t-\tau^b)\mathbf{1}_{]t,\infty]}(\tau^d) Z_t(\mathrm{d}\tau^b, \mathrm{d}\tau^d, \mathrm{d}x) = \sum_{k=1}^{N_t} \mathbf{1}_{\{ t -\tau_k^b \geq a \}}\mathbf{1}_{\{\tau^d_k \geq t \} }.

3.2 Events

The population composition changes at random dates following different types of events. IBMPopSim allows the simulation of IBMs with the following events types:

A birth event at time t is the addition of a new individual I'=(t,\infty, X) of age 0 to the population. Their date of birth is \tau^b =t, and characteristics is X, a random variable of distribution defined by the birth kernel k^b(t,I,\mathrm{d}x) on {\mathcal{X}}, depending on t and its parent I. The population size becomes N_t = N_{t^-} + 1, and the population composition after the event is Z_t = Z_{t^-} + \delta_{(t,\infty, X)}.
An entry event at time t is also the addition of an individual I' in the population. However, this individual is not of age 0. The date of birth and characteristics of the new individual I'= (\tau^b, \infty, X) are random variables of probability distribution defined by the entry kernel k^{en}(t, \mathrm{d}s, \mathrm{d}x) on {\mathbb{R}}\times {\mathcal{X}}. The population size becomes N_t = N_{t^-} + 1, and the population composition after the event is: Z_t = Z_{t^-} + \delta_{(\tau^b, \infty, X)}.
A death or exit event of an individual I= (\tau^b,\infty, x)\in Z_{t^-} at time t is the modification of its death date \tau^d from +\infty to t. This event results in the simultaneous addition of the individual (\tau^b,t,x) and removal of the individual I from the population. The population size is not modified, and the population composition after the event is Z_t = Z_{t^-} +\delta_{(\tau^b,t,x)}- \delta_{I}.
A swap event (change of characteristics) results in the simultaneous addition and removal of an individual. If an individual I= (\tau^b,\infty, x) \in Z_{t^-} changes of characteristics at time t, then it is removed from the population and replaced by I' = (\tau^b,\infty, X). The new characteristics X is a random variable of distribution k^s(t, I,\mathrm{d}x) on {\mathcal{X}}, depending on time, the individual’s age and previous characteristics x. In this case, the population size is not modified and the population becomes: Z_t = Z_{t^-} + \delta_{(\tau^b, \infty, X)} - \delta_{(\tau^b, \infty, x)}.

To summarize, the space of event types is E = \{ b, en, d, s \}, and the jump \Delta Z_t = Z_t - Z_{t^-} (change in the population composition) generated by an event of type e \in \{ b, en, d, s \} is denoted by \phi^e(t, I). We thus have the following rules summarized in Table 1.

Table 1: Action in the population for a given event type

Event	Type	\phi^e(t, I)	New individual
Birth	b	\delta_{(t, \infty, X)}	\tau^b =t, \; X \sim k^b(t,I,\mathrm{d}x)
Entry	en	\delta_{(\tau^b, \infty, X)}	(\tau^b, X) \sim k^{en}(t,\mathrm{d}s, \mathrm{d}x)
Death/Exit	d	\delta_{(\tau^b, t,x)} - \delta_{(\tau^b, \infty, x)}	\tau^d = t
Swap	s	\delta_{(\tau^b, \infty , X)} - \delta_{(\tau^b, \infty, x)}	X \sim k^s(t,I,\mathrm{d}x)

Remark 1 (Composition of the population).

At time T, the population Z_T contains all individuals who lived in the population before T, including dead/exited individuals. If there are no swap events, or entries, the population state Z_t for any time t\leq T can be obtained from Z_T. Indeed, if Z_T = \sum_{k=1}^{N_T} \delta_{I_k}, then the population at time t\leq T is simply composed of the individuals born before t: Z_t = \sum_{k=1}^{N_T} \mathbf{1}_{\{\tau^b_k \leq t \}} \delta_{I_k}.
In the presence of entries (open population), a characteristic x can track the individuals’ entry dates. Then, the previous equation can be easily modified in order to obtain the population Z_t at time t\leq T from Z_T.

3.3 Events intensity

Once the different event types have been defined in the population model, the frequency at which each event e occurs in the population has to be specified. Informally, the intensity \Lambda^e_t(Z_t) at which an event e can occur is defined by \mathbb P\big( \text{event } e \text { occurs in the population } Z_t \in (t,t+\mathrm{d}t] | \mathcal{F}_t \big) \simeq \Lambda^e_t (Z_t)\mathrm{d}t. For a more formal definition of stochastic intensities, we refer to Brémaud (1981) or Kaakai and El Karoui (2023). The form of the intensity function (\Lambda^e_t (Z_t)) determines the population simulation algorithm in IBMPopSim:

When the event intensity does not depend on the population state, \big(\Lambda^e_t (Z_t)\big)_{t\in [0,T]} = \big(\mu^e(t)\big)_{t \in [0,T]}, \tag{3} with \mu^e a deterministic function, the events of type e occur at the jump times of an inhomogeneous Poisson process of intensity function (\mu^e(t))_{t \in [0,T]}. This is particularly useful when entry events occur with intensities influenced by environmental processes and/or exhibit seasonal variations. When such an event occurs, the individual to whom the event happens is drawn uniformly from the living individuals in the population. In a given model, the set of events e\in E with Poisson intensities will be denoted by \mathcal{P}.
Otherwise, we assume that the global intensity \Lambda^e_t(Z_t) at which the events of type e occur in the population can be written as the sum of individual intensities \lambda^e_t(I,Z_t): \begin{aligned} &\Lambda^e_t (Z_t) = \sum_{k=1}^{N_t} \lambda^e_t ( I_k,Z_t), \\ & \text{with } \mathbb P\big( \text{event } e \text { occurs to an individual } I \in (t,t+\mathrm{d}t] | \mathcal{F}_t \big) \simeq \lambda^e_t (I,Z_t)\mathrm{d}t. \end{aligned} \tag{4}

Obviously, nothing can happen to dead or exited individuals, i.e. individuals I= (\tau^b, \tau^d, x) with \tau^d \leq t. Thus, individual event intensities are assumed to be null for dead/exited individuals: \lambda^e_t ( I,Z_t) = 0, \text{ if }\tau^d \leq t, \text{ so that } \Lambda^e_t (Z_t) = \sum_{k=1}^{N_t^a} \lambda^e_t ( I_k,Z_t), with N^a_t the number of alive individuals at time t.

The event’s individual intensity \lambda^e_t (I,Z_t) can depend on time (for instance when there is a mortality reduction over time), on the individual’s age t-\tau^b and characteristics, but also on the population composition Z_t. The dependence of \lambda^e on the population Z models interactions between individuals in the populations. Hence, two types of individual intensity functions can be implemented in IBMPopSim:

No interactions: The intensity function \lambda^e does not depend on the population composition. The intensity at which an event of type e occurs to an individual I only depends on its date of birth and characteristics: \lambda^e_t (I,Z_t) = \lambda^e(t, I), \tag{5} where \lambda^e: \mathbb{R}_+ \times \mathcal{I}\to {\mathbb{R}}^+ is a deterministic function. In a given model, we denote by \mathcal{E} the set of event types with individual intensity Equation 5.
“Quadratic” interactions: The intensity at which an event of type e occurs to an individual I depends on I and on the population composition, through an interaction function W^e. The quantity W^e(t, I,J) describes the intensity of interactions between two alive individuals I and J at time t, for instance in the presence of competition or cooperation. In this case, we have \lambda^e_t(I,Z_t)=\sum_{j=1}^{N_t} W^e(t, I, I_j) = \int_{\mathcal{I}} W^e(t, I, (\tau^b,\tau^d,x)) Z_t (\mathrm{d}\tau^b,\mathrm{d}\tau^d, \mathrm{d}x), \tag{6} where W^e(t, I, (\tau^b,\tau^d,x)) = 0 if the individual J =(\tau^b,\tau^d,x) is dead, i.e. \tau^d \leq t. In a given model, we denote by \mathcal{E}_W the set of event types with individual intensity Equation 6.

To summarize, an individual intensity in IBMPopSim can be written as: \lambda^e_t(I,Z_t) = \lambda^e(t, I) \mathbf{1}_{\{e \in \mathcal{E}\}} + \biggl( \sum_{j=1}^{N_t} W^e(t, I, I_j) \biggr) \mathbf{1}_{\{e \in \mathcal{E}_W\}}. \tag{7}

Example 1

An example of death intensity without interaction for an individual I=(\tau^b, \tau^d, x) alive at time t, t < \tau^d, is: \lambda^d(t,I) = \alpha_x \exp(\beta_x a(I,t)), \text{ where } a(I,t) = t-\tau^b is the age of the individual I at time t. In this standard case, the death rate of an individual I is an exponential (Gompertz) function of the individual’s age, with coefficients depending on the individual’s characteristics x.
In the presence of competition between individuals, the death intensity of an individual I also depends on other individuals J in the population. For example, if I=(\tau^b,\tau^d, x), with its size x, then we have: W^d(t,I,J) = (x_J - x)^+ \mathbf{1}_{\{\tau^d_J > t\}}, \quad \forall \; J=(\tau^b_J,\tau^d_J , x_J). \tag{8} This can be interpreted as follows: if the individual I meets randomly an individual J alive at time t, and of bigger size x_J > x, then he can die at the intensity x_J-x. If J is smaller than I, then it cannot kill I. The bigger is the size x of I, the lower is its death intensity \lambda^d_t(I,Z_t) defined by \lambda^d_t(I,Z_t) = \sum_{\genfrac{}{}{0pt}{2}{J\in Z_t,}{x_J > x}} (x_J -x)\mathbf{1}_{\{\tau^d_J > t\}}.
IBMPopSim can simulate IBMs that include intensities expressed as a sum of Poisson intensities and individual intensities of the form \Lambda^e(Z_t) =\mu^e_t + \sum_{k=1}^{N_t} \lambda^e(I_k, Z_t). Other examples are provided in Section 6 and Section 7.

Finally, the global intensity at which an event can occur in the population is defined by: \Lambda_t(Z_t) = \sum_{e\in \mathcal{P}} \mu^e(t) + \sum_{e \in \mathcal E} \Big(\sum_{k=1}^{N_t} \lambda^e(t, I_k)\Big) + \sum_{e \in \mathcal E_W} \Big(\sum_{k=1}^{N_t}\sum_{j=1}^{N_t} W^e(t, I_k, I_j)\Big). \tag{9} An important point is that for events e \in \mathcal E without interactions, the global event intensity \Lambda^e_t(Z_t) = \sum_{k=1}^{N_t} \lambda^e(t, I_k) is of order N_t^a defined in Equation 2 (number of alive individuals at time t). On the other hand, for events e \in \mathcal{E}_W with interactions, \Lambda^e_t(Z_t) = \sum_{k=1}^{N_t}\sum_{j=1}^{N_t} W^e(t, I_k, I_j) is of order (N_t^a)^2. Informally, this means that when the population size increases, events with interaction are more costly to simulate. Furthermore, the numerous computations of the interaction kernel W^e can also be computationally costly. The randomized Algorithm 3, detailed in Section 4.3, allows us to overcome these limitations.

Events intensity bounds

The simulation algorithms implemented in IBMPopSim are based on an acceptance/rejection procedure, which requires the user to specify bounds for the various events intensities \Lambda^e_t(Z_t). These bounds are defined differently depending on the expression of the intensity.

Assumption 1 For all events e \in \mathcal{P} with Poisson intensity (Equation 3), the intensity is assumed to be bounded on [0,T]: \forall t \in [0,T], \quad \Lambda^e_t(Z_t) = \mu^e(t) \leq \bar \mu^e.

When e \in \mathcal{E} \cup \mathcal{E}_W, \Lambda^e_t(Z_t) =\sum_{k=1}^{N_t} \lambda^e_t(I_k,Z_t), assuming that \Lambda^e_t(Z_t) is uniformly bounded is too restrictive since the event intensity depends on the population size. In this case, the assumption is made on the individual intensity \lambda^e or on the interaction function W^e, depending on the situation.

Assumption 2 For all event types e \in \mathcal{E}, the associated individual event intensity \lambda^e with no interactions (Equation 5) is assumed to be uniformly bounded: \lambda^e(t, I) \leq \bar \lambda^e, \quad \forall \; t\in [0, T], \; I \in \mathcal{I}. In particular, \forall t \in [0,T], \quad \Lambda^e_t (Z_t) = \sum_{k=1}^{N_t} \lambda^e(t, I) \leq \bar \lambda^e N_t . \tag{10}

Assumption 3 For all event types e \in \mathcal{E}_W, the associated interaction function W^e is assumed to be uniformly bounded: W^e(t, I, J) \leq \bar W^e, \quad \forall \; t\in [0,T], \; I, J \in \mathcal{I}. In particular, \forall t \in [0,T], \lambda^e_t (I,Z_t) = \sum_{j=1}^{N_t} W^e(t, I, I_j) \leq \bar W^e N_t, \quad \text{and} \quad \Lambda^e_t (Z_t) \leq \bar W^e (N_t)^2.

Assumption 1, Assumption 2 and Assumption 3 yield that events in the population occur with the global event intensity \Lambda_t(Z_t), given in Equation 9, which is dominated by a polynomial function in the population size: \Lambda_t(Z_t) \leq \bar \Lambda(N_t), \quad \text{with } \bar \Lambda (n) = \sum_{e \in \mathcal{P}} \bar \mu^e + \sum_{e\in \mathcal{E}}\bar \lambda^e n + \sum_{e \in \mathcal E_W} \bar W^e n^2. \tag{11} This bound is linear in the population size if there are no interactions, and quadratic if there at least is an event including interactions. This assumption is the key to the algorithms implemented in IBMPopSim. Before presenting the simulation algorithm, we close this section with a rigorous definition of an IBM, based on the pathwise representation of its dynamics as a Stochastic Differential Equation (SDE) driven by Poisson random measures.

3.4 Pathwise representation of stochastic IBM

Since the seminal paper of Fournier and Méléard (2004), it has been shown that a stochastic IBM dynamics can be defined rigorously as the unique solution of an SDE driven by Poisson measures, under reasonable non explosion conditions. This representation has actually been proved only in particular case, for various models. We provide here a general mathematical framework.
Some recalls on Poisson random measures are presented in the Appendix Section 8.1, and for more details on these representations of particular examples, we refer to the abundant literature on the subject (see (Brémaud 1981; Çinlar 2011) and the references therein).

In the following we consider an individual-based stochastic population (Z_t)_{t\in [0,T]}, keeping the notations introduced in Section 3.2 and Section 3.3 for the events and their intensities. In particular, the set of events types that define the population evolution is denoted by \mathcal{P} \cup \mathcal{E} \cup \mathcal{E}_W \subset E, with \mathcal{P}\, the set of events types with Poisson intensity verifying Assumption 1, \mathcal{E} the set of events types with individual intensity and no interaction, verifying Assumption 2 and finally \mathcal{E}_W the set of event types with interactions, verifying Assumption 3.

Non-explosion criterion

First, one has to ensure that the number of events occurring in the population will not explode in finite time, leading to an infinite simulation time. Assumption 2 and Assumption 3 are not sufficient to guarantee the non explosion of the event number, due to the potential explosion of the population size in the presence of interactions. An example is the case when only birth events occur, with an intensity \Lambda^b_t(Z_t) = C_b (N_t^a)^2 (i.e. when W^b(t, I,J) =C_b). Then, the number of alive individuals (N_t^a)_{t\geq 0} is a well-known pure birth process of intensity function g(n) = C_b n^2 (intensity of moving from state n to n+1). This process explodes in finite time, since g does not verify the necessary and sufficient non explosion criterion for pure birth Markov processes: \sum_{n=1}^\infty \frac{1}{g(n)} = \infty (see e.g. Theorem 2.2 in (Bansaye and Méléard 2015)). There is thus an explosion in finite time of birth events.

This example shows that the important point for non explosion is to control the population size. We give below a general sufficient condition on birth and entry event intensities, in order for the population size to stay finite in finite time. This ensures that the number of events does not explode in finite time. Informally, the idea is to control the intensities by a pure birth intensity function verifying the non-explosion criterion.

Assumption 4 Let e\in\{b, en\} a birth or entry event type. If the intensity at which the events of type e occur in the population are not Poissonian, i.e. e \in \mathcal{E} \cup \mathcal{E}_W, then there exists a function f^e : {\mathbb{N}}\to (0, +\infty), such that \sum_{n=1}^{\infty} \frac{1}{nf^e(n)} = \infty, and for all individual I \in \mathcal{I} and population measure Z = \sum_{k=1}^{n} \delta_{I_k} of size n, \lambda^e_t (I, Z) \leq f^e(n), \; \forall \; 0\leq t \leq T.

If e \in \mathcal{E}, \lambda_t^e(I,Z) = \lambda^e(t,I) \leq \bar{\lambda}^e by the domination Assumption 3, then Assumption 4 is always verified with f^e(n) = \bar{\lambda}^e.

Assumption 4 yields that the global intensity \Lambda_t^e(\cdot) of event e is bounded by a function g^e only depending on the population size: \Lambda_t^e (Z) \leq g^e(n) := nf^e(n), \quad \text{with }\sum_{n=1}^{\infty} \frac{1}{g^e(n)} = \infty. If e\in \mathcal{P}\, has a Poisson intensity, then \Lambda_t^e(Z) =\mu^e_t always verifies the previous equation with g^e(n) = \bar \mu^e.

Before introducing the IBM SDE, let us give an idea of the equation construction. Between two successive events, the population composition Z_t stays constant, since the population process (Z_t)_{t \geq 0} is a pure jump process. Furthermore, since each event type is characterized by an intensity function, the jumps occurring in the population can be represented by restriction and projection of a Poisson measure defined on a larger state space. More precisely, we introduce a random Poisson measure Q on \mathbb R^+ \times \mathcal{J}\times \mathbb{R}^+, with \mathcal{J}= \mathbb N \times(\mathcal E \cup \mathcal{E}_W). Q is composed of random quadruplets (\tau, k , e, \theta), where \tau represents a potential event time for an individual I_k and event type e. The last variable \theta is used to accept/reject this proposed event, depending on the event intensity. Hence, the Poisson measure is restricted to a certain random set and then projected on the space of interest {\mathbb{R}}^+ \times \mathcal{J}. If the event is accepted, then a jump \phi^e(\tau,I_k) occurs.

Theorem 1 (Pathwise representation) Let T\in {\mathbb{R}}^+ and \mathcal{J}= \mathbb N \times(\mathcal E \cup \mathcal{E}_W). Let Q be a random Poisson measure on \mathbb R^+ \times \mathcal{J}\times \mathbb{R}^+, of intensity \mathrm{d}t \delta_{\mathcal{J}}(\mathrm{d}k,\mathrm{d}e) (\theta)\mathrm{d}\theta, with \delta_{\mathcal{J}} the counting measure on \mathcal{J}. Finally, let Q^{\mathcal P} be a random Poisson measure on \mathbb R^+ \times \mathcal{P} \times \mathbb{R}^+, of intensity \mathrm{d}t \delta_{\cal P}(\mathrm{d}e)\mathrm{d}\theta, and Z_0= \sum_{k=1}^{N_0} \delta_{I_k} an initial population. Then, under Assumption 4 , there exists a unique measure-valued population process Z, strong solution on the following SDE driven by the Poisson measure Q: \begin{aligned} Z_t = Z_0 &+ \int_0^t \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{s^-}\} }\mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-})\}} Q (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ) \\ &+ \int_0^t \int_{\mathcal{P} \times \mathbb R^+} \phi^e(s, I_{s^-}) \mathbf{1}_{\{\theta \leq \mu^e(s) \}} Q^{\mathcal{P}} (\mathrm{d}s ,\mathrm{d}e, \mathrm{d}\theta), \qquad \forall 0 \leq t \leq T, \end{aligned} \tag{12} and where I_{s^-} is an individual, chosen uniformly among alive individuals in the population Z_{s^-}.

The proof of Theorem 1 is detailed in the Appendix, Section 8.2.1. Note that Equation 12 is an SDE describing the evolution of the IBM, the intensity of the events in the right hand side of the equation depending on the population process Z itself. The main idea of the proof of Theorem 1 is to use the non explosion property of Lemma 1, and to write the r.h.s of Equation 12 as a sum of simple equations between two successive events, solved by induction.

Lemma 1 Let Z be a solution of Equation 12 on {\mathbb{R}}^+, with (T_n)_{n\geq 0} its jump times, T_0 = 0. If Assumption 4 is satisfied, then \lim_{n \to \infty} T_n = \infty, \quad {\mathbb{P}}\text{-a.s.}

The proof of Lemma 1, detailed in Appendix Section 8.2.2 is more technical and relies on a pathwise comparison result, generalizing those obtained in (Kaakai and El Karoui 2023). An alternative pathwise representation of the population process, inspired by the randomized Algorithm 3 is given as well in Proposition 3.

4 Population simulation

We now present the main algorithm for simulating the evolution of an IBM over [0,T]. The algorithm implemented in IBMPopSim allows the exact simulation of Equation 12, based on an acceptance/reject algorithm for simulating random times called thinning. The exact simulation of event times with this acceptance/reject procedure is closely related to the simulations of inhomogeneous Poisson processes by the so-called thinning algorithm, often attributed to Lewis and Shedler (1979). The simulation methods for inhomogeneous Poisson processes can be adapted to IBMs, and we introduce in this section a general algorithm extending those by Fournier and Méléard (2004) (see also Ferrière and Tran (2009), Bensusan (2010)).

It can be noted that under appropriate rescaling and when the population size goes to infinity, an IBM can be approximated by a non linear transport PDE, structured by age and trait. A central limit theorem can also be obtained under appropriate assumptions (Tran 2008). In the presence of interactions as in Section 7 for instance, the IBM goes almost surely to extinction in finite time, which is not the case for the limit PDE. In this case, simulating the microscopic process can be quite useful for approximating the distribution of the extinction time. Other applications of IBM simulations can include the simulation of multiscale population evolution, strongly heterogeneous populations, or small populations with strong interactions.

The algorithm is based on exponential “candidate” event times, chosen with a (constant) intensity which must be greater than the global event intensity \Lambda_t(Z_t) (Equation 4). Starting from time t, once a candidate event time t + \bar T_\ell has been proposed, a candidate event type e (birth, death,...) is chosen with a probability p^e depending on the event intensity bounds \bar \mu^e, \bar \lambda^e and \bar W^e, as defined in Assumption 2 and Assumption 3. An individual I is then drawn from the population. Finally, it remains to accept or reject the candidate event with a probability q^e(t,I,Z_t) depending on the true event intensity. If the candidate event time is accepted, then the event e occurs at time t + \bar T_\ell to the individual I. The main idea of the implemented algorithm can be summarized as follows:

Draw a candidate time t + \bar T_\ell and candidate event type e.
Draw a uniform variable \theta \sim \mathcal{U}([0, 1]) and individual I.
If \theta \leq q^e(t,I,Z_t) then event e occur to individual I, else Do nothing and start again from t + \bar T_\ell.

Before introducing the main algorithms in more details, we recall briefly the thinning procedure for simulating inhomogeneous Poisson processes, as well as the links with pathwise representations. Some recalls on Poisson random measures are presented in Section 8.1. For a more general presentation of thinning of a Poisson random measure, see (Devroye 1986; Çinlar 2011; Kallenberg 2017).

4.1 Thinning of Poisson measure

Let us start with the simulation and pathwise representation of an inhomogeneous Poisson process on [0,T] with intensity (\Lambda(t))_{t\in [0,T]}. The thinning procedure is based on the fundamental assumption that \Lambda(t) \leq \bar \Lambda is bounded on [0,T]. In this case, the inhomogeneous Poisson can be obtained from an homogeneous Poisson process of intensity \bar \Lambda, which can be simulated easily.

First, the Poisson process can be extended to a Marked Poisson measure \bar Q:= \sum_{\ell \ge 1} \delta_{(\bar T_\ell, \bar \Theta_\ell)} on (\mathbb{R}^+)^2, defined as follow:

The jump times of (\bar T_\ell)_{\ell \ge 1} of \bar Q are the jump times of a Poisson process of intensity \bar \Lambda.
The marks (\bar \Theta_\ell)_{\ell \ge 1} are i.i.d. random variables, uniformly distributed on [0, \bar \Lambda].

By Proposition 6 , \bar{Q} is a Poisson random measure with mean measure \bar \mu(\mathrm{d}t, \mathrm{d}\theta): = \bar \Lambda \mathrm{d}t \frac{\mathbf{1}_{[0, \bar \Lambda]}(\theta)}{\bar \Lambda} \mathrm{d}\theta= \mathrm{d}t \mathbf{1}_{[0, \bar \Lambda]}(\theta) \mathrm{d}\theta. In particular, the average number of atoms (\bar T_\ell, \bar \Theta_\ell) in [0,t]\times [0,h] is \mathbb{E}[Q([0,t]\times [0,h])]={\mathbb{E}}[\sum_{\ell} \mathbf{1}_{[0,t]\times [0,h]} (\bar T_\ell, \bar \Theta_{\ell})] = \int_{(\mathbb{R}^+)^2} \bar \mu(\mathrm{d}t, \mathrm{d}\theta) = t (\bar \Lambda \wedge h). The thinning is based on the restriction property for Poisson measure: for a measurable set \Delta\subset {\mathbb{R}}^+\times {\mathbb{R}}^+, the restriction Q^\Delta:= \mathbf{1}_{\Delta}\bar Q of \bar Q to \Delta (by taking only atoms in \Delta) is also a Poisson random measure of mean measure \mu^{\Delta}(\mathrm{d}t, \mathrm{d}\theta) = \mathbf{1}_{\Delta}(t,\theta) \bar \mu(\mathrm{d}t, \mathrm{d}\theta).

In order to obtain an inhomogeneous Poisson measure of intensity (\Lambda(t)), the “good” choice of \Delta is the hypograph of \Lambda: \Delta =\{ (t,\theta) \in [0,T]\times [0,\bar \Lambda] ; \; \theta \leq \Lambda(t)\} (see Figure 1). Then, Q^\Delta = \sum_{\ell \ge 1} \mathbf{1}_{\left\{\bar \Theta_\ell \le \Lambda(\bar T_\ell)\right\}} \delta_{(\bar T_\ell, \bar \Theta_\ell)}, and since \Lambda(t) \leq \bar \Lambda, on [0,T]: \begin{aligned} \mu^{\Delta}(\mathrm{d}t, \mathrm{d}\theta) & = \mathbf{1}_{\{ \theta \leq \Lambda(t)\}} \mathrm{d}t \mathbf{1}_{[0, \bar \Lambda]}(\theta)\mathrm{d}\theta = \mathbf{1}_{\{\theta \leq \Lambda(t)\}} \mathrm{d}t \mathrm{d}\theta. \end{aligned}

Figure 1: Realization of a Marked Poisson measure \bar Q on [0,T] with mean measure \bar \mu(\mathrm{d} t, \mathrm{d} \theta) = \mathrm{d} t \mathbf{1}_{[0, \bar \Lambda]}(\theta) \mathrm{d} \theta (red crosses), and realization of the restriction \bar Q^\Delta where \Delta = \{ (t,\theta)\in [0,T]\times[0, \bar \Lambda], \theta \leq \Lambda(t) \} (blue circles). The projection of \bar Q^\Delta on first component is an inhomogeneous Poisson process on [0,T] of intensity (\Lambda(t)) and jump times (T_k)_{k \ge 1}.

Finally, the inhomogeneous Poisson process is obtained by the projection Proposition 5, which states that the jump times of Q^\Delta are the jump times of an inhomogeneous Poisson process of intensity (\Lambda(t)).

Proposition 1 The counting process N^\Lambda, projection of Q^{\Delta} on the time component and defined by, N^{\Lambda}_t := Q^{\Delta}( [0,t] \times {\mathbb{R}}^+) = \int_0^t \int_{{\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \Lambda(s)\}} \bar Q(\mathrm{d}s, \mathrm{d}\theta) = \sum_{\ell \geq 1} \mathbf{1}_{\{ \bar T_\ell \leq t \}} \mathbf{1}_{\{\bar \Theta_\ell \leq \Lambda(\bar T_\ell) \}}, \quad \forall t \in [0,T], \tag{13} is an inhomogeneous Poisson process on [0,T] of intensity function (\Lambda(t))_{t\in [0,T]}. The thinning Equation 13 is a pathwise representation of N^\Lambda by restriction and projection of the Poisson measure Q on [0,T].

The previous proposition yields a straightforward thinning algorithm to simulate the jump times (T_k)_{k \ge 1} of an inhomogeneous Poisson process of intensity \Lambda(t), by selecting jump times \bar T_\ell such that \bar \Theta_\ell \le \Lambda(\bar T_\ell).

4.1.1 Multivariate Poisson process

This can be extended to the simulation of multivariate inhomogeneous Poisson processes, which is an important example before tackling the simulation of an IBM.

Let (N^j)_{j \in \mathcal{J}} be a (inhomogeneous) multivariate Poisson process indexed by a finite set \mathcal{J}, such that \forall j \in \mathcal{J}, the intensity (\lambda_j(t))_{t\in [0,T]} of N_j is bounded on [0,T]: \sup_{t\in[0,T]} \lambda_j(t) \le \bar \lambda_j, \text{ and let } \bar \Lambda = \sum_{j \in \mathcal{J}} \bar \lambda_j. Recall that such multivariate counting process can be rewritten as a Poisson random measure N= \sum_{k\geq 1} \delta_{(T_k,J_k)} on {\mathbb{R}}^+\times \mathcal{J} (see e.g. Sec. 2 of Chapter 6 in (Çinlar 2011)), where T_k is kth jump time of \sum_{j\in \mathcal{J}} N^j and J_k corresponds to the component of the the vector which jumps. In particular, N^j_t = N([0,t]\times \{j\}).

Once again the simulation of such process can be obtained from the simulation of a (homogeneous) multivariate Poisson process of intensity vector (\bar{\lambda}_j)_{j \in \mathcal{J}}, extended into a Poisson measures by adding marks on {\mathbb{R}}^+. Thus, we introduce the Marked Poisson measure \bar Q = \sum \delta_{(\bar T_\ell, \bar J_\ell, \bar \Theta_\ell)} on {\mathbb{R}}^+ \times \mathcal{J}\times {\mathbb{R}}^+, such that:

The jump times (\bar T_\ell) of \bar Q are the jump times of a Poisson measure of intensity \bar \Lambda.
The variables (\bar J_\ell) are i.i.d. random variables on \mathcal{J}, with \displaystyle p_j= {\mathbb{P}}(\bar J_1 = j)= \bar \lambda_j/\bar \Lambda and representing the component of the vector which jumps.
The marks (\bar \Theta_\ell) are independent variables with \bar \Theta_\ell a uniform random variable on [0,\bar \lambda_{{\bar J}_\ell}], \forall \ell \geq 1.

By Proposition 6 and Proposition 5, each measure \bar Q_j (\mathrm{d}t, \mathrm{d}\theta) = \bar Q(\mathrm{d}t, \{j\}, \mathrm{d}\theta) = \sum_{\ell \geq 1} \mathbf{1}_{\{\bar J_\ell=j \}} \delta_{(\bar T_\ell, \bar \Theta_\ell)} is a marked Poisson measure of intensity \bar \mu_j ( \mathrm{d}t ,\mathrm{d}\theta) = \bar{\Lambda}p_j \mathrm{d}t \frac{\mathbf{1}_{\{\theta \leq \bar \lambda_{j}\}}(\theta)}{\bar \lambda_{j}} \mathrm{d}\theta = \mathrm{d}t \mathbf{1}_{\{\theta \leq \bar \lambda_{j}\}}(\theta) \mathrm{d}\theta. As a direct application of Proposition 1 , the inhomogeneous multivariate Poisson process is obtained by restriction of each measures \bar Q_j to \Delta_j = \{ (t, \theta) \in [0,T] \times [0,\bar \lambda_j] ;\; \theta \leq \lambda_j(t) \} and projection.

Proposition 2 The multivariate counting process (N^j)_{j \in \mathcal{J}}, defined for all j \in \mathcal{J} and t \in [0,T] by thinning and projection of \bar Q: N^j_t := \int_0^t \int_{ {\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \lambda_j(s)\}} \bar{Q}(\mathrm{d}s, \{j\}, \mathrm{d}\theta) = \sum_{\ell \geq 1} \mathbf{1}_{\{ \bar T_\ell \leq t \}}\mathbf{1}_{\{\bar J_\ell = j\}} \mathbf{1}_{\{\bar \Theta_\ell \leq \lambda_j (\bar T_\ell ) \}}, is an inhomogeneous Poisson process of intensity vector (\lambda_j(t))_{j\in\mathcal{J}} on [0,T].

Proposition 2 yields the following simulation Algorithm 1 for multivariate Poisson processes.

\begin{algorithm} \caption{Thinning algorithm for multivariate inhomogeneous Poisson processes.} \begin{algorithmic} \State \textbf{Input:} Functions and bounds $(\lambda_j, \bar \lambda_j)$, $\lambda_j: [0,T] \to [0,\bar \lambda_j]$ and $\bar \Lambda = \sum_{j \in \mathcal{J}} \bar \lambda_j$ \State \textbf{Output:} Points $(T_k, J_k)$ of Poisson measure $N$ on $[0,T] \times \mathcal{J}$ \State Initialization $T_0 \longleftarrow 0$, $\bar T_0 \longleftarrow 0$ \While{$T_k < T$} \Repeat \State increment iterative variable $\ell \longleftarrow \ell+1$ \State compute next proposed time $\bar T_{\ell} \longleftarrow \bar T_{\ell-1} + S_\ell$ with $S_\ell \sim \mathcal{E}(\bar \Lambda)$ \State draw $\bar J_\ell \sim \mathcal{U}\big\{ \bar \lambda_j / \bar \Lambda, j \in \mathcal{J} \big\}$ i.e. $\mathbb{P}(\bar J_\ell = j) = \bar \lambda_j / \bar \Lambda$ \State draw $\bar \Theta_\ell \sim \mathcal{U}([0,\bar \lambda_{\bar J_\ell}])$ \Until{accepted event $\bar \Theta_\ell \le \lambda_{\bar J_\ell}(\bar T_{\ell})$} \State record $(T_k, J_k) \longleftarrow (\bar T_{\ell}, \bar J_\ell)$ as accepted point \EndWhile \end{algorithmic} \end{algorithm}

Remark 2. The acceptance/rejection Algorithm 1 can be efficient when the functions \lambda_j are of different order, and thus bounded by different \bar \lambda_j. However, it is important to note that the simulation of the discrete random variables (\bar J_\ell) can be costly (compared to a uniform law) when \mathcal{J} is large, for instance when an individual is drawn from a large population. In this case, an alternative is to choose the same bound \bar \lambda_j= \bar \lambda for all j \in \mathcal{J}. Then the marks (\bar J_\ell, \bar \Theta_\ell) are i.i.d. uniform variables on \mathcal{J}\times [0,\bar \lambda], faster to simulate.

4.2 Simulation algorithm

Let us now come back to the simulation of the IBM introduced in Section 2. For ease of notations, we assume that there are no event with Poisson intensity (\mathcal{P} =\emptyset), so that all events that occur are of type e \in \mathcal{E} \cup \mathcal{E}_W, with individual intensity \lambda_t^e(I,Z_t) depending on the population composition Z_t (e \in \mathcal{E}_W) or not (e \in \mathcal{E}), as defined in Equation 7 and verifying either Assumption 2 or Assumption 3. The global intensity Equation 9 at time t \in [0,T] is thus \Lambda_t(Z_t) = \sum_{e \in \mathcal{E}} \Big( \sum_{k=1}^{N_t} \lambda^e(t, I_k) \Big) + \sum_{e \in \mathcal{E}_W} \Big( \sum_{k=1}^{N_t} \sum_{j=1}^{N_t} W^e(t, I_k, I_j) \Big) \leq \bar \Lambda(N_t), \tag{14} with \bar \Lambda(n) = \big(\sum_{e \in \mathcal{E}} \bar \lambda^e \big) n + \big( \sum_{e \in \mathcal{E}_W} \bar W^e \big) n^2.

One of the main difficulty is that the intensity of events is not deterministic as in the case of inhomogeneous Poisson processes, but a function \Lambda_t(Z_t) of the population state, bounded by a function which also depends on the population size. However, the Algorithm 1 can be adapted to simulate the IBM. The construction is done by induction, by conditioning on the state of the population Z_{T_k} at the kth event time T_k (T_0 = 0).

We first present the construction of the first event at time T_1.

First event simulation

Before the first event time, on \{ t < T_1 \}, the population composition is constant : Z_t = Z_0 = \{ I_1, \dots, I_{N_0}\}. For each type of event e and individual I_k, k \in \{1,\dots N_0\}, we denote by N^{k,e} the counting process of intensity \lambda_t^e (I_k,Z_t), counting the occurrences of the events of type e happening to the individual I_k. Then, the first event T_1 is the first jump time of the multivariate counting vector (N^{(k,e)})_{ (k,e) \in \mathcal{J}_0}, with \mathcal{J}_0 = \{1,\dots , N_0\}\times \big(\mathcal{E} \cup \mathcal{E}_W \big).

Since the population composition is constant before the first event time, each counting process N^{j} with j=(k,e) \in \mathcal{J}_0 coincides on [0,T_1[ with an inhomogeneous Poisson process, of intensity \lambda_t^e (I_k,Z_0). Thus (conditionally to Z_0), T_1 is also the first jump time of an inhomogeneous multivariate Poisson process N^0 = (N^{0,j})_{j \in \mathcal{J}_0} of intensity function (\lambda_j)_{j\in \mathcal J_0}, defined for all j = (k,e) \in \mathcal{J}_0 by: \lambda_j(t) = \lambda^e_t(I_k,Z_0) \le \bar \lambda^e_0 \quad \text{with} \quad \bar \lambda^e_0 = \bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e N_0 \mathbf{1}_{e \in \mathcal{E}_W}, by Assumption 2 and Assumption 3. In particular, the jump times of N^0 occur at the intensity \Lambda(t) =\sum_{j \in \mathcal{J}_0} \lambda_j(t) =\sum_{e \in \mathcal{E} \cup \mathcal{E}_W} \sum_{k=1}^{N_0} \lambda^e_t(I_k,Z_0) \leq \bar \Lambda(N_0)=N_0 \sum_{e \in \mathcal{E} \cup \mathcal{E}_W} \bar \lambda^e_0. By Proposition 2, N^0 can be obtained by thinning of the marked Poisson measure \bar Q^0 = \sum_{\ell \geq 1} \delta_{(\bar T_\ell , (\bar{K}_\ell, \bar E_\ell), \bar \Theta_\ell)} on {\mathbb{R}}^+\times\mathcal{J}_0 \times {\mathbb{R}}^+, with:

(\bar T_\ell)_{\ell \in {\mathbb{N}}^*} the jump times of a Poisson process of rate \bar \Lambda(N_0).
(\bar{K}_\ell, \bar E_\ell)_{\ell \in {\mathbb{N}}^*} discrete i.i.d. random variables on \mathcal{J}_0 = \{1,\dots , N_0\}\times \big(\mathcal{E} \cup \mathcal{E}_W \big), with K_\ell representing the index of the chosen individual and E_\ell the event type for the proposed event, such that: \mathbb{P}( \bar K_1 = k, \bar E_1 = e) = \frac{\bar \lambda^e_0 }{\bar \Lambda(N_0)} = \frac{1}{N_0} \frac{\bar \lambda^e_0 N_0}{\bar \Lambda(N_0)}, i.e. (\bar K_1, \bar E_1) are distributed as independent random variables where \bar K_1 \sim \mathcal U(\{1,\dots, N_0\}) and \bar E_1 such that p_e := \mathbb{P}( \bar E_1 = e) = \frac{\bar \lambda^e_0 N_0}{\bar \Lambda(N_0)}.
(\bar \Theta_\ell)_{\ell \in {\mathbb{N}}^*} are independent uniform random variables, with \bar \Theta_\ell \sim \mathcal{U}([0,\bar \lambda^{\bar E_\ell}]).

Since the first event is the first jump of N^0, by Proposition 2 and Algorithm 1, the first event time T_1 is the first jump time \bar T_\ell of \bar Q^0 such that \bar \Theta_\ell \leq \lambda^{\bar E_\ell}_{\bar T_\ell}(I_{\bar K_\ell}, Z_0).

At T_1 =\bar T_{\ell}, the event \bar E_\ell occurs to the individual I_{\bar K_\ell} = (\tau^b, \infty, x). For instance, if \bar E_\ell =d, a death/exit event occurs, so that Z_{T_1} = Z_{0} + \delta_{(\tau^b, T_1, x)} - \delta_{I_{\bar K_\ell}} and N_{T_1} = N_{0}. If \bar E_\ell =b or en, a birth or entry event occurs, so that N_{T_1} = N_{0} + 1, and a new individual I_{N_0+1} is added to the population, chosen as described in Table 1. Finally, if \bar E_\ell=s, a swap event occurs, the population size stays constant and I_{\bar K_\ell} is replaced by an individual I_{\bar K_\ell}', chosen as described in Table 1.

The steps for simulating the first event in the population can be iterated in order to simulate the population. At the kth step, the same procedure is repeated to simulate the kth event, starting from a population Z_{T_{k-1}} of size N_{T_{k-1}}. The algorithm is summarized in Algorithm 2.

\begin{algorithm} \caption{IBM simulation algorithm (without events of Poissonian intensity)} \begin{algorithmic} \State \textbf{Input:} Initial population $Z_0$, horizon $T > 0$, and events described by: \State - Intensity functions and bounds $(\lambda^e, \bar \lambda^e)$ for $e \in \mathcal{E}$ and $(W^e, \bar W^e)$ for $e \in \mathcal{E}_W$ \State - Event action functions $\phi^e(t, I)$ for $e \in \mathcal{E} \cup \mathcal{E}_W$ \State \textbf{Output:} Population $Z_T$ \State Initialization $T_0 \longleftarrow 0$, $\bar T_0 \longleftarrow 0$ \While{$T_k < T$} \Repeat \State increment iterative variable $\ell \longleftarrow \ell+1$ \State compute next proposed time $\bar T_{\ell} \longleftarrow \bar T_{\ell-1} + \mathcal{E}\big(\bar \Lambda(N_{T_k}) \big)$ \State draw a proposed event $\bar E_\ell \sim \mathcal{U}\{p_e\}$ with $p_e = \frac{\bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e N_{T_k} \mathbf{1}_{e \in \mathcal{E}_W}}{\sum_{e \in \mathcal{E}} \bar \lambda^e + \sum_{e \in \mathcal{E}_W} \bar W^e N_{T_k}}$ \State draw an individual index $\bar K_\ell \sim \mathcal{U}(\{1,\dots,N_{T_k}\})$ \State draw $\bar \Theta_\ell \sim \mathcal{U}([0, \bar \lambda^{\bar E_\ell}])$ if $\bar E_\ell \in \mathcal{E}$ or $\bar \Theta_\ell \sim \mathcal{U}([0, \bar W^{\bar E_\ell} N_{T_k}])$ if $\bar E_\ell \in \mathcal{E}_W$ \Until{accepted event $\bar \Theta_\ell \le \lambda^{\bar E_\ell}_{\bar T_\ell}(I_{\bar K_\ell},Z_{T_{k}})$} \State increment iterative variable $k \longleftarrow k+1$ \State record $(T_k, E_k, I_k) \longleftarrow (\bar T_{\ell}, \bar E_\ell, I_{\bar K_\ell})$ as accepted time, event and individual \State update the population $Z_{T_k} = Z_{T_{k-1}} + \phi^{E_k}(T_k, I_k)$ \EndWhile \end{algorithmic} \end{algorithm}

Theorem 2 A population process (Z_t)_{t \in [0,T]} simulated by the Algorithm 2 is an exact solution of the SDE Equation 12.

The proof of Theorem 2 is detailed in the Appendix Section 8.3.

Remark 3. The population Z_{T_k} includes dead/exited individuals before the event time T_k. Thus, N_{T_k} > N_{T_k}^a is greater than the number of alive individuals at time T_k. When a dead individual I_{\bar K_l} is drawn from the population during the rejection/acceptance phase of the algorithm, the proposed event (\bar T_{\ell}, \bar E_\ell, I_{\bar K_\ell}) is automatically rejected since the event intensity is \lambda^{\bar E_\ell}_{T_\ell}(I_{\bar K_\ell}, Z_{T_k}) = 0 (nothing can happen to a dead individual). This can slow down the algorithm, especially when the proportion of dead/exited individuals in the population increases. However, the computational cost of keeping dead/exited individuals in the population is much lower than the cost of removing an individual from the population at each death/exit event, which is linear in the population size.

Actually, dead/exited individuals are regularly removed from the population in the IBMPopSim algorithm, in order to optimize the trade-off between having to many dead individuals and removing dead individuals from the population too often. The frequency at which dead individuals are “removed from the population” can be chosen by the user, as an optional argument of the main function popsim (see details in Section 4).

Remark 4. In practice, the bounds \bar \lambda^e and \bar W^e should be chosen as sharp as possible. It is easy to see that conditionally to \{\bar E_\ell = e, \bar T_\ell = t, \bar K_\ell = l \} the probability of accepting the event is, depending if there are interactions, \mathbb{P}\big(\bar \Theta_\ell \le \lambda^e_t(I_l,Z_{T_k})| \mathcal{F}_{T_k}\big) = \frac{\lambda^e(t, I_l)}{\bar \lambda^e} \mathbf{1}_{e \in \mathcal{E}} + \frac{\sum_{j=1}^{N_{T_k}} W^e(t, I_l, I_j)}{\bar W^e N_{T_k}} \mathbf{1}_{e \in \mathcal{E}_W}. The sharper the bounds \bar \lambda^e and \bar W^e are, the higher is the acceptance rate. For even sharper bounds, an alternative is to define bounds \bar \lambda^e(I_l) and \bar W^e(I_l) depending on the individuals’ characteristics. However, the algorithm is modified and the individual I_l is not chosen uniformly in the population anymore. Due to the population size, this is way more costly than choosing uniform bounds, as explained in Remark 2.

4.3 Simulation algorithm with randomization

Let e \in \cal E_W be an event with interactions. In order to evaluate the individual intensity \lambda^e_t (I,Z_t) = \sum_{j=1}^{N_t} W^e(t, I,I_j) one must compute W^e(t, I_l, I_j) for all individuals in the population. This step can be computationally costly, especially for large populations. One way to avoid this summation is to use randomization (see also Fournier and Méléard (2004) in a model without age). The randomization consists in replacing the summation by an evaluation of the interaction function W^e using an individual J drawn uniformly from the population.

More precisely, if J \sim \mathcal{U}(\{1, \dots, N_{T_k}\}) is independent of \bar \Theta_\ell, we have \mathbb{P}\Big(\bar \Theta_\ell \le \sum_{j=1}^{N_{T_k}} W^e(t, I_l, I_j) | \mathcal{F}_{T_k} \Big) = \mathbb{P}\big(\bar \Theta_\ell \le N_{T_k} W^e(t, I_l, I_J) | \mathcal{F}_{T_k}\big). \tag{15} Equivalently, we can write this probability as \mathbb{P}\big(\tilde \Theta_\ell \le W^e(t, I_l, I_J) \big) where \tilde \Theta_\ell = \frac{\bar \Theta_\ell}{N_{T_k}}\sim \mathcal{U}([0, \bar W^e]) is independent of J \sim \mathcal{U}(\{1, \dots, N_{T_k}\}).

The efficiency of the randomization procedure increases with the population homogeneity. If the function W^e varies little according to the individuals in the population, the randomization approach is very efficient in practice, especially when the population is large.

We now present the main Algorithm 3 implemented in the popsim function of the IBMPopSim package in the case where events arrive with individual intensities, but also with interactions (using randomization) and Poisson intensities. In this general case, \bar \Lambda(n) is defined by Equation 11.

\begin{algorithm} \caption{Randomized IBM simulation algorithm} \begin{algorithmic} \State \textbf{Input:} Initial population $Z_0$, horizon $T > 0$, and events described by \State Intensity functions and bounds $(\lambda^e, \bar \lambda^e)$ for $e \in \mathcal{E}$, $(W^e, \bar W^e)$ for $e \in \mathcal{E}_W$ and $(\mu^e, \bar \mu^e)$ for $e \in \mathcal{P}$ \State Event action functions $\phi^e(t, I)$ for $e \in \mathcal{E} \cup \mathcal{E}_W \cup \mathcal{P}$ \State \textbf{Output:} Population $Z_T$ \State Initialization $T_0 \longleftarrow 0$, $\bar T_0 \longleftarrow 0$ \While{$T_k < T$} \Repeat \State increment iterative variable $\ell \longleftarrow \ell+1$ \State compute next proposed time $\bar T_{\ell} \longleftarrow \bar T_{\ell-1} + \mathcal{E}\big(\bar \Lambda(N_{T_k}) \big)$ \State draw an individual index $\bar K_\ell \sim \mathcal{U}(\{1,\dots,N_{T_k}\})$ \State draw a proposed event $\bar E_\ell \sim \mathcal{U}\{p_e\}$ with $p_e = \frac{\bar \mu^e \mathbf{1}_{e \in \mathcal{P}} + \bar \lambda^e N_{T_k}\mathbf{1}_{e \in \mathcal{E}} + \bar W^e (N_{T_k})^2 \mathbf{1}_{e \in \mathcal{E}_W}}{\bar \Lambda(N_{T_k})}$ \If{$\bar E_\ell \in \mathcal{E}$ (without interaction)} \State draw $\bar \Theta_\ell \sim \mathcal{U}\big([0, \bar \lambda^{\bar E_\ell}] \big)$ \State \textit{accepted} $\longleftarrow \bar \Theta_\ell \le \lambda^{\bar E_\ell}(\bar T_\ell, I_{\bar K_\ell})$ \EndIf \If{$\bar E_\ell \in \mathcal{E}_W$ (with interaction)} \State draw $(\bar \Theta_\ell, J_\ell) \sim \mathcal{U}\big([0, \bar W^{\bar E_\ell}] \times \{1, \dots, N_{T_k}\} \big)$ \State \textit{accepted} $\longleftarrow \bar \Theta_\ell \le W^{\bar E_\ell}(\bar T_\ell, I_{\bar K_\ell}, I_{J_\ell})$ \EndIf \If{$\bar E_\ell \in \mathcal{P}$ (Poissonian intensity)} \State draw $\bar \Theta_\ell \sim \mathcal{U}\big([0, \bar \mu^{\bar E_\ell}] \big)$ \State \textit{accepted} $\longleftarrow \bar \Theta_\ell \le \mu^{\bar E_\ell}(\bar T_\ell)$ \EndIf \Until{accepted} \State increment iterative variable $k \longleftarrow k+1$ \State record $(T_k, E_k, I_k) \longleftarrow (\bar T_{\ell}, \bar E_\ell, \bar I_{\bar K_\ell})$ as accepted time, event and individual \State update the population $Z_{T_k} = Z_{T_{k-1}} + \phi^{E_k}(T_k, I_k)$ \EndWhile \end{algorithmic} \end{algorithm}

Proposition 3 The population processes (Z_t)_{t\in [0,T]} simulated by the Algorithm 2 and Algorithm 3 have the same law.

Proof. The only difference between Algorithm 2 and Algorithm 3 is in the acceptance/rejection step of proposed events, in the presence of interactions. In Algorithm 3 , a proposed event (\bar T_\ell, \bar E_\ell, \bar K_\ell), with \bar E_l \in \mathcal{E}_W (an event with interaction), is accepted as a true event in the population if \bar \Theta_\ell \le W^{\bar E_\ell}(\bar T_\ell, I_{\bar K_\ell}, I_{\bar J_\ell}), \text{ with } (\bar \Theta_\ell, \bar J_\ell) \sim \mathcal{U}\big([0, \bar W^{\bar E_\ell}] \times \{1, \dots, N_{T_k}\} \big). By Equation 15, the probability of accepting this event is the same than in Algorithm 2, which achieves the proof.

5 Model creation and simulation with IBMPopSim

The use of the IBMPopSim package is mainly done in two steps: a first model creation followed by the simulation of the population evolution. The creation of a model is itself based on two steps: the description of the population Z_t, as introduced in Section 3.1, and the description of the events types, along with their associated intensities, as detailed in Section 3.2 and Section 3.3. A model is compiled by calling the mk_model function, which internally uses a template mechanism to automatically generate the source code describing the model, which is subsequently compiled using the Rcpp package to produce the object code.

After the compilation of the model, the simulations are launched by calling the popsim function. This function depends on the previously compiled model and simulates a random trajectory of the population evolution based on an initial population and on parameter values, which can change from a call to another.

In this section, we take a closer look at each component of a model in IBMPopSim. We also refer to the IBMPopSim website and to the vignettes of the package for more details on the package and various examples of model creation.

5.1 Population

A population Z is represented by an object of class population containing a data frame where each row corresponds to an individual I=(\tau^b, \tau^d, x), and which has at least two columns, birth and death, corresponding to the birth date \tau^b and death/exit date \tau^d (\tau^d is set to NA for alive individuals). The data frame can contain more than two columns if individuals are described by additional characteristics x= (x_1,\dots x_n).

If entry events can occur in the population, the population will contain a characteristic named entry. This can be done by setting the flag entry=TRUE in the population function, or by calling the add_characteristic function on an existing population. During the simulation, the date at which an individual enters the population is automatically recorded in the variable I.entry. If exit events can occur, the population shall contain a characteristic named out. This can be done by setting the flag out=TRUE in the population function, or by calling the add_characteristic function. When an individual I exits the population during the simulation, I.out is set to TRUE and its exit time is recorded as a “death” date.

In the example below, individuals are described by their birth and death dates, as well a Boolean characteristics called male, and the entry characteristic. For instance, the first individual is a female whose age at t_0=0 is 107 and who was originally in the population.

Hide/Show the code

pop_init <- population(EW_pop_14$sample, entry=TRUE)
str(pop_init)

Classes 'population' and 'data.frame':  100000 obs. of  4 variables:
 $ birth: num  -107 -107 -105 -104 -104 ...
 $ death: num  NA NA NA NA NA NA NA NA NA NA ...
 $ male : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
 $ entry: logi  NA NA NA NA NA NA ...

Individual In the C++ model which is automatically generated and compiled, an individual I is an object of an internal class containing some attributes (birth_date, death_date and the characteristics), and some methods including:

I.age(t): a const method returning the age of an individual I at time t,
I.set_age(a, t): a method to set the age a at time t of an individual I (set birth_date at t-a),
I.is_dead(t): a const method returning true if the individual I is dead at time t.

Remark 5. A characteristic x_i must be of atomic type: logical, integer, double or character. The function get_characteristic allows to easily get characteristics names and their types from a population data frame. We draw the attention to the fact that some names for characteristics are forbidden, or reserved to specific cases : this is the case for birth, death, entry, out, id.

5.2 Events

The most important step of the model creation is the events creation. The call to the function creating an event is of form

Hide/Show the code

mk_event_CLASS(type="TYPE", name="NAME", ...)

where CLASS is replaced by the class of the event intensity, described in Section 3.3 , and type corresponds to the event type, described in Section 3.2. Table 2 (a) and Table 2 (b) summarize the different possible choices for intensity classes and types of event. The optional argument name gives a name to the event. If not specified, the name of the event is its type, for instance death. However, a name must be specified if the model is composed of several events with the same type (for instance when there are multiple death events corresponding to different causes of death). The other arguments depend on the intensity class and on the event type.

Table 2: Choices of CLASS and TYPE arguments for an event creation.

(a) Intensity Classes

Intensity class	Set	`CLASS`
Individual	\mathcal{E}	`individual`
Interaction	\mathcal{E}_W	`interaction`
Poisson	\mathcal{P}	`poisson`
Inhomogeneous Poisson	\mathcal{P}	`inhomogeneous_poisson`

(b) Event Types

Event type	`TYPE`
Birth	`birth`
Death	`death`
Entry	`entry`
Exit	`exit`
Swap	`swap`

The intensity function and the kernel of an event are defined through arguments of the function mk_event_CLASS. These arguments are strings composed of few lines of code. Since the model is compiled using Rcpp, the code should be written in C++. However, thanks to the functions/variables of the package, even the non-experienced C++ user can define a model quite easily. To facilitate the implementation, the user can also define a list of model parameters, which can be used in the event and intensity definitions. These parameters are stored in a named list and can be of various types: atomic type, numeric vector or matrix, predefined function of one variable (stepfun, linfun, gompertz, weibull, piecewise_x), piecewise functions of two variables (piecewise_xy). We refer to the vignette(IBMPopSim_cpp) for more details on parameters types and basic C++ tools. Another advantage of the model parameters is that their value can be modified from a simulation to another without changing the model.

5.2.1 Intensities

In IBMPopSim, the intensity of an event can belong to three classes Section 3.3: individual intensities without interaction between individuals, corresponding to events e\in\mathcal{E}, individual intensities with interaction, corresponding to events e\in\mathcal{E}_W, and Poisson intensities (homogeneous and inhomogeneous), corresponding to events e\in\mathcal{P}.

Event creation with individual intensity

An event e\in \mathcal{E} (see Equation 5) has an intensity of the form \lambda^e(t, I) which depends only on the individual I and time. Events with such intensity are created using the function

Hide/Show the code

mk_event_individual(type = "TYPE", 
                    name = "NAME",
                    intensity_code = "INTENSITY", ...)

The intensity_code argument is a character string containing few lines of C++ code describing the intensity function \lambda^e(t, I). The intensity value has to be stored in a variable called result and the available variables for the intensity code are given in Table 3.

Table 3: C++ variables available for intensity code

Variable	Description
Variable	Description
`I`	Current individual
`J`	Another individual in the population (only for interaction)
`t`	Current time
Model parameters	Depends on the model

For instance, the intensity code below corresponds to an individual death intensity \lambda^d(t,I) equal to d_1(a(I,t)) = \alpha_1 \exp(\beta_1 a(I,t)) for males and d_2(a(I,t)) = \alpha_2 \exp(\beta_2 a(I,t)) for females, where a(I,t)=t-\tau^b is the age of the individual I=(\tau^b, \tau^d,x) at time t. In this case, the intensity function depends on the individuals’ age, gender, and on the model parameters \alpha = (\alpha_1, \alpha_2) and \beta = (\beta_1, \beta_2).

Hide/Show the code

death_intensity <- "
    if (I.male) result = alpha_1 * exp(beta_1 * I.age(t));
    else result = alpha_2 * exp(beta_2 * I.age(t));
"

Event creation with interaction intensity

An event e\in \mathcal{E}_W is an event which occurs to an individual at a frequency which is the result of interactions with other members of the population (see Equation 6), and which can be written as \lambda^e_t(I, Z_t)=\sum_{J\in Z_t} W^e(t, I, J) where W^e(t, I, J) is the intensity of the interaction between individual I and individual J.

An event e\in \mathcal{E}_W with such intensity is created by calling the function

Hide/Show the code

mk_event_interaction(type = "TYPE",
                     name = "NAME",
                     interaction_code = "INTERACTION_CODE",
                     interaction_type = "random", ...)

The interaction_code argument contains few lines of C++ code describing the interaction function W^e(t, I, J). The interaction function value has to be stored in a variable called result and the available variables for the intensity code are given in Table 3. For example, if we set

Hide/Show the code

death_interaction_code <- "result = max(J.size - I.size, 0.);"

the death intensity of an individual I is the result of the competition between individuals, depending on a characteristic named size, as defined in Equation 8.

The argument interaction_type, set by default at random, is the algorithm choice for simulating the model. When interaction_type=full, the simulation follows Algorithm 2, interaction_type=random it follows Algorithm 3. In most cases, the random algorithm is much faster than the full algorithm. For instance in the example of Section 7 the random algorithm is 40 times faster on average than the full algorithm, on a set of standard parameters. This allows in particular to explore larger parameter sets and population sizes, while avoiding the explosion of computation time.

Events creation with Poisson and Inhomogeneous Poisson intensity

For events e\in \mathcal{P}\, with an intensity \mu^e(t) which does not depend on the population, the event intensity is of class inhomogeneous_poisson or poisson depending on whether or not the intensity depends on time (in the second case the intensity is constant).

For Poisson (constant) intensities the events are created with the function

Hide/Show the code

mk_event_poisson(type = "TYPE",
                 name = "NAME",
                 intensity = "CONSTANT", ...)

The following example creates a death event, where individuals die at a constant intensity lambda (which has to be in the list of model parameters):

Hide/Show the code

mk_event_poisson(type = "death,
                 intensity = "lambda")

When the intensity (\mu^e(t)) depends on time, the event can be created similarly by using the function

Hide/Show the code

mk_event_inhomogeneous_poisson(type = "TYPE",
                               name = "NAME",
                               intensity = "INTENSITY", ...)

5.2.2 Event kernel code

When an event occurs, the events kernels k^e specify how the event modifies the population. The events kernels are defined in the kernel_code parameter of the mk_event_CLASS(type = "TYPE", name ="NAME", ...) function. The kernel_code is NULL by default and doesn’t have to be specified for death, exit events and birth events, but mandatory for entry and swap events. Recall that the kernel_code argument is a string composed of a few lines of C++ code, characterizing the individual characteristics following the event. Table 4 summarizes the list of available variables that can be used in the kernel_code.

Death/Exit event If the user defines a death event, the death date of the current individual I is set automatically to the current time t. Similarly, when an individual I exits the population, I.out is set automatically to TRUE and his exit time is recorded as a death date. For these events types, the kernel_code doesn’t have to be specified by the user.
Birth event The default generated event kernel is that an individual I gives birth to a new individual newI of age 0 at the current time t, with same characteristics than the parent I. If no kernel is specified, the default generated C++ code for a birth event is:

Hide/Show the code

individual newI = I;
newI.birth_date = t;
pop.add(newI);

The user can modify the birth kernel, by specifying the argument kernel_code of mk_event_CLASS. In this case, the generated code is

Hide/Show the code

individual newI = I;
newI.birth_date = t;
_KERNEL_CODE_
pop.add(newI);

where _KERNEL_CODE_ is replaced by the content of the kernel_code argument.

Entry event When an individual I enters the population, I.entry is set automatically as the date at which the individual enters the population. When an entry occurs the individual entering the population is not of age 0. In this case, the user must specify the kernel_code argument indicating how the age and characteristics of the new individual are chosen. For instance, the code below creates an event of type entry, named ev_example, where individuals enter the population at a Poisson constant intensity. When an individual newI enters the population at time t, its age is chosen as a normally distributed random variable, with mean 20 and variance 4.

Hide/Show the code

mk_event_poisson(
    type = "entry", 
    name = "ev_example", 
    intensity = "lambda", 
    kernel_code = "
        double a_I = max(CNorm(20, 2), 0.); 
        newI.set_age(a_I, t);
    ")

Swap event The user must specify the kernel_code argument indicating how the characteristics of an individual are modified following a swap.

Table 4: C++ variables available for events kernel code

Variable	Description
Variable	Description
`I`	Current individual
`J`	Another individual in the population (only for interaction)
`t`	Current time
`pop`	Current population (vector)
`newI`	Available only for birth and entry events.
Model parameters	Depends on the model

When there are several events of the same type, the user can identify which events generated a particular event by adding a characteristic to the population recording the event name/id when it occurs. See e.g. vignette(IBMPopSim_human_pop) for an example with different causes of death.

5.3 Model creation

Once the population, the events, and model parameters are defined, the IBM model is created using the function mk_model.

Hide/Show the code

model <- mk_model(characteristics = get_characteristics(pop_init),
                  event = events_list,
                  parameters = model_params)

During this step which can take a few seconds, the model is created and compiled using the Rcpp package. The model structure in IBMPopSim is that the model depends only on the population characteristics’ and parameters names and types, rather than their values. This means that once the model has been created, various simulations can be done with different initial populations and different parameters values.

Example 2 Here is an example of model with a population structured by age and gender, with birth and death events. The death intensity of an individual of age a is d(a) = \alpha \exp(\beta a), and females between 15 and 40 can give birth with birth intensity b(a) = \bar \lambda^b \mathbf{1}_{[15,40]}. The newborn is a male with probability p_{male}.

Hide/Show the code

params <- list("p_male"= 0.51,
               "birth_rate" = stepfun(c(15,40),c(0,0.05,0)),
               "death_rate" = gompertz(0.008,0.02))


death_event <- mk_event_individual(type = "death", name= "my_death_event",
                  intensity_code = "result = death_rate(age(I,t));")

birth_event <- mk_event_individual( type = "birth",
                  intensity_code = "if (I.male)
                                        result = 0;
                                    else
                                        result=birth_rate(age(I,t));",
                  kernel_code = "newI.male = CUnif(0, 1) < p_male;")
pop <- population(EW_pop_14$sample)

model <- mk_model(characteristics = get_characteristics(pop),
                  events = list(death_event,birth_event),
                  parameters = params)

5.4 Simulation

The simulation of the IBM is based on the algorithms presented in Section 4.2 and Section 4.3. The user has first to specify bounds for the intensity or interaction functions of each event type. The random evolution of the population can then be simulated over a period of time [0,T] by calling the function popsim.

Events bounds

Since the IBM simulation algorithm is based on an acceptance-rejection method for simulating random times, the user has to specify bounds for the intensity (or interaction) functions of each event (see Assumption 2 and Assumption 3). These bounds should be stored in a named vector, where for event e, the name corresponding to the event bound \bar{\mu}^e, \bar{\lambda}^e or \bar{W}^e is the event name defined during the event creation step.

In Example 2 from previous section the intensity bound for birth events is \bar\lambda_b. Since the death intensity function is not bounded, the user will have to specify a maximum age a_{\max} in popsim (all individuals above a_{\max} die automatically). Then, the bound for death events is \bar \lambda_d = \alpha\exp(\beta a_{\max}). In the example, the death event has been named my_death_event. No name has been specified for the birth event which thus has the default name birth. Then,

Hide/Show the code

a_max <- 120 # maximum age
events_bounds <- c("my_death_event" = params$death_rate(a_max),
                   "birth" = max(params$birth_rate))

Once the model and events bounds have been defined, a random trajectory of the population can be simulated by calling

Hide/Show the code

sim_out <- popsim(model, pop, events_bounds, params,
                  age_max = a_max, time = 30)

Optional parameters

If there are no events with intensity of class interaction, then the simulation can be parallelized easily by setting the optional parameter multithreading (FALSE by default) to TRUE. By default, the number of threads is the number of concurrent threads supported by the available hardware implementation. The number of threads can be set manually with the optional argument num_threads. By default, when the proportion of dead individuals in the population exceeds 10\%, dead individuals are removed from the current population used in the algorithm (see Remark 3). The user can modify this ratio using the optional argument clean_ratio, or by removing dead individuals from the population with a certain frequency, given by the clean_step argument. Finally, the user can also define the seed of the random number generator stored in the argument seed.

Outputs and treatment of swap events

The output of the popsim function is a list containing three elements: a data frame population containing the output population Z_T (or a list of populations (Z_{t_1}, \dots Z_{t_n}) if time is a vector of times), a numeric vector logs of variables related to the simulation algorithm (including the simulation time and number of proposed/accepted events), and the list arguments of the simulation inputs, including the initial population, parameters and event bounds used for the simulation.

When there are no swap events (individuals don’t change of characteristics), the evolution of the population over the period [0,T] is recorded in a single data frame sim_out$population where each line contains the information of an individual who lived in the population over the period [0,T] (see Remark 1).

When there are swap events (individuals can change of characteristics), recording the dates of swap events and changes of characteristics following each swap event and for each individual in the population is a memory intensive and computationally costly process. To maintain efficient simulations in the presence of swap events, the argument time of popsim can be defined as a vector of dates (t_0,\dots, t_n). In this case, popsim returns in the object population a list of n populations representing the population at time t_1,\dots t_n, simulated from the initial time t_0. For i=1\dots n, the ith data frame is the population Z_{t_i}, i.e. individuals who lived in the population during the period [t_0,t_i], with their characteristics at time t_i.

It is also possible to isolate the individuals’ life course, by adding an id column to the population, which can be done by setting id=TRUE in the population construction, or by calling the add_characteristic function to an existing population, in order to identify each individual with a unique integer.

Base functions to study the simulation outputs are provided in the package. For instance, the population age pyramid can computed at a given time, as well as death and exposure tables. Several illustrations of the outputs functions are given in the example Section 6 and Section 7.

6 Insurance portfolio

This section provides an example of how to use the IBMPopSim package to simulate a heterogeneous life insurance portfolio (see also vignette(IBMPopSim_insurance_portfolio)).

We consider an insurance portfolio consisting of male policyholders, of age greater than 65. These policyholders are characterized by their age, assumed to be less than a_{\max} = 110, and risk class x \in \mathcal X = \{1, 2\}.

Entries in the portfolio New policyholders enter the population at a constant Poisson rate \mu^{en}=\lambda, which means that on average, \lambda individuals enter the portfolio each year. A new individual enters the population at an age a that is uniformly distributed between 65 and 70, and is in risk class 1 with probability p.

Death events A baseline age and time specific death rate is first calibrated on “England and Wales (EW)” males mortality historic data (source: Human Mortality Database https://www.mortality.org/), and projected for 30 years using the Lee-Carter model with the package StMoMo (see A. M. Villegas, Kaishev, and Millossovich (2018)). The forecasted baseline death intensity is denoted by d(t,a), defined by: d(t,a) = \sum_{k=0}^{29}\mathbf{1}_{\{k\leq t < k+1\}} d_k(a), \quad \forall \; t\in [0,30] \text{ and } a \in [65, a_{\max}], \tag{16} with d_k(a) the point estimate of the forecasted mortality rate for age a and year k.
Individuals in risk class 1 are assumed to have mortality rates that are 20% higher than the baseline mortality (for instance, the risk class could refer to smokers), while individuals in risk class 2 are assumed to have mortality rates that are 20% lower than the baseline (non smokers). The death intensity of an individual I= (\tau_b, \infty, x), of age a(I,t) = t - \tau_b at time t and in risk class x \in \{1, 2\} is thus the function \lambda^d(t,I) = \alpha_x d(t,a(I,t)), \quad \alpha_1 = 1.2, \quad \alpha_2 = 0.8. In particular, the death intensity verifies Assumption 3 since: \lambda^d(t,I) \leq \bar d : = \alpha_1 \sup_{t \in [0,30]} d(t,a_{\max}). \tag{17}

Exits from the portfolio Individuals exit the portfolio at a constant (individual) rate \lambda^{ex}(t,I) = \mu^{i} only depending on their risk class i\in \{1,2\}.

6.1 Population

We start with an initial population of 30\,000 males of age 65, distributed uniformly in each risk class. The population data frame has thus the two (mandatory) columns birth (here the initial time is t_0=0) and death (NA if alive), and an additional column risk_cls corresponding to the policyholders risk class. Since there are entry and exit events, the entry and out flags of the population constructor are set to TRUE.

Hide/Show the code

N <- 30000
pop_df <- data.frame("birth" = rep(-65,N), "death" = rep(NA,N),
                     "risk_cls" = rep(1:2,each=N/2))
pop_init <- population(pop_df, entry=TRUE, out=TRUE)

6.2 Events

Entry events The age of the new individual is determined by the kernel_code argument in the mk_event_poisson function.

Hide/Show the code

entry_params <- list("lambda" = 30000, "p" = 0.5)
entry_event <- mk_event_poisson(
    type = "entry",
    intensity = "lambda",
    kernel_code = "if (CUnif() < p) newI.risk_cls =1;
                   else newI.risk_cls= 2;
                   double a = CUnif(65, 70);
                   newI.set_age(a, t);")

Note that the variables newI and t, as well as the function CUnif(), are implicitly defined and usable in the kernel_code. The field risk_cls comes from the names of characteristics of individuals in the population. The names lambda and p are parameter names that will be specified in the R named list params.

Here we use a constant \lambda as the event intensity, but we could also use a rate \lambda(t) that depends on time, using the function mk_event_poisson_inhomogeneous.

Death and exit events The baseline death intensity defined in Equation 16 and obtained with the package StMoMo is stored in the variable death_male.

Hide/Show the code

# StMoMo death rates
library('StMoMo')
library('reshape2')
EWStMoMoMale <- StMoMoData(EWdata_hmd, series = "male")
LC <- lc()
ages.fit <- 65:100
years.fit <- 1950:2016
LCfitMale <- fit(LC, data = EWStMoMoMale, ages.fit = ages.fit, years.fit = years.fit)
t <- 30
LCforecastMale <- forecast(LCfitMale, h = t)
d_k <- apply(LCforecastMale$rates, 2, function(x) stepfun(66:100, x))
breaks <- 1:29
death_male <- piecewise_xy(breaks,d_k)

The death and exit intensities are of class individual (see Table 2 (a)). Hence, the death and exit events are created with the mk_event_individual function.

Hide/Show the code

death_params <- list("death_male" = death_male, "alpha" = c(1.2, 0.8))
death_event <- mk_event_individual(
    type = "death",
    intensity_code = "result = alpha[I.risk_cls-1] * death_male(t, I.age(t));")

Hide/Show the code

exit_params = list("mu" = c(0.001, 0.06))
exit_event <- mk_event_individual(
    type = "exit",
    intensity_code = "result = mu[I.risk_cls-1]; ")

6.3 Model creation and simulation

The model is created from all the previously defined building blocks, by calling the mk_model.

Hide/Show the code

model <- mk_model(
    characteristics = get_characteristics(pop_init),
    events = list(entry_event, death_event, exit_event),
    parameters = c(entry_params, death_params, exit_params))

Once the model is compiled, it can be used with different parameters and run simulations for various scenarios. Similarly, the initial population (here pop_df) can be modified without rerunning the mk_model function. The bounds for entry events is simply the intensity \lambda. For death events, the bound is given by \bar{d} defined in Equation 17, which is stored in the death_max variable.

Hide/Show the code

death_max <- max(sapply(d_k, function(x) { max(x) }))
bounds <- c("entry" = entry_params$lambda,
            "death" = death_max,
            "exit" = max(exit_params$mu))

sim_out <- popsim(
    model = model,
    initial_population = pop_init,
    events_bounds = bounds,
    parameters = c(entry_params, death_params, exit_params),
    time = 30,
    age_max = 110,
    multithreading = TRUE)

6.4 Outputs

The data frame sim_out$population consists of all individuals present in the portfolio during the period of [0, 30], including the individuals in the initial population and those who entered the portfolio. Each row represents an individual, with their date of birth, date of death (NA if still alive at the end of the simulation), risk class, and characteristics entry and out. Recall that if an individual enters the population at time t, his entry characteristic is automatically set up to be equal to t. The characteristics out is set to TRUE for individuals who left the portfolio due to an exit event.

In this example, the simulation time over 30 years, starting from an initial population of 30 000 individuals is very fast (see below), for an acceptance rate of proposed event of approximately 25%. At the end of the simulation, the number of alive individuals is approximately 430 000.

[1] "Number of alive individuals in the population at final time T=30 : 426797"

[1] "Execution time : 0.36s"

[1] "Proportion of effective events and proposed events : 0.25"

Initially in the portfolio (at t=0), there is the same number of 65 years old policyholders in each risk class. However, policyholders in the risk class 2 with lower mortality rates leave the portfolio at higher rate than policyholders in the risk class 1 : \mu^2 > \mu^1. Therefore, the heterogeneous portfolio composition changes with time, including more and more individuals in risk class 1 with higher mortality rates, but with variations across age classes. To illustrate the composition of the total population at the end of the simulation (t=30), we present in Figure 2 the age pyramid of the final composition of the portfolio obtained with the age_pyramid and plot functions of the pyramid class.

Figure 2: Portfolio age pyramid at t = 30 for individuals in risk class 1 (blue) and 2 (red).

IBMPopSim also allows the fast computation of exact life tables from truncated and censored individual data (due to entry and exit events), using the functions death_table and exposure_table. These function are particularly efficient, since the computations are made using the Rccp library.

Hide/Show the code

age_grp <- 65:95
Dx_pop <- death_table(sim_out$population, ages = age_grp, period = 0:30)
Ex_pop <- exposure_table(sim_out$population, ages = age_grp, period = 0:30)
mx_pop <- Dx_pop/Ex_pop

In Figure 3, we illustrate the central death rates in the simulated portfolio at final time. Due to the mortality differential between risk class 1 and 2, one would expect to observe more individuals in risk class 2 at higher ages. However, due to exit events, a higher proportion of individuals in risk class 1 exit the portfolio over time, resulting in a greater proportion of individuals in risk class 1 at higher ages than what would be expected in the absence of exit events. Consequently, the mortality rates in the portfolio are more aligned with those of risk class 1 at higher ages. This is a simple example of how composition changes in the portfolio can impact aggregated mortality rates and potentially compensate or reduce an overall mortality reduction (see also (Kaakaï et al. 2019)).

Figure 3: Portfolio central death rates at t = 30 (black).

7 Population with genetically variable traits

This section provides an example of how to use the IBMPopSim package to simulate an age-structured population with interactions, based on the model proposed in Example 1 of Ferrière and Tran (2009) (see also Méléard and Tran (2009)).

In this model, individuals are characterized by their body size at birth x_0 \in [0,4] and by their physical age a \in [0,2]. The body size of an individual I=(\tau^b,\infty, x_0) at time t is a linear function of its age a(I,t) = t-\tau^b: x(t)= x_0 + ga(I,t), where g is a constant growth rate assumed to be identical for all individuals.

Birth events The birth intensity of each individual I=(\tau^b, \infty, x_0) depends on a parameter \alpha > 0 and on its initial size, as given by the equation \lambda^b(t,I) = \alpha (4 - x_0) \leq \bar \lambda^b = 4\alpha. \tag{18} Thus, smaller individuals have a higher birth intensity. When a birth occurs, the new individual inherits the same birth size x_0 as its parent with high probability 1-p, or a mutation can occur with probability p, resulting in a birth size given by x_0' = \min(\max(0, x_0 + G), 4), \tag{19} where G is a Gaussian random variable with mean 0 and variance \sigma^2.

Death events Due to competition between individuals, the death intensity of an individual depends on the size of other individuals in the population. Bigger individuals have a better chance of survival. If an individual I= (\tau^b, \infty, x_0) of size x(t)= x_0 +ga(I,t) encounters an individual J= (\tau^{b}_J, \infty, x_0') of size x'(t) = x_0'+ ga(J,t), then it can die with the intensity W(t, I,J) = U(x(t),x'(t)), where the interaction function U is defined by U(x,y) = \beta \left(1- \frac{1}{1+ c\exp(-4(x-y))}\right) \leq \bar W = \beta. \tag{20} The death intensity of an individual I at time t and in a population Z is the result of interactions with all individuals in the population, including itself, and is given by \lambda^d_t(I,Z) = \sum_{J = (\tau^b,\infty, x_0') \in Z} W (x_0 + g a(I,t), x_0' + g a(J,t)),

7.1 Population

We use an initial population of 900 living individuals, all of whom have the same size and ages uniformly distributed between 0 and 2 years.

Hide/Show the code

N <- 900 
x0 <- 1.06
agemin <- 0.
agemax <- 2.

Hide/Show the code

pop_df <- data.frame(
  "birth" = -runif(N, agemin, agemax), # Uniform age in [0,2]
  "death" = as.double(NA), # All individuals are alive
  "birth_size" = x0) # All individuals have the same initial birth size x0
pop_init <- population(pop_df)

7.2 Events

7.2.1 Birth events

The parameters involved in a birth event are the probability of mutation p, the variance of the Gaussian random variable and the coefficient \alpha of the intensity.

Hide/Show the code

params_birth <- list("p" = 0.03, "sigma" = sqrt(0.01), "alpha" = 1)

The birth intensity Equation 18 is of class individual. Hence, the event is created by calling the mk_event_individual function. The size of the new individual is given in the kernel following Equation 19.

Hide/Show the code

birth_event <- mk_event_individual( 
  type = "birth",
  intensity_code = "result = alpha*(4 - I.birth_size);",
  kernel_code = "if (CUnif() < p)
                   newI.birth_size = min(max(0.,CNorm(I.birth_size,sigma)),4.);
                 else
                   newI.birth_size = I.birth_size;")

7.2.2 Death events

The death intensity Equation 20 is of class interaction. Hence, the event is created by calling the mk_event_interaction function. The parameters used for this event are the growth rate g, the amplitude of the interaction function \beta, and the strength of competition c.

Hide/Show the code

params_death <- list("g" = 1, "beta" = 2./300., "c" = 1.2)
death_event <- mk_event_interaction(
  type = "death",
  interaction_code = "double x_I = I.birth_size + g * age(I,t);
                      double x_J = J.birth_size + g * age(J,t);
                      result = beta*(1.-1./(1.+c*exp(-4.*(x_I-x_J))));")

7.3 Model creation and simulation

The model is created using the mk_model function.

Hide/Show the code

model <- mk_model(
    characteristics = get_characteristics(pop_init),
    events = list(birth_event, death_event),
    parameters = c(params_birth, params_death))

The simulation of one scenario can then be launched with the call of the popsim function, after computing the events bounds \bar \lambda^b=4 \alpha and \bar W= \beta.

Hide/Show the code

sim_out <- popsim(model = model,
    initial_population = pop_init,
    events_bounds = c("birth" = 4 * params_birth$alpha, 
                      "death" = params_death$beta),
    parameters = c(params_birth, params_death),
    age_max = 2,
    time = 500)

Based on the results of a simulation, we can reproduce the numerical results of Ferrière and Tran (2009). In Figure 4, we draw a line for each individual in the population to represent their birth size during their lifetime.

In this example, the randomized Algorithm 3 allows for much faster computation times than the model implemented below with Algorithm 2 (“full” algorithm):

Hide/Show the code

death_event_full <- mk_event_interaction(type = "death",
    interaction_type= "full",
    interaction_code = "double x_I = I.birth_size + g * age(I,t);
                        double x_J = J.birth_size + g * age(J,t);
                        result = beta * ( 1.- 1./(1. + c * exp(-4. * (x_I-x_J))));"
)

model_full <- mk_model(characteristics = get_characteristics(pop_init),
    events = list(birth_event, death_event_full),
    parameters = c(params_birth, params_death))

sim_out_full <- popsim(model = model_full,
    initial_population = pop_init,
    events_bounds =c("birth" = 4 * params_birth$alpha, "death" = params_death$beta),
    parameters = c(params_birth, params_death),
    age_max = 2,
    time = 500)

[1] "The full algorithm is 30 times slower than the randomized version"

In Figure 5, the two algorithms are compared for different population sizes. We progressively decrease the value of the mortality rate parameter \beta and increase the birth rate parameter \alpha. Starting with the values provided in Ferrière and Tran (2009), \alpha=1 and \beta=2/300, resulting in a stationary population size of approximately N=360 individuals for a sample of 50 simulations, we can easily increase the stationary population size to approximately N=2600 individuals with \alpha=2 and \beta=1/300.¹ In the log-scaled figure, we can observe the trend of computation time as a function of the population size N, which is linear for the randomized algorithm and quadratic for the full one ( Algorithm 2). We can also see that the randomized version of the algorithm is between 17 to 100 times faster than the full one in this example, taking only 2 seconds in average for the randomized version versus 211 seconds for Algorithm 2 for the biggest population size (N=2600) and T=500.

Figure 5: Full vs random algorithm computation time

8 Appendix

8.1 Recall on Poisson random measures

We recall below some useful properties of Poisson random measures, mainly following Chapter 6 of (Çinlar 2011). We also refer to (Kallenberg 2017) for a more comprehensive presentation of random counting measures.

Definition 1 (Poisson Random Measures) Let \mu be a \sigma-finite diffuse measure on a Borel subspace (E,\mathcal{E}) of ({\mathbb{R}}^d, \mathcal{B}({\mathbb{R}}^d)). A random counting measure Q= \sum_{k\geq 1} \delta_{X_k} is a Poisson (counting) random measure of mean measure \mu if

\forall A \in \mathcal{E}, Q(A) is a Poisson random variable with {\mathbb{E}}[Q(A)]= \mu(A).
For all disjoints subsets A_1, \dots, A_n \in \mathcal{E}, Q(A_1), \dots, Q(A_n) are independent Poisson random variables.

Let us briefly recall here some simple but useful operations on Poisson measures. In the following, Q is a Poisson measure of mean measure \mu, unless stated otherwise.

Proposition 4 (Restricted Poisson measure) If B \in \mathcal{E}, then, the restriction of Q to B defined by Q^B = \mathbf{1}_B Q = \sum_{k \ge 1} \mathbf{1}_{B}(X_k) \delta_{X_k} is also a Poisson random measure, of mean measure \mu^B = \mu(\cdot \cap B).

Proposition 5 (Projection of Poisson measure) If E = F_1 \times F_2 is a product space, then the projection Q_1(\mathrm{d}x) = \int_{F_2} Q(\mathrm{d}x , \mathrm{d}y) is a Poisson random measure of mean measure \mu_1 (\mathrm{d}x ) = \int_{F_2} \mu(\mathrm{d}x, \mathrm{d}y).

8.1.1 Link with Poisson processes

Let Q= \sum_{k\geq 1} \delta_{T_k} a Poisson random measure on E={\mathbb{R}}^+ with mean measure \mu(\mathrm{d}t) = \Lambda (t) \mathrm{d}t absolutely continuous with respect to the Lebesgue measure, \mu(A) = \int_A \Lambda(t) \mathrm{d}t. The counting process (N_t)_{t \ge 0} defined by N_t = Q([0,t]) = \sum_{k \geq 1} \mathbf{1}_{\{T_k \leq t\}}, \quad \forall \; t\geq 0, \tag{21} is an inhomogeneous Poisson process with intensity function (or rate) t \mapsto \Lambda(t). In particular, when \Lambda(t) \equiv c is a constant, N is a homogeneous Poisson process with rate c. Assuming that the atoms are ordered T_1< T_2< \dots, we recall that the sequence (T_{k+1}-T_k)_{k\geq 1} is a sequence of i.i.d. exponential variables of parameter c.

8.1.2 Marked Poisson measures on E = {\mathbb{R}}^+ \times F

We are interested in the particular case when E is the product space {\mathbb{R}}^+ \times F, with (F,\mathcal{F}) a Borel subspace of {\mathbb{R}}^d. Then, a random counting measure is defined by a random set S =\{ (T_k, \Theta_k ), k \geq 1\}. The random variables T_k\geq 0 can be considered as time variables, and constitute the jump times of the random measure, while the variables \Theta_k \in F represent space variables.

We recall in this special case the Theorem VI.3.2 in (Çinlar 2011).

Proposition 6 (Marked Poisson measure) Let m be a \sigma–finite diffuse measure on {\mathbb{R}}^+, and K a transition probability kernel from ({\mathbb{R}}^+,\mathcal{B}({\mathbb{R}}^+)) into (F, \mathcal{F}). Assume that the collection (T_k)_{k \ge 1} forms a Poisson process (N_t) =(\sum_{k\geq 1} \mathbf{1}_{\{T_k \leq t\}}) with mean m(\mathrm{d}t) =\Lambda(t) \mathrm{d}t, and that given (T_k)_{k \ge 1}, the variables \Theta_k are conditionally independent and have the respective distributions K(T_k, \cdot).

Then, \{ (T_k, \Theta_k) ;\; k \ge 1\} forms a Poisson random measure Q = \sum_{k\ge 1} \delta_{(T_k, \Theta_k)} on ({\mathbb{R}}^+ \times F, \mathcal{B}({\mathbb{R}}^+) \otimes \mathcal{F}), called a Marked point process , with mean \mu defined by \mu(\mathrm{d}t, \mathrm{d}y) = \Lambda(t) \mathrm{d}t K(t, \mathrm{d}y).
Reciprocally let Q be a Poisson random measure of mean measure \mu(\mathrm{d}t, \mathrm{d}y), admitting the following disintegration with respect to the first coordinate: \mu(\mathrm{d}t , \mathrm{d}y) =\tilde \Lambda(t) \mathrm{d}t \nu(t, \mathrm{d}y), with \nu(t, F)<\infty. Let K(t, \mathrm{d}y) = \dfrac{\nu(t,\mathrm{d}y) }{\nu(t, F) } and \Lambda(t) = \nu(t, F)\tilde \Lambda(t). Then, Q = \sum_{k\ge 1} \delta_{(T_k, \Theta_k)} is a marked Poisson measure with (T_k,\Theta_k)_{k\in {\mathbb{N}}^*} defined as above. In particular, the projection N= (N_t)_{t\geq0} of the Poisson measure on the first coordinate, N_t = Q([0,t] \times F) = \sum_{k\geq 1} \mathbf{1}_{[0,t] \times F} (T_k, \Theta_k) = \sum_{k\geq 1} \mathbf{1}_{\{T_k \leq t\}}, \quad \forall \; t \geq 0, is an inhomogeneous Poisson process of rate \Lambda(t)= \nu(t, F)\tilde \Lambda(t).

When the transition probability kernel K does not depend on the time: K(t, A) = \nu(A) for some probability measure \nu, then the marks (\Theta_k)_{k \ge 1} form an i.i.d. sequence with distribution \nu, independent of (T_k)_{k \ge 1}.

The preceding proposition thus yields a straightforward iterative simulation procedure for a Marked Poisson process on [0,T]\times F with mean measure \mu(\mathrm{d}t, \mathrm{d}y) = c \mathrm{d}t K(t, \mathrm{d}y) and c>0. The procedure is described in Algorithm 4.

\begin{algorithm} \caption{Simulation of Marked Poisson measure} \begin{algorithmic} \State \textbf{Input:} Constant $c$, simulatable kernel $K$ and final time $T$ \State \textbf{Output:} Times $(T_1,\dots,T_n)$ and Marks $(Y_1, \dots, Y_n)$ of the Marked Poisson measure of mean $\mu(\mathrm{d} t, \mathrm{d} y) = c \mathrm{d} t K(t, \mathrm{d} y)$ in $[0,T]\times F$. \State Initialization draw $T_1 \sim \mathcal{E}(c)$ and draw $Y_1 \sim K(T_1, \mathrm{d} y)$ \While{condition} \State increment iterative variable $k \longleftarrow k+1$ \State compute next jump time $T_k \longleftarrow T_{k-1} + \mathcal{E}(c)$ \State draw a conditional mark $Y_k \sim K(T_k, \mathrm{d} y)$ \EndWhile \end{algorithmic} \end{algorithm}

Figure 6: Example of Marked Poisson measure on [0,T] with m(\mathrm{d} t) = L \mathrm{d} t (jump times occur at Poisson arrival times of rate L) and with \nu(\mathrm{d} y) = \frac{1}{L} \mathbf{1}_{[0, L]}(y) \mathrm{d} y (marks are drawn uniformly on [0,L]). The mean measure is then \mu(\mathrm{d} t, \mathrm{d} y) = \mathrm{d} t {1}_{[0,L]}(y) \mathrm{d} y.

8.2 Pathwise representation of IBMs

Notation reminder The population’s evolution is described by the measure valued process (Z_t)_{t\geq 0}. Several types of events e can occur to individuals denoted by I. If an event of type e occur to the individual I at time t, then the population state Z_{t^-} is modified by \phi^e(t,I). If e\in \mathcal{E} \cup \mathcal{E}_W, then events of type e occur with an intensity \sum_{k=1}^{N_t} \lambda_t^e(I,Z_t), with \lambda_t^e(I,Z_t) defined by Equation 7. If e \in \mathcal{P}, then events of type e occur in the population at a Poisson intensity of (\mu^e_t).

8.2.1 Proof of Theorem 1

Proof. For ease of notation, we prove the case when \mathcal{P} =\emptyset (there are no events with Poisson intensity).

Step 1. The existence of a solution to Equation 12 is obtained by induction. Let Z^1 be the unique solution the thinning equation: Z_t^1 = Z_0 + \int_0^t \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{0}\} }\mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{0})\}} Q (\mathrm{d}s,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ), \quad \forall 0 \leq t \leq T. Let T_1 be the first jump time of Z^1. Since Z_{s^-}^1 = Z_{0} and N_{s^-}=N_{0} on [0, T_1], Z^1 is solution of Equation 12 on [0,T_1].

Let us now assume that Equation 12 admits a solution Z^n on [0,T_n], with T_n the n–th event time in the population. Let Z^{n+1} be the unique solution of the thinning equation: Z^{n+1}_t = Z_{t\wedge T_n}^n + \int_{t\wedge T_n}^{t} \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{T_n}^n)\}} \mathbf{1}_{\{k \leq N_{T_n}^n \} }Q (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). First, observe that Z^{n+1} coincides with Z^n on [0,T_n]. Let T_{n+1} be the (n+1)–th jump of Z^{n+1}. Furthermore, Z_{s^-}^{n+1} = Z_{T_n}^n and N_{s^-}^{n+1}=N_{T_n}^{n} on [T_n, T_{n+1}] (nothing happens between two successive event times), Z^{n+1} verifies for all t\leq T_{n+1}: \begin{aligned} Z^{n+1}_t = Z_{t\wedge T_n}^n +\int_{t\wedge T_n}^{t} \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-}^{n+1} )\}} \mathbf{1}_{\{k \leq N_{s^-}^{n+1} \} }Q (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned} Since, Z^n is a solution of Equation 12 on [0,T_n] coinciding with Z^{n+1} this achieves to prove that Z^{n+1} is solution of Equation 12 on [0,T_{n+1}]. Finally, let Z =\lim_{n\to \infty } Z^n. For all n\geq 1, T_n is the n–th event time of Z, and Z is solution of Equation 12 on all time intervals [0,T_n\wedge T] by construction.

By Lemma 1 T_n \underset{n\to \infty}{\longrightarrow} \infty. Thus, by letting n\to \infty we can conclude that Z is a solution of Equation 12 on [0,T].

Step 2. Let \tilde Z be a solution of Equation 12. Using the same arguments than in Step 1, it is straightforward to show that \tilde Z coincides with Z^n on [0,T_n], for all n \geq 1. Thus, \tilde{Z} = Z, with achieves to prove uniqueness.

8.2.2 Proof of Lemma 1

The proof is obtained using pathwise comparison result, generalizing those obtained in (Kaakai and El Karoui 2023).

Proof. Let Z be a solution of Equation 12. For all e \in \mathcal{P} \cup \mathcal{E} \cup \mathcal{E}_W, let N^e be the process counting the occurrence of events of type e in the population. N^e is a counting process of \{\mathcal{F}_t\}-intensity (\Lambda_t^e(Z_{t^-})), solution of \begin{aligned} & N_t^e = \int_0^t \int_{{\mathbb{N}}\times {\mathbb{R}}^+} \mathbf{1}_{\{k\leq N_{s^-}\}} \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-})\}} Q(\mathrm{d}s, \mathrm{d}k, \{e\}, \mathrm{d}\theta), & \quad \textit{ if } e \in \mathcal{E}\cup \mathcal{E}_W, \\ & N_t^e = \int_0^t \int_{{\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \mu^e_s \}} Q^{\mathcal{P}}(\mathrm{d}s, \{e\}, \mathrm{d}\theta), & \quad \textit{ if } e \in \cal P. \\ \end{aligned} \tag{22} By definition, the jump times of the multivariate counting process (N^e)_{e \in \mathcal{P} \cup \mathcal{E}\cup \mathcal{E}_W} are the population event times (T_n)_{n\geq 0}. The idea of the proof is to show that (N^e)_{e \in \mathcal{P} \cup \mathcal{E}\cup \mathcal{E}_W} does not explode in finite time, by pathwise domination with a simpler multivariate counting process. The first steps are to control the population size N_t = N_0 + N^b_t + N^{en}_t.

Step 1 Let (\bar N^b, \bar N^{en}) be the 2-dimensional counting process defined as follows: for e \in \{b,en\}, \bar N^e_0 = 0 and \begin{aligned} & \bar N_t^e = \int_0^t \int_{{\mathbb{N}}\times {\mathbb{R}}^+} \mathbf{1}_{\{k\leq N_0 + \bar N_{s^-} \}} \mathbf{1}_{\{\theta \leq f^e(N_0 + \bar N_{s^-})\}} Q(\mathrm{d}s, \mathrm{d}k, \{e\}, \mathrm{d}\theta), \quad \textit{ if } e \in \mathcal{E}\cup \mathcal{E}_W, \\ & \bar N_t^e = \int_0^t \int_{{\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \bar \mu^e\}} Q^{\mathcal{P}}(\mathrm{d}s, \{e\}, \mathrm{d}\theta) \quad \textit{ if } e \in \cal P, \end{aligned} \tag{23} with \bar N := \bar N^b + \bar N^{en} and f^e the function introduced in Assumption 4.
- If b,en \in \cal P, then \bar N is a inhomogeneous Poisson process.
- If b,en \in \mathcal{E}\cup \mathcal{E}_W, then it is straightforward to show that conditionally to N_0, \bar N is a pure birth Markov process with birth intensity function g(n) = n\big(f^b(N_0+n) + f^{en}(N_0+n)\big). In particular, by Assumption 4, g verifies the standard Feller condition for pure birth Markov processes (see e.g. (Bansaye and Méléard 2015)): \sum_{n=1}^\infty \frac{1}{g(n)}. - Finally, if b \in \mathcal{E} and en \in \cal P (or equivalently if b \in \cal P and en \in \mathcal{E}), then one can show easily that \bar N is a pure birth Markov process with immigration, of birth intensity function g(n)= \bar \mu^{en} + n f^b(N_0 + n) (resp. g(n)= \bar \mu^b+ n f^{en}(N_0 + n)), also verifying the Feller condition. Therefore, there exists a non-exploding solution of Equation 23, by Proposition 3.3 in (Kaakai and El Karoui 2023).

Step 2 The second step consists in showing that (N^b, N^{en}) is strongly dominated by (\bar N^b, \bar N^{en}), i.e that all jumps of (N^b, N^{en}) are jumps of (\bar N^b, \bar N^{en}). Without loss of generality, we can assume that f^e:\mathbb{N} \to (0,+\infty) is increasing since f^e(n) can be replaced by \sup_{\{m\leq n \} } f^e(m).
Let e\in \{b, en\}. If e \in \mathcal{P}, then for all s\in [0,T] \{\theta \leq \mu_s^e\} \subset \{ \theta \leq \bar \mu^e\}, which yields that all jumps of N^e are jumps of \bar N^e.
If e \in \mathcal{E}\cup \mathcal{E}_W, the proof by induction is analogous to the proof of Proposition 2.1 in (Kaakai and El Karoui 2023). Let T_1^e be first jump time of N^e, associated with the marks (K_1^e,\Theta_1^e) of Q (or Q^{\mathcal{P}}). Then, by Definition of Equation 22, K_1^e \leq N_0 and \Theta_1^e \leq \lambda_{T_1^e}^e (I_{K_1^e}, Z_0).
By Assumption 4, we have also \Theta_1^e \leq \lambda_{T_1^e}^e (I_{K_1^e}, Z_0) \leq f^e(N_0) \leq f^e(N_0 +\bar N_{T_1^{e,-}}), \quad K_1^e \leq N_0 + \bar N_{T_1^{e,-}}. Thus, T_1^e is also a jump time of \bar N^e. By iterating this argument, we obtain that all jump times of N^e are jump times of \bar N^e.
Thus, (N^b, N^{en}) does not explode in finite time.

Step 3 It remains to show that for e \notin \{b, en\}, N^e does not explode.
Let e \neq b, en. If e\in \mathcal P, the proof is the same than in Step 2. Otherwise, let: h^e_t(n) = \sup_{I \in \mathcal{I}, m \leq n } \lambda^e_t \biggl(I, \sum_{k=1}^{m} \delta_{I_k}\biggr), \quad \forall \; t \in [0,T] \; n \in {\mathbb{N}}^*. By Assumption 2 and Assumption 3, h^e_t(n) <\infty, and we can introduce the non exploding counting process \bar{N}^e, defined by the thinning equation : \bar N_t^e = \int_0^t \int_{{\mathbb{N}}\times {\mathbb{R}}^+} \mathbf{1}_{\{k\leq N_0 + \bar N_{s^-} \}} \mathbf{1}_{\{\theta \leq h^e_s (N_0 + \bar N_{s^-})\}} Q(\mathrm{d}s, \mathrm{d}k, \{e\}, \mathrm{d}\theta), with \bar N_s = \bar N^b_s + \bar N^{en}_s.
Finally, by Step 2, for s\in [0,T] the population size N_s = N_0 + N^b_s+ N^{en}_s is bounded a.s. by N_0 + \bar N_s, since all jumps of (N^b,N^{en}) are jumps of (\bar N^b, \bar N^{en}). Thus, for all s\in [0,T], \{k \leq N_{s^-} \}\subset \{k\leq N_0 + \bar N_{s^-} \}, \text{ and } \{\theta \leq \lambda_s^e(I_k, Z_{s^-})\} \subset \{\theta \leq h_s^e(N_0 + \bar N_{s^-})\}. This proves that all jumps of N^e are jumps \bar N^e, and thus N^e does not explode in finite time.

8.2.3 Alternative pathwise representation

Theorem 3 Let \mathcal{J}_{\mathcal{E}} = \mathbb N \times \mathcal{E} and \mathcal{J}_W = \mathbb N \times \mathcal{E}_W.
Let Q^{\mathcal{E}} be a random Poisson measure on \mathbb R^+ \times \mathcal{J}_{\mathcal{E}}\times \mathbb{R}^+, of intensity \mathrm{d}t \delta_{\mathcal{J}_{\mathcal{E}}}(\mathrm{d}k, \mathrm{d}e) \mathbf{1}_{[0,\bar \lambda^e]} (\theta) \mathrm{d}\theta, and Q^{W} a random Poisson measure on \mathbb R^+ \times \mathcal{J}_W \times \mathbb{N} \times \mathbb{R}^+, of intensity \mathrm{d}t \delta_{\mathcal{J}_{\mathcal{E}}}(\mathrm{d}k,\mathrm{d}e)) \delta_{\mathbb{N}} (\mathrm{d}j) \mathbf{1}_{[0,\bar W^e]} (\theta)\mathrm{d}\theta. Finally, let Q^{\mathcal P} be a random Poisson measure on \mathbb R^+ \times \mathcal{P} \times \mathbb{R}^+, of intensity \mathrm{d}t \delta_{\cal P}(\mathrm{d}e) \mathbf{1}_{[0,\bar \mu^e]} (\theta)\mathrm{d}\theta.
There exists a unique measure-valued process Z, strong solution on the following SDE driven by Poisson measure: \begin{aligned} Z_t = Z_0 & + \int_0^t \int_{\mathcal{J}_{\mathcal{E}}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{s^-}\} }\mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-})\}} Q^{\mathcal{E}}(\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ) \\ & + \int_0^t \int_{\mathcal{J}_W \times {\mathbb{N}}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{s^-}\} } \mathbf{1}_{\{j \leq N_{s^-}\} }\mathbf{1}_{\{\theta \leq W^e(s, I_k , I_j) \}} Q^W (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}j ,\mathrm{d}\theta ),\\ & + \int_0^t \int_{\mathcal{P} \times \mathbb R^+} \phi^e(s, I_{s^-}) \mathbf{1}_{\{\theta \leq \mu_s^e \}} Q^{\mathcal{P}} (\mathrm{d}s ,\mathrm{d}e , \mathrm{d}\theta), \end{aligned} \tag{24} with I_{s^-} an individual taken uniformly in Z_{s^-}.
Furthermore, the solution of Equation 24 has the same law than the solution of Equation 12.

The proof of Theorem 3 follows the same steps than the proof of Theorem 1.

8.3 Proof of Theorem 2

For ease of notation, we prove the case when \mathcal{P} =\emptyset (there are no events with Poisson intensity).
Let Z be the population process obtained by Algorithm 2, and (T_n)_{n\geq 0} the sequence of its jump times (T_0=0).
Step 1 Let T_1 be the first event time in the population, with its associated marks defining the type E_1 of the event and the individual I_1 to which this event occurs. By construction, (T_1, E_1, I_1) is characterized by the first jump of: Q^0(\mathrm{d}t, \mathrm{d}k , \mathrm{d}e) = \int_{\mathbb R^+} \mathbf{1}_{\{\theta \leq \lambda_{t}^e(I_k,Z_0)\}}\bar Q^0 (\mathrm{d}t ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ), with \bar Q^0 the Poisson measure introduced in the first step of the algorithm described in Section 4.2.

Since T_1 is the first event time, the population composition stays constant, Z_t=Z_0, on \{t<T_1\}. In addition, recalling that the first event has the action \phi^{E_1}(T_1, I_1) (see Table 1) on the population Z, we obtain that: \begin{aligned} Z_{t\wedge T_1} & = Z_0 + \mathbf{1}_{\{t\geq T_1\}} \phi^{E_1} (T_1 , I_1) \\ & = Z_0 + \int_0^{t\wedge T_1} \int_{\mathcal{J}_0} \phi^e (s , I_k) Q^0 (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e ) \\ & = Z_0 + \int_0^{t\wedge T_1} \int_{\mathcal{J}_0} \int_{\mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k,Z_0)\}}\bar Q^0 (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned} Since Z_{s^-} = Z_0 on \{s \leq T_1\}, the last equation can be rewritten as Z_{t\wedge T_1} = Z_0 + \int_0^{t\wedge T_1} \int_{\mathcal{J}_0} \int_{\mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k,Z_{s^-})\}}\bar Q^0 (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ).

Step 2 The population size at the n–th event time T_n is N_{T_n}. The (n+1)–th event type and the individual to which this event occur are thus chosen in the set \mathcal{J}_n := \{ 1,\dots, N_{T_n}\} \times (\mathcal{E} \cup \mathcal{E}_W). Conditionally to \mathcal{F}_{T_n}, let us first introduce the marked Poisson measure \bar Q^n on [T_n, \infty) \times \mathcal J_n \times \mathbb R^+, of intensity: \begin{aligned} \bar \mu^n(\mathrm{d}t, \mathrm{d}k, \mathrm{d}e , \mathrm{d}\theta ) & := \mathbf{1}_{\{t > T_n \}}\bar \Lambda (N_{T_n})\mathrm{d}t \frac{\bar \lambda^e_n }{\bar \Lambda(N_{T_n})} \delta_{\mathcal J_n}(\mathrm{d}k, \mathrm{d}e) \frac{1}{\bar \lambda^e_n} \mathbf{1}_{[0,\bar \lambda^e_n]} (\theta)\mathrm{d}\theta,\\ & = \mathbf{1}_{\{t > T_n \}}\mathrm{d}t \delta_{\mathcal J_n}(\mathrm{d}k, \mathrm{d}e) \mathbf{1}_{[0,\bar \lambda^e_n]}(\theta)\mathrm{d}\theta, \end{aligned} \tag{25} with \lambda^e_n = \bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e N_{T_n} \mathbf{1}_{e \in \mathcal{E}_W}.

By definition, \bar Q^n has no jump before T_n.
As for the first event, the triplet (T_{n+1}, E_{n+1}, I_{n+1}) is determined by the first jump of the measure Q^n (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e) := \int_{\mathbb R^+} \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{T_n})\}}\bar Q^n (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta), obtained by thinning of \bar Q^n. Finally, since the population composition is constant on [T_n, T_{n+1}[, Z_t = Z_{T_n}, the population on [0,T_{n+1}] is defined by: \begin{aligned} Z_{t\wedge T_{n+1}} & = Z_{t \wedge T_n } + \mathbf{1}_{\{t\geq T_{n+1}\}}\phi^{E_{n+1}}(T_{n+1}, I_{n+1}), \\ & = Z_{t \wedge T_n } + \int_{t \wedge T_n}^{t \wedge T_{n+1}} \int_{\mathcal{J}_n \times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, Z_{s^-})\}}\bar Q^n (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned} \tag{26} Applying n times Equation 26 yields that: \begin{aligned} Z_{t\wedge T_{n+1}} = Z_0 + \sum_{l=0}^n \int_{t \wedge T_l}^{t \wedge T_{l+1}}\int_{ \mathcal{J}_l \times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, \tilde Z_{s^-})\}}\bar Q^l (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned}

Step 3 Finally, let \tilde{Z} be the solution of Equation 12, with (\tilde T_n)_{n\geq 0} the sequence of its event times. Then, we can write similarly for all n\geq 0: \begin{aligned} \tilde Z_{t\wedge \tilde T_{n+1}} & = Z_0 + \sum_{l=0}^n \int_{t \wedge \tilde T_l}^{t \wedge \tilde T_{l+1}} \int_{\mathcal{J}\times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, \tilde Z_{s^-})\}}\mathbf{1}_{ \{k \leq \tilde N_{s^-} \}} Q(\mathrm{d}s, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ), \\ & = Z_0 + \sum_{l=0}^n \int_{t \wedge \tilde T_l}^{t \wedge \tilde T_{l+1}} \int_{\mathcal{J}\times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, \tilde Z_{s^-})\}} \mathbf{1}_{ \{\theta \leq \tilde \lambda^e_n \}}\mathbf{1}_{ \{k \leq \tilde N_{\tilde T_l} \}} Q(\mathrm{d}s, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ), \end{aligned} since \tilde Z_{s^-} = \tilde Z_{T_l} on \tilde [T_l, \tilde T_{l+1}], and \lambda_s^e(I_k, \tilde Z_{s^-}) \leq \tilde \lambda^e_n:= \bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e \tilde N_{\tilde T_n} \mathbf{1}_{e \in \mathcal{E}_W} For each l \geq 0, let \tilde Q^l(\mathrm{d}t, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ) = \mathbf{1}_{\{t > \tilde T_l\}} \mathbf{1}_{ \{1, \dots , \tilde N_{\tilde T_l} \}}(k) \mathbf{1}_{[0, \tilde \lambda^e_n]}(\theta) Q(\mathrm{d}t, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ). By Proposition 4, \tilde Q^l is, conditionally to \mathcal{F}_{T_l}, a Poisson measure of intensity \mathbf{1}_{\{t > \tilde T_l\}} \mathrm{d}t \mathbf{1}_{ \{1, \dots , \tilde N_{\tilde T_l} \}}(k) \delta_{\mathcal{J}}(\mathrm{d}k, \mathrm{d}e)\mathbf{1}_{[0, \tilde \lambda^e_n]}(\theta) \mathrm{d}\theta. It follows easily by induction that \tilde{Q}^l has thus the same distribution than \bar Q^l, the Poisson measure with the conditional intensity \bar \mu^l defined in Equation 25. Thus, Z is an exact simulation of Equation 12.

8.4 Acknowledgements

The research of Sarah Kaakai is funded by the European Union (ERC, SINGER, 101054787). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

References

Bansaye, Vincent, and Sylvie Méléard. 2015. Stochastic Models for Structured Populations. Springer International Publishing.

Barrieu, Pauline, Harry Bensusan, Nicole El Karoui, Caroline Hillairet, Stéphane Loisel, Claudia Ravanelli, and Yahia Salhi. 2012. “Understanding, Modelling and Managing Longevity Risk: Key Issues and Main Challenges.” Scandinavian Actuarial Journal 2012 (3): 203–31.

Bensusan, Harry. 2010. “Interest rate and longevity risk: dynamic model and applications to derivative products and life insurance.” Theses, Ecole Polytechnique X.

Billiard, Sylvain, Pierre Collet, Régis Ferrière, Sylvie Méléard, and Viet Chi Tran. 2016. “The Effect of Competition and Horizontal Trait Inheritance on Invasion, Fixation, and Polymorphism.” Journal of Theoretical Biology 411: 48–58.

Boumezoued, Alexandre. 2016. “Micro-macro analysis of heterogenous age-structured populations dynamics.Application to self-exciting processes and demography.” Theses, Université Pierre et Marie Curie.

Brémaud, Pierre. 1981. Point Processes and Queues: Martingale Dynamics. Vol. 66. Springer.

Calvez, Vincent, Susely Figueroa Iglesias, Hélène Hivert, Sylvie Méléard, Anna Melnykova, and Samuel Nordmann. 2020. “Horizontal Gene Transfer: Numerical Comparison Between Stochastic and Deterministic Approaches.” ESAIM: Proceedings and Surveys 67: 135–60.

Çinlar, Erhan. 2011. Probability and Stochastics. Springer New York.

Collet, Pierre, Sylvie Méléard, and Johan AJ Metz. 2013. “A Rigorous Model Study of the Adaptive Dynamics of Mendelian Diploids.” Journal of Mathematical Biology 67: 569–607.

Costa, Manon, Céline Hauzy, Nicolas Loeuille, and Sylvie Méléard. 2016. “Stochastic Eco-Evolutionary Model of a Prey-Predator Community.” Journal of Mathematical Biology 72: 573–622.

Devroye, Luc. 1986. Nonuniform Random Variate Generation. Springer-Verlag, New York.

Eddelbuettel, Dirk, and Romain Francois. 2011. “Rcpp: Seamless r and c++ Integration.” Journal of Statistical Software 40 (8): 1–18. https://doi.org/10.18637/jss.v040.i08.

El Karoui, Nicole, Kaouther Hadji, and Sarah Kaakai. 2021. “Simulating Long-Term Impacts of Mortality Shocks: Learning from the Cholera Pandemic.” arXiv Preprint arXiv:2111.08338.

Ferrière, Régis, and Viet Chi Tran. 2009. “Stochastic and Deterministic Models for Age-Structured Populations with Genetically Variable Traits.” In, 27:289–310. ESAIM Proc. EDP Sci., Les Ulis.

Fournier, Nicolas, and Sylvie Méléard. 2004. “A Microscopic Probabilistic Description of a Locally Regulated Population and Macroscopic Approximations.” Ann. Appl. Probab. 14 (4): 1880–1919.

Hyndman, Rob, Heather Booth Booth, Leonie Tickle Tickle, John Maindonald, Simon Wood Wood, and R Core Team. 2023. demography: Forecasting Mortality, Fertility, Migration and Population Data. https://cran.r-project.org/package=demography.

Kaakai, Sarah, and Nicole El Karoui. 2023. “Birth Death Swap Population in Random Environment and Aggregation with Two Timescales.” Stochastic Processes and Their Applications 162: 218–48. https://doi.org/https://doi.org/10.1016/j.spa.2023.04.017.

Kaakaï, Sarah, Héloïse Labit Hardy, Séverine Arnold, and Nicole El Karoui. 2019. “How Can a Cause-of-Death Reduction Be Compensated for by the Population Heterogeneity? A Dynamic Approach.” Insurance: Mathematics and Economics 89: 16–37. https://doi.org/https://doi.org/10.1016/j.insmatheco.2019.07.005.

Kallenberg, Olav. 2017. Random Measures, Theory and Applications. Vol. 77. Probability Theory and Stochastic Modelling. Springer, Cham.

Lavallée, François, Charline Smadi, Isabelle Alvarez, Björn Reineking, François-Marie Martin, Fanny Dommanget, and Sophie Martin. 2019. “A Stochastic Individual-Based Model for the Growth of a Stand of Japanese Knotweed Including Mowing as a Management Technique.” Ecological Modelling 413: 108828.

Lewis, Peter, and Gerald Shedler. 1979. “Simulation of Nonhomogeneous Poisson Processes by Thinning.” Naval Research Logistics Quarterly 26 (3): 403–13.

Méléard, Sylvie, Michael Rera, and Tristan Roget. 2019. “A Birth–Death Model of Ageing: From Individual-Based Dynamics to Evolutive Differential Inclusions.” Journal of Mathematical Biology 79: 901–39.

Méléard, Sylvie, and Viet Chi Tran. 2009. “Trait Substitution Sequence Process and Canonical Equation for Age-Structured Populations.” Journal of Mathematical Biology 58: 881–921.

Roget, T, Claire Macmurray, P Jolivet, S Méléard, and Michael Rera. 2024. “A Scenario for an Evolutionary Selection of Ageing.” eLife 13.

Tran, Viet Chi. 2008. “Large Population Limit and Time Behaviour of a Stochastic Particle Model Describing an Age-Structured Population.” ESAIM: Probability and Statistics 12: 345–86. https://doi.org/10.1051/ps:2007052.

Villegas, Andrés M., Vladimir K. Kaishev, and Pietro Millossovich. 2018. “StMoMo: An R Package for Stochastic Mortality Modelling.” Journal of Statistical Software 84: 1–38.

Villegas, Andres, Pietro Millossovich, and Vladimir Kaishev Hyndman. 2018. StMoMo: Stochastic Mortality Modelling. https://cran.r-project.org/package=StMoMo.

Zinn, Sabine. 2014. “The MicSim package of R: an entry-level toolkit for continuous-time microsimulation.” International Journal of Microsimulation 7 (3): 3–32.

Zinn, Sabine, Jutta Gampe, Jan Himmelspach, and Adelinde M Uhrmacher. 2009. “MIC-CORE: A Tool for Microsimulation.” In Proceedings of the 2009 Winter Simulation Conference (WSC), 992–1002. IEEE.

Session information

Hide/Show the code

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] reshape2_1.4.4  StMoMo_0.4.1    forecast_8.23.0 gnm_1.1-5      
[5] IBMPopSim_1.1.0 ggplot2_3.5.1  

loaded via a namespace (and not attached):
 [1] dotCall64_1.2          gtable_0.3.5           spam_2.11-0           
 [4] xfun_0.44              lattice_0.22-6         tzdb_0.4.0            
 [7] quadprog_1.5-8         vctrs_0.6.5            tools_4.4.1           
[10] generics_0.1.3         curl_6.0.1             parallel_4.4.1        
[13] tibble_3.2.1           fansi_1.0.6            xts_0.14.1            
[16] pkgconfig_2.0.3        Matrix_1.7-0           checkmate_2.3.2       
[19] RColorBrewer_1.1-3     lifecycle_1.0.4        rootSolve_1.8.2.4     
[22] farver_2.1.2           stringr_1.5.1          compiler_4.4.1        
[25] fields_16.3            munsell_0.5.1          htmltools_0.5.8.1     
[28] maps_3.4.2.1           yaml_2.3.8             pillar_1.9.0          
[31] MASS_7.3-60.2          nlme_3.1-164           fracdiff_1.5-3        
[34] tidyselect_1.2.1       fanplot_4.0.0          digest_0.6.35         
[37] stringi_1.8.4          dplyr_1.1.4            labeling_0.4.3        
[40] qvcalc_1.0.3           tseries_0.10-58        RcppArmadillo_14.2.0-1
[43] fastmap_1.2.0          grid_4.4.1             colorspace_2.1-0      
[46] cli_3.6.2              magrittr_2.0.3         relimp_1.0-5          
[49] utf8_1.2.4             readr_2.1.5            withr_3.0.0           
[52] scales_1.3.0           backports_1.5.0        TTR_0.24.4            
[55] rmarkdown_2.27         quantmod_0.4.26        nnet_7.3-19           
[58] timeDate_4041.110      zoo_1.8-12             hms_1.1.3             
[61] urca_1.3-4             evaluate_0.23          knitr_1.46            
[64] lmtest_0.9-40          viridisLite_0.4.2      rlang_1.1.3           
[67] Rcpp_1.0.13-1          glue_1.7.0             renv_1.0.7            
[70] jsonlite_1.8.8         plyr_1.8.9             R6_2.5.1

Footnotes

The choices (\alpha, \beta) \in \{(1,2/300),(1, 1/300), (1.5, 1/300), (2, 1/300)\} lead to the stationary population sizes N \in \{360, 900, 1800, 2600\}. For each set of parameters, we generated a new initial population, which was used for a benchmark of 50 simulations with both randomized and full algorithm. The simulations run on a Intel Core i7-8550U CPU 1.80GHz × 8 processor, with 15.3 GiB of RAM, under Debian GNU/Linux 11.↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@article{giorgi2024,
  author = {Giorgi, Daphné and Kaakai, Sarah and Lemaire, Vincent},
  publisher = {Société Française de Statistique},
  title = {Efficient Simulation of Individual-Based Population Models},
  journal = {Computo},
  date = {2024-12-01},
  url = {https://computo.sfds.asso.fr/published-202412-giorgi-efficient},
  doi = {10.57750/sfxn-1t05},
  issn = {2824-7795},
  langid = {en},
  abstract = {The `R` Package `IBMPopSim` facilitates the simulation of
    the random evolution of heterogeneous populations using stochastic
    Individual-Based Models (IBMs). The package enables users to
    simulate population evolution, in which individuals are
    characterized by their age and some characteristics, and the
    population is modified by different types of events, including
    births/arrivals, death/exit events, or changes of characteristics.
    The frequency at which an event can occur to an individual can
    depend on their age and characteristics, but also on the
    characteristics of other individuals (interactions). Such models
    have a wide range of applications in fields including actuarial
    science, biology, ecology or epidemiology. `IBMPopSim` overcomes the
    limitations of time-consuming IBMs simulations by implementing new
    efficient algorithms based on thinning methods, which are compiled
    using the `Rcpp` package while providing a user-friendly interface.}
}

For attribution, please cite this work as:

Giorgi, Daphné, Sarah Kaakai, and Vincent Lemaire. 2024. “Efficient Simulation of Individual-Based Population Models.” Computo, December. https://doi.org/10.57750/sfxn-1t05.

--- title: "Efficient simulation of individual-based population models" subtitle: "The `R` package `IBMPopSim`" author: - name: Daphné Giorgi corresponding: true email: daphne.giorgi@sorbonne-universite.fr url: https://perso.lpsm.paris/~giorgi/ affiliations: - name: Sorbonne Université, CNRS department: Laboratoire de Probabilités, Statistique et Modélisation url: https://www.lpsm.paris/ - name: Sarah Kaakai corresponding: true email: kaakai@math.univ-paris13.fr url: https://www.math.univ-paris13.fr/~kaakai/ affiliations: - name: Université Sorbonne Paris Nord, CNRS department: Laboratoire Analyse, Géométrie et Applications url: https://www.math.univ-paris13.fr/ - name: Ecole Polytechnique, CNRS department: Centre de Mathématiques Appliquées url: https://cmap.ip-paris.fr/ - name: Vincent Lemaire corresponding: true email: vincent.lemaire@sorbonne-universite.fr url: https://perso.lpsm.paris/~vlemaire/ orcid: 0000-0002-0433-7722 affiliations: - name: Sorbonne Université, CNRS department: Laboratoire de Probabilités, Statistique et Modélisation url: https://www.lpsm.paris/ date: 2024-12-01 date-modified: last-modified description: | This document provides a full description of the Stochastic Individual-Based Models (IBMs) that can be implemented in the IBMPopSim package. A unified mathematical and simulation framework is given, with a detailed description of the simulation algorithm. Examples of applications for the package are also provided, showing the performance and flexibility of IBMPopSim. abstract: >+ The `R` Package `IBMPopSim` facilitates the simulation of the random evolution of heterogeneous populations using stochastic Individual-Based Models (IBMs). The package enables users to simulate population evolution, in which individuals are characterized by their age and some characteristics, and the population is modified by different types of events, including births/arrivals, death/exit events, or changes of characteristics. The frequency at which an event can occur to an individual can depend on their age and characteristics, but also on the characteristics of other individuals (interactions). Such models have a wide range of applications in fields including actuarial science, biology, ecology or epidemiology. `IBMPopSim` overcomes the limitations of time-consuming IBMs simulations by implementing new efficient algorithms based on thinning methods, which are compiled using the `Rcpp` package while providing a user-friendly interface. keywords: [Individual-based models, stochastic simulation, population dynamics, Poisson measures, thinning method, actuarial science, insurance portfolio simulation] citation: type: article-journal container-title: "Computo" doi: "10.57750/sfxn-1t05" url: https://computo.sfds.asso.fr/published-202412-giorgi-efficient publisher: "Société Française de Statistique" issn: "2824-7795" bibliography: references.bib github-user: computorg repo: "published-202412-giorgi-efficient" draft: false published: true google-scholar: true format: computo-html: default computo-pdf: default crossref: cnj-title: Assumption cnj-prefix: Assumption execute: eval: true echo: true warning: false --- ```{r echo=FALSE} library(ggplot2) ``` # Introduction In various fields, advances in probability have contributed to the development of a new mathematical framework for so-called individual-based stochastic population dynamics, also called stochastic Individual-Based Models (IBMs). Stochastic IBMs allow the modeling in continuous time of populations dynamics structured by age and/or characteristics. In the field of mathematical biology and ecology, a large community has used this formalism for the study of the evolution of structured populations (see e.g. @FerTra09; @collet2013rigorous; @BanMel15; @costa2016stochastic; @billiard2016effect; @lavallee2019stochastic; @meleard2019birth; @calvez2020horizontal), after the pioneer works [@FouMel04;@tran_2008;@meleard2009trait]. IBMs are also useful in demography and actuarial sciences, for the modeling of human populations dynamics (see e.g. @Ben10; @Bou16; @karoui2021simulating). Indeed, they allow the modeling of heterogeneous and complex population dynamics, which can be used to compute demographic indicators or simulate the evolution of insurance portfolios in order to study the basis risk, compute cash flows for annuity products or pension schemes, or for a fine assessment of mortality models [@barrieu2012understanding]. There are other domains in which stochastic IBMs can be used, for example in epidemiology with stochastic compartmental models, neurosciences, cyber risk, or Agent-Based Models (ABMs) in economy and social sciences, which can be seen as IBMs. Many mathematical results have been obtained in the literature cited above, for quantifying the limit behaviors of IBMs over long time scales or in large population. In particular, pathwise representations of IBMs have been introduced in @FouMel04 (and extended to age-structured populations in @tran_2008; @meleard2009trait), as measure-valued pure jumps Markov processes, solutions of SDEs driven by Poisson measures. These pathwise representations are based on the *thinning* and projection of Poisson random measures defined on extended spaces. However, the simulation of large and interacting populations is often computationally expensive. The aim of the `R` package `IBMPopSim` is to meet the needs of the various communities for efficient tools in order to simulate the evolution of stochastic IBMs. `IBMPopSim` provides a general framework for the simulation of a wide class of IBMs, where individuals are characterized by their age and/or a set of characteristics. Different types of events can be included in the modeling by users, depending on their needs: births, deaths, entry or exit in/to the population and changes of characteristics (swap events). Furthermore, the various events that can happen to individuals in the population can occur at a non-stationary frequency, depending on the individuals' characteristics and time, and also including potential interactions between individuals. We introduce a unified mathematical and simulation framework for this class of IBMs, generalizing the pathwise representation of IBMs by thinning of Poisson measures, as well as the associated population simulation algorithm, based on an acceptance/rejection procedure. In particular, we provide general sufficient conditions on the event intensities under which the simulation of a particular model is possible. We opted to implement the algorithms of the `IBMPopSim` package using the `Rcpp` package, a tool facilitating the seamless integration of high-performance `C++` code into easily callable `R` functions [@JSSv040i08]. `IBMPopSim` offers user-friendly R functions for defining and simulating IBMs. Once events and their associated intensities are specified, an automated procedure creates the model. This involves integrating the user's source code into the primary `C++` code using a template mechanism. Subsequently, `Rcpp` is invoked to compile the model so that the model is integrated into the `R` session and callable with varying parameters, enabling the generation of diverse population evolution scenarios. Combined with the design of the simulation algorithms, the package structure yields very competitive simulation runtimes for IBMs, while staying user-friendly for `R` users. Several outputs function are also implemented in `IBMPopSim`. For instance the package allows the construction and visualization of age pyramids, as well as the construction of death and exposures table from the censored individual data, compatible with `R` packages concerned with mortality modelling, such as @Rdemography or @Rstmomo. Several examples are provided in the form of `R` vignettes on the [website](https://daphnegiorgi.github.io/IBMPopSim/), and in recent works of @karoui2021simulating and @roget2022positive. To the best of our knowledge, there are no other `R` packages currently available addressing the issue of stochastic IBMs efficient simulation. Another approach for simulating populations is continuous time microsimulation in social sciences, which is implemented in the `R` package `MicSim` [@Zin14]. In this framework, individual life-courses are specified by sequences of state transitions (events) and the time spans between these transitions. The state space is usually discrete and finite, which is not necessarily the case in `IBMPopSim`, where individuals can have continuous characteristics. But most importantly, microsimulation does not allow for interactions between individuals. Indeed, microsimulation produces separately the life courses of all individuals in the populations, based on the computation of the distribution functions of the waiting times in the distinct states of the state space, for each individual [@Zin14]. This can be slow in comparison to the simulation by thinning of event times occurring in the population, which is based on selecting event times among some competing proposed event times. Finally, `MicSim` simplifies the Mic-Core microsimulation tool implemented in Java [@zinn2009mic]. However, the implementation in `R` of simulation algorithms yields longer simulation run times than when using `Rcpp`. In @sec-overview, we give a short description of Stochastic Individual-Based Models (IBMs) and a quick example of model implementation with the `IBMPopSim` package. In @sec-math, we introduce the mathematical framework that characterizes the class of IBMs that can be implemented in the `IBMPopSim` package. In particular, a general pathwise representation of IBMs is presented. The population dynamics is obtained as the solution of an SDE driven by Poisson measures, for which we obtain existence and uniqueness results in @thm-eq-Z. In @sec-simulation the two main algorithms for simulating the population evolution of an IBM across the interval $[0, T]$ are detailed. In @sec-package we present the main functions of the `IBMPopSim` package, which allow for the definition of events and their intensities, the creation of a model, and the simulation of scenarios. Two examples are detailed in @sec-insurance-portfolio and @sec-example-interaction, featuring applications involving an heterogeneous insurance portfolio characterized by entry and exit events, and an age and size-structured population with intricate interactions. # Brief overview of IBMPopSim {#sec-overview} Stochastic Individual-Based Models (IBMs) represent a broad class of random population dynamics models, allowing the description of population evolution on an individual scale. Informally, an IBM can be summarized by the description of the individuals constituting the population, the various types of events that can occur to these individuals, along with their respective frequencies. In `IBMPopSim`, individuals can be characterized by their age and/or a collection of discrete or continuous characteristics. Moreover, the package enables users to simulate efficiently populations in which one or more of the following event types may occur: - **Birth event**: addition of an individual of age 0 to the population. - **Death event**: removal of an individual from the population. - **Entry event**: arrival of an individual in the population. - **Exit (emigration) event**: exit from the population (other than death). - **Swap event**: an individual changes characteristics. Each event type is linked to an associated event kernel, describing how the population is modified following the occurrence of the event. For some event types, the event kernel requires explicit specification. This is the case for entry events when a new individual joins the population, which requires to specify the age and characteristics of this new individual. For instance, the characteristics of a new individual in the population can be chosen uniformly in the space of all characteristics, or can depend on the distribution of his parents or those of the other individuals composing the population. The last component of an IBM are the event intensities. Informally, an event intensity is a function $\lambda^e_t(I, Z)$ describing the frequency at which an event $e$ can occur to an individual $I$ in a population $Z$ at a time $t$. Given a history of the population $(\mathcal{F}_t)$, the probability of event $e$ occurring to individual $I$ during a small interval of time $(t,t+dt]$ is proportional to $\lambda^e(I,t)$: $$ \mathbb{P}(\text{event } e \text{ occurring to $I$ during } (t,t+dt] | \mathcal{F}_t) \simeq \lambda^e_t(I, Z)dt. $$ The intensity function $\lambda^e$ can include various dependencies: - **individual intensity**: $\lambda^e$ depends only on the individual's $I$ age and characteristics, and time $t$, - **interaction intensity**: in addition $\lambda^e$ depends on the population composition $Z$. Prior to providing a detailed description of an Individual-Based Model (IBM), we present a simple model of birth and death in an age-structured _human_ population. We assume no interactions between individuals, and individuals are characterized by their gender, in addition to their age. In this simple model, all individuals, regardless of gender, can give birth when their age falls between 15 and 40 years, with a constant birth rate of 0.05. The death intensity is assumed to follow a Gompertz-type intensity depending on age. The birth and death intensities are then given by \begin{equation*} \lambda^b(t, I) = 0.05 \times \mathbf{1}_{[15,40]}(a(I,t)), \quad \lambda^d(t, I) = \alpha\exp(\beta a(I,t)), \end{equation*} with $a(I,t)$ the age of individual $I$ at time $t$. Birth events are also characterized with a kernel determining the gender of the newborn, who is male with probability $p_{male}$. ## Model creation All models in `IBMPopSim` are created with a call to the `mk_model` function, which takes the list of events as an argument. In this example, the events are created with the `mk_event_individual` function, involving a few lines of `cpp` instructions defining the intensity and, if applicable, the kernel of the event. For a more in depth description of the event creation step and its parameters, we refer to @sec-package-events. The events of this simple model are for example defined through the following calls. ```{r} #| echo: false library(IBMPopSim) ``` ```{r} birth_event <- mk_event_individual( type = "birth", intensity_code = "result = birth_rate(I.age(t));", kernel_code = "newI.male = CUnif(0,1) < p_male;") death_event <- mk_event_individual( type = "death", intensity_code = "result = alpha * exp(beta * I.age(t));") ``` In the `cpp` codes, the names `birth_rate`, `p_male`, `alpha` and `beta` refer to the model parameters defined in the following list. ```{r} params <- list( "alpha" = 0.008, "beta" = 0.02, "p_male" = 0.51, "birth_rate" = stepfun(c(15, 40), c(0, 0.05, 0))) ``` In a second step, the model is created by calling the function `mk_model`. A `cpp` source code is automatically created through a template mechanism based on the events and parameters, subsequently compiled using the `sourceCpp` function from the `Rcpp` package. ```{r} birth_death_model <- mk_model( characteristics = c("male" = "bool"), events = list(death_event, birth_event), parameters = params) ``` ## Simulation Once the model is created and compiled, the `popsim` function is called to simulate the evolution of a population according to this model. To achieve this, an initial population must be defined. In this example, we extract a population from a dataset specified in the package (a sample of $100\,000$ individuals based on the population of England and Wales in 2014). It is also necessary to set bounds for the events intensities. In this example, they are obtained by assuming that the maximum age for an individual is 115 years. ```{r} a_max <- 115 events_bounds = c( "death" = params$alpha * exp(params$beta * a_max), "birth" = max(params$birth_rate)) ``` The function `popsim` can now be called to simulate the population starting from the initial population `population(EW_pop_14$sample)` up to time $T = 30$. ```{r} sim_out <- popsim( birth_death_model, population(EW_pop_14$sample), events_bounds, parameters = params, age_max = a_max, time = 30) ``` The data frame `sim_out$population` contains the information (birth, death, gender) on individuals who lived in the population over the period $[0,30]$. Functions of the package allows to provide aggregated information on the population. # Mathematical framework {#sec-math} In this section, we define rigorously the class of IBMs that can be simulated in `IBMPopSim`, along with the assumptions that are required in order for the population to be simulatable. The representation of age-structured IBMs based on measure-valued processes, as introduced in @tran_2008, is generalized to a wider class of abstract population dynamics. The modeling differs slightly here, since individuals are _kept in the population_ after their death (or exit), by including the death/exit date as an individual trait. In the remainder of the paper, the filtered probability space is denoted by $(\Omega,\{\mathcal{F}_t \},{\mathbb{P}})$, under the usual assumptions. All processes are assumed to be càdlàg and adapted to the filtration $\{\mathcal{F}_t \}$ (for instance the history of the population) on a time interval $[0,T]$. For a càdlàg process $X$, we denote $X_{t^-} := \lim_{\genfrac{}{}{0pt}{2}{s\to t}{s<t}} X_s$. ## Population {#sec-population} As mentioned in @sec-overview a population is a collection of individuals whose evolution defines the population process. ### Individuals An individual is represented by a triplet $I = (\tau^b, \tau^d, x) \in \mathcal{I}= {\mathbb{R}}\times \bar {\mathbb{R}}\times {\mathcal{X}}$ with: - $\tau^b \in {\mathbb{R}}$ the date of birth, - $\tau^d \in \bar {\mathbb{R}}$ the death date, with $\tau^d = \infty$ if the individual is still alive, - a collection $x \in {\mathcal{X}}$ of characteristics where ${\mathcal{X}}$ is the space of characteristics. Note that in IBMs, individuals are usually characterized by their age $a(t) =t-\tau^b$ instead of their date of birth $\tau^b$. However, using the latter is actually easier for the simulation, as it remains constant over time. ### Population process The population at a given time $t$ is a random set $$ Z_t=\{ I_k \in \mathcal{I}; \; k= 1,\dots, N_t\}, $$ composed of all individuals (alive or dead) who have lived in the population before time $t$. As a random set, $Z_t$ can be represented by a random counting measure on $\mathcal{I}$ , that is an integer-valued measure $Z: \Omega \times \mathcal{I}\to \bar {\mathbb{N}}$ where for $A \in \mathcal{I}$, $Z(A)$ is the (random) number of individuals $I$ in the subset $A$. With this representation: $$ \begin{aligned} Z_t (\mathrm{d}\tau^b, \mathrm{d}\tau^d , \mathrm{d}x) &= \sum_{k=1}^{N_t} \delta_{I_k} (\tau^b, \tau^d,x), \\ \text{ with } \int_{\mathcal{I}} f(\tau^b, \tau^d, x) Z_t (\mathrm{d}\tau^b, \mathrm{d}\tau^d , \mathrm{d}x) &= \sum_{k=1}^{N_t} f(I_k). \end{aligned} $$ {#eq-popZ} The number of individuals present in the population *before time* $t$ is obtained by taking $f\equiv 1$: $$ N_t = \int_{\mathcal{I}} Z_t(\mathrm{d}\tau^b, \mathrm{d}\tau^d, \mathrm{d}x) = \sum_{k=1}^{N_t} \mathbf{1}_{\mathcal{I}} (I_k). $$ Note that $(N_t)_{t\geq 0}$ is an increasing process since dead/exited individuals are kept in the population $Z$. The number of alive individuals in the population at time $t$ is: $$ N_t^a = \int_{\mathcal{I}} \mathbf{1}_{\{\tau^d > t \} }Z_t(\mathrm{d}\tau^b, \mathrm{d}\tau^d, \mathrm{d}x) = \sum_{k=1}^{N_t} \mathbf{1}_{\{\tau^d_k > t \} }. $$ {#eq-Nta} Another example is the number of alive individuals of age over $a$ is $$ N_t([a,+\infty)) := \int_{\mathcal{I}} \mathbf{1}_{[a,+\infty)}(t-\tau^b)\mathbf{1}_{]t,\infty]}(\tau^d) Z_t(\mathrm{d}\tau^b, \mathrm{d}\tau^d, \mathrm{d}x) = \sum_{k=1}^{N_t} \mathbf{1}_{\{ t -\tau_k^b \geq a \}}\mathbf{1}_{\{\tau^d_k \geq t \} }. $$ ## Events {#sec-events} The population composition changes at random dates following different types of events. `IBMPopSim` allows the simulation of IBMs with the following events types: - A **birth** event at time $t$ is the addition of a new individual $I'=(t,\infty, X)$ of age $0$ to the population. Their date of birth is $\tau^b =t$, and characteristics is $X$, a random variable of distribution defined by the birth kernel $k^b(t,I,\mathrm{d}x)$ on ${\mathcal{X}}$, depending on $t$ and its parent $I$. The population size becomes $N_t = N_{t^-} + 1$, and the population composition after the event is $$Z_t = Z_{t^-} + \delta_{(t,\infty, X)}.$$ - An **entry** event at time $t$ is also the addition of an individual $I'$ in the population. However, this individual is not of age $0$. The date of birth and characteristics of the new individual $I'= (\tau^b, \infty, X)$ are random variables of probability distribution defined by the entry kernel $k^{en}(t, \mathrm{d}s, \mathrm{d}x)$ on ${\mathbb{R}}\times {\mathcal{X}}$. The population size becomes $N_t = N_{t^-} + 1$, and the population composition after the event is: $$ Z_t = Z_{t^-} + \delta_{(\tau^b, \infty, X)}. $$ - A **death** or **exit** event of an individual $I= (\tau^b,\infty, x)\in Z_{t^-}$ at time $t$ is the modification of its death date $\tau^d$ from $+\infty$ to $t$. This event results in the simultaneous addition of the individual $(\tau^b,t,x)$ and removal of the individual $I$ from the population. The population size is not modified, and the population composition after the event is $$ Z_t = Z_{t^-} +\delta_{(\tau^b,t,x)}- \delta_{I}. $$ - A **swap** event (change of characteristics) results in the simultaneous addition and removal of an individual. If an individual $I= (\tau^b,\infty, x) \in Z_{t^-}$ changes of characteristics at time $t$, then it is removed from the population and replaced by $I' = (\tau^b,\infty, X)$. The new characteristics $X$ is a random variable of distribution $k^s(t, I,\mathrm{d}x)$ on ${\mathcal{X}}$, depending on time, the individual's age and previous characteristics $x$. In this case, the population size is not modified and the population becomes: $$ Z_t = Z_{t^-} + \delta_{(\tau^b, \infty, X)} - \delta_{(\tau^b, \infty, x)}. $$ To summarize, the space of event types is $E = \{ b, en, d, s \}$, and the jump $\Delta Z_t = Z_t - Z_{t^-}$ (change in the population composition) generated by an event of type $e \in \{ b, en, d, s \}$ is denoted by $\phi^e(t, I)$. We thus have the following rules summarized in @tbl-event-action. Event | Type | $\phi^e(t, I)$ | New individual ------|:----:|----------------|--------------- Birth | $b$ | $\delta_{(t, \infty, X)}$ | $\tau^b =t, \; X \sim k^b(t,I,\mathrm{d}x)$ Entry | $en$ | $\delta_{(\tau^b, \infty, X)}$ | $(\tau^b, X) \sim k^{en}(t,\mathrm{d}s, \mathrm{d}x)$ Death/Exit | $d$ | $\delta_{(\tau^b, t,x)} - \delta_{(\tau^b, \infty, x)}$ | $\tau^d = t$ Swap | $s$ | $\delta_{(\tau^b, \infty , X)} - \delta_{(\tau^b, \infty, x)}$ | $X \sim k^s(t,I,\mathrm{d}x)$ : Action in the population for a given event type {#tbl-event-action} ::: {#rem-popfinale} ## Composition of the population - At time $T$, the population $Z_T$ contains all individuals who lived in the population before $T$, including dead/exited individuals. If there are no swap events, or entries, the population state $Z_t$ for any time $t\leq T$ can be obtained from $Z_T$. Indeed, if $Z_T = \sum_{k=1}^{N_T} \delta_{I_k}$, then the population at time $t\leq T$ is simply composed of the individuals born before $t$: $$ Z_t = \sum_{k=1}^{N_T} \mathbf{1}_{\{\tau^b_k \leq t \}} \delta_{I_k}. $$ - In the presence of entries (open population), a characteristic $x$ can track the individuals' entry dates. Then, the previous equation can be easily modified in order to obtain the population $Z_t$ at time $t\leq T$ from $Z_T$. ::: ## Events intensity {#sec-event-intensity} Once the different event types have been defined in the population model, the frequency at which each event $e$ occurs in the population has to be specified. Informally, the intensity $\Lambda^e_t(Z_t)$ at which an event $e$ can occur is defined by $$ \mathbb P\big( \text{event } e \text { occurs in the population } Z_t \in (t,t+\mathrm{d}t] | \mathcal{F}_t \big) \simeq \Lambda^e_t (Z_t)\mathrm{d}t. $$ For a more formal definition of stochastic intensities, we refer to @bremaud1981point or @KaaElK20. The form of the intensity function $(\Lambda^e_t (Z_t))$ determines the population simulation algorithm in `IBMPopSim`: - When the event intensity does not depend on the population state, $$ \big(\Lambda^e_t (Z_t)\big)_{t\in [0,T]} = \big(\mu^e(t)\big)_{t \in [0,T]}, $$ {#eq-poisson-intensity} with $\mu^e$ a deterministic function, the events of type $e$ occur at the jump times of an inhomogeneous Poisson process of intensity function $(\mu^e(t))_{t \in [0,T]}$. This is particularly useful when entry events occur with intensities influenced by environmental processes and/or exhibit seasonal variations. When such an event occurs, the individual to whom the event happens is drawn uniformly from the living individuals in the population. In a given model, the set of events $e\in E$ with Poisson intensities will be denoted by $\mathcal{P}$. - Otherwise, we assume that the global intensity $\Lambda^e_t(Z_t)$ at which the events of type $e$ occur in the population can be written as the sum of individual intensities $\lambda^e_t(I,Z_t)$: $$ \begin{aligned} &\Lambda^e_t (Z_t) = \sum_{k=1}^{N_t} \lambda^e_t ( I_k,Z_t), \\ & \text{with } \mathbb P\big( \text{event } e \text { occurs to an individual } I \in (t,t+\mathrm{d}t] | \mathcal{F}_t \big) \simeq \lambda^e_t (I,Z_t)\mathrm{d}t. \end{aligned} $$ {#eq-global-intensity} Obviously, nothing can happen to dead or exited individuals, i.e. individuals $I= (\tau^b, \tau^d, x)$ with $\tau^d \leq t$. Thus, individual event intensities are assumed to be null for dead/exited individuals: $$ \lambda^e_t ( I,Z_t) = 0, \text{ if }\tau^d \leq t, \text{ so that } \Lambda^e_t (Z_t) = \sum_{k=1}^{N_t^a} \lambda^e_t ( I_k,Z_t), $$ with $N^a_t$ the number of alive individuals at time $t$. The event's individual intensity $\lambda^e_t (I,Z_t)$ can depend on time (for instance when there is a mortality reduction over time), on the individual's age $t-\tau^b$ and characteristics, but also on the population composition $Z_t$. The dependence of $\lambda^e$ on the population $Z$ models interactions between individuals in the populations. Hence, two types of individual intensity functions can be implemented in `IBMPopSim`: 1. *No interactions:* The intensity function $\lambda^e$ does not depend on the population composition. The intensity at which an event of type $e$ occurs to an individual $I$ only depends on its date of birth and characteristics: $$ \lambda^e_t (I,Z_t) = \lambda^e(t, I), $$ {#eq-intensity-no-interaction} where $\lambda^e: \mathbb{R}_+ \times \mathcal{I}\to {\mathbb{R}}^+$ is a deterministic function. In a given model, we denote by $\mathcal{E}$ the set of event types with individual intensity @eq-intensity-no-interaction. 2. *"Quadratic" interactions:* The intensity at which an event of type $e$ occurs to an individual $I$ depends on $I$ and on the population composition, through an interaction function $W^e$. The quantity $W^e(t, I,J)$ describes the intensity of interactions between two alive individuals $I$ and $J$ at time $t$, for instance in the presence of competition or cooperation. In this case, we have $$ \lambda^e_t(I,Z_t)=\sum_{j=1}^{N_t} W^e(t, I, I_j) = \int_{\mathcal{I}} W^e(t, I, (\tau^b,\tau^d,x)) Z_t (\mathrm{d}\tau^b,\mathrm{d}\tau^d, \mathrm{d}x), $$ {#eq-intensity-interaction} where $W^e(t, I, (\tau^b,\tau^d,x)) = 0$ if the individual $J =(\tau^b,\tau^d,x)$ is dead, i.e. $\tau^d \leq t$.\ In a given model, we denote by $\mathcal{E}_W$ the set of event types with individual intensity @eq-intensity-interaction. To summarize, an individual intensity in IBMPopSim can be written as: $$ \lambda^e_t(I,Z_t) = \lambda^e(t, I) \mathbf{1}_{\{e \in \mathcal{E}\}} + \biggl( \sum_{j=1}^{N_t} W^e(t, I, I_j) \biggr) \mathbf{1}_{\{e \in \mathcal{E}_W\}}. $$ {#eq-individual-intensity} ::: {#exm-popfinale} 1. An example of death intensity without interaction for an individual $I=(\tau^b, \tau^d, x)$ alive at time $t$, $t < \tau^d$, is: $$ \lambda^d(t,I) = \alpha_x \exp(\beta_x a(I,t)), \text{ where } a(I,t) = t-\tau^b $$ is the age of the individual $I$ at time $t$. In this standard case, the death rate of an individual $I$ is an exponential (Gompertz) function of the individual's age, with coefficients depending on the individual's characteristics $x$. 2. In the presence of competition between individuals, the death intensity of an individual $I$ also depends on other individuals $J$ in the population. For example, if $I=(\tau^b,\tau^d, x)$, with its size $x$, then we have: $$ W^d(t,I,J) = (x_J - x)^+ \mathbf{1}_{\{\tau^d_J > t\}}, \quad \forall \; J=(\tau^b_J,\tau^d_J , x_J). $$ {#eq-ex-interaction} This can be interpreted as follows: if the individual $I$ meets randomly an individual $J$ alive at time $t$, and of bigger size $x_J > x$, then he can die at the intensity $x_J-x$. If $J$ is smaller than $I$, then it cannot kill $I$. The bigger is the size $x$ of $I$, the lower is its death intensity $\lambda^d_t(I,Z_t)$ defined by $$ \lambda^d_t(I,Z_t) = \sum_{\genfrac{}{}{0pt}{2}{J\in Z_t,}{x_J > x}} (x_J -x)\mathbf{1}_{\{\tau^d_J > t\}}. $$ 3. `IBMPopSim` can simulate IBMs that include intensities expressed as a sum of Poisson intensities and individual intensities of the form $\Lambda^e(Z_t) =\mu^e_t + \sum_{k=1}^{N_t} \lambda^e(I_k, Z_t)$. Other examples are provided in @sec-insurance-portfolio and @sec-example-interaction. Finally, the global intensity at which an event can occur in the population is defined by: $$ \Lambda_t(Z_t) = \sum_{e\in \mathcal{P}} \mu^e(t) + \sum_{e \in \mathcal E} \Big(\sum_{k=1}^{N_t} \lambda^e(t, I_k)\Big) + \sum_{e \in \mathcal E_W} \Big(\sum_{k=1}^{N_t}\sum_{j=1}^{N_t} W^e(t, I_k, I_j)\Big). $$ {#eq-global-ev-intensity} An important point is that for events $e \in \mathcal E$ without interactions, the global event intensity $\Lambda^e_t(Z_t) = \sum_{k=1}^{N_t} \lambda^e(t, I_k)$ is _of order_ $N_t^a$ defined in @eq-Nta (number of alive individuals at time $t$). On the other hand, for events $e \in \mathcal{E}_W$ with interactions, $\Lambda^e_t(Z_t) = \sum_{k=1}^{N_t}\sum_{j=1}^{N_t} W^e(t, I_k, I_j)$ is of order $(N_t^a)^2$. Informally, this means that when the population size increases, events with interaction are more costly to simulate. Furthermore, the numerous computations of the interaction kernel $W^e$ can also be computationally costly. The randomized @alg-randomized, detailed in @sec-simulation-randomized, allows us to overcome these limitations. ::: ### Events intensity bounds {.unnumbered} The simulation algorithms implemented in `IBMPopSim` are based on an acceptance/rejection procedure, which requires the user to specify bounds for the various events intensities $\Lambda^e_t(Z_t)$. These bounds are defined differently depending on the expression of the intensity. ::: {#cnj-intensity-poisson} For all events $e \in \mathcal{P}$ with Poisson intensity (@eq-poisson-intensity), the intensity is assumed to be bounded on $[0,T]$: $$ \forall t \in [0,T], \quad \Lambda^e_t(Z_t) = \mu^e(t) \leq \bar \mu^e. $$ ::: When $e \in \mathcal{E} \cup \mathcal{E}_W$, $\Lambda^e_t(Z_t) =\sum_{k=1}^{N_t} \lambda^e_t(I_k,Z_t)$, assuming that $\Lambda^e_t(Z_t)$ is uniformly bounded is too restrictive since the event intensity depends on the population size. In this case, the assumption is made on the individual intensity $\lambda^e$ or on the interaction function $W^e$, depending on the situation. ::: {#cnj-intensity1} For all event types $e \in \mathcal{E}$, the associated individual event intensity $\lambda^e$ with no interactions (@eq-intensity-no-interaction) is assumed to be uniformly bounded: $$ \lambda^e(t, I) \leq \bar \lambda^e, \quad \forall \; t\in [0, T], \; I \in \mathcal{I}. $$ In particular, $$ \forall t \in [0,T], \quad \Lambda^e_t (Z_t) = \sum_{k=1}^{N_t} \lambda^e(t, I) \leq \bar \lambda^e N_t . $$ {#eq-def-bar-Lambda} ::: ::: {#cnj-intensity2} For all event types $e \in \mathcal{E}_W$, the associated interaction function $W^e$ is assumed to be uniformly bounded: $$ W^e(t, I, J) \leq \bar W^e, \quad \forall \; t\in [0,T], \; I, J \in \mathcal{I}. $$ In particular, $\forall t \in [0,T]$, $$ \lambda^e_t (I,Z_t) = \sum_{j=1}^{N_t} W^e(t, I, I_j) \leq \bar W^e N_t, \quad \text{and} \quad \Lambda^e_t (Z_t) \leq \bar W^e (N_t)^2. $$ ::: @cnj-intensity-poisson, @cnj-intensity1 and @cnj-intensity2 yield that events in the population occur with the global event intensity $\Lambda_t(Z_t)$, given in @eq-global-ev-intensity, which is dominated by a polynomial function in the population size: $$ \Lambda_t(Z_t) \leq \bar \Lambda(N_t), \quad \text{with } \bar \Lambda (n) = \sum_{e \in \mathcal{P}} \bar \mu^e + \sum_{e\in \mathcal{E}}\bar \lambda^e n + \sum_{e \in \mathcal E_W} \bar W^e n^2. $$ {#eq-def-bar-Lambda} This bound is linear in the population size if there are no interactions, and quadratic if there at least is an event including interactions. This assumption is the key to the algorithms implemented in `IBMPopSim`. Before presenting the simulation algorithm, we close this section with a rigorous definition of an IBM, based on the pathwise representation of its dynamics as a Stochastic Differential Equation (SDE) driven by Poisson random measures. ## Pathwise representation of stochastic IBM Since the seminal paper of @FouMel04, it has been shown that a stochastic IBM dynamics can be defined rigorously as the unique solution of an SDE driven by Poisson measures, under reasonable non explosion conditions. This representation has actually been proved only in particular case, for various models. We provide here a general mathematical framework. Some recalls on Poisson random measures are presented in the Appendix @sec-recall-poisson, and for more details on these representations of particular examples, we refer to the abundant literature on the subject (see [@bremaud1981point; @Cin11] and the references therein). In the following we consider an individual-based stochastic population $(Z_t)_{t\in [0,T]}$, keeping the notations introduced in @sec-events and @sec-event-intensity for the events and their intensities. In particular, the set of events types that define the population evolution is denoted by $\mathcal{P} \cup \mathcal{E} \cup \mathcal{E}_W \subset E$, with $\mathcal{P}\,$ the set of events types with Poisson intensity verifying @cnj-intensity-poisson, $\mathcal{E}$ the set of events types with individual intensity and no interaction, verifying @cnj-intensity1 and finally $\mathcal{E}_W$ the set of event types with interactions, verifying @cnj-intensity2. ### Non-explosion criterion {.unnumbered} First, one has to ensure that the number of events occurring in the population will not explode in finite time, leading to an infinite simulation time. @cnj-intensity1 and @cnj-intensity2 are not sufficient to guarantee the non explosion of the event number, due to the potential explosion of the population size in the presence of interactions. An example is the case when only birth events occur, with an intensity $\Lambda^b_t(Z_t) = C_b (N_t^a)^2$ (i.e. when $W^b(t, I,J) =C_b$). Then, the number of alive individuals $(N_t^a)_{t\geq 0}$ is a well-known pure birth process of intensity function $g(n) = C_b n^2$ (intensity of moving from state $n$ to $n+1$). This process explodes in finite time, since $g$ does not verify the necessary and sufficient non explosion criterion for pure birth Markov processes: $\sum_{n=1}^\infty \frac{1}{g(n)} = \infty$ (see e.g. Theorem 2.2 in [@BanMel15]). There is thus an explosion in finite time of birth events. This example shows that the important point for non explosion is to control the population size. We give below a general sufficient condition on birth and entry event intensities, in order for the population size to stay finite in finite time. This ensures that the number of events does not explode in finite time. Informally, the idea is to control the intensities by a pure birth intensity function verifying the non-explosion criterion. ::: {#cnj-non-explosion} Let $e\in\{b, en\}$ a birth or entry event type. If the intensity at which the events of type $e$ occur in the population are not Poissonian, i.e. $e \in \mathcal{E} \cup \mathcal{E}_W$, then there exists a function $f^e : {\mathbb{N}}\to (0, +\infty)$, such that $$ \sum_{n=1}^{\infty} \frac{1}{nf^e(n)} = \infty, $$ and for all individual $I \in \mathcal{I}$ and population measure $Z = \sum_{k=1}^{n} \delta_{I_k}$ of size $n$, $$ \lambda^e_t (I, Z) \leq f^e(n), \; \forall \; 0\leq t \leq T. $$ ::: ::: {#rem} If $e \in \mathcal{E}$, $\lambda_t^e(I,Z) = \lambda^e(t,I) \leq \bar{\lambda}^e$ by the domination @cnj-intensity2, then @cnj-non-explosion is always verified with $f^e(n) = \bar{\lambda}^e$. ::: @cnj-non-explosion yields that the global intensity $\Lambda_t^e(\cdot)$ of event $e$ is bounded by a function $g^e$ only depending on the population size: $$ \Lambda_t^e (Z) \leq g^e(n) := nf^e(n), \quad \text{with }\sum_{n=1}^{\infty} \frac{1}{g^e(n)} = \infty. $$ If $e\in \mathcal{P}\,$ has a Poisson intensity, then $\Lambda_t^e(Z) =\mu^e_t$ always verifies the previous equation with $g^e(n) = \bar \mu^e$. Before introducing the IBM SDE, let us give an idea of the equation construction. Between two successive events, the population composition $Z_t$ stays constant, since the population process $(Z_t)_{t \geq 0}$ is a pure jump process. Furthermore, since each event type is characterized by an intensity function, the jumps occurring in the population can be represented by restriction and projection of a Poisson measure defined on a larger state space. More precisely, we introduce a random Poisson measure $Q$ on $\mathbb R^+ \times \mathcal{J}\times \mathbb{R}^+$, with $\mathcal{J}= \mathbb N \times(\mathcal E \cup \mathcal{E}_W)$. $Q$ is composed of random quadruplets $(\tau, k , e, \theta)$, where $\tau$ represents a potential event time for an individual $I_k$ and event type $e$. The last variable $\theta$ is used to accept/reject this proposed event, depending on the event intensity. Hence, the Poisson measure is restricted to a certain random set and then projected on the space of interest ${\mathbb{R}}^+ \times \mathcal{J}$. If the event is accepted, then a jump $\phi^e(\tau,I_k)$ occurs. ::: {#thm-eq-Z} ## Pathwise representation Let $T\in {\mathbb{R}}^+$ and $\mathcal{J}= \mathbb N \times(\mathcal E \cup \mathcal{E}_W)$. Let $Q$ be a random Poisson measure on $\mathbb R^+ \times \mathcal{J}\times \mathbb{R}^+$, of intensity $\mathrm{d}t \delta_{\mathcal{J}}(\mathrm{d}k,\mathrm{d}e) (\theta)\mathrm{d}\theta$, with $\delta_{\mathcal{J}}$ the counting measure on $\mathcal{J}$. Finally, let $Q^{\mathcal P}$ be a random Poisson measure on $\mathbb R^+ \times \mathcal{P} \times \mathbb{R}^+$, of intensity $\mathrm{d}t \delta_{\cal P}(\mathrm{d}e)\mathrm{d}\theta$, and $Z_0= \sum_{k=1}^{N_0} \delta_{I_k}$ an initial population. Then, under @cnj-non-explosion , there exists a unique measure-valued population process $Z$, strong solution on the following SDE driven by the Poisson measure $Q$: $$ \begin{aligned} Z_t = Z_0 &+ \int_0^t \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{s^-}\} }\mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-})\}} Q (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ) \\ &+ \int_0^t \int_{\mathcal{P} \times \mathbb R^+} \phi^e(s, I_{s^-}) \mathbf{1}_{\{\theta \leq \mu^e(s) \}} Q^{\mathcal{P}} (\mathrm{d}s ,\mathrm{d}e, \mathrm{d}\theta), \qquad \forall 0 \leq t \leq T, \end{aligned} $$ {#eq-sde-eq-Z} and where $I_{s^-}$ is an individual, chosen uniformly among alive individuals in the population $Z_{s^-}$. ::: The proof of @thm-eq-Z is detailed in the Appendix, @sec-prf-thm-eq-Z. Note that @eq-sde-eq-Z is an SDE describing the evolution of the IBM, the intensity of the events in the right hand side of the equation depending on the population process $Z$ itself. The main idea of the proof of @thm-eq-Z is to use the non explosion property of @lem-non-explosion, and to write the r.h.s of @eq-sde-eq-Z as a sum of simple equations between two successive events, solved by induction. ::: {#lem-non-explosion} Let $Z$ be a solution of @eq-sde-eq-Z on ${\mathbb{R}}^+$, with $(T_n)_{n\geq 0}$ its jump times, $T_0 = 0$. If @cnj-non-explosion is satisfied, then $$ \lim_{n \to \infty} T_n = \infty, \quad {\mathbb{P}}\text{-a.s.} $$ ::: The proof of @lem-non-explosion, detailed in Appendix @sec-prf-lem-non-explosion is more technical and relies on a pathwise comparison result, generalizing those obtained in [@KaaElK20]. An alternative pathwise representation of the population process, inspired by the randomized @alg-randomized is given as well in @prp-randomized. # Population simulation {#sec-simulation} We now present the main algorithm for simulating the evolution of an IBM over $[0,T]$. The algorithm implemented in `IBMPopSim` allows the exact simulation of @eq-sde-eq-Z, based on an acceptance/reject algorithm for simulating random times called *thinning*. The exact simulation of event times with this acceptance/reject procedure is closely related to the simulations of inhomogeneous Poisson processes by the so-called thinning algorithm, often attributed to @LewShe79. The simulation methods for inhomogeneous Poisson processes can be adapted to IBMs, and we introduce in this section a general algorithm extending those by @FouMel04 (see also @FerTra09, @Ben10). It can be noted that under appropriate rescaling and when the population size goes to infinity, an IBM can be approximated by a non linear transport PDE, structured by age and trait. A central limit theorem can also be obtained under appropriate assumptions [@tran_2008]. In the presence of interactions as in @sec-example-interaction for instance, the IBM goes almost surely to extinction in finite time, which is not the case for the limit PDE. In this case, simulating the microscopic process can be quite useful for approximating the distribution of the extinction time. Other applications of IBM simulations can include the simulation of multiscale population evolution, strongly heterogeneous populations, or small populations with strong interactions. The algorithm is based on exponential "candidate" event times, chosen with a (constant) intensity which must be greater than the global event intensity $\Lambda_t(Z_t)$ (@eq-global-intensity). Starting from time $t$, once a candidate event time $t + \bar T_\ell$ has been proposed, a candidate event type $e$ (birth, death,\...) is chosen with a probability $p^e$ depending on the event intensity bounds $\bar \mu^e$, $\bar \lambda^e$ and $\bar W^e$, as defined in @cnj-intensity1 and @cnj-intensity2. An individual $I$ is then drawn from the population. Finally, it remains to accept or reject the candidate event with a probability $q^e(t,I,Z_t)$ depending on the true event intensity. If the candidate event time is accepted, then the event $e$ occurs at time $t + \bar T_\ell$ to the individual $I$. The main idea of the implemented algorithm can be summarized as follows: 1. Draw a candidate time $t + \bar T_\ell$ and candidate event type $e$. 2. Draw a uniform variable $\theta \sim \mathcal{U}([0, 1])$ and individual $I$. 3. **If** $\theta \leq q^e(t,I,Z_t)$ **then** event $e$ occur to individual $I$, **else** Do nothing and start again from $t + \bar T_\ell$. Before introducing the main algorithms in more details, we recall briefly the thinning procedure for simulating inhomogeneous Poisson processes, as well as the links with pathwise representations. Some recalls on Poisson random measures are presented in @sec-recall-poisson. For a more general presentation of thinning of a Poisson random measure, see [@Dev86; @Cin11; @Kal17]. ## Thinning of Poisson measure {#sec-thinning} Let us start with the simulation and pathwise representation of an inhomogeneous Poisson process on $[0,T]$ with intensity $(\Lambda(t))_{t\in [0,T]}$. The thinning procedure is based on the fundamental assumption that $\Lambda(t) \leq \bar \Lambda$ is bounded on $[0,T]$. In this case, the inhomogeneous Poisson can be obtained from an homogeneous Poisson process of intensity $\bar \Lambda$, which can be simulated easily. First, the Poisson process can be extended to a Marked Poisson measure $\bar Q:= \sum_{\ell \ge 1} \delta_{(\bar T_\ell, \bar \Theta_\ell)}$ on $(\mathbb{R}^+)^2$, defined as follow: - The jump times of $(\bar T_\ell)_{\ell \ge 1}$ of $\bar Q$ are the jump times of a Poisson process of intensity $\bar \Lambda$. - The marks $(\bar \Theta_\ell)_{\ell \ge 1}$ are _i.i.d._ random variables, uniformly distributed on $[0, \bar \Lambda]$. By @prp-marked-poisson , $\bar{Q}$ is a Poisson random measure with mean measure $$ \bar \mu(\mathrm{d}t, \mathrm{d}\theta): = \bar \Lambda \mathrm{d}t \frac{\mathbf{1}_{[0, \bar \Lambda]}(\theta)}{\bar \Lambda} \mathrm{d}\theta= \mathrm{d}t \mathbf{1}_{[0, \bar \Lambda]}(\theta) \mathrm{d}\theta. $$ In particular, the average number of atoms $(\bar T_\ell, \bar \Theta_\ell)$ in $[0,t]\times [0,h]$ is $$ \mathbb{E}[Q([0,t]\times [0,h])]={\mathbb{E}}[\sum_{\ell} \mathbf{1}_{[0,t]\times [0,h]} (\bar T_\ell, \bar \Theta_{\ell})] = \int_{(\mathbb{R}^+)^2} \bar \mu(\mathrm{d}t, \mathrm{d}\theta) = t (\bar \Lambda \wedge h). $$ The thinning is based on the restriction property for Poisson measure: for a measurable set $\Delta\subset {\mathbb{R}}^+\times {\mathbb{R}}^+$, the restriction $Q^\Delta:= \mathbf{1}_{\Delta}\bar Q$ of $\bar Q$ to $\Delta$ (by taking only atoms in $\Delta$) is also a Poisson random measure of mean measure $\mu^{\Delta}(\mathrm{d}t, \mathrm{d}\theta) = \mathbf{1}_{\Delta}(t,\theta) \bar \mu(\mathrm{d}t, \mathrm{d}\theta).$ In order to obtain an inhomogeneous Poisson measure of intensity $(\Lambda(t))$, the "good" choice of $\Delta$ is the hypograph of $\Lambda$: $\Delta =\{ (t,\theta) \in [0,T]\times [0,\bar \Lambda] ; \; \theta \leq \Lambda(t)\}$ (see @fig-thinning). Then, $$ Q^\Delta = \sum_{\ell \ge 1} \mathbf{1}_{\left\{\bar \Theta_\ell \le \Lambda(\bar T_\ell)\right\}} \delta_{(\bar T_\ell, \bar \Theta_\ell)}, $$ and since $\Lambda(t) \leq \bar \Lambda$, on $[0,T]$: $$ \begin{aligned} \mu^{\Delta}(\mathrm{d}t, \mathrm{d}\theta) & = \mathbf{1}_{\{ \theta \leq \Lambda(t)\}} \mathrm{d}t \mathbf{1}_{[0, \bar \Lambda]}(\theta)\mathrm{d}\theta = \mathbf{1}_{\{\theta \leq \Lambda(t)\}} \mathrm{d}t \mathrm{d}\theta. \end{aligned} $$ ![Realization of a Marked Poisson measure $\bar Q$ on $[0,T]$ with mean measure $\bar \mu(\mathrm{d} t, \mathrm{d} \theta) = \mathrm{d} t \mathbf{1}_{[0, \bar \Lambda]}(\theta) \mathrm{d} \theta$ (red crosses), and realization of the restriction $\bar Q^\Delta$ where $\Delta = \{ (t,\theta)\in [0,T]\times[0, \bar \Lambda], \theta \leq \Lambda(t) \}$ (blue circles). The projection of $\bar Q^\Delta$ on first component is an inhomogeneous Poisson process on $[0,T]$ of intensity $(\Lambda(t))$ and jump times $(T_k)_{k \ge 1}$.](img/plot_thinning.png){#fig-thinning} Finally, the inhomogeneous Poisson process is obtained by the projection @prp-proj-poisson-measure, which states that the jump times of $Q^\Delta$ are the jump times of an inhomogeneous Poisson process of intensity $(\Lambda(t))$. ::: {#prp-thinning} The counting process $N^\Lambda$, projection of $Q^{\Delta}$ on the time component and defined by, $$ N^{\Lambda}_t := Q^{\Delta}( [0,t] \times {\mathbb{R}}^+) = \int_0^t \int_{{\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \Lambda(s)\}} \bar Q(\mathrm{d}s, \mathrm{d}\theta) = \sum_{\ell \geq 1} \mathbf{1}_{\{ \bar T_\ell \leq t \}} \mathbf{1}_{\{\bar \Theta_\ell \leq \Lambda(\bar T_\ell) \}}, \quad \forall t \in [0,T], $$ {#eq-thinning-inhomogeneous-poisson} is an inhomogeneous Poisson process on $[0,T]$ of intensity function $(\Lambda(t))_{t\in [0,T]}$. The thinning @eq-thinning-inhomogeneous-poisson is a pathwise representation of $N^\Lambda$ by *restriction and projection* of the Poisson measure $Q$ on $[0,T]$. ::: The previous proposition yields a straightforward thinning algorithm to simulate the jump times $(T_k)_{k \ge 1}$ of an inhomogeneous Poisson process of intensity $\Lambda(t)$, by selecting jump times $\bar T_\ell$ such that $\bar \Theta_\ell \le \Lambda(\bar T_\ell)$. ### Multivariate Poisson process {#sec-vector} This can be extended to the simulation of multivariate inhomogeneous Poisson processes, which is an important example before tackling the simulation of an IBM. Let $(N^j)_{j \in \mathcal{J}}$ be a (inhomogeneous) multivariate Poisson process indexed by a finite set $\mathcal{J}$, such that $\forall j \in \mathcal{J}$, the intensity $(\lambda_j(t))_{t\in [0,T]}$ of $N_j$ is bounded on $[0,T]$: $$ \sup_{t\in[0,T]} \lambda_j(t) \le \bar \lambda_j, \text{ and let } \bar \Lambda = \sum_{j \in \mathcal{J}} \bar \lambda_j. $$ Recall that such multivariate counting process can be rewritten as a Poisson random measure $N= \sum_{k\geq 1} \delta_{(T_k,J_k)}$ on ${\mathbb{R}}^+\times \mathcal{J}$ (see e.g. Sec. 2 of Chapter 6 in [@Cin11]), where $T_k$ is $k$th jump time of $\sum_{j\in \mathcal{J}} N^j$ and $J_k$ corresponds to the component of the the vector which jumps. In particular, $N^j_t = N([0,t]\times \{j\})$. Once again the simulation of such process can be obtained from the simulation of a (homogeneous) multivariate Poisson process of intensity vector $(\bar{\lambda}_j)_{j \in \mathcal{J}}$, extended into a Poisson measures by adding marks on ${\mathbb{R}}^+$. Thus, we introduce the Marked Poisson measure $\bar Q = \sum \delta_{(\bar T_\ell, \bar J_\ell, \bar \Theta_\ell)}$ on ${\mathbb{R}}^+ \times \mathcal{J}\times {\mathbb{R}}^+$, such that: - The jump times $(\bar T_\ell)$ of $\bar Q$ are the jump times of a Poisson measure of intensity $\bar \Lambda$. - The variables $(\bar J_\ell)$ are *i.i.d.* random variables on $\mathcal{J}$, with $\displaystyle p_j= {\mathbb{P}}(\bar J_1 = j)= \bar \lambda_j/\bar \Lambda$ and representing the component of the vector which jumps. - The marks $(\bar \Theta_\ell)$ are independent variables with $\bar \Theta_\ell$ a uniform random variable on $[0,\bar \lambda_{{\bar J}_\ell}]$, $\forall \ell \geq 1$. By @prp-marked-poisson and @prp-proj-poisson-measure, each measure $\bar Q_j (\mathrm{d}t, \mathrm{d}\theta) = \bar Q(\mathrm{d}t, \{j\}, \mathrm{d}\theta) = \sum_{\ell \geq 1} \mathbf{1}_{\{\bar J_\ell=j \}} \delta_{(\bar T_\ell, \bar \Theta_\ell)}$ is a marked Poisson measure of intensity $$ \bar \mu_j ( \mathrm{d}t ,\mathrm{d}\theta) = \bar{\Lambda}p_j \mathrm{d}t \frac{\mathbf{1}_{\{\theta \leq \bar \lambda_{j}\}}(\theta)}{\bar \lambda_{j}} \mathrm{d}\theta = \mathrm{d}t \mathbf{1}_{\{\theta \leq \bar \lambda_{j}\}}(\theta) \mathrm{d}\theta. $$ As a direct application of @prp-thinning , the inhomogeneous multivariate Poisson process is obtained by restriction of each measures $\bar Q_j$ to $\Delta_j = \{ (t, \theta) \in [0,T] \times [0,\bar \lambda_j] ;\; \theta \leq \lambda_j(t) \}$ and projection. ::: {#prp-thinning-vector} The multivariate counting process $(N^j)_{j \in \mathcal{J}}$, defined for all $j \in \mathcal{J}$ and $t \in [0,T]$ by thinning and projection of $\bar Q$: $$ N^j_t := \int_0^t \int_{ {\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \lambda_j(s)\}} \bar{Q}(\mathrm{d}s, \{j\}, \mathrm{d}\theta) = \sum_{\ell \geq 1} \mathbf{1}_{\{ \bar T_\ell \leq t \}}\mathbf{1}_{\{\bar J_\ell = j\}} \mathbf{1}_{\{\bar \Theta_\ell \leq \lambda_j (\bar T_\ell ) \}}, $$ is an inhomogeneous Poisson process of intensity vector $(\lambda_j(t))_{j\in\mathcal{J}}$ on $[0,T]$. ::: @prp-thinning-vector yields the following simulation @alg-thinning2 for multivariate Poisson processes. ```pseudocode #| label: alg-thinning2 #| html-indent-size: "1.2em" #| html-comment-delimiter: "//" #| html-line-number: true #| html-line-number-punc: ":" #| html-no-end: false #| pdf-placement: "htb!" #| pdf-line-number: true \begin{algorithm} \caption{Thinning algorithm for multivariate inhomogeneous Poisson processes.} \begin{algorithmic} \State \textbf{Input:} Functions and bounds $(\lambda_j, \bar \lambda_j)$, $\lambda_j: [0,T] \to [0,\bar \lambda_j]$ and $\bar \Lambda = \sum_{j \in \mathcal{J}} \bar \lambda_j$ \State \textbf{Output:} Points $(T_k, J_k)$ of Poisson measure $N$ on $[0,T] \times \mathcal{J}$ \State Initialization $T_0 \longleftarrow 0$, $\bar T_0 \longleftarrow 0$ \While{$T_k < T$} \Repeat \State increment iterative variable $\ell \longleftarrow \ell+1$ \State compute next proposed time $\bar T_{\ell} \longleftarrow \bar T_{\ell-1} + S_\ell$ with $S_\ell \sim \mathcal{E}(\bar \Lambda)$ \State draw $\bar J_\ell \sim \mathcal{U}\big\{ \bar \lambda_j / \bar \Lambda, j \in \mathcal{J} \big\}$ i.e. $\mathbb{P}(\bar J_\ell = j) = \bar \lambda_j / \bar \Lambda$ \State draw $\bar \Theta_\ell \sim \mathcal{U}([0,\bar \lambda_{\bar J_\ell}])$ \Until{accepted event $\bar \Theta_\ell \le \lambda_{\bar J_\ell}(\bar T_{\ell})$} \State record $(T_k, J_k) \longleftarrow (\bar T_{\ell}, \bar J_\ell)$ as accepted point \EndWhile \end{algorithmic} \end{algorithm} ``` ::: {#rem-alternate-thinning} The acceptance/rejection @alg-thinning2 can be efficient when the functions $\lambda_j$ are of different order, and thus bounded by different $\bar \lambda_j$. However, it is important to note that the simulation of the discrete random variables $(\bar J_\ell)$ can be costly (compared to a uniform law) when $\mathcal{J}$ is large, for instance when an individual is drawn from a large population. In this case, an alternative is to choose the same bound $\bar \lambda_j= \bar \lambda$ for all $j \in \mathcal{J}$. Then the marks $(\bar J_\ell, \bar \Theta_\ell)$ are *i.i.d.* uniform variables on $\mathcal{J}\times [0,\bar \lambda]$, faster to simulate. ::: ## Simulation algorithm {#sec-simulation-algo} Let us now come back to the simulation of the IBM introduced in @sec-overview. For ease of notations, we assume that there are no event with Poisson intensity ($\mathcal{P} =\emptyset$), so that all events that occur are of type $e \in \mathcal{E} \cup \mathcal{E}_W$, with individual intensity $\lambda_t^e(I,Z_t)$ depending on the population composition $Z_t$ ($e \in \mathcal{E}_W$) or not ($e \in \mathcal{E}$), as defined in @eq-individual-intensity and verifying either @cnj-intensity1 or @cnj-intensity2. The global intensity @eq-global-ev-intensity at time $t \in [0,T]$ is thus $$ \Lambda_t(Z_t) = \sum_{e \in \mathcal{E}} \Big( \sum_{k=1}^{N_t} \lambda^e(t, I_k) \Big) + \sum_{e \in \mathcal{E}_W} \Big( \sum_{k=1}^{N_t} \sum_{j=1}^{N_t} W^e(t, I_k, I_j) \Big) \leq \bar \Lambda(N_t), $$ {#eq-def-intensity} with $\bar \Lambda(n) = \big(\sum_{e \in \mathcal{E}} \bar \lambda^e \big) n + \big( \sum_{e \in \mathcal{E}_W} \bar W^e \big) n^2$. One of the main difficulty is that the intensity of events is not deterministic as in the case of inhomogeneous Poisson processes, but a function $\Lambda_t(Z_t)$ of the population state, bounded by a function which also depends on the population size. However, the @alg-thinning2 can be adapted to simulate the IBM. The construction is done by induction, by conditioning on the state of the population $Z_{T_k}$ at the $k$th event time $T_k$ ($T_0 = 0$). We first present the construction of the first event at time $T_1$. ### First event simulation {.unnumbered} Before the first event time, on $\{ t < T_1 \}$, the population composition is constant : $Z_t = Z_0 = \{ I_1, \dots, I_{N_0}\}$. For each type of event $e$ and individual $I_k$, $k \in \{1,\dots N_0\}$, we denote by $N^{k,e}$ the counting process of intensity $\lambda_t^e (I_k,Z_t)$, counting the occurrences of the events of type $e$ happening to the individual $I_k$. Then, the first event $T_1$ is the first jump time of the multivariate counting vector $(N^{(k,e)})_{ (k,e) \in \mathcal{J}_0}$, with $\mathcal{J}_0 = \{1,\dots , N_0\}\times \big(\mathcal{E} \cup \mathcal{E}_W \big)$. Since the population composition is constant before the first event time, each counting process $N^{j}$ with $j=(k,e) \in \mathcal{J}_0$ coincides on $[0,T_1[$ with an inhomogeneous Poisson process, of intensity $\lambda_t^e (I_k,Z_0)$. Thus (conditionally to $Z_0$), $T_1$ is also the first jump time of an inhomogeneous multivariate Poisson process $N^0 = (N^{0,j})_{j \in \mathcal{J}_0}$ of intensity function $(\lambda_j)_{j\in \mathcal J_0}$, defined for all $j = (k,e) \in \mathcal{J}_0$ by: $$ \lambda_j(t) = \lambda^e_t(I_k,Z_0) \le \bar \lambda^e_0 \quad \text{with} \quad \bar \lambda^e_0 = \bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e N_0 \mathbf{1}_{e \in \mathcal{E}_W}, $$ by @cnj-intensity1 and @cnj-intensity2. In particular, the jump times of $N^0$ occur at the intensity $$ \Lambda(t) =\sum_{j \in \mathcal{J}_0} \lambda_j(t) =\sum_{e \in \mathcal{E} \cup \mathcal{E}_W} \sum_{k=1}^{N_0} \lambda^e_t(I_k,Z_0) \leq \bar \Lambda(N_0)=N_0 \sum_{e \in \mathcal{E} \cup \mathcal{E}_W} \bar \lambda^e_0. $$ By @prp-thinning-vector, $N^0$ can be obtained by thinning of the marked Poisson measure $\bar Q^0 = \sum_{\ell \geq 1} \delta_{(\bar T_\ell , (\bar{K}_\ell, \bar E_\ell), \bar \Theta_\ell)}$ on ${\mathbb{R}}^+\times\mathcal{J}_0 \times {\mathbb{R}}^+$, with: - $(\bar T_\ell)_{\ell \in {\mathbb{N}}^*}$ the jump times of a Poisson process of rate $\bar \Lambda(N_0)$. - $(\bar{K}_\ell, \bar E_\ell)_{\ell \in {\mathbb{N}}^*}$ discrete *i.i.d.* random variables on $\mathcal{J}_0 = \{1,\dots , N_0\}\times \big(\mathcal{E} \cup \mathcal{E}_W \big)$, with $K_\ell$ representing the index of the chosen individual and $E_\ell$ the event type for the proposed event, such that: $$ \mathbb{P}( \bar K_1 = k, \bar E_1 = e) = \frac{\bar \lambda^e_0 }{\bar \Lambda(N_0)} = \frac{1}{N_0} \frac{\bar \lambda^e_0 N_0}{\bar \Lambda(N_0)}, $$ i.e. $(\bar K_1, \bar E_1)$ are distributed as independent random variables where $\bar K_1 \sim \mathcal U(\{1,\dots, N_0\})$ and $\bar E_1$ such that $$ p_e := \mathbb{P}( \bar E_1 = e) = \frac{\bar \lambda^e_0 N_0}{\bar \Lambda(N_0)}. $$ - $(\bar \Theta_\ell)_{\ell \in {\mathbb{N}}^*}$ are independent uniform random variables, with $\bar \Theta_\ell \sim \mathcal{U}([0,\bar \lambda^{\bar E_\ell}]).$ Since the first event is the first jump of $N^0$, by @prp-thinning-vector and @alg-thinning2, the first event time $T_1$ is the first jump time $\bar T_\ell$ of $\bar Q^0$ such that $\bar \Theta_\ell \leq \lambda^{\bar E_\ell}_{\bar T_\ell}(I_{\bar K_\ell}, Z_0)$. At $T_1 =\bar T_{\ell}$, the event $\bar E_\ell$ occurs to the individual $I_{\bar K_\ell} = (\tau^b, \infty, x)$. For instance, if $\bar E_\ell =d$, a death/exit event occurs, so that $Z_{T_1} = Z_{0} + \delta_{(\tau^b, T_1, x)} - \delta_{I_{\bar K_\ell}}$ and $N_{T_1} = N_{0}$. If $\bar E_\ell =b$ or $en$, a birth or entry event occurs, so that $N_{T_1} = N_{0} + 1$, and a new individual $I_{N_0+1}$ is added to the population, chosen as described in @tbl-event-action. Finally, if $\bar E_\ell=s$, a swap event occurs, the population size stays constant and $I_{\bar K_\ell}$ is replaced by an individual $I_{\bar K_\ell}'$, chosen as described in @tbl-event-action. The steps for simulating the first event in the population can be iterated in order to simulate the population. At the $k$th step, the same procedure is repeated to simulate the $k$th event, starting from a population $Z_{T_{k-1}}$ of size $N_{T_{k-1}}$. The algorithm is summarized in @alg-interaction. ```pseudocode #| label: alg-interaction #| html-indent-size: "1.2em" #| html-comment-delimiter: "//" #| html-line-number: true #| html-line-number-punc: ":" #| html-no-end: false #| pdf-placement: "htb!" #| pdf-line-number: true \begin{algorithm} \caption{IBM simulation algorithm (without events of Poissonian intensity)} \begin{algorithmic} \State \textbf{Input:} Initial population $Z_0$, horizon $T > 0$, and events described by: \State - Intensity functions and bounds $(\lambda^e, \bar \lambda^e)$ for $e \in \mathcal{E}$ and $(W^e, \bar W^e)$ for $e \in \mathcal{E}_W$ \State - Event action functions $\phi^e(t, I)$ for $e \in \mathcal{E} \cup \mathcal{E}_W$ \State \textbf{Output:} Population $Z_T$ \State Initialization $T_0 \longleftarrow 0$, $\bar T_0 \longleftarrow 0$ \While{$T_k < T$} \Repeat \State increment iterative variable $\ell \longleftarrow \ell+1$ \State compute next proposed time $\bar T_{\ell} \longleftarrow \bar T_{\ell-1} + \mathcal{E}\big(\bar \Lambda(N_{T_k}) \big)$ \State draw a proposed event $\bar E_\ell \sim \mathcal{U}\{p_e\}$ with $p_e = \frac{\bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e N_{T_k} \mathbf{1}_{e \in \mathcal{E}_W}}{\sum_{e \in \mathcal{E}} \bar \lambda^e + \sum_{e \in \mathcal{E}_W} \bar W^e N_{T_k}}$ \State draw an individual index $\bar K_\ell \sim \mathcal{U}(\{1,\dots,N_{T_k}\})$ \State draw $\bar \Theta_\ell \sim \mathcal{U}([0, \bar \lambda^{\bar E_\ell}])$ if $\bar E_\ell \in \mathcal{E}$ or $\bar \Theta_\ell \sim \mathcal{U}([0, \bar W^{\bar E_\ell} N_{T_k}])$ if $\bar E_\ell \in \mathcal{E}_W$ \Until{accepted event $\bar \Theta_\ell \le \lambda^{\bar E_\ell}_{\bar T_\ell}(I_{\bar K_\ell},Z_{T_{k}})$} \State increment iterative variable $k \longleftarrow k+1$ \State record $(T_k, E_k, I_k) \longleftarrow (\bar T_{\ell}, \bar E_\ell, I_{\bar K_\ell})$ as accepted time, event and individual \State update the population $Z_{T_k} = Z_{T_{k-1}} + \phi^{E_k}(T_k, I_k)$ \EndWhile \end{algorithmic} \end{algorithm} ``` ::: {#thm-interaction} A population process $(Z_t)_{t \in [0,T]}$ simulated by the @alg-interaction is an exact solution of the SDE @eq-sde-eq-Z. ::: The proof of @thm-interaction is detailed in the Appendix @sec-prf-thm-interaction. ::: {#rem-remvoval-dead} The population $Z_{T_k}$ includes dead/exited individuals before the event time $T_k$. Thus, $N_{T_k} > N_{T_k}^a$ is greater than the number of alive individuals at time $T_k$. When a dead individual $I_{\bar K_l}$ is drawn from the population during the rejection/acceptance phase of the algorithm, the proposed event $(\bar T_{\ell}, \bar E_\ell, I_{\bar K_\ell})$ is automatically rejected since the event intensity is $\lambda^{\bar E_\ell}_{T_\ell}(I_{\bar K_\ell}, Z_{T_k}) = 0$ (nothing can happen to a dead individual). This can slow down the algorithm, especially when the proportion of dead/exited individuals in the population increases. However, the computational cost of keeping dead/exited individuals in the population is much lower than the cost of removing an individual from the population at each death/exit event, which is linear in the population size. Actually, dead/exited individuals are regularly removed from the population in the `IBMPopSim` algorithm, in order to optimize the trade-off between having to many dead individuals and removing dead individuals from the population too often. The frequency at which dead individuals are "removed from the population" can be chosen by the user, as an optional argument of the main function `popsim` (see details in @sec-simulation). ::: ::: {#rem-sharp-bounds} In practice, the bounds $\bar \lambda^e$ and $\bar W^e$ should be chosen as sharp as possible. It is easy to see that conditionally to $\{\bar E_\ell = e, \bar T_\ell = t, \bar K_\ell = l \}$ the probability of accepting the event is, depending if there are interactions, $$ \mathbb{P}\big(\bar \Theta_\ell \le \lambda^e_t(I_l,Z_{T_k})| \mathcal{F}_{T_k}\big) = \frac{\lambda^e(t, I_l)}{\bar \lambda^e} \mathbf{1}_{e \in \mathcal{E}} + \frac{\sum_{j=1}^{N_{T_k}} W^e(t, I_l, I_j)}{\bar W^e N_{T_k}} \mathbf{1}_{e \in \mathcal{E}_W}. $$ The sharper the bounds $\bar \lambda^e$ and $\bar W^e$ are, the higher is the acceptance rate. For even sharper bounds, an alternative is to define bounds $\bar \lambda^e(I_l)$ and $\bar W^e(I_l)$ depending on the individuals' characteristics. However, the algorithm is modified and the individual $I_l$ is not chosen uniformly in the population anymore. Due to the population size, this is way more costly than choosing uniform bounds, as explained in @rem-alternate-thinning. ::: ## Simulation algorithm with randomization {#sec-simulation-randomized} Let $e \in \cal E_W$ be an event with interactions. In order to evaluate the individual intensity $\lambda^e_t (I,Z_t) = \sum_{j=1}^{N_t} W^e(t, I,I_j)$ one must compute $W^e(t, I_l, I_j)$ for all individuals in the population. This step can be computationally costly, especially for large populations. One way to avoid this summation is to use randomization (see also @FouMel04 in a model without age). The randomization consists in replacing the summation by an evaluation of the interaction function $W^e$ using an individual $J$ drawn uniformly from the population. More precisely, if $J \sim \mathcal{U}(\{1, \dots, N_{T_k}\})$ is independent of $\bar \Theta_\ell$, we have $$ \mathbb{P}\Big(\bar \Theta_\ell \le \sum_{j=1}^{N_{T_k}} W^e(t, I_l, I_j) | \mathcal{F}_{T_k} \Big) = \mathbb{P}\big(\bar \Theta_\ell \le N_{T_k} W^e(t, I_l, I_J) | \mathcal{F}_{T_k}\big). $$ {#eq-randomized-event} Equivalently, we can write this probability as $\mathbb{P}\big(\tilde \Theta_\ell \le W^e(t, I_l, I_J) \big)$ where $\tilde \Theta_\ell = \frac{\bar \Theta_\ell}{N_{T_k}}\sim \mathcal{U}([0, \bar W^e])$ is independent of $J \sim \mathcal{U}(\{1, \dots, N_{T_k}\})$. ::: {#rem} The efficiency of the randomization procedure increases with the population homogeneity. If the function $W^e$ varies little according to the individuals in the population, the randomization approach is very efficient in practice, especially when the population is large. ::: We now present the main @alg-randomized implemented in the `popsim` function of the `IBMPopSim` package in the case where events arrive with individual intensities, but also with interactions (using randomization) and Poisson intensities. In this general case, $\bar \Lambda(n)$ is defined by @eq-def-bar-Lambda. ```pseudocode #| label: alg-randomized #| html-indent-size: "1.2em" #| html-comment-delimiter: "//" #| html-line-number: true #| html-line-number-punc: ":" #| html-no-end: false #| pdf-placement: "htb!" #| pdf-line-number: true \begin{algorithm} \caption{Randomized IBM simulation algorithm} \begin{algorithmic} \State \textbf{Input:} Initial population $Z_0$, horizon $T > 0$, and events described by \State Intensity functions and bounds $(\lambda^e, \bar \lambda^e)$ for $e \in \mathcal{E}$, $(W^e, \bar W^e)$ for $e \in \mathcal{E}_W$ and $(\mu^e, \bar \mu^e)$ for $e \in \mathcal{P}$ \State Event action functions $\phi^e(t, I)$ for $e \in \mathcal{E} \cup \mathcal{E}_W \cup \mathcal{P}$ \State \textbf{Output:} Population $Z_T$ \State Initialization $T_0 \longleftarrow 0$, $\bar T_0 \longleftarrow 0$ \While{$T_k < T$} \Repeat \State increment iterative variable $\ell \longleftarrow \ell+1$ \State compute next proposed time $\bar T_{\ell} \longleftarrow \bar T_{\ell-1} + \mathcal{E}\big(\bar \Lambda(N_{T_k}) \big)$ \State draw an individual index $\bar K_\ell \sim \mathcal{U}(\{1,\dots,N_{T_k}\})$ \State draw a proposed event $\bar E_\ell \sim \mathcal{U}\{p_e\}$ with $p_e = \frac{\bar \mu^e \mathbf{1}_{e \in \mathcal{P}} + \bar \lambda^e N_{T_k}\mathbf{1}_{e \in \mathcal{E}} + \bar W^e (N_{T_k})^2 \mathbf{1}_{e \in \mathcal{E}_W}}{\bar \Lambda(N_{T_k})}$ \If{$\bar E_\ell \in \mathcal{E}$ (without interaction)} \State draw $\bar \Theta_\ell \sim \mathcal{U}\big([0, \bar \lambda^{\bar E_\ell}] \big)$ \State \textit{accepted} $\longleftarrow \bar \Theta_\ell \le \lambda^{\bar E_\ell}(\bar T_\ell, I_{\bar K_\ell})$ \EndIf \If{$\bar E_\ell \in \mathcal{E}_W$ (with interaction)} \State draw $(\bar \Theta_\ell, J_\ell) \sim \mathcal{U}\big([0, \bar W^{\bar E_\ell}] \times \{1, \dots, N_{T_k}\} \big)$ \State \textit{accepted} $\longleftarrow \bar \Theta_\ell \le W^{\bar E_\ell}(\bar T_\ell, I_{\bar K_\ell}, I_{J_\ell})$ \EndIf \If{$\bar E_\ell \in \mathcal{P}$ (Poissonian intensity)} \State draw $\bar \Theta_\ell \sim \mathcal{U}\big([0, \bar \mu^{\bar E_\ell}] \big)$ \State \textit{accepted} $\longleftarrow \bar \Theta_\ell \le \mu^{\bar E_\ell}(\bar T_\ell)$ \EndIf \Until{accepted} \State increment iterative variable $k \longleftarrow k+1$ \State record $(T_k, E_k, I_k) \longleftarrow (\bar T_{\ell}, \bar E_\ell, \bar I_{\bar K_\ell})$ as accepted time, event and individual \State update the population $Z_{T_k} = Z_{T_{k-1}} + \phi^{E_k}(T_k, I_k)$ \EndWhile \end{algorithmic} \end{algorithm} ``` ::: {#prp-randomized} The population processes $(Z_t)_{t\in [0,T]}$ simulated by the @alg-interaction and @alg-randomized have the same law. ::: ::: {.proof} The only difference between @alg-interaction and @alg-randomized is in the acceptance/rejection step of proposed events, in the presence of interactions. In @alg-randomized , a proposed event $(\bar T_\ell, \bar E_\ell, \bar K_\ell)$, with $\bar E_l \in \mathcal{E}_W$ (an event with interaction), is accepted as a true event in the population if $$ \bar \Theta_\ell \le W^{\bar E_\ell}(\bar T_\ell, I_{\bar K_\ell}, I_{\bar J_\ell}), \text{ with } (\bar \Theta_\ell, \bar J_\ell) \sim \mathcal{U}\big([0, \bar W^{\bar E_\ell}] \times \{1, \dots, N_{T_k}\} \big). $$ By @eq-randomized-event, the probability of accepting this event is the same than in @alg-interaction, which achieves the proof. ::: # Model creation and simulation with IBMPopSim {#sec-package} The use of the `IBMPopSim` package is mainly done in two steps: a first model creation followed by the simulation of the population evolution. The creation of a model is itself based on two steps: the description of the population $Z_t$, as introduced in @sec-population, and the description of the events types, along with their associated intensities, as detailed in @sec-events and @sec-event-intensity. A model is compiled by calling the `mk_model` function, which internally uses a template mechanism to automatically generate the source code describing the model, which is subsequently compiled using the `Rcpp` package to produce the object code. After the compilation of the model, the simulations are launched by calling the `popsim` function. This function depends on the previously compiled model and simulates a random trajectory of the population evolution based on an initial population and on parameter values, which can change from a call to another. In this section, we take a closer look at each component of a model in `IBMPopSim`. We also refer to the [IBMPopSim website](https://daphnegiorgi.github.io/IBMPopSim/) and to the `vignettes` of the package for more details on the package and various examples of model creation. ## Population {#sec-package-population} A population $Z$ is represented by an object of class `population` containing a data frame where each row corresponds to an individual $I=(\tau^b, \tau^d, x)$, and which has at least two columns, `birth` and `death`, corresponding to the birth date $\tau^b$ and death/exit date $\tau^d$ ($\tau^d$ is set to `NA` for alive individuals). The data frame can contain more than two columns if individuals are described by additional characteristics $x= (x_1,\dots x_n)$. If entry events can occur in the population, the population will contain a characteristic named `entry`. This can be done by setting the flag `entry=TRUE` in the `population` function, or by calling the `add_characteristic` function on an existing population. During the simulation, the date at which an individual enters the population is automatically recorded in the variable `I.entry`. If exit events can occur, the population shall contain a characteristic named `out`. This can be done by setting the flag `out=TRUE` in the `population` function, or by calling the `add_characteristic` function. When an individual `I` exits the population during the simulation, `I.out` is set to `TRUE` and its exit time is recorded as a "death" date. In the example below, individuals are described by their birth and death dates, as well a Boolean characteristics called male, and the `entry` characteristic. For instance, the first individual is a female whose age at $t_0=0$ is $107$ and who was originally in the population. ```{R} pop_init <- population(EW_pop_14$sample, entry=TRUE) str(pop_init) ``` _Individual_ In the `C++` model which is automatically generated and compiled, an individual `I` is an object of an internal class containing some attributes (`birth_date`, `death_date` and the characteristics), and some methods including: - `I.age(t)`: a `const` method returning the age of an individual `I` at time `t`, - `I.set_age(a, t)`: a method to set the age `a` at time `t` of an individual `I` (set `birth_date` at `t-a`), - `I.is_dead(t)`: a `const` method returning `true` if the individual `I` is dead at time `t`. ::: {#rem-characteristics} A characteristic $x_i$ must be of atomic type: `logical`, `integer`, `double` or `character`. The function `get_characteristic` allows to easily get characteristics names and their types from a population data frame. We draw the attention to the fact that some names for characteristics are forbidden, or reserved to specific cases : this is the case for `birth`, `death`, `entry`, `out`, `id`. ::: ## Events {#sec-package-events} The most important step of the model creation is the events creation. The call to the function creating an event is of form ```{R} #| eval: false mk_event_CLASS(type="TYPE", name="NAME", ...) ``` where `CLASS` is replaced by the class of the event intensity, described in @sec-event-intensity , and `type` corresponds to the event type, described in @sec-events. @tbl-intensity-classes and @tbl-event-types summarize the different possible choices for intensity classes and types of event. The optional argument `name` gives a name to the event. If not specified, the name of the event is its type, for instance `death`. However, a name must be specified if the model is composed of several events with the same type (for instance when there are multiple death events corresponding to different causes of death). The other arguments depend on the intensity class and on the event type. ::: {#tbl-panel layout-ncol=2} Intensity class Set `CLASS` ----------------------- ----------------- ------------------------- Individual $\mathcal{E}$ `individual` Interaction $\mathcal{E}_W$ `interaction` Poisson $\mathcal{P}$ `poisson` Inhomogeneous Poisson $\mathcal{P}$ `inhomogeneous_poisson` : Intensity Classes {#tbl-intensity-classes} Event type `TYPE` ------------ --------- Birth `birth` Death `death` Entry `entry` Exit `exit` Swap `swap` : Event Types {#tbl-event-types} Choices of `CLASS` and `TYPE` arguments for an event creation. ::: The intensity function and the kernel of an event are defined through arguments of the function `mk_event_CLASS`. These arguments are strings composed of few lines of code. Since the model is compiled using `Rcpp`, the code should be written in `C++`. However, thanks to the functions/variables of the package, even the non-experienced `C++` user can define a model quite easily. To facilitate the implementation, the user can also define a list of **model parameters**, which can be used in the event and intensity definitions. These parameters are stored in a named list and can be of various types: atomic type, numeric vector or matrix, predefined function of one variable (`stepfun`, `linfun`, `gompertz`, `weibull`, `piecewise_x`), piecewise functions of two variables (`piecewise_xy`). We refer to the `vignette(IBMPopSim_cpp)` for more details on parameters types and basic `C++` tools. Another advantage of the model parameters is that their value can be modified from a simulation to another without changing the model. ### Intensities In `IBMPopSim`, the intensity of an event can belong to three classes @sec-event-intensity: individual intensities without interaction between individuals, corresponding to events $e\in\mathcal{E}$, individual intensities with interaction, corresponding to events $e\in\mathcal{E}_W$, and Poisson intensities (homogeneous and inhomogeneous), corresponding to events $e\in\mathcal{P}$. _Event creation with individual intensity_ An event $e\in \mathcal{E}$ (see @eq-intensity-no-interaction) has an intensity of the form $\lambda^e(t, I)$ which depends only on the individual `I` and time. Events with such intensity are created using the function ```{R} #| eval: false mk_event_individual(type = "TYPE", name = "NAME", intensity_code = "INTENSITY", ...) ``` The `intensity_code` argument is a character string containing few lines of `C++` code describing the intensity function $\lambda^e(t, I)$. The intensity value has to be stored in a variable called `result` and the available variables for the intensity code are given in @tbl-intensity-variables. ::: {#tbl-intensity-variables} Variable Description ------------------ ------------------------------------------------------------- Variable Description `I` Current individual `J` Another individual in the population (only for interaction) `t` Current time Model parameters Depends on the model `C++` variables available for intensity code ::: For instance, the intensity code below corresponds to an individual death intensity $\lambda^d(t,I)$ equal to $d_1(a(I,t)) = \alpha_1 \exp(\beta_1 a(I,t))$ for males and $d_2(a(I,t)) = \alpha_2 \exp(\beta_2 a(I,t))$ for females, where $a(I,t)=t-\tau^b$ is the age of the individual $I=(\tau^b, \tau^d,x)$ at time $t$. In this case, the intensity function depends on the individuals' age, gender, and on the model parameters $\alpha = (\alpha_1, \alpha_2)$ and $\beta = (\beta_1, \beta_2)$. ```{R} #| eval: false death_intensity <- " if (I.male) result = alpha_1 * exp(beta_1 * I.age(t)); else result = alpha_2 * exp(beta_2 * I.age(t)); " ``` _Event creation with interaction intensity_ An event $e\in \mathcal{E}_W$ is an event which occurs to an individual at a frequency which is the result of interactions with other members of the population (see @eq-intensity-interaction), and which can be written as $\lambda^e_t(I, Z_t)=\sum_{J\in Z_t} W^e(t, I, J)$ where $W^e(t, I, J)$ is the intensity of the interaction between individual $I$ and individual $J$. An event $e\in \mathcal{E}_W$ with such intensity is created by calling the function ```{R} #| eval: false mk_event_interaction(type = "TYPE", name = "NAME", interaction_code = "INTERACTION_CODE", interaction_type = "random", ...) ``` The `interaction_code` argument contains few lines of `C++` code describing the interaction function $W^e(t, I, J)$. The interaction function value has to be stored in a variable called `result` and the available variables for the intensity code are given in @tbl-intensity-variables. For example, if we set ```{R} #| eval: false death_interaction_code <- "result = max(J.size - I.size, 0.);" ``` the death intensity of an individual `I` is the result of the competition between individuals, depending on a characteristic named `size`, as defined in @eq-ex-interaction. The argument `interaction_type`, set by default at `random`, is the algorithm choice for simulating the model. When `interaction_type=full`, the simulation follows @alg-interaction, `interaction_type=random` it follows @alg-randomized. In most cases, the `random` algorithm is much faster than the `full` algorithm. For instance in the example of @sec-example-interaction the `random` algorithm is 40 times faster on average than the `full` algorithm, on a set of standard parameters. This allows in particular to explore larger parameter sets and population sizes, while avoiding the explosion of computation time. _Events creation with Poisson and Inhomogeneous Poisson intensity_ For events $e\in \mathcal{P}\,$ with an intensity $\mu^e(t)$ which does not depend on the population, the event intensity is of class `inhomogeneous_poisson` or `poisson` depending on whether or not the intensity depends on time (in the second case the intensity is constant). For Poisson (constant) intensities the events are created with the function ```{r} #| eval: false mk_event_poisson(type = "TYPE", name = "NAME", intensity = "CONSTANT", ...) ``` The following example creates a death event, where individuals die at a constant intensity `lambda` (which has to be in the list of model parameters): ```{r} #| eval: false mk_event_poisson(type = "death, intensity = "lambda") ``` When the intensity $(\mu^e(t))$ depends on time, the event can be created similarly by using the function ```{r} #| eval: false mk_event_inhomogeneous_poisson(type = "TYPE", name = "NAME", intensity = "INTENSITY", ...) ``` ### Event kernel code {#sec-event-kernel-code} When an event occurs, the events kernels $k^e$ specify how the event modifies the population. The events kernels are defined in the `kernel_code` parameter of the `mk_event_CLASS(type = "TYPE", name ="NAME", ...)` function. The `kernel_code` is `NULL` by default and doesn't have to be specified for death, exit events and birth events, but mandatory for entry and swap events. Recall that the `kernel_code` argument is a string composed of a few lines of `C++` code, characterizing the individual characteristics following the event. @tbl-events-variables summarizes the list of available variables that can be used in the `kernel_code`. - **Death/Exit event** If the user defines a death event, the death date of the current individual `I` is set automatically to the current time `t`. Similarly, when an individual `I` exits the population, `I.out` is set automatically to `TRUE` and his exit time is recorded as a *death* date. For these events types, the `kernel_code` doesn't have to be specified by the user. - **Birth event** The default generated event kernel is that an individual `I` gives birth to a new individual `newI` of age 0 at the current time `t`, with same characteristics than the parent `I`. If no kernel is specified, the default generated `C++` code for a birth event is: ```{r} #| eval: false individual newI = I; newI.birth_date = t; pop.add(newI); ``` The user can modify the birth kernel, by specifying the argument `kernel_code` of `mk_event_CLASS`. In this case, the generated code is ```{r} #| eval: false individual newI = I; newI.birth_date = t; _KERNEL_CODE_ pop.add(newI); ``` where `_KERNEL_CODE_` is replaced by the content of the `kernel_code` argument. - **Entry event** When an individual `I` enters the population, `I.entry` is set automatically as the date at which the individual enters the population. When an entry occurs the individual entering the population is not of age $0$. In this case, the user must specify the `kernel_code` argument indicating how the age and characteristics of the new individual are chosen. For instance, the code below creates an event of type `entry`, named `ev_example`, where individuals enter the population at a Poisson constant intensity. When an individual `newI` enters the population at time `t`, its age is chosen as a normally distributed random variable, with mean 20 and variance 4. ```{r} #| eval: false mk_event_poisson( type = "entry", name = "ev_example", intensity = "lambda", kernel_code = " double a_I = max(CNorm(20, 2), 0.); newI.set_age(a_I, t); ") ``` - **Swap event** The user must specify the `kernel_code` argument indicating how the characteristics of an individual are modified following a swap. ::: {#tbl-events-variables} Variable Description ------------------ ------------------------------------------------------------- Variable Description `I` Current individual `J` Another individual in the population (only for interaction) `t` Current time `pop` Current population (vector) `newI` Available only for birth and entry events. Model parameters Depends on the model `C++` variables available for events kernel code ::: ::: {#rem} When there are several events of the same type, the user can identify which events generated a particular event by adding a characteristic to the population recording the event name/id when it occurs. See e.g. `vignette(IBMPopSim_human_pop)` for an example with different causes of death. ::: ## Model creation {#sec-model-creation} Once the population, the events, and model parameters are defined, the IBM model is created using the function `mk_model`. ```{r} #| eval: false model <- mk_model(characteristics = get_characteristics(pop_init), event = events_list, parameters = model_params) ``` During this step which can take a few seconds, the model is created and compiled using the `Rcpp` package. The model structure in `IBMPopSim` is that the model depends only on the population characteristics' and parameters names and types, rather than their values. This means that once the model has been created, various simulations can be done with different initial populations and different parameters values. ::: {#exm-model-creation} Here is an example of model with a population structured by age and gender, with birth and death events. The death intensity of an individual of age $a$ is $d(a) = \alpha \exp(\beta a),$ and females between 15 and 40 can give birth with birth intensity $b(a) = \bar \lambda^b \mathbf{1}_{[15,40]}.$ The newborn is a male with probability $p_{male}$. ```{r} #| eval: false params <- list("p_male"= 0.51, "birth_rate" = stepfun(c(15,40),c(0,0.05,0)), "death_rate" = gompertz(0.008,0.02)) death_event <- mk_event_individual(type = "death", name= "my_death_event", intensity_code = "result = death_rate(age(I,t));") birth_event <- mk_event_individual( type = "birth", intensity_code = "if (I.male) result = 0; else result=birth_rate(age(I,t));", kernel_code = "newI.male = CUnif(0, 1) < p_male;") pop <- population(EW_pop_14$sample) model <- mk_model(characteristics = get_characteristics(pop), events = list(death_event,birth_event), parameters = params) ``` ::: ## Simulation {#sec-package-simulation} The simulation of the IBM is based on the algorithms presented in @sec-simulation-algo and @sec-simulation-randomized. The user has first to specify bounds for the intensity or interaction functions of each event type. The random evolution of the population can then be simulated over a period of time $[0,T]$ by calling the function `popsim`. _Events bounds_ Since the IBM simulation algorithm is based on an acceptance-rejection method for simulating random times, the user has to specify bounds for the intensity (or interaction) functions of each event (see @cnj-intensity1 and @cnj-intensity2). These bounds should be stored in a named vector, where for event $e$, the name corresponding to the event bound $\bar{\mu}^e$, $\bar{\lambda}^e$ or $\bar{W}^e$ is the event `name` defined during the event creation step. In @exm-model-creation from previous section the intensity bound for birth events is $\bar\lambda_b$. Since the death intensity function is not bounded, the user will have to specify a maximum age $a_{\max}$ in `popsim` (all individuals above $a_{\max}$ die automatically). Then, the bound for death events is $\bar \lambda_d = \alpha\exp(\beta a_{\max}).$ In the example, the death event has been named `my_death_event`. No name has been specified for the birth event which thus has the default name `birth`. Then, ```{r} #| eval: false a_max <- 120 # maximum age events_bounds <- c("my_death_event" = params$death_rate(a_max), "birth" = max(params$birth_rate)) ``` Once the model and events bounds have been defined, a random trajectory of the population can be simulated by calling ```{r} #| eval: false sim_out <- popsim(model, pop, events_bounds, params, age_max = a_max, time = 30) ``` _Optional parameters_ If there are no events with intensity of class `interaction`, then the simulation can be parallelized easily by setting the optional parameter `multithreading` (`FALSE` by default) to `TRUE`. By default, the number of threads is the number of concurrent threads supported by the available hardware implementation. The number of threads can be set manually with the optional argument `num_threads`. By default, when the proportion of dead individuals in the population exceeds $10\%$, dead individuals are removed from the current population used in the algorithm (see @rem-remvoval-dead). The user can modify this ratio using the optional argument `clean_ratio`, or by removing dead individuals from the population with a certain frequency, given by the `clean_step` argument. Finally, the user can also define the seed of the random number generator stored in the argument `seed`. _Outputs and treatment of swap events_ The output of the `popsim` function is a list containing three elements: a data frame `population` containing the output population $Z_T$ (or a list of populations $(Z_{t_1}, \dots Z_{t_n})$ if `time` is a vector of times), a numeric vector `logs` of variables related to the simulation algorithm (including the simulation time and number of proposed/accepted events), and the list `arguments` of the simulation inputs, including the initial population, parameters and event bounds used for the simulation. When there are no swap events (individuals don't change of characteristics), the evolution of the population over the period $[0,T]$ is recorded in a single data frame `sim_out$population` where each line contains the information of an individual who lived in the population over the period $[0,T]$ (see @rem-popfinale). When there are swap events (individuals can change of characteristics), recording the dates of swap events and changes of characteristics following each swap event and for each individual in the population is a memory intensive and computationally costly process. To maintain efficient simulations in the presence of swap events, the argument `time` of `popsim` can be defined as a vector of dates $(t_0,\dots, t_n)$. In this case, `popsim` returns in the object `population` a list of $n$ populations representing the population at time $t_1,\dots t_n$, simulated from the initial time $t_0$. For $i=1\dots n$, the $i$th data frame is the population $Z_{t_i}$, i.e. individuals who lived in the population during the period $[t_0,t_i]$, with their characteristics at time $t_i$. It is also possible to isolate the individuals' life course, by adding an `id` column to the population, which can be done by setting `id=TRUE` in the population construction, or by calling the `add_characteristic` function to an existing population, in order to identify each individual with a unique integer. Base functions to study the simulation outputs are provided in the package. For instance, the population age pyramid can computed at a given time, as well as death and exposure tables. Several illustrations of the outputs functions are given in the example @sec-insurance-portfolio and @sec-example-interaction. # Insurance portfolio {#sec-insurance-portfolio} This section provides an example of how to use the `IBMPopSim` package to simulate a heterogeneous life insurance portfolio (see also [`vignette(IBMPopSim_insurance_portfolio)`](https://daphnegiorgi.github.io/IBMPopSim/articles/IBMPopSim_insurance_portfolio.html)). We consider an insurance portfolio consisting of male policyholders, of age greater than 65. These policyholders are characterized by their age, assumed to be less than $a_{\max} = 110$, and risk class $x \in \mathcal X = \{1, 2\}$. **Entries in the portfolio** New policyholders enter the population at a constant Poisson rate $\mu^{en}=\lambda$, which means that on average, $\lambda$ individuals enter the portfolio each year. A new individual enters the population at an age a that is uniformly distributed between 65 and 70, and is in risk class 1 with probability $p$. **Death events** A baseline age and time specific death rate is first calibrated on "England and Wales (EW)" males mortality historic data (source: Human Mortality Database <https://www.mortality.org/>), and projected for 30 years using the Lee-Carter model with the package `StMoMo` (see @stmomo). The forecasted baseline death intensity is denoted by $d(t,a)$, defined by: $$ d(t,a) = \sum_{k=0}^{29}\mathbf{1}_{\{k\leq t < k+1\}} d_k(a), \quad \forall \; t\in [0,30] \text{ and } a \in [65, a_{\max}], $$ {#eq-insurance-baseline} with $d_k(a)$ the point estimate of the forecasted mortality rate for age $a$ and year $k$.\ Individuals in risk class 1 are assumed to have mortality rates that are 20% higher than the baseline mortality (for instance, the risk class could refer to smokers), while individuals in risk class 2 are assumed to have mortality rates that are 20% lower than the baseline (non smokers). The death intensity of an individual $I= (\tau_b, \infty, x)$, of age $a(I,t) = t - \tau_b$ at time $t$ and in risk class $x \in \{1, 2\}$ is thus the function $$ \lambda^d(t,I) = \alpha_x d(t,a(I,t)), \quad \alpha_1 = 1.2, \quad \alpha_2 = 0.8. $$ In particular, the death intensity verifies @cnj-intensity2 since: $$ \lambda^d(t,I) \leq \bar d : = \alpha_1 \sup_{t \in [0,30]} d(t,a_{\max}). $$ {#eq-insurance-bound-deathrates} **Exits from the portfolio** Individuals exit the portfolio at a constant (individual) rate $\lambda^{ex}(t,I) = \mu^{i}$ only depending on their risk class $i\in \{1,2\}$. ## Population {#insurance-population} We start with an initial population of $30\,000$ males of age 65, distributed uniformly in each risk class. The population data frame has thus the two (mandatory) columns `birth` (here the initial time is $t_0=0$) and `death` (`NA` if alive), and an additional column `risk_cls` corresponding to the policyholders risk class. Since there are entry and exit events, the `entry` and `out` flags of the population constructor are set to `TRUE`. ```{r} N <- 30000 pop_df <- data.frame("birth" = rep(-65,N), "death" = rep(NA,N), "risk_cls" = rep(1:2,each=N/2)) pop_init <- population(pop_df, entry=TRUE, out=TRUE) ``` ## Events {#insurance-events} **Entry events** The age of the new individual is determined by the `kernel_code` argument in the `mk_event_poisson` function. ```{r} entry_params <- list("lambda" = 30000, "p" = 0.5) entry_event <- mk_event_poisson( type = "entry", intensity = "lambda", kernel_code = "if (CUnif() < p) newI.risk_cls =1; else newI.risk_cls= 2; double a = CUnif(65, 70); newI.set_age(a, t);") ``` Note that the variables `newI` and `t`, as well as the function `CUnif()`, are implicitly defined and usable in the `kernel_code`. The field `risk_cls` comes from the names of characteristics of individuals in the population. The names `lambda` and `p` are parameter names that will be specified in the `R` named list `params`. Here we use a constant $\lambda$ as the event intensity, but we could also use a rate $\lambda(t)$ that depends on time, using the function `mk_event_poisson_inhomogeneous`. **Death and exit events** The baseline death intensity defined in @eq-insurance-baseline and obtained with the package `StMoMo` is stored in the variable `death_male`. ```{r, results='hide'} # StMoMo death rates library('StMoMo') library('reshape2') EWStMoMoMale <- StMoMoData(EWdata_hmd, series = "male") LC <- lc() ages.fit <- 65:100 years.fit <- 1950:2016 LCfitMale <- fit(LC, data = EWStMoMoMale, ages.fit = ages.fit, years.fit = years.fit) t <- 30 LCforecastMale <- forecast(LCfitMale, h = t) d_k <- apply(LCforecastMale$rates, 2, function(x) stepfun(66:100, x)) breaks <- 1:29 death_male <- piecewise_xy(breaks,d_k) ``` The death and exit intensities are of class `individual` (see @tbl-intensity-classes). Hence, the death and exit events are created with the `mk_event_individual` function. ```{r} death_params <- list("death_male" = death_male, "alpha" = c(1.2, 0.8)) death_event <- mk_event_individual( type = "death", intensity_code = "result = alpha[I.risk_cls-1] * death_male(t, I.age(t));") ``` ```{r} exit_params = list("mu" = c(0.001, 0.06)) exit_event <- mk_event_individual( type = "exit", intensity_code = "result = mu[I.risk_cls-1]; ") ``` ## Model creation and simulation {#insurance-simulation} The model is created from all the previously defined building blocks, by calling the `mk_model`. ```{r} model <- mk_model( characteristics = get_characteristics(pop_init), events = list(entry_event, death_event, exit_event), parameters = c(entry_params, death_params, exit_params)) ``` Once the model is compiled, it can be used with different parameters and run simulations for various scenarios. Similarly, the initial population (here `pop_df`) can be modified without rerunning the `mk_model` function. The bounds for entry events is simply the intensity $\lambda$. For death events, the bound is given by $\bar{d}$ defined in @eq-insurance-bound-deathrates, which is stored in the `death_max` variable. ```{r} death_max <- max(sapply(d_k, function(x) { max(x) })) bounds <- c("entry" = entry_params$lambda, "death" = death_max, "exit" = max(exit_params$mu)) sim_out <- popsim( model = model, initial_population = pop_init, events_bounds = bounds, parameters = c(entry_params, death_params, exit_params), time = 30, age_max = 110, multithreading = TRUE) ``` ## Outputs The data frame `sim_out$population` consists of all individuals present in the portfolio during the period of $[0, 30]$, including the individuals in the initial population and those who entered the portfolio. Each row represents an individual, with their date of birth, date of death (`NA` if still alive at the end of the simulation), risk class, and characteristics `entry` and `out`. Recall that if an individual enters the population at time $t$, his `entry` characteristic is automatically set up to be equal to $t$. The characteristics `out` is set to `TRUE` for individuals who left the portfolio due to an exit event. In this example, the simulation time over 30 years, starting from an initial population of 30 000 individuals is very fast (see below), for an acceptance rate of proposed event of approximately 25%. At the end of the simulation, the number of alive individuals is approximately 430 000. ```{r, echo=FALSE} print(paste("Number of alive individuals in the population at final time T=30 : " ,dim(population_alive(sim_out$population,t = 30))[1], sep="")) print(paste("Execution time : ", format(sim_out$logs[["duration_ns"]]/1e9,digits = 2), "s", sep="")) print(paste("Proportion of effective events and proposed events : ", format(sim_out$logs[["effective_events"]] / sim_out$logs[["proposed_events"]], digits=2), sep="")) ``` Initially in the portfolio (at $t=0$), there is the same number of 65 years old policyholders in each risk class. However, policyholders in the risk class 2 with lower mortality rates leave the portfolio at higher rate than policyholders in the risk class 1 : $\mu^2 > \mu^1$. Therefore, the heterogeneous portfolio composition changes with time, including more and more individuals in risk class 1 with higher mortality rates, but with variations across age classes. To illustrate the composition of the total population at the end of the simulation ($t=30$), we present in @fig-insur-a the age pyramid of the final composition of the portfolio obtained with the `age_pyramid` and `plot` functions of the `pyramid` class. ```{r, echo=FALSE, fig.width=8} #| label: fig-insur-a #| fig-cap: "Portfolio age pyramid at t = 30 for individuals in risk class 1 (blue) and 2 (red)." #| fig-pos: 'H' # Age pyramid (Figure 2(a)) age_grp <- 65:98 pyr = age_pyramid(sim_out$population, time = 30, ages=age_grp) colnames(pyr)[2]<- "group_name" pyr$group_name <- as.character(pyr$group_name) colors <- c("1"="#00AFBB","2"="#FC4E07") plot(pyr,colors,age_breaks = as.integer(seq(1,length(age_grp)-1,by=2))) ``` `IBMPopSim` also allows the fast computation of exact life tables from truncated and censored individual data (due to entry and exit events), using the functions `death_table` and `exposure_table`. These function are particularly efficient, since the computations are made using the `Rccp` library. ```{r} age_grp <- 65:95 Dx_pop <- death_table(sim_out$population, ages = age_grp, period = 0:30) Ex_pop <- exposure_table(sim_out$population, ages = age_grp, period = 0:30) mx_pop <- Dx_pop/Ex_pop ``` In @fig-insur-b, we illustrate the central death rates in the simulated portfolio at final time. Due to the mortality differential between risk class 1 and 2, one would expect to observe more individuals in risk class 2 at higher ages. However, due to exit events, a higher proportion of individuals in risk class 1 exit the portfolio over time, resulting in a greater proportion of individuals in risk class 1 at higher ages than what would be expected in the absence of exit events. Consequently, the mortality rates in the portfolio are more aligned with those of risk class 1 at higher ages. This is a simple example of how composition changes in the portfolio can impact aggregated mortality rates and potentially compensate or reduce an overall mortality reduction (see also [@KAAKAI201916]). ```{r, echo=FALSE, results=FALSE, fig.width=8} # Portofolio death rates Dx <- death_table(sim_out$population[sim_out$population$risk_cls==1,], ages = age_grp, period = 0:30) Ex <- exposure_table(sim_out$population[sim_out$population$risk_cls==1,], ages = age_grp, period = 0:30) LC <- lc() LCfitSim1 <- fit(LC, Dxt = Dx , Ext = Ex,ages=age_grp[-length(age_grp)]) time <- 30 age <- age_grp[-length(age_grp)] estimated <- LCfitSim1$ax + LCfitSim1$bx*LCfitSim1$kt[time] forecast_1 <- log(1.3*LCforecastMale$rates[1:(length(age_grp)-1),time]) forecast_2 <- log(0.8*LCforecastMale$rates[1:(length(age_grp)-1),time]) portfolio <- log((mx_pop)[,time]) df <- data.frame(age=age, portfolio=portfolio, forecast_1=forecast_1, forecast_1=forecast_2) colnames(df) <- c("Age", "Portfolio", "Risk class 1", "Risk class 2") df_melt <- melt(df, id="Age") ``` ```{r, echo=FALSE} #| label: fig-insur-b #| fig-cap: "Portfolio central death rates at t = 30 (black)." #| fig-pos: 'h' # Figure 2(b) ggplot(data=df_melt, aes(x=Age, y=value)) + geom_point(aes(color=variable, shape=variable)) + geom_line(aes(color=variable, linetype=variable)) + xlab("Age") + ylab("Log mortality rates") + scale_color_manual(values = c("Portfolio" = "black", "Risk class 1" = "blue", "Risk class 2" = "red")) + scale_shape_manual(values = c("Portfolio" = 1, "Risk class 1" = NA, "Risk class 2" = NA)) + scale_linetype_manual(values = c("Portfolio" = 0, "Risk class 1" = 2, "Risk class 2" = 1)) + theme(legend.title = element_blank(), legend.position = c(0.9,0.2), plot.title = element_text(hjust = 0.5)) ``` # Population with genetically variable traits {#sec-example-interaction} This section provides an example of how to use the `IBMPopSim` package to simulate an age-structured population with interactions, based on the model proposed in Example 1 of @FerTra09 (see also @meleard2009trait). In this model, individuals are characterized by their body size at birth $x_0 \in [0,4]$ and by their physical age $a \in [0,2]$. The body size of an individual $I=(\tau^b,\infty, x_0)$ at time $t$ is a linear function of its age $a(I,t) = t-\tau^b$: $$x(t)= x_0 + ga(I,t),$$ where $g$ is a constant growth rate assumed to be identical for all individuals. **Birth events** The birth intensity of each individual $I=(\tau^b, \infty, x_0)$ depends on a parameter $\alpha > 0$ and on its initial size, as given by the equation $$ \lambda^b(t,I) = \alpha (4 - x_0) \leq \bar \lambda^b = 4\alpha. $${#eq-interaction-birth-intensity} Thus, smaller individuals have a higher birth intensity. When a birth occurs, the new individual inherits the same birth size $x_0$ as its parent with high probability $1-p$, or a mutation can occur with probability $p$, resulting in a birth size given by $$ x_0' = \min(\max(0, x_0 + G), 4), $${#eq-interaction-birth-kernel} where $G$ is a Gaussian random variable with mean 0 and variance $\sigma^2$. **Death events** Due to competition between individuals, the death intensity of an individual depends on the size of other individuals in the population. Bigger individuals have a better chance of survival. If an individual $I= (\tau^b, \infty, x_0)$ of size $x(t)= x_0 +ga(I,t)$ encounters an individual $J= (\tau^{b}_J, \infty, x_0')$ of size $x'(t) = x_0'+ ga(J,t)$, then it can die with the intensity $$W(t, I,J) = U(x(t),x'(t)),$$ where the interaction function $U$ is defined by $$ U(x,y) = \beta \left(1- \frac{1}{1+ c\exp(-4(x-y))}\right) \leq \bar W = \beta. $${#eq-interaction-death-intensity} The death intensity of an individual $I$ at time $t$ and in a population $Z$ is the result of interactions with all individuals in the population, including itself, and is given by $$\lambda^d_t(I,Z) = \sum_{J = (\tau^b,\infty, x_0') \in Z} W (x_0 + g a(I,t), x_0' + g a(J,t)),$$ ## Population We use an initial population of 900 living individuals, all of whom have the same size and ages uniformly distributed between 0 and 2 years. ```{r} N <- 900 x0 <- 1.06 agemin <- 0. agemax <- 2. ``` ```{r} pop_df <- data.frame( "birth" = -runif(N, agemin, agemax), # Uniform age in [0,2] "death" = as.double(NA), # All individuals are alive "birth_size" = x0) # All individuals have the same initial birth size x0 pop_init <- population(pop_df) ``` ## Events ### Birth events The parameters involved in a birth event are the probability of mutation $p$, the variance of the Gaussian random variable and the coefficient $\alpha$ of the intensity. ```{r} params_birth <- list("p" = 0.03, "sigma" = sqrt(0.01), "alpha" = 1) ``` The birth intensity @eq-interaction-birth-intensity is of class `individual`. Hence, the event is created by calling the `mk_event_individual` function. The size of the new individual is given in the kernel following @eq-interaction-birth-kernel. ```{r} birth_event <- mk_event_individual( type = "birth", intensity_code = "result = alpha*(4 - I.birth_size);", kernel_code = "if (CUnif() < p) newI.birth_size = min(max(0.,CNorm(I.birth_size,sigma)),4.); else newI.birth_size = I.birth_size;") ``` ### Death events The death intensity @eq-interaction-death-intensity is of class `interaction`. Hence, the event is created by calling the `mk_event_interaction` function. The parameters used for this event are the growth rate $g$, the amplitude of the interaction function $\beta$, and the strength of competition $c$. ```{r} params_death <- list("g" = 1, "beta" = 2./300., "c" = 1.2) death_event <- mk_event_interaction( type = "death", interaction_code = "double x_I = I.birth_size + g * age(I,t); double x_J = J.birth_size + g * age(J,t); result = beta*(1.-1./(1.+c*exp(-4.*(x_I-x_J))));") ``` ## Model creation and simulation The model is created using the `mk_model` function. ```{r} model <- mk_model( characteristics = get_characteristics(pop_init), events = list(birth_event, death_event), parameters = c(params_birth, params_death)) ``` The simulation of one scenario can then be launched with the call of the `popsim` function, after computing the events bounds $\bar \lambda^b=4 \alpha$ and $\bar W= \beta$. ```{r} sim_out <- popsim(model = model, initial_population = pop_init, events_bounds = c("birth" = 4 * params_birth$alpha, "death" = params_death$beta), parameters = c(params_birth, params_death), age_max = 2, time = 500) ``` Based on the results of a simulation, we can reproduce the numerical results of @FerTra09. In @fig-birth-size, we draw a line for each individual in the population to represent their birth size during their lifetime. ```{r, echo=FALSE, fig.width=8} #| label: fig-birth-size #| fig-cap: "Evolution of birth size" #| fig-pos: 'h' #| fig-format: png pop_out <- sim_out$population ggplot(pop_out) + geom_segment( aes(x=birth, xend=death, y=birth_size, yend=birth_size), na.rm=TRUE, colour="blue", alpha=0.2) + xlab("Time") + ylab("Birth size") ``` In this example, the randomized @alg-randomized allows for much faster computation times than the model implemented below with @alg-interaction ("full" algorithm): ```{r} death_event_full <- mk_event_interaction(type = "death", interaction_type= "full", interaction_code = "double x_I = I.birth_size + g * age(I,t); double x_J = J.birth_size + g * age(J,t); result = beta * ( 1.- 1./(1. + c * exp(-4. * (x_I-x_J))));" ) model_full <- mk_model(characteristics = get_characteristics(pop_init), events = list(birth_event, death_event_full), parameters = c(params_birth, params_death)) sim_out_full <- popsim(model = model_full, initial_population = pop_init, events_bounds =c("birth" = 4 * params_birth$alpha, "death" = params_death$beta), parameters = c(params_birth, params_death), age_max = 2, time = 500) ``` ```{r echo=FALSE} print(paste("The full algorithm is " ,format(sim_out_full$logs["duration_ns"]/sim_out$logs["duration_ns"], digits=2), " times slower than the randomized version", sep="")) ``` In @fig-fullvsrandom, the two algorithms are compared for different population sizes. We progressively decrease the value of the mortality rate parameter $\beta$ and increase the birth rate parameter $\alpha$. Starting with the values provided in @FerTra09, $\alpha=1$ and $\beta=2/300$, resulting in a stationary population size of approximately $N=360$ individuals for a sample of 50 simulations, we can easily increase the stationary population size to approximately $N=2600$ individuals with $\alpha=2$ and $\beta=1/300$.^[The choices $(\alpha, \beta) \in \{(1,2/300),(1, 1/300), (1.5, 1/300), (2, 1/300)\}$ lead to the stationary population sizes $N \in \{360, 900, 1800, 2600\}$. For each set of parameters, we generated a new initial population, which was used for a benchmark of 50 simulations with both randomized and full algorithm. The simulations run on a Intel Core i7-8550U CPU 1.80GHz × 8 processor, with 15.3 GiB of RAM, under Debian GNU/Linux 11.] In the log-scaled figure, we can observe the trend of computation time as a function of the population size $N$, which is linear for the randomized algorithm and quadratic for the full one ( @alg-interaction). We can also see that the randomized version of the algorithm is between 17 to 100 times faster than the full one in this example, taking only 2 seconds in average for the randomized version versus 211 seconds for @alg-interaction for the biggest population size ($N=2600$) and $T=500$. ![Full vs random algorithm computation time](img/time_pop_size.png){#fig-fullvsrandom width=80%} # Appendix ## Recall on Poisson random measures {#sec-recall-poisson } We recall below some useful properties of Poisson random measures, mainly following Chapter 6 of [@Cin11]. We also refer to [@Kal17] for a more comprehensive presentation of random counting measures. ::: {#def-poisson-random-measure} ## Poisson Random Measures Let $\mu$ be a $\sigma$-finite diffuse measure on a Borel subspace $(E,\mathcal{E})$ of $({\mathbb{R}}^d, \mathcal{B}({\mathbb{R}}^d))$. A random counting measure $Q= \sum_{k\geq 1} \delta_{X_k}$ is a Poisson (counting) random measure of *mean measure* $\mu$ if 1. $\forall A \in \mathcal{E}$, $Q(A)$ is a Poisson random variable with ${\mathbb{E}}[Q(A)]= \mu(A)$. 2. For all disjoints subsets $A_1, \dots, A_n \in \mathcal{E}$, $Q(A_1), \dots, Q(A_n)$ are independent Poisson random variables. ::: Let us briefly recall here some simple but useful operations on Poisson measures. In the following, $Q$ is a Poisson measure of mean measure $\mu$, unless stated otherwise. ::: {#prp-restriction-poisson-measure} ## Restricted Poisson measure If $B \in \mathcal{E}$, then, the restriction of $Q$ to $B$ defined by $$ Q^B = \mathbf{1}_B Q = \sum_{k \ge 1} \mathbf{1}_{B}(X_k) \delta_{X_k} $$ is also a Poisson random measure, of mean measure $\mu^B = \mu(\cdot \cap B)$. ::: ::: {#prp-proj-poisson-measure} ## Projection of Poisson measure If $E = F_1 \times F_2$ is a product space, then the projection $$ Q_1(\mathrm{d}x) = \int_{F_2} Q(\mathrm{d}x , \mathrm{d}y) $$ is a Poisson random measure of mean measure $\mu_1 (\mathrm{d}x ) = \int_{F_2} \mu(\mathrm{d}x, \mathrm{d}y)$. ::: ### Link with Poisson processes Let $Q= \sum_{k\geq 1} \delta_{T_k}$ a Poisson random measure on $E={\mathbb{R}}^+$ with mean measure $\mu(\mathrm{d}t) = \Lambda (t) \mathrm{d}t$ absolutely continuous with respect to the Lebesgue measure, $\mu(A) = \int_A \Lambda(t) \mathrm{d}t$. The counting process $(N_t)_{t \ge 0}$ defined by $$ N_t = Q([0,t]) = \sum_{k \geq 1} \mathbf{1}_{\{T_k \leq t\}}, \quad \forall \; t\geq 0, $$ {#eq-inhomogeneous-pp} is an inhomogeneous Poisson process with intensity function (or rate) $t \mapsto \Lambda(t)$. In particular, when $\Lambda(t) \equiv c$ is a constant, $N$ is a homogeneous Poisson process with rate $c$. Assuming that the atoms are ordered $T_1< T_2< \dots$, we recall that the sequence $(T_{k+1}-T_k)_{k\geq 1}$ is a sequence of *i.i.d.* exponential variables of parameter $c$. ### Marked Poisson measures on $E = {\mathbb{R}}^+ \times F$ We are interested in the particular case when $E$ is the product space ${\mathbb{R}}^+ \times F$, with $(F,\mathcal{F})$ a Borel subspace of ${\mathbb{R}}^d$. Then, a random counting measure is defined by a random set $S =\{ (T_k, \Theta_k ), k \geq 1\}$. The random variables $T_k\geq 0$ can be considered as time variables, and constitute the jump times of the random measure, while the variables $\Theta_k \in F$ represent space variables. We recall in this special case the Theorem VI.3.2 in [@Cin11]. ::: {#prp-marked-poisson} ## Marked Poisson measure Let $m$ be a $\sigma$--finite diffuse measure on ${\mathbb{R}}^+$, and $K$ a transition probability kernel from $({\mathbb{R}}^+,\mathcal{B}({\mathbb{R}}^+))$ into $(F, \mathcal{F})$. Assume that the collection $(T_k)_{k \ge 1}$ forms a Poisson process $(N_t) =(\sum_{k\geq 1} \mathbf{1}_{\{T_k \leq t\}})$ with mean $m(\mathrm{d}t) =\Lambda(t) \mathrm{d}t$, and that given $(T_k)_{k \ge 1}$, the variables $\Theta_k$ are conditionally independent and have the respective distributions $K(T_k, \cdot)$. 1. Then, $\{ (T_k, \Theta_k) ;\; k \ge 1\}$ forms a Poisson random measure $Q = \sum_{k\ge 1} \delta_{(T_k, \Theta_k)}$ on $({\mathbb{R}}^+ \times F, \mathcal{B}({\mathbb{R}}^+) \otimes \mathcal{F})$, called a *Marked point process* , with mean $\mu$ defined by $$ \mu(\mathrm{d}t, \mathrm{d}y) = \Lambda(t) \mathrm{d}t K(t, \mathrm{d}y). $$ 2. Reciprocally let $Q$ be a Poisson random measure of mean measure $\mu(\mathrm{d}t, \mathrm{d}y)$, admitting the following disintegration with respect to the first coordinate: $\mu(\mathrm{d}t , \mathrm{d}y) =\tilde \Lambda(t) \mathrm{d}t \nu(t, \mathrm{d}y)$, with $\nu(t, F)<\infty$. Let $K(t, \mathrm{d}y) = \dfrac{\nu(t,\mathrm{d}y) }{\nu(t, F) }$ and $\Lambda(t) = \nu(t, F)\tilde \Lambda(t)$. Then, $Q = \sum_{k\ge 1} \delta_{(T_k, \Theta_k)}$ is a marked Poisson measure with $(T_k,\Theta_k)_{k\in {\mathbb{N}}^*}$ defined as above. In particular, the projection $N= (N_t)_{t\geq0}$ of the Poisson measure on the first coordinate, $$ N_t = Q([0,t] \times F) = \sum_{k\geq 1} \mathbf{1}_{[0,t] \times F} (T_k, \Theta_k) = \sum_{k\geq 1} \mathbf{1}_{\{T_k \leq t\}}, \quad \forall \; t \geq 0, $$ is an inhomogeneous Poisson process of rate $\Lambda(t)= \nu(t, F)\tilde \Lambda(t)$. ::: ::: {#rem} When the transition probability kernel $K$ does not depend on the time: $K(t, A) = \nu(A)$ for some probability measure $\nu$, then the marks $(\Theta_k)_{k \ge 1}$ form an *i.i.d.* sequence with distribution $\nu$, independent of $(T_k)_{k \ge 1}$. ::: The preceding proposition thus yields a straightforward iterative simulation procedure for a Marked Poisson process on $[0,T]\times F$ with mean measure $\mu(\mathrm{d}t, \mathrm{d}y) = c \mathrm{d}t K(t, \mathrm{d}y)$ and $c>0$. The procedure is described in @alg-marked-poisson. ```pseudocode #| label: alg-marked-poisson #| html-indent-size: "1.2em" #| html-comment-delimiter: "//" #| html-line-number: true #| html-line-number-punc: ":" #| html-no-end: false #| pdf-placement: "htb!" #| pdf-line-number: true \begin{algorithm} \caption{Simulation of Marked Poisson measure} \begin{algorithmic} \State \textbf{Input:} Constant $c$, simulatable kernel $K$ and final time $T$ \State \textbf{Output:} Times $(T_1,\dots,T_n)$ and Marks $(Y_1, \dots, Y_n)$ of the Marked Poisson measure of mean $\mu(\mathrm{d} t, \mathrm{d} y) = c \mathrm{d} t K(t, \mathrm{d} y)$ in $[0,T]\times F$. \State Initialization draw $T_1 \sim \mathcal{E}(c)$ and draw $Y_1 \sim K(T_1, \mathrm{d} y)$ \While{condition} \State increment iterative variable $k \longleftarrow k+1$ \State compute next jump time $T_k \longleftarrow T_{k-1} + \mathcal{E}(c)$ \State draw a conditional mark $Y_k \sim K(T_k, \mathrm{d} y)$ \EndWhile \end{algorithmic} \end{algorithm} ``` ![Example of Marked Poisson measure on $[0,T]$ with $m(\mathrm{d} t) = L \mathrm{d} t$ (jump times occur at Poisson arrival times of rate $L$) and with $\nu(\mathrm{d} y) = \frac{1}{L} \mathbf{1}_{[0, L]}(y) \mathrm{d} y$ (marks are drawn uniformly on $[0,L]$). The mean measure is then $\mu(\mathrm{d} t, \mathrm{d} y) = \mathrm{d} t {1}_{[0,L]}(y) \mathrm{d} y$.](img/plot_poisson.png){#fig-poisson} ## Pathwise representation of IBMs {#sec-pathwise-representation } **Notation reminder** The population's evolution is described by the measure valued process $(Z_t)_{t\geq 0}$. Several types of events $e$ can occur to individuals denoted by $I$. If an event of type $e$ occur to the individual $I$ at time $t$, then the population state $Z_{t^-}$ is modified by $\phi^e(t,I)$. If $e\in \mathcal{E} \cup \mathcal{E}_W$, then events of type $e$ occur with an intensity $\sum_{k=1}^{N_t} \lambda_t^e(I,Z_t)$, with $\lambda_t^e(I,Z_t)$ defined by @eq-individual-intensity. If $e \in \mathcal{P}$, then events of type $e$ occur in the population at a Poisson intensity of $(\mu^e_t)$. ### Proof of @thm-eq-Z {#sec-prf-thm-eq-Z} ::: {.proof} For ease of notation, we prove the case when $\mathcal{P} =\emptyset$ (there are no events with Poisson intensity). - Step 1. The existence of a solution to @eq-sde-eq-Z is obtained by induction. Let $Z^1$ be the unique solution the thinning equation: $$ Z_t^1 = Z_0 + \int_0^t \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{0}\} }\mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{0})\}} Q (\mathrm{d}s,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ), \quad \forall 0 \leq t \leq T. $$ Let $T_1$ be the first jump time of $Z^1$. Since $Z_{s^-}^1 = Z_{0}$ and $N_{s^-}=N_{0}$ on $[0, T_1]$, $Z^1$ is solution of @eq-sde-eq-Z on $[0,T_1]$. Let us now assume that @eq-sde-eq-Z admits a solution $Z^n$ on $[0,T_n]$, with $T_n$ the $n$--th event time in the population. Let $Z^{n+1}$ be the unique solution of the thinning equation: $$ Z^{n+1}_t = Z_{t\wedge T_n}^n + \int_{t\wedge T_n}^{t} \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{T_n}^n)\}} \mathbf{1}_{\{k \leq N_{T_n}^n \} }Q (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). $$ First, observe that $Z^{n+1}$ coincides with $Z^n$ on $[0,T_n]$. Let $T_{n+1}$ be the $(n+1)$--th jump of $Z^{n+1}$. Furthermore, $Z_{s^-}^{n+1} = Z_{T_n}^n$ and $N_{s^-}^{n+1}=N_{T_n}^{n}$ on $[T_n, T_{n+1}]$ (nothing happens between two successive event times), $Z^{n+1}$ verifies for all $t\leq T_{n+1}$: $$ \begin{aligned} Z^{n+1}_t = Z_{t\wedge T_n}^n +\int_{t\wedge T_n}^{t} \int_{ \mathcal{J}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-}^{n+1} )\}} \mathbf{1}_{\{k \leq N_{s^-}^{n+1} \} }Q (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned} $$ Since, $Z^n$ is a solution of @eq-sde-eq-Z on $[0,T_n]$ coinciding with $Z^{n+1}$ this achieves to prove that $Z^{n+1}$ is solution of @eq-sde-eq-Z on $[0,T_{n+1}]$. Finally, let $Z =\lim_{n\to \infty } Z^n$. For all $n\geq 1$, $T_n$ is the $n$--th event time of $Z$, and $Z$ is solution of @eq-sde-eq-Z on all time intervals $[0,T_n\wedge T]$ by construction. By @lem-non-explosion $T_n \underset{n\to \infty}{\longrightarrow} \infty$. Thus, by letting $n\to \infty$ we can conclude that $Z$ is a solution of @eq-sde-eq-Z on $[0,T]$. - Step 2. Let $\tilde Z$ be a solution of @eq-sde-eq-Z. Using the same arguments than in Step 1, it is straightforward to show that $\tilde Z$ coincides with $Z^n$ on $[0,T_n]$, for all $n \geq 1$. Thus, $\tilde{Z} = Z$, with achieves to prove uniqueness. ::: ### Proof of @lem-non-explosion {#sec-prf-lem-non-explosion} The proof is obtained using pathwise comparison result, generalizing those obtained in [@KaaElK20]. ::: {.proof} Let $Z$ be a solution of @eq-sde-eq-Z. For all $e \in \mathcal{P} \cup \mathcal{E} \cup \mathcal{E}_W$, let $N^e$ be the process counting the occurrence of events of type $e$ in the population. $N^e$ is a counting process of $\{\mathcal{F}_t\}$-intensity $(\Lambda_t^e(Z_{t^-}))$, solution of $$ \begin{aligned} & N_t^e = \int_0^t \int_{{\mathbb{N}}\times {\mathbb{R}}^+} \mathbf{1}_{\{k\leq N_{s^-}\}} \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-})\}} Q(\mathrm{d}s, \mathrm{d}k, \{e\}, \mathrm{d}\theta), & \quad \textit{ if } e \in \mathcal{E}\cup \mathcal{E}_W, \\ & N_t^e = \int_0^t \int_{{\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \mu^e_s \}} Q^{\mathcal{P}}(\mathrm{d}s, \{e\}, \mathrm{d}\theta), & \quad \textit{ if } e \in \cal P. \\ \end{aligned} $$ {#eq-Ne} By definition, the jump times of the multivariate counting process $(N^e)_{e \in \mathcal{P} \cup \mathcal{E}\cup \mathcal{E}_W}$ are the population event times $(T_n)_{n\geq 0}$. The idea of the proof is to show that $(N^e)_{e \in \mathcal{P} \cup \mathcal{E}\cup \mathcal{E}_W}$ does not explode in finite time, by pathwise domination with a simpler multivariate counting process. The first steps are to control the population size $N_t = N_0 + N^b_t + N^{en}_t$.\ **Step 1** Let $(\bar N^b, \bar N^{en})$ be the 2-dimensional counting process defined as follows: for $e \in \{b,en\}$, $\bar N^e_0 = 0$ and $$ \begin{aligned} & \bar N_t^e = \int_0^t \int_{{\mathbb{N}}\times {\mathbb{R}}^+} \mathbf{1}_{\{k\leq N_0 + \bar N_{s^-} \}} \mathbf{1}_{\{\theta \leq f^e(N_0 + \bar N_{s^-})\}} Q(\mathrm{d}s, \mathrm{d}k, \{e\}, \mathrm{d}\theta), \quad \textit{ if } e \in \mathcal{E}\cup \mathcal{E}_W, \\ & \bar N_t^e = \int_0^t \int_{{\mathbb{R}}^+} \mathbf{1}_{\{\theta \leq \bar \mu^e\}} Q^{\mathcal{P}}(\mathrm{d}s, \{e\}, \mathrm{d}\theta) \quad \textit{ if } e \in \cal P, \end{aligned} $$ {#eq-dominatingprocess} with $\bar N := \bar N^b + \bar N^{en}$ and $f^e$ the function introduced in @cnj-non-explosion.\ - If $b,en \in \cal P$, then $\bar N$ is a inhomogeneous Poisson process.\ - If $b,en \in \mathcal{E}\cup \mathcal{E}_W$, then it is straightforward to show that conditionally to $N_0$, $\bar N$ is a pure birth Markov process with birth intensity function $g(n) = n\big(f^b(N_0+n) + f^{en}(N_0+n)\big)$. In particular, by @cnj-non-explosion, $g$ verifies the standard Feller condition for pure birth Markov processes (see e.g. [@BanMel15]): $$\sum_{n=1}^\infty \frac{1}{g(n)}.$$ - Finally, if $b \in \mathcal{E}$ and $en \in \cal P$ (or equivalently if $b \in \cal P$ and $en \in \mathcal{E}$), then one can show easily that $\bar N$ is a pure birth Markov process with immigration, of birth intensity function $g(n)= \bar \mu^{en} + n f^b(N_0 + n)$ (resp. $g(n)= \bar \mu^b+ n f^{en}(N_0 + n)$), also verifying the Feller condition. Therefore, there exists a non-exploding solution of @eq-dominatingprocess, by Proposition 3.3 in [@KaaElK20].\ **Step 2** The second step consists in showing that $(N^b, N^{en})$ is strongly dominated by $(\bar N^b, \bar N^{en})$, i.e that all jumps of $(N^b, N^{en})$ are jumps of $(\bar N^b, \bar N^{en})$. Without loss of generality, we can assume that $f^e:\mathbb{N} \to (0,+\infty)$ is increasing since $f^e(n)$ can be replaced by $\sup_{\{m\leq n \} } f^e(m)$.\ Let $e\in \{b, en\}$. If $e \in \mathcal{P}$, then for all $s\in [0,T]$ $$\{\theta \leq \mu_s^e\} \subset \{ \theta \leq \bar \mu^e\},$$ which yields that all jumps of $N^e$ are jumps of $\bar N^e$.\ If $e \in \mathcal{E}\cup \mathcal{E}_W$, the proof by induction is analogous to the proof of Proposition 2.1 in [@KaaElK20]. Let $T_1^e$ be first jump time of $N^e$, associated with the marks $(K_1^e,\Theta_1^e)$ of $Q$ (or $Q^{\mathcal{P}}$). Then, by Definition of @eq-Ne, $K_1^e \leq N_0$ and $\Theta_1^e \leq \lambda_{T_1^e}^e (I_{K_1^e}, Z_0)$.\ By @cnj-non-explosion, we have also $$\Theta_1^e \leq \lambda_{T_1^e}^e (I_{K_1^e}, Z_0) \leq f^e(N_0) \leq f^e(N_0 +\bar N_{T_1^{e,-}}), \quad K_1^e \leq N_0 + \bar N_{T_1^{e,-}}.$$ Thus, $T_1^e$ is also a jump time of $\bar N^e$. By iterating this argument, we obtain that all jump times of $N^e$ are jump times of $\bar N^e$.\ Thus, $(N^b, N^{en})$ does not explode in finite time.\ **Step 3** It remains to show that for $e \notin \{b, en\}$, $N^e$ does not explode.\ Let $e \neq b, en$. If $e\in \mathcal P$, the proof is the same than in Step 2. Otherwise, let: $$h^e_t(n) = \sup_{I \in \mathcal{I}, m \leq n } \lambda^e_t \biggl(I, \sum_{k=1}^{m} \delta_{I_k}\biggr), \quad \forall \; t \in [0,T] \; n \in {\mathbb{N}}^*.$$ By @cnj-intensity1 and @cnj-intensity2, $h^e_t(n) <\infty$, and we can introduce the non exploding counting process $\bar{N}^e$, defined by the thinning equation : $$\bar N_t^e = \int_0^t \int_{{\mathbb{N}}\times {\mathbb{R}}^+} \mathbf{1}_{\{k\leq N_0 + \bar N_{s^-} \}} \mathbf{1}_{\{\theta \leq h^e_s (N_0 + \bar N_{s^-})\}} Q(\mathrm{d}s, \mathrm{d}k, \{e\}, \mathrm{d}\theta),$$ with $\bar N_s = \bar N^b_s + \bar N^{en}_s$.\ Finally, by Step 2, for $s\in [0,T]$ the population size $N_s = N_0 + N^b_s+ N^{en}_s$ is bounded a.s. by $N_0 + \bar N_s$, since all jumps of $(N^b,N^{en})$ are jumps of $(\bar N^b, \bar N^{en})$. Thus, for all $s\in [0,T]$, $$\{k \leq N_{s^-} \}\subset \{k\leq N_0 + \bar N_{s^-} \}, \text{ and } \{\theta \leq \lambda_s^e(I_k, Z_{s^-})\} \subset \{\theta \leq h_s^e(N_0 + \bar N_{s^-})\}.$$ This proves that all jumps of $N^e$ are jumps $\bar N^e$, and thus $N^e$ does not explode in finite time. ::: ### Alternative pathwise representation ::: {#thm-EqZrandomized} Let $\mathcal{J}_{\mathcal{E}} = \mathbb N \times \mathcal{E}$ and $\mathcal{J}_W = \mathbb N \times \mathcal{E}_W$.\ Let $Q^{\mathcal{E}}$ be a random Poisson measure on $\mathbb R^+ \times \mathcal{J}_{\mathcal{E}}\times \mathbb{R}^+$, of intensity $\mathrm{d}t \delta_{\mathcal{J}_{\mathcal{E}}}(\mathrm{d}k, \mathrm{d}e) \mathbf{1}_{[0,\bar \lambda^e]} (\theta) \mathrm{d}\theta$, and $Q^{W}$ a random Poisson measure on $\mathbb R^+ \times \mathcal{J}_W \times \mathbb{N} \times \mathbb{R}^+$, of intensity $\mathrm{d}t \delta_{\mathcal{J}_{\mathcal{E}}}(\mathrm{d}k,\mathrm{d}e)) \delta_{\mathbb{N}} (\mathrm{d}j) \mathbf{1}_{[0,\bar W^e]} (\theta)\mathrm{d}\theta$. Finally, let $Q^{\mathcal P}$ be a random Poisson measure on $\mathbb R^+ \times \mathcal{P} \times \mathbb{R}^+$, of intensity $\mathrm{d}t \delta_{\cal P}(\mathrm{d}e) \mathbf{1}_{[0,\bar \mu^e]} (\theta)\mathrm{d}\theta$.\ There exists a unique measure-valued process $Z$, strong solution on the following SDE driven by Poisson measure: $$\begin{aligned} Z_t = Z_0 & + \int_0^t \int_{\mathcal{J}_{\mathcal{E}}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{s^-}\} }\mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{s^-})\}} Q^{\mathcal{E}}(\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ) \\ & + \int_0^t \int_{\mathcal{J}_W \times {\mathbb{N}}\times \mathbb R^+ }\phi^e (s , I_k) \mathbf{1}_{\{k \leq N_{s^-}\} } \mathbf{1}_{\{j \leq N_{s^-}\} }\mathbf{1}_{\{\theta \leq W^e(s, I_k , I_j) \}} Q^W (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}j ,\mathrm{d}\theta ),\\ & + \int_0^t \int_{\mathcal{P} \times \mathbb R^+} \phi^e(s, I_{s^-}) \mathbf{1}_{\{\theta \leq \mu_s^e \}} Q^{\mathcal{P}} (\mathrm{d}s ,\mathrm{d}e , \mathrm{d}\theta), \end{aligned}$$ {#eq-SDE_pop_randomized} with $I_{s^-}$ an individual taken uniformly in $Z_{s^-}$.\ Furthermore, the solution of @eq-SDE_pop_randomized has the same law than the solution of @eq-sde-eq-Z. ::: The proof of @thm-EqZrandomized follows the same steps than the proof of @thm-eq-Z. ## Proof of @thm-interaction {#sec-prf-thm-interaction } For ease of notation, we prove the case when $\mathcal{P} =\emptyset$ (there are no events with Poisson intensity).\ Let $Z$ be the population process obtained by @alg-interaction, and $(T_n)_{n\geq 0}$ the sequence of its jump times ($T_0=0$).\ **Step 1** Let $T_1$ be the first event time in the population, with its associated marks defining the type $E_1$ of the event and the individual $I_1$ to which this event occurs. By construction, $(T_1, E_1, I_1)$ is characterized by the first jump of: $$Q^0(\mathrm{d}t, \mathrm{d}k , \mathrm{d}e) = \int_{\mathbb R^+} \mathbf{1}_{\{\theta \leq \lambda_{t}^e(I_k,Z_0)\}}\bar Q^0 (\mathrm{d}t ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ),$$ with $\bar Q^0$ the Poisson measure introduced in the first step of the algorithm described in @sec-simulation-algo. Since $T_1$ is the first event time, the population composition stays constant, $Z_t=Z_0$, on $\{t<T_1\}$. In addition, recalling that the first event has the action $\phi^{E_1}(T_1, I_1)$ (see @tbl-event-action) on the population $Z$, we obtain that: $$\begin{aligned} Z_{t\wedge T_1} & = Z_0 + \mathbf{1}_{\{t\geq T_1\}} \phi^{E_1} (T_1 , I_1) \\ & = Z_0 + \int_0^{t\wedge T_1} \int_{\mathcal{J}_0} \phi^e (s , I_k) Q^0 (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e ) \\ & = Z_0 + \int_0^{t\wedge T_1} \int_{\mathcal{J}_0} \int_{\mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k,Z_0)\}}\bar Q^0 (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned}$$ Since $Z_{s^-} = Z_0$ on $\{s \leq T_1\}$, the last equation can be rewritten as $$ Z_{t\wedge T_1} = Z_0 + \int_0^{t\wedge T_1} \int_{\mathcal{J}_0} \int_{\mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k,Z_{s^-})\}}\bar Q^0 (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ).$$ **Step 2** The population size at the $n$--th event time $T_n$ is $N_{T_n}$. The $(n+1)$--th event type and the individual to which this event occur are thus chosen in the set $$\mathcal{J}_n := \{ 1,\dots, N_{T_n}\} \times (\mathcal{E} \cup \mathcal{E}_W).$$ Conditionally to $\mathcal{F}_{T_n}$, let us first introduce the marked Poisson measure $\bar Q^n$ on $[T_n, \infty) \times \mathcal J_n \times \mathbb R^+$, of intensity: $$ \begin{aligned} \bar \mu^n(\mathrm{d}t, \mathrm{d}k, \mathrm{d}e , \mathrm{d}\theta ) & := \mathbf{1}_{\{t > T_n \}}\bar \Lambda (N_{T_n})\mathrm{d}t \frac{\bar \lambda^e_n }{\bar \Lambda(N_{T_n})} \delta_{\mathcal J_n}(\mathrm{d}k, \mathrm{d}e) \frac{1}{\bar \lambda^e_n} \mathbf{1}_{[0,\bar \lambda^e_n]} (\theta)\mathrm{d}\theta,\\ & = \mathbf{1}_{\{t > T_n \}}\mathrm{d}t \delta_{\mathcal J_n}(\mathrm{d}k, \mathrm{d}e) \mathbf{1}_{[0,\bar \lambda^e_n]}(\theta)\mathrm{d}\theta, \end{aligned} $$ {#eq-barmun} with $\lambda^e_n = \bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e N_{T_n} \mathbf{1}_{e \in \mathcal{E}_W}$. By definition, $\bar Q^n$ has no jump before $T_n$.\ As for the first event, the triplet $(T_{n+1}, E_{n+1}, I_{n+1})$ is determined by the first jump of the measure $Q^n (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e) := \int_{\mathbb R^+} \mathbf{1}_{\{\theta \leq \lambda_s^e(I_k, Z_{T_n})\}}\bar Q^n (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta)$, obtained by thinning of $\bar Q^n$. Finally, since the population composition is constant on $[T_n, T_{n+1}[$, $Z_t = Z_{T_n}$, the population on $[0,T_{n+1}]$ is defined by: $$ \begin{aligned} Z_{t\wedge T_{n+1}} & = Z_{t \wedge T_n } + \mathbf{1}_{\{t\geq T_{n+1}\}}\phi^{E_{n+1}}(T_{n+1}, I_{n+1}), \\ & = Z_{t \wedge T_n } + \int_{t \wedge T_n}^{t \wedge T_{n+1}} \int_{\mathcal{J}_n \times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, Z_{s^-})\}}\bar Q^n (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned} $$ {#eq-z-tn} Applying $n$ times @eq-z-tn yields that: $$\begin{aligned} Z_{t\wedge T_{n+1}} = Z_0 + \sum_{l=0}^n \int_{t \wedge T_l}^{t \wedge T_{l+1}}\int_{ \mathcal{J}_l \times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, \tilde Z_{s^-})\}}\bar Q^l (\mathrm{d}s ,\mathrm{d}k , \mathrm{d}e, \mathrm{d}\theta ). \end{aligned} $$ **Step 3** Finally, let $\tilde{Z}$ be the solution of @eq-sde-eq-Z, with $(\tilde T_n)_{n\geq 0}$ the sequence of its event times. Then, we can write similarly for all $n\geq 0$: $$\begin{aligned} \tilde Z_{t\wedge \tilde T_{n+1}} & = Z_0 + \sum_{l=0}^n \int_{t \wedge \tilde T_l}^{t \wedge \tilde T_{l+1}} \int_{\mathcal{J}\times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, \tilde Z_{s^-})\}}\mathbf{1}_{ \{k \leq \tilde N_{s^-} \}} Q(\mathrm{d}s, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ), \\ & = Z_0 + \sum_{l=0}^n \int_{t \wedge \tilde T_l}^{t \wedge \tilde T_{l+1}} \int_{\mathcal{J}\times \mathbb R^+} \phi^e (s , I_k) \mathbf{1}_{\{\theta\leq \lambda_s^e(I_k, \tilde Z_{s^-})\}} \mathbf{1}_{ \{\theta \leq \tilde \lambda^e_n \}}\mathbf{1}_{ \{k \leq \tilde N_{\tilde T_l} \}} Q(\mathrm{d}s, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ), \end{aligned}$$ since $\tilde Z_{s^-} = \tilde Z_{T_l}$ on $\tilde [T_l, \tilde T_{l+1}]$, and $$\lambda_s^e(I_k, \tilde Z_{s^-}) \leq \tilde \lambda^e_n:= \bar \lambda^e \mathbf{1}_{e \in \mathcal{E}} + \bar W^e \tilde N_{\tilde T_n} \mathbf{1}_{e \in \mathcal{E}_W}$$ For each $l \geq 0$, let $$\tilde Q^l(\mathrm{d}t, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ) = \mathbf{1}_{\{t > \tilde T_l\}} \mathbf{1}_{ \{1, \dots , \tilde N_{\tilde T_l} \}}(k) \mathbf{1}_{[0, \tilde \lambda^e_n]}(\theta) Q(\mathrm{d}t, \mathrm{d}k , \mathrm{d}e , \mathrm{d}\theta ).$$ By @prp-restriction-poisson-measure, $\tilde Q^l$ is, conditionally to $\mathcal{F}_{T_l}$, a Poisson measure of intensity $$\mathbf{1}_{\{t > \tilde T_l\}} \mathrm{d}t \mathbf{1}_{ \{1, \dots , \tilde N_{\tilde T_l} \}}(k) \delta_{\mathcal{J}}(\mathrm{d}k, \mathrm{d}e)\mathbf{1}_{[0, \tilde \lambda^e_n]}(\theta) \mathrm{d}\theta.$$ It follows easily by induction that $\tilde{Q}^l$ has thus the same distribution than $\bar Q^l$, the Poisson measure with the conditional intensity $\bar \mu^l$ defined in @eq-barmun. Thus, $Z$ is an exact simulation of @eq-sde-eq-Z. ## Acknowledgements The research of Sarah Kaakai is funded by the European Union (ERC, SINGER, 101054787). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. # References {.unnumbered} ::: {#refs} ::: # Session information {.appendix .unnumbered} ```{r session-info} sessionInfo() ```