2009, Vol. 37, No. 4, 1906–1945 DOI:10.1214/08-AOS631

©Institute of Mathematical Statistics, 2009

**ASYMPTOTICS FOR POSTERIOR HAZARDS**

BY PIERPAOLODEBLASI,1GIOVANNIPECCATI2 ANDIGORPRÜNSTER1

*Università di Torino, Collegio Carlo Alberto, Université Paris Ouest and*
*Université Paris VI and Università di Torino*

*and Collegio Carlo Alberto and ICER*

An important issue in survival analysis is the investigation and the mod-eling of hazard rates. Within a Bayesian nonparametric framework, a natural and popular approach is to model hazard rates as kernel mixtures with respect to a completely random measure. In this paper we provide a comprehensive analysis of the asymptotic behavior of such models. We investigate consis-tency of the posterior distribution and derive fixed sample size central limit theorems for both linear and quadratic functionals of the posterior hazard rate. The general results are then specialized to various specific kernels and mix-ing measures yieldmix-ing consistency under minimal conditions and neat central limit theorems for the distribution of functionals.

**1. Introduction.** Bayesian nonparametric methods have found a fertile
ground of applications within survival analysis. Indeed, given that survival
analy-sis typically requires function estimation, the Bayesian nonparametric paradigm
seems to be tailor made for such problems, as already shown in the seminal papers
by Doksum [4], Dykstra and Laud [6], Lo and Weng [24] and Hjort [11].
Accord-ing to the approach of [6, 24], the hazard rate is modeled as a mixture of a suitable
kernel with respect to an increasing additive process (see [32]) or, more generally,
a completely random measure (see [21]). This approach will be the focus of the
present paper: below we first present the model and, then, the two asymptotic
is-sues we are going to tackle, namely weak consistency and the derivation of fixed
sample size central limit theorems (CLTs) for functionals of the posterior hazard
rate.

*1.1. Life-testing model with mixture hazard rate. Denote by Y a positive *
ab-solutely continuous random variable representing the lifetime and assume that its
random hazard rate is of the form

*˜h(t) =*

X*k(t, x)˜μ(dx),*
(1)

Received February 2008; revised June 2008.
1_{Supported in part by MIUR, Grant 2006/133449.}
2_{Supported in part by ISI Foundation, Lagrange Project.}

*AMS 2000 subject classifications.*62G20, 60G57.

*Key words and phrases. Asymptotics, Bayesian consistency, Bayesian nonparametrics, central*

limit theorem, completely random measure, path-variance, random hazard rate, survival analysis.

*where k is a kernel and* *˜μ a completely random measure on some Polish space X*
*endowed with its Borel σ -fieldX . The kernel k is a jointly measurable *
applica-tion fromR+× X to R+ *and the application C*→_{C}k(t, x) dt*defines a σ -finite*
measure on*B(R*+*)for any x in*X. Typical choices, which we will also consider
in this paper, are:

(i) the Dykstra–Laud (DL) kernel [6]

*k(t, x)*= I*(*0*≤x≤t),*
(2)

which leads to monotone increasing hazard rates;

(ii) the rectangular kernel (see, e.g., [13*]) with bandwidth τ > 0*

*k(t, x)*= I*(|t−x|≤τ)*;
(3)

(iii) the Ornstein–Uhlenbeck (OU) kernel (see, e.g., [25, 26*]) with κ > 0*

*k(t, x)*=√*2κ exp**−κ(t − x)*I*(*0≤x≤t);
(4)

(iv) the exponential kernel (see, e.g., [14])

*k(t, x)= x*−1*e−t/x,*

(5)

which yields monotone decreasing hazard rates.

As for the mixing measure in (1*), letting (M, B(M)) be the space of *
*bound-edly finite measures on (X, X ), ˜μ is taken to be a completely random measure*
(CRM) in the sense of [21]. This means that *˜μ is a random element defined on*

*(,F , P), taking values in (M, B(M)) and such that, for any collection of *

*dis-joint sets, B*1*, B*2*, . . . ,*the random variables *˜μ(B*1*),˜μ(B*2*), . . .*are mutually
inde-pendent. AppendixA.1provides a brief account of CRMs, as well as justifications
of the following statements. It is important to recall that a CRM is characterized
*by its Poisson intensity ν, which we can write as*

*ν(dv, dx)= ρ(dv|x)λ(dx),*

(6)

*where λ is a σ -finite measure onX. If, furthermore, ν(dv, dx) = ρ(dv)λ(dx), the*
corresponding CRM *˜μ is termed homogeneous, otherwise it is said to be *

*nonhomo-geneous. We always consider kernels such that*_{X}*k(t, x)λ(dx) <*+∞.

*Through-out the paper, we will take ν and λ to be nonatomic and we shall moreover assume*
that

*ρ(*R+*|x) = +∞* *a.e.-λ* and *supp(λ)= X,*
(H1)

for inferential purposes. In the examples we will focus on a large class of CRMs, which includes almost all CRMs used so far in applications and is characterized by an intensity measure of the type

*ν(dv, dx)*= 1
* (*1*− σ)*

*e−γ (x)v*

*v*1*+σ* *dvλ(dx),*
(7)

*where σ* *∈ [0, 1) and γ is a strictly positive function on X. Note that, if γ is a*
constant, the resulting CRMs coincide with the generalized gamma measures [2],
*whereas when σ*= 0 they are extended gamma CRMs [6, 24].

Having defined the ingredients of the mixture hazard (1), we can complete the
*description of the model, which is often referred to as life-testing model. The *
cu-mulative hazard is then given by ˜*H (t)*=_{0}*t* *˜h(s) ds and, provided*

˜

*H (t)*→ ∞ *for t*→ +∞ *a.s.,*

(8)

one can define a random density function ˜*f* as
˜

*f (t)= ˜h(t) exp(− ˜H (t))= ˜h(t) ˜S(t),*

(9)

where ˜*S(t):= exp(− ˜H (t))is the survival function, providing the probability that*

*Y > t. Consequently, the random cumulative distribution function of Y is of the*
form ˜*F (t)= 1 − exp(− ˜H (t))*. Note that, given *˜μ, ˜h represents the hazard rate*
*of Y , that is, h(t) dt= P(t ≤ Y ≤ t + dt|Y ≥ t, ˜μ). Throughout the paper we will*
assume that
E[ ˜*H (t)*] =
*t*
0
R+_{×X}*vk(u, x)ρ(dv|x)λ(dx) du < +∞* *∀t > 0.*
(H2)

Such models have recently received much attention due to their relatively simple implementation in applications. Important developments, dealing also with more general multiplicative intensity models, can be found in [12–15, 25, 26], among others.

*1.2. Posterior consistency. The study of consistency of Bayesian *
nonparamet-ric procedures represents one of the main recent research topics in Bayesian theory.
The “frequentist” (or “what if”) approach to Bayesian consistency consists of
*gen-erating independent data from a “true” fixed density f*0 and checking whether the
sequence of posterior distributions accumulates in some suitable neighborhood
*of f*0*. Specifically, denote by P*0 *the probability distribution associated with f*0
*and by P*_{0}∞ the infinite product measure. Moreover, the symbol F indicates the
space of density functions absolutely continuous with respect to the Lebesgue
mea-sure on*R, endowed with the Borel σ -field B(F) (with respect to an appropriate*

*interested in establishing sufficient conditions to have that, as n*→ +∞, for any

*ε >*0

*n(Aε(f*0*))*→ 1 *a.s.-P*0∞*,*
(10)

*where Aε(f*0*)* *represents a ε-neighborhood of f*0 in a suitable topology. If (10)
*holds, then is said to be consistent at f*0*. Now, if Aε(f*0*)*is chosen to be a weak
*neighborhood, one obtains weak consistency. Sufficient conditions for weak *
con-sistency of various important nonparametric models have been provided in, for
ex-ample, [8, 33, 35, 37]. By requiring (10*) to hold with Aεbeing a L*1-neighborhood,
*one obtains the stronger notion of L*1*consistency: general sufficient conditions for*
this to happen are provided in [1, 8, 36]. In the context of discrete models such as
neutral to the right processes, posterior consistency has been studied in [9, 19, 20].
For a thorough review of the literature on consistency issues, the reader is referred
to the monograph [10].

Turning back to the life-testing model defined by (1) and (9), little is known about consistency, since their structure is intrinsically very different from the mod-els considered so far. First results were given in [5, 25]. In particular, in [5] con-sistency is established for the DL kernel with extended gamma mixing measure assuming a bounded “true” hazard. In this paper, we determine sufficient condi-tions for weak consistency of Bayesian nonparametric models defined in terms of mixture random hazard rates. We also cover the case of lifetimes subject to in-dependent right-censoring. Then, we use this general result for establishing weak consistency for mixture hazards with the specific kernels in (2)–(5) and CRMs characterized by (7). In particular, we obtain consistency essentially w.r.t. nonde-creasing hazards for DL mixtures, w.r.t. bounded Lipschitz hazards for rectangular mixtures, w.r.t. to hazards with certain local exponential decay rate for OU mix-tures and w.r.t. completely monotone hazards for exponential mixmix-tures.

hazard with a DL kernel (2*) with a homogeneous CRM is T*2, with the oscillations
*around the trend increasing like T3/2*, whereas with a rectangular kernel the trend
*is T and the oscillations increase like T1/2*. Moreover, the parameters of the kernel
and the CRM enter the variance of the asymptotic Gaussian random variable, thus
leading to a rigorous procedure for their a priori selection.

Here, we face the more challenging problem of deriving CLTs for the posterior
hazard rate: indeed, the model defined by (1) and (9) is not conjugate and, hence,
the derivation of distributional results for posterior functionals is quite demanding.
However, by exploiting the posterior representation of James [15] (to be detailed
in Section2), we are able to provide fixed sample size CLTs also for functionals
of posterior hazard rates. One of our main findings is that, in all the considered
*special cases, the CLTs associated with the posterior hazard rate are the same as*

*for the prior ones, and this for any number of observations. If one interprets CLTs*

as approximate “global pictures” of a model, the conclusions to be drawn from our results are quite clear. Indeed, although consistency implies that a given model can be asymptotically directed toward any deterministic target, the overall structure of a posterior hazard rate is systematically determined by the prior choice, even after conditioning on a very large number of observations.

As an example of the results derived in the sequel, consider again the hazard rate
given by the DL kernel (2**) with a homogeneous CRM, and let Y***= (Y*1*, . . . , Yn)*
be a set of observations. In Section4.3.1, we will prove that

*T−3/2*[ ˜*H (T )− cT*2**]|Y***−→ X*law

(the precise meaning of such a conditional convergence in law will be clarified in
*the sequel), where c is a constant and X is a centered Gaussian random variable*
*with variance σ*2*. As anticipated, the crucial point will be that both c and σ*2 are
* independent of n and Y, and that they are actually the same constants appearing in*
the prior CLTs proved in [27]. A more detailed illustration of these phenomena is
provided in Section4.3, where we also discuss analogous results involving other
models, as well as limit theorems for quadratic functionals.

We stress that our choice of +∞ as a limiting point is mainly conventional,
and that one can easily modify our framework to deal with models that live within
a finite window of time by using an appropriate deformation of the time scale.
For instance, one can embed a hazard rate model defined on *[0, +∞) into a *
*fi-nite time interval, by substituting the time parameter T in the previous discussion*
with an increasing function of the type log*[T*∗*/(T*∗*− T )], where T*∗*<*+∞ and

0*≤ T < T*∗.

involving specific kernels and CRMs. In Section5some concluding remarks and future research lines are presented. Further results, which are also of independent interest, and the proofs are deferred to theAppendix.

**2. Posterior distribution of the random hazard rate.** In order to make
Bayesian inference starting from model (1), an explicit posterior characterization
is essential. Indeed, the first treatments of model (1) were limited to considering
extended gamma CRMs, which allow for a relatively simple posterior
character-ization [6, 24]. Analysis beyond gamma-like choices of *˜μ has not been possible*
for a long time due to the lack of a suitable and implementable posterior
charac-terization: however, in James [15] this goal has been achieved and many choices
for *˜μ can now be explored. See also [*23] for a different derivation of these results.
In what follows, we give an explicit description of the posterior characterization of
the model (1).

Let ˜*P _{f}*

_{˜}be the random probability measure associated with (9) and denote by

*(Yn)n*≥1*a sequence of exchangeable observations, defined on (,F , P) and *
tak-ing values inR+, such that, given ˜*P _{f}*

_{˜}

*, the Yn*’s are i.i.d. with distribution ˜

*P*

_{f}_{˜}, that is,

*P[Y*1

*∈ B*1

*, . . . , Yn∈ Bn*| ˜

*P*

_{f}_{˜}] =

_{i}n_{=1}

*P*˜

_{f}_{˜}

*(Bi)for any Bi∈ B(R*+

*), i= 1, . . . , n*

*and n*1

**≥ 1. The joint (conditional) density of Y = (Y***, . . . , Yn)*given

*˜μ = μ is then*given by

*e*− X

*ni=1*0

*yik(t,x) dt μ(dx)*

*n*

*i*=1 X

*k(yi, x)μ(dx).*

In this context it is important to consider also some censoring mechanism,
specifically independent right-censoring. Hence, suppose there are
*addition-ally Yn*+1*, . . . , Ym* random times which are right censored by censoring times
*Cn*+1*, . . . , Cm, that is, Yi> Ci* *for i= n + 1, . . . , m [by exchangeability, it would*
*be equivalent to assume the right censored data to be an arbitrary (m− *
*n)-di-mensional subvector of (Y*1*, . . . , Ym)*]. It is well known that assuming the
*distrib-ution of C to be known is equivalent to assuming the distribdistrib-ution of C is a priori*
*independent of the distribution of Y . Hence, the posterior distribution of* *˜μ may*
*be obtained without even specifying the prior on the distribution of C. Then the*
**likelihood function based on Y***= (Y*1*, . . . , Ym)***, where the vector Y is composed**
*of n completely observed times and m− n right censored times, has the form*

* L (μ; y) = e*−X

*Km(x)μ(dx)*

*n*

*i*=1 X

*k(yi, x)μ(dx),*(11)

*where Km(x)*=

*mi*=1

*yi∧ci*

(11) reduces to
* L (μ; y, x) = e*−X

*Km(x)μ(dx)*

*n*

*i*=1

*k(yi; xi)μ(dxi)*

*= e*−X

*Km(x)μ(dx)*

*k*

*j*=1

*μ(dx*∗

_{j}*)nj*

*i∈Dj*

*k(yi; xj*∗

*),*

**where X**∗*= (X*∗_{1}*, . . . , X*∗* _{k})denote the k≤ n distinct latent variables, nj* is the

*fre-quency of X*∗

_{j}*and Dj*

*= {r : xr= x*∗

_{j}*}. Finally, set τnj(x)*=

R+*vnje−vKm(x)ρ(dv*|
*x).*We are now in a position to state the posterior characterization of the mixture
hazard rate.

THEOREM 1 (James [15]). *Let ˜h be a random hazard rate as defined in*(1),

*corresponding to model (*9**). Then, given Y, the posterior distribution of ˜**h can be*characterized as follows:*

**(i) Given X and Y, the conditional distribution of***˜μ coincides with the *

*distri-bution of the random measure*

*˜μm,*∗_{+}
*k*
*i*=1
*JiδX*∗_{j}*= ˜μm,*∗*+ n,*∗*,*
(12)

*where* *˜μm,*∗*is a CRM with intensity measure*

*νm,*∗*(dv, dx):= e−vKm(x) _{ρ(dv}|x)λ(dx),*

(13)

*n,*∗*(dx)*:=*k _{i}*

_{=1}

*JiδX*∗

_{i}(dx) with, for i*= 1, . . . , k, Xi*∗

*a fixed point of*

*disconti-nuity with corresponding jump Ji*

*distributed as*

*fJi(dv)*=
*vni _{e}−vKm(X*∗

*i)*∗

_{ρ(dv}|X*i)*R+

*vni*∗

_{e}−vKm(X_{i}*)*∗

_{ρ(dv}|X*i)*

*.*(14)

*Moreover, the Ji ’s are, conditionally on X and Y, independent of*

*˜μm,*∗.

**(ii) Conditionally on Y, the distribution of the latent variables X is****3. Consistency.** Our first goal consists in deriving sufficient conditions for
weak consistency of the Bayesian nonparametric life-testing model (9) with
mix-ture hazard (1), which covers also the case of data subject to right-censoring. Then,
we exploit this criterion for obtaining consistency results for specific mixture
haz-ards.

In the case of complete data, a general and widely used sufficient condition
*for weak consistency with respect to a “true” unknown density function f*0, due
to Schwartz [33*], requires a prior to assign positive probability to Kullback–*
*Leibler neighborhoods of f*0, that is,

*f* *∈ F : dKL(f*0*, f ) < ε*
*>*0 *for any ε > 0,*
(15)
*where dKL(f*0*, f )*=

*log(f*0*(t)/f (t))f*0*(t) dt* denotes the Kullback–Leibler
*di-vergence between f*0*and f .*

*In the presence of right-censoring, we do not actually observe the lifetime Y ,*
*but, (Z, ), where Z= Y ∧ C, = I(Y≤C)* *for C a censoring time with *
*distribu-tion Pcadmitting density fc*. Clearly, this leads us to consider a prior on the space
*F × F and the corresponding prior *∗_{induced on the space of the distribution of}
*the observables (Zi, i)*’s.

The strategy of the proof consists in first rewriting the Kullback–Leibler
*condi-tion in terms of the induced prior *∗: this condition then guarantees consistency
*of *∗*. Moreover, it allows us to deduce the consistency of , the prior on the*
*distribution of the lifetime Y , under independent right-censoring with the simple*
support condition

*supp(Pc)*= R+*.*
(16)

The last step consists in translating the Kullback–Leibler condition into a condition
*in terms of uniform neighborhoods of the true hazard rate h*0*on the interval (0, T*]
*for any finite T . When dealing with models for hazard rates, the latter appears to*
be both more natural and easy to verify.

*Without risk of confusion, in the following we denote by the prior on ˜f* and
also the prior induced on ˜*h. Moreover, recall that the “true” density f*0can always
*be represented in terms of the “true” hazard h*0*as f*0*(t)= h*0*(t)exp(*−

*t*

0*h*0*(s) ds)*.
THEOREM2. *Let ˜f be a random density function defined by*(1*) and (*9*) with*

*kernels (*2*)–(*5*) and denote its (prior) distribution by . Suppose the distribution*

*of the censoring times Pcis independent of the lifetime Y , absolutely continuous*
*and satisfies (*16*). Moreover, assume that the following conditions hold:*

*(i) f*0*(t) is strictly positive on (0,∞) and*

R+max{E[ ˜*H (t)], t}f*0*(t) dt <*∞;
*(ii) there exists r > 0 such that lim inft*↓0 *˜h(t)/tr= ∞ a.s.*

*Then, a sufficient condition for to be weakly consistent at f*0*is that*

*h*: sup
*0<t≤T|h(t) − h*
0*(t)| < δ*
*>*0
(17)

Some comments regarding the conditions are in order at this point. Let us start
*by condition (i): the strict positivity of f*0 *on (0,∞) is equivalent to strict *
*pos-itivity of the “true” hazard h*0 *on (0,∞), which is a property satisfied by any*
*reasonable h*0. The second part of condition (i), which is also related to the
asymp-totic characterizations considered in Section 4, clearly becomes more restrictive
*the faster the trend of the cumulative hazard. However, note that if h*0 is a power
*function, then f*0 admits moments of any order and, hence, it is enough that the
trend of the cumulative hazard is a power function as well. Condition (ii) allows to
*remove the somehow artificial assumption of h*0*(0) > 0 as in [*5*]. Indeed, h*0*(0)*= 0
represents a common situation in practice and condition (ii) covers such a case by
controlling the small time behavior of ˜*h. Obviously, if h*0*(0) > 0, then one would*
adopt a random hazard ˜*h* nonvanishing in 0 and so condition (ii) would be
auto-matically satisfied. Overall, the result can be seen as a general consistency criterion
for mixture hazard models and deals automatically with the case of independent
right-censoring. Moreover, it should be extendable in a quite straightforward way
to mixture hazards with different reasonably behaving kernels.

Before entering a detailed analysis of specific models, we show how
condi-tion (ii) of Theorem2 can be reduced to the problem of studying the short time
behavior of the CRM and, moreover, we establish that the CRMs defined in (7)
satisfy the corresponding short time behavior requirement. Throughout this
sec-tion we assumeX = R+and, hence, when useful, *˜μ will be treated as an increasing*
additive process (see [32]), namely the càdlàg distribution function induced by *˜μ.*
PROPOSITION 3. *Let ˜h be a mixture hazard*(1*). Then condition (ii) in *

*Theo-rem*2*is implied by:*

*(ii1) there exists ε > 0 such that ˜h(t)≥ c ˜μ((0, t]) for t < ε, where c is a *
*con-stant not depending on t ;*

*(ii2) there exists r > 0 such that lim inft*↓0 *˜μ((0, t])/tr= ∞ a.s.*

*In particular, (ii1) holds if k is either the DL (*2*) or the OU (*4*) kernel; (ii2) holds*

*if* *˜μ is a CRM belonging to (*7*) with σ∈ (0, 1) and λ(dx) = dx.*

Condition (ii1) requires that the random hazard leaves the origin at least as
fast as the driving CRM, which is typically the case. Out of the four considered
kernels, we have to face the problem of ˜*h(0)*= 0 a.s. for the DL and OU mixtures
and for both kernels (ii1) is satisfied. Condition (ii2) asks to control the small time
behavior of the CRM and is met by CRMs like (7). If one is interested in CRMs
different from (7), one can try to adapt one of the several results on small time
behavior known in the literature (see, e.g., [32] and references therein).

*of mixture form h*0*(t)*=

R+*k(t, x)μ*0*(dx), where k is the same kernel used for*
defining the specific model ˜*h; then, we show that these mixture h*0’s are
*arbitrar-ily close in the uniform metric to any h*0 belonging to a class of hazards having
a suitable qualitative feature.

We first deal with DL mixture hazards ˜*h(t)*=_{R}+I*(*0≤x≤t)*˜μ(dx), which *
repre-sent a model for nondecreasing hazard rates. The result establishes weak
*consis-tency of such models for any nondecreasing h*0 satisfying some mild additional
conditions.

THEOREM4. *Let ˜h be a mixture hazard*(1*) with DL kernel and* *˜μ satisfying*

*condition (ii2) of Proposition*3.

*Then is weakly consistent at any f*0*∈ F*1*, where* *F*1 *is defined as the set*

*of densities for which: (i)*_{R}+E[ ˜*H (t)]f*0*(t) dt <∞; (ii) h*0*(0)= 0 and h*0*(t) is*

*strictly positive and nondecreasing for any t > 0.*

The second model we consider is represented by rectangular mixture hazards
*˜h(t) =*_{R}+I*(|t−x|≤ ˜τ)˜μ(dx). In order to obtain consistency with respect to a large*
*class of h*0*’s we treat the bandwidth τ as a hyper-parameter and assign to it an*
*independent prior π , whose support contains[0, L] for some L > 0. So we have*
two sources of randomness: *˜τ with distribution π and ˜μ, whose distribution we*
*denote by Q. Hence, the prior distribution on ˜his induced by π×Q via the map*

*(τ, μ)→ h(·|τ, μ) :=*I*( |·−x|≤τ)μ(dx)*. In this framework we are able to derive

*consistency at essentially any bounded and nonvanishing Lipschitz hazard h*0.

THEOREM5. *Let ˜h be a mixture hazard*(1*) with rectangular kernel and *

*ran-dom bandwidth* *˜τ independent of ˜μ. Moreover, the support of the prior π on ˜τ*
*contains[0, L] for some L > 0.*

*Then is weakly consistent at any f*0*∈ F*2*, whereF*2 *is defined as the set of*

*densities for which: (i)* _{R}+max{E[ ˜*H (t)], t}f*0*(t) dt <∞; (ii) h*0*(t) >0 for any*

*t≥ 0; (iii) h*0 *is bounded and Lipschitz.*

Now consider OU mixture hazards ˜*h(t)*=_{R}+√*2κe−κ(t−x)*I*(*0*≤x≤t)˜μ(dx). *
*De-fine for any differentiable decreasing function g the local exponential decay rate*
as*−g*
*(y)/g(y). Our result establishes consistency at essentially any h*0which
ex-hibits, in regions where it is decreasing, a local exponential decay rate smaller than

*κ*√*2κ. This sheds also some light on the role of the kernel-parameter κ: choosing*
*a large κ leads to less smooth trajectories of ˜h*, but, on the other hand, ensures also
*consistency with respect to h*0’s which have abrupt decays in certain regions.

THEOREM6. *Let ˜h be a mixture hazard*(1*) with OU kernel and* *˜μ satisfying*

*condition (ii2) of Proposition*3.

*Then is weakly consistent at any f*0*∈ F*3*, where* *F*3 *is defined as the set*

*h*0*(t) >0 for any t > 0; (iii) h*0 *is differentiable and, for any t > 0 such that*

*h*
_{0}*(t) <0, the corresponding local exponential decay rate is smaller than κ*√*2κ.*
REMARK 1. In the above three mixture hazard models, one typically selects
*CRMs with λ in (*6) being the Lebesgue measure on R+. If this is the case, then
condition (i) in the definition of*Fi* *(i= 1, 2, 3), becomes*

R+*t*2*f*0*(t) dt <*∞ for
DL mixture hazards and_{R}+*tf*0*(t) dt <*∞ for rectangular and OU mixtures.

Now we deal with mixture hazards based on an exponential kernel ˜*h(t)*=

R+*x*−1*e−t/x˜μ(dx), which are used to model decreasing hazard rates. Note that,*

*in contrast to the DL, rectangular and OU kernels which all exhibit, for any fixed t ,*
finite support onR+ *when seen as functions of x, in this case the support is*R+
*for any fixed t . This implies the need for quite different techniques for handling*
*it. Recall that a function g on*R+is completely monotone if it possesses
*deriva-tives g(n)of all orders and (−1)ng(n)(y)≥ 0 for any y > 0. The next result shows*

that consistency holds at essentially any completely monotone hazard for which

*h*0*(0) <*∞.

THEOREM7. *Let ˜h be a mixture hazard*(1*) with exponential kernel such that*
*˜h(0) < ∞ a.s.*

*Then is weakly consistent at any f*0*∈ F*4*, whereF*4 *is defined as the set of*

*densities for which: (i)*_{R}+*tf*0*(t) dt <∞; (ii) h*0*(0) <∞; (iii) h*0 *is completely*

*monotone.*

Note that the requirement of ˜*h* not to explode in 0 is easily achieved by
*se-lecting λ in (*6) such that_{R}+_{×R}+*(*1*− e−ux*−1*v)ρ(dv|x)λ(dx) < ∞ for all u > 0,*

which is equivalent to ˜*h(0) <*∞ a.s. [see (36) in AppendixA.1].

**4. Fixed sample size posterior CLTs.** In this section we derive CLTs for
functionals of the random hazard given a fixed set of observations as time diverges.
For the sake of clarity, in the following we confine ourselves to the case of
com-plete observations; however, all subsequent results immediately carry over to the
case of data subject to right-censoring.

*4.1. Further concepts and notation. Since we will heavily exploit the *
poste-rior characterization of ˜*h*recalled in Theorem1, it is useful to introduce first some
definitions related to quantities involved in its statement. Whenever convenient,
*we shall use the notation ν0,*∗*:= ν and ˜μ0,*∗*:= ˜μ, that is, ˜μ0,*∗is the “prior” CRM
*and ν0,*∗*is its intensity measure. For every n≥ 0, q, p ≥ 1, we denote by*

*Lp((νn,*∗*)q)= Lp**(*R+*× X)q,**B(R*+*)⊗ X**q, (νn,*∗*)q*

*p≥ 1. The symbol L*2* _{s}((νn,*∗

*)*2

*)*is used to denote the Hilbert subspace of

*L*2*((νn,*∗*)*2*)* *generated by the symmetric functions on (*R+ *× X)*2. Note that
*a function f , on (*R+*× X)*2*, is said to be symmetric whenever f (s, x; t, y) =*

*f (t, y; s, x) for every (s, x), (t, y) ∈ R*+*× X.*

Now we introduce various kernels which will enter either the statements or the
*conditions of the posterior CLTs. For n*≥ 0, we denote the posterior hazard rate
**and posterior cumulative hazard, given X and Y, by**

*˜hn,*∗*(t)*=
X*k(t, x)[ ˜μ*
*n,*∗* _{(dx)}_{+ }n,*∗

*∗*

_{(dx)}_{] = ˜h}n,

_{(t)}_{+}

*k*

*i*=1

*Jik(t, Xi*∗

*)*(18) ˜

*Hn,∗(T )*=

*T*0

*˜hn,∗(t) dt*= ˜

*Hn,*∗

*(T )*+

*k*

*i*=1

*Ji*

*T*0

*k(t, X*∗

*(19)*

_{i}) dt.In (18) and (19), we implicitly introduced the notation ˜*hn,*∗*(t)* and ˜*Hn,*∗*(T )*for,
respectively, the hazard rate and cumulative hazard without fixed points of
discon-tinuity. Note that ˜*h _{}0,*∗

*(t)*coincides with ˜

*h(t)*, the prior hazard rate.

Furthermore, we need to define two basic classes of kernels:

*(i) for every n≥ 0 and every f ∈ L*2* _{s}((νn,*∗

*)*2

*), the kernel f*1

*is defined*

_{1,n}f*on (*R+

*× X)*2

*and is equal to the contraction*

*f *1* _{1,n}f (t*1

*, x*1

*; t*2

*, x*2

*)*=

R+_{×X}*f (t*1*, x*1*; s, y)f (s, y; t*2*, x*2*)ν*

*n,*∗_{(ds, dy)}_{;}
(20)

*(ii) for every n≥ 0 and every f ∈ L*2* _{s}((νn,*∗

*)*2

*), the kernel f*1

*is defined*

_{2,n}f*on (*R+

*× X) and is given by*

*f*1

*= R+*

_{2,n}f (t, x)_{×X}

*f (t, x; s, y)*2

*∗*

_{ν}n,*(21)*

_{(ds, dy).}The “star” notation is rather common, see, for example, [16, 28, 34]. Note that the
*Cauchy–Schwarz inequality yields that f *1_{1,n}f*∈ L*2* _{s}((νn,*∗

*)*2

*)*. It is worth noting

*that the two operators “*1

*1*

_{1,n}” and “*,” which appear in the stataments of our CLTs, can be used to obtain explicit (combinatorial) expressions of the moments and of the cumulants associated with single and double integrals with respect to a Poisson (completely) random measure. See [31] for a discussion of this point.*

_{2,n}Introduce now a last set of kernels which will appear in the conditions of the
results discussed in Section4*. Fix n≥ 0, take T such that 0 ≤ T < +∞ and define*

*k _{T}(2)(s, x)*=

*s*2

*T*

*T*0

*k(u, x)*2

*du*; (24)

*k*= R+

_{T ,n}(3)(s, x)_{×X}

*k*

*(1)*

*T*

*(s, x; u, w)ν*

*n,*∗

*(25)*

_{(du, dw).}*Finally, for (s, x)*∈ R+*× X define the random kernel*

*k(4)*
*T ,n,∗(s, x)*=
*s*
*T*
*T*
0
*k(u, x)*
X*k(u, y)*
*n,*∗* _{(dy) du}*
(26)
=

*k*

*i*=1

*k*∗

_{T}(1)(s, x; Ji, X*i).*

*4.2. General results. Before stating the results concerning the asymptotic *
be-havior of functionals of random hazards, we need to make some more technical
assumptions, which do not appear to be very restrictive; indeed, in the following
examples, involving kernels and CRMs commonly exploited in practice, they will
be shown to hold.

In the sequel we consider mixture hazards (1) which, in addition to (H1)–(H2),
satisfy also
R+_{×X}*k(t, x)*
*j _{v}j_{ρ(dv}_{|x)λ(dx) < +∞}*

*(H3)*

_{∀t, j = 1, 2, 4;}_{}

*T*0 R+

_{×X}

*k(t, x)*

*j*

_{v}j_{ρ(dv}_{|x)λ(dx) dt < +∞}*See [27, 28] for a discussion of these conditions. Recall from (18), that ˜*

_{∀T ≥ 0, j = 2, 4.}*hn,*∗

*(t)*

**stands for the posterior hazard without fixed points of discontinuity (given X**
**and Y) and is characterized by (**13). It is straightforward to see that, if the prior
hazard rate satisfies (H1)–(H3), then ˜*hn,*∗*(t)*meets (H1)–(H3) as well.

*Given an event B ∈ F , we will say that B has P{·|X, Y}-probability 1 whenever*

*there exists*

*∈ F such that P{*

*} = 1, and, for every fixed ω ∈*, the random

*probabilty measure A*

**→ P{X ∈ A|Y}(ω) has support contained in the set of those***(x*1*, . . . , xn)*∈ X*n*such that

* P{B|X = (x*1

*, . . . , xn*

**), Y**} = 1.*Finally, fix a sample size n*≥ 1 for the remainder of the section. The following
Theorems8,9and10provide sufficient conditions to have that linear and quadratic
functionals associated with posterior random hazard rates verify a CLT. The first
result deals with linear functionals.

THEOREM 8 (Linear functionals). *Suppose: (i) k _{T}(0)*

*∈ L*3

*(νn,*∗

*) for every*

*such that, as T* → +∞,
*C*_{0}2*(n, k, T )*×
R+_{×X}
*k( _{T}0)(s, x)*
2

*νn,*∗

*(ds, dx)→ σ*

_{0}2

*(n, k),*(27)

*C*

_{0}3

*(n, k, T )*× R+

_{×X}

*k(*3

_{T}0)(s, x)*νn,*∗

*(ds, dx)→ 0,*(28)

*where σ*_{0}2*(n, k) ∈ (0, +∞). Also assume that, with P{·|X, Y}-probability 1,*

lim
*T*→+∞*C*0*(n, k, T )*×
*k*
*i*=1
*Ji*
*T*
0
*k(t, X _{i}*∗

*) dt= m(n, n,*∗

*, k)∈ [0, +∞).*(29)

*Then, a.s.-P, for every real λ,*

Eexp*iλC*0*(n, k, T )*[ ˜*H (T )*− E[ ˜*Hn,*∗*(T )*]]
**|Y**
−→
*T*→+∞E
exp
*iλm(n, n,*∗*, k)*−*λ*
2
2*σ*
2
0*(n, k)*
**Y**
*.*

REMARK 2. *When n***= 0 and setting, by convention, Y = X = 0 so that**

*σ {Y, X} = {, ∅}, one recovers Theorem 1 in [*27] for prior random hazards. The
same applies for the following two results concerning path-second moments and
path-variances.

THEOREM 9 (Path-second moments). *Suppose k( _{T ,n}3)*

*∈ L*2

*(νn,*∗

*)∩ L*1

*(νn,*∗

*)*,

*k( _{T}2)∈ L*3

*(νn,*∗

*) and that there exists a strictly positive function C*1

*(n, k, T ) such*

*that the following asymptotic conditions are satisfied as T* → +∞:

*Then, a.s.-P, for every real λ,*
E
exp
*iλC*1*(n, k, T )*
1
*T*
*T*
0
*˜h(t)*2* _{dt}_{− A}n,*∗

*T*− 1

*T*

*T*0

*E[ ˜h*

*n,*∗

*2*

_{(t)}

_{] dt}**−→**

_{Y}*T*→+∞E exp

*iλv(n,*∗

*, k)*−

*λ*2 2

*σ*

_{1}2

*(n, k)+ σ*

_{4}2

*(n, n,*∗

*, k)*

**Y**

*.*

THEOREM 10 (Path-variances). *Suppose that the assumptions of Theorem*8

*and Theorem*9*are satisfied. Assume, moreover, that*

*1. C*1*(n, k, T )/(T C*0*(n, k, T ))*2→ 0;

*2. 2C*1*(n, k, T )*E[ ˜*Hn,*∗*(T )]/(T*2*C*0*(n, k, T ))→ δ(n, k) ∈ R;*

3. *C*1*(n, k, T )(k _{T}(2)*

*+ 2k*

_{T ,n}(3)*+ 2k*

_{T ,}(4)*n,*∗

*)*

*− δ(n, k)C*0

*(n, k, T )k(*2

_{T}0)*2*

_{L}*∗*

_{(ν}n,*)*→

*σ*

_{5}2

*(n, n,*∗

*, k)*

**∈ [0, +∞), with P{·|X, Y}-probability 1***and An, _{T}*∗

*is given by (*30

*). Then, a.s.-P, for every real λ*E

*eiλC*1

*(n,k,T ){1/T*0

*T[ ˜h(t)− ˜H (T )/T*]2

*dt−An,∗T*

*−1/T*

*T*0

*E[ ˜hn,*∗

*(t )*2

*] dt+E[ ˜Hn,*∗

*(T )*]2

*/T*2}

**|Y**−→

*T*→+∞E

*eiλ(v(n,n,∗,k)−δ(n,k)m(n,n,∗,k))−λ*2

*/2(σ*12

*(n,k)+σ*52

*(n,n,∗,k))*

**|Y**

_{.}REMARK 3. *We stress that, in general, the four quantities m(n, n,*∗*, k)*,

*σ*_{4}2*(n, n,*∗*, k), v(n, n,*∗*, k)* *and σ*_{5}2*(n, n,*∗*, k)* (appearing in the previous three
statements) can be random.

*4.3. Applications. In this section we derive CLTs for functionals of *
poste-rior hazards based on the four kernels (2)–(5), combined with generalized gamma
CRMs [2], namely CRMs as in (7*) with γ a positive constant. The measure λ is*
chosen such that the life-testing model is well defined and (H1)–(H3) are met.
Many other classes of CRM represent possible alternatives and one can proceed as
below. It is important to recall that consistency of all the models dealt with below
is easily deduced from the results in Section3.

In all the cases we get to the conclusion that the asymptotic behavior of
function-als of the posterior hazard rate coincides exactly with the behavior of functionfunction-als
of the prior hazard. To see why this happens, let us focus on the behavior of the
trend of the posterior CRM. It turns out thatE[ ˜*Hn,*∗*(T )] ∼ ψ*1*(T )+ ψ*2*(T ; Y),*

*where ψ*1

*(T )*∼ E[ ˜

*H (T )] and ψ*2

*(T*

**; Y) explicitly depends on the data Y, is***different from 0 for every T > 0 and ψ*2

*(T*1

**; Y) = o(ψ***(T ))*. Moreover, once

*the rate of divergence from the trend C*0

*(n, k, T )*is computed, one finds that

*C*0*(n, k, T )= C*0*(k; T ) and C*0*(k, T )*−1*× ψ*2*(T ; Y) → 0 as T → ∞. To fix ideas,*
consider a DL mixture hazard with generalized gamma CRM given one

*observa-tion Y*1: one obtains

*and, since the divergence rate C*0*(n, k, T )*−1*is equal to T3/2*, the influence of the
*data vanishes at a rate T−1/2*. Similar phenomena occur when studying the
as-ymptotic behavior of the part of the posterior corresponding to the fixed points of
discontinuity. This basically explains why the forthcoming CLTs do not depend
on the data. Such an outcome is quite surprising, at least to us. Note, indeed, that
the Poisson intensity of the posterior CRM (13**) depends explicitly on the data Y,**
which implies that the posterior hazard, and a fortiori the posterior cumulative
*hazard, depend on the data for any T . Also, the fact that the variance of the *
as-ymptotic Gaussian distribution is not influenced by the data is somehow
counter-intuitive: since the contribution of the CRM vanishes in the limit, one would expect
the variance to become smaller and smaller as more data come in. Since this does
not happen, our findings provide some evidence that the choice of the CRM really
matters whatever the size of the dataset. Hence, one should carefully select the
kernel and CRM so to incorporate prior knowledge appropriately into the model;
the neat CLTs presented here provide a guideline in this respect by highlighting
trend, oscillation around the trend and asymptotic variance.

*4.3.1. Asymptotics for kernels with finite support. We start by considering *
ker-nels with finite support, namely, the DL, OU and rectangular ones with
*general-ized gamma CRM and take λ to be the Lebesgue measure on* R+. This ensures
that (H1)–(H3) are satisfied. For a generalized gamma CRM one has, for any

*c >*0,_{0}∞*scρ(ds)= [(1 − σ)c*−1*]γ−c+σ* *:= Kρ(c), where (a)n:= (a + n)/ (a)*
denotes the Pochhammer symbol. Since in the posterior the CRM becomes
nonho-mogeneous with updated intensity (13), the verification of the conditions of
Theo-rems8–10*can become cumbersome. However, for any A*∈ R2_{+}, one has

*ν(A)≤ νn,*∗*(A)≤ ν(A),*

(31)

*where ν(dv, dx):= exp{−nk _{Y}(0)*

*(n)(v, x)}ν(dv, dx) and Y(n)* stands for the largest

*lifetime. Having a lower and an upper bound for the Poisson intensity νn,*∗allows
* then to use, conditionally on X, Y, a comparison result analogous to Theorem 4*
of [27] in order to check the conditions of the posterior CLTs.

*Let us first consider linear functionals for the OU kernel. Note that k _{T}(0)(v, x)*=

*v*√

*2/κ(1− e−κ(T −x))*I

*(*0

*≤x≤T ), and that k*3

_{T}(0)∈ L*(ν)*, so that condition (i) of The-orem 8 is a direct consequence of (31). Next, one can check that

*k(*2

_{T}0)*2*

_{L}*∼*

_{(ν)}*k(0)*

*T* 2*L*2* _{(ν)}∼ 2κ*−1

*K*

*(2)*

*ρ* *T*. In fact, the dominating term in the norm with respect
*to ν is the integral over* R+*× [Y(n),∞), which is in turn equal to the *
domi-nating term of*k _{T}(0)*2

*2*

_{L}

_{(ν)}. Moreover, T−3/2k*(0)*
*T*

3

*L*3* _{(ν}n,∗_{)}*→ 0 and we have that

(27) and (28*) are satisfied with C*0*(n, k, T )= C*0*(0, k, T )= 1/*
√

*T* → ∞, so that (29*) holds with m(n, n,*∗*, k) = 0, not depending on X, Y. Finally,*

since *k( _{T}0)*

*1*

_{L}

_{(ν)}*∼ k*

_{T}(0)*1*

_{L}

_{(ν)}*∼ Kρ(1)*√

*2/κT , then* E[ ˜*Hn,*∗*(T )*] ∼ E[ ˜*H (T )*].

Hence, from Theorem8combined with fact that the limiting mean does not depend
**on Y, it follows that**
Eexp
*iλ*[ ˜*H (T )*−
√
*2/κγ−1+σT*]
*T1/2*
**Y**
−→
*T*→+∞exp
−*λ*2
2 *σ*
2
0*(0, k)*
*,*
(32)

*where σ*_{0}2*(0, k)= 2κ*−1*(*1*− σ)γ−2+σ*. Therefore, the posterior cumulative hazard
has the same asymptotic behavior as the prior cumulative hazard. As mentioned
before, this is quite surprising also in the light of the consistency result.

*Let us now consider the path-second moment. We obtain k _{T}(1)(v, x; u, y) =*

*uv*

*T* *eκ(x+y)(e−2κ(x∨y)− e−2κT)*I*(*0*≤x,y≤T ),* and, as for condition 1 in Theorem 9,
one finds that *k( _{T}1)*2

*2*

_{L}*2*

_{(ν}

_{)}∼ k*(1)*
*T*
2
*L*2* _{(ν}*2

*−1*

_{)}∼ 2κ*(K*

*(2)*

*ρ*

*)*2

*T*

*and, hence, C*1

*(n, k,*

*T )*=√*T* *and σ*_{1}2*(n, k)= 2κ*−1*(Kρ(2))*2*, which coincide with the case n*= 0. The
idea here is the same as before, namely that the dominating term of the norm
*with respect to ν*2 *is the integral over (R*+*× [Y(n),∞])*2, which is equal to the
dominating term of*k _{T}(1)*2

*2*

_{L}*2*

_{(ν}*. Then, conditions 2., 3. and 4. are verified since*

_{)}*they are verified for n= 0. In particular, note that k _{T}(1)*1

_{i,n}k_{T}(1)*≤ k(*1

_{T}1)

_{i,}_{0}

*k*

_{T}(1)*for i* *= 1, 2. As for condition 5., one first check that k( _{T ,}4)*

*n,*∗

*= O(T*−1

*)*, then

*some tedious algebra allows to verify that it is satisfied with σ*_{4}2*(n, n,*∗*, k)*=
*Kρ(4)*+* _{κ}*8

*Kρ(3)Kρ(1)*+ 16

*2*

_{κ}*K*

*(2)*

*ρ* *(Kρ(1))*2. This is, indeed, a delicate point since both
*k( _{T ,n}3)*

*and the norm with respect to the updated Poisson intensity νn,*∗ depend on the posterior. Once this is done, it is not difficult to check that condition 6. is

*sat-isfied. Moreover, the quantity v(n, n,*∗

*, k)*in condition 7. can be shown to be 0,

*whereas A*∗

_{T}n,*= O(T*−1

*)*in (30). Finally, one can check that

*1*

_{T}_{0}

*TE[ ˜hn,*∗

*(t)*2

*] dt ∼*

1
*T*

*T*

0 *E[ ˜h0,*∗*(t)*2*] dt ∼ K*
*(2)*

*ρ* +2* _{κ}(Kρ(1))*2, so that, from Theorem9, we deduce the
following CLT for the path-second moment:

E
exp
*iλ*√*T*
_{1}
*T*
*T*
0
*˜h(t)*2* _{dt}_{− γ}−2+σ*
1

*− σ +2γ*

*σ*

*κ*

**Y**(33) −→

*T*→+∞exp −

*λ*2 2

*σ*

_{1}2

*(n, k)+ σ*

_{4}2

*(n, n,*∗

*, k)*

*,*

*where σ*

_{1}2

*(n, k)+ σ*

_{4}2

*(n, n,*∗

*, k)*=

*(*1

*−σ)(16κ*−1

*γ2σ+2(9−5σ)γσ+κ(2−σ)*2

*)*

*κγ*4−σ .

As far as the path-variance is concerned, one verifies easily that the conditions
of Theorem10 *are satisfied, with δ(n, k)*= 2√*3/2*

*κK*
*(1)*

so that Theorem10leads to
E
exp
*iλ*√*T*
_{1}
*T*
*T*
0
*˜h(t) −H (T )*˜
*T*
2
*dt*−1*− σ*
*γ*2*−σ*
**Y**
(34)
−→
*T*→+∞exp
−*λ*2
2
*σ*_{1}2*(n, k)+ σ*_{5}2*(n, n,*∗*, k)**,*
*where σ*_{1}2*(n, k)+ σ*_{5}2*(n, n,*∗*, k)*= *(*1−σ)(2(1−σ)γ*σ+κ(2−σ)*2*)*
*κγ*4−σ .

For the other two kernels, namely rectangular and DL, one can proceed along
the same lines of reasoning and, again, the asymptotic posterior behavior coincides
with the one of the prior. In particular, one obtains that for linear functionals and
quadratical functionals of hazard rates based on the rectangular kernel, the CLTs
(32), (33) and (34) hold with the same rate functions and appropriately modified
constants and variances (for the exact values see [27], since they coincide with the
a priori ones). As for the DL kernel the CLT for the posterior cumulative hazard is
of the form
1
*T3/2*
˜
*H (T )*− 1
*2γ*1*−σT*
2_{Y}* _{−→ X ∼ N}*law

*1*

_{0,}*− σ*

*3γ*2

*−σ*

*.*

With reference to quadratic functionals, in this case, some of the conditions of Theorems9and10are violated already in prior (see [27] for details).

*4.3.2. Asymptotics for exponential kernel. Here we consider random hazards*
based on the exponential kernel. Indeed, it is crucial to consider also a kernel with
full support, since one may think that the lack of dependence on the data of
poste-rior functionals may be due to the boundedness of the support of the kernels dealt
with in Section4.3.1. However, it turns out that, again, the posterior CLTs coincide
with the corresponding prior CLTs.

In particular, set, within (7*), λ(dx)= x−1/2e−1/x(*2√*π )*−1: this implies that
*˜h(0) < ∞ a.s., (*8) is in order and (H1)–(H3) are satisfied. This model is of interest
also beyond the scope of the present asymptotic analysis; in fact, it leads to a prior
mean*E[ ˜h(t)] = Kρ(1)(t+1)−1/2*and, thus, we have a nonparametric prior centered
on a quasi Weibull hazard, which is a desirable feature in survival analysis.

We start by investigating the linear functional of ˜*h*: here we provide details
also for the derivation of the prior CLTs since this model has not been considered
in [27*]. In this case, we have that k _{T}(0)(v, x)= v(1 − e−T /x)and k(_{T}0)(v, x)∈ L*3

*(ν)*

*for all T > 0 and the same holds for the posterior. We also have thatk( _{T}0)*

*1*

_{L}*=*

_{(ν)}*Kρ(1)(*

√

1*+ T − 1), so that, as T → ∞, E[ ˜H (T )] ∼ Kρ(1)*
√

variable and dominated convergence
*k(0)*
*T* *L*1* _{(ν)}*
√

*T*= 1 √

*T*R+

*(*1

*− e−T /x)*

*[γ + n(1 − e−Y(n)/x*]1−σ

_{)}*e−1/xx−1/2*2√

*π*

*dx*= 1 2√

*π*R+

*(*1

*− e−y)e−y/Ty−3/2*

*[γ + n(1 − e−yY(n)/T*]1

_{)}*−σ*

*dx*−→

*T*→+∞ 1 2√

*π*R+ 1

*− e−y*

*y3/2*1

_{γ}*−σ*

*dx= K*

*(1)*

*ρ*

*.*

Therefore, E[ ˜*Hn,*∗*(T )*] ∼ E[ ˜*H (T )*]. Similar arguments lead to show that

*k(0)*
*T* 2*L*2_{(ν)}∼ k*(0)*
*T* 2*L*2* _{(ν)}∼ (2 −*
√

*2)Kρ(2)*√

*T* *and, hence, we have C*0*(n, k, T )*=

*C*0*(0, k, T )* *= T−1/4* *and σ*02*(n, k)* *= σ*02*(0, k)* *= (2 −*
√
*2)Kρ(2)*. Moreover,
*k(0)*
*T* 3*L*3* _{(ν)}∼ O(*
√

*T )*is sufficient for concluding that (28) holds both for the prior
*and the posterior. Finally, as T* → ∞,*k _{i}*

_{=1}

*k*∗

_{T}(0)(Ji, Xi*)*Y}-probability 1; thus, also in this case (29

**= O(1) with P{·|X,***) holds with m(n, n,*∗

*, k)*= 0. We can

then deduce from Theorem8that

Eexp
*iλ*[ ˜*H (T )− γ*
*−(1−σ) _{T}1/2*

_{]}

*T1/4*

**Y**−→

*T*→+∞exp −

*λ*2 2

*σ*2 0

*(0, k)*

*for any sample size n≥ 0 and with σ*_{0}2*= (2 −*√*2)(1− σ)γ−1+σ*. Hence, we have
shown that the exponential kernel hazard exhibits both trend and oscillations of
*or-der T1/2*and verifies exactly the same CLT for both prior and posterior cumulative
hazard, thus confirming that the asymptotics is not influenced by the data.

Our results for quadratic functionals do not apply to the exponential kernel. To
*see this, note that k _{T}(1)(v, x; u, y) =uv_{t}*

*1*

_{x}_{+y}*(*1− exp{−

*x*

_{xy}+yT}) and, by calculating*the norm with respect to ν*2, we get
_{k}(1)*T*
2
*L*2* _{(ν}*2

*=*

_{)}*(Kρ(2))*2

*16(2T*2

_{+ 3T + 1)},*which implies C*1

*(0, k, T )= T . However, k*4

_{T}(1)*4*

_{L}*2*

_{(ν}*∼*

_{)}*d*

*T*4*, d being a positive*

constant, so that condition 2 in Theorem9does not hold.

*in the parametric setup. The case of α > 1 is covered by both Theorem 4 (DL*
*kernel) and Theorem 6 (OU kernel). When α < 1, h*0 is a completely monotone
function and it would naturally belong to the domain of attraction of Theorem7;
*however, in such a case h*0*(0) is not finite and, hence, the required conditions*
*are not met. Nonetheless, h*0 can be approximated to any order of accuracy by

*hε(t)= (α/λ)((t +ε)/λ)α*−1*, for some small enough ε > 0, when accuracy is *
*mea-sured in terms of survival functions. In fact, it is easy to see that for S*0*(t)and Sε(t)*,
*the survival functions corresponding to h*0*and hε*, respectively, sup*t|S*0*(t)−Sε(t)*|
*goes to zero as ε approaches zero. Finally, note that Theorem*7 *applies to hε* for
*any ε > 0. Further work is needed in order to extend the consistency result to *
com-pletely monotone hazards which explode in zero; for such cases, condition (17) is
probably too strong.

Future work will also focus on achieving consistency with respect to stronger
topologies; two are the possible routes in this direction. The first one is to
inves-tigate under which additional conditions on the CRM *˜μ and restrictions on the*
*form of the true hazard rate h*0 *we get L*1-consistency at the density level, that is,
(10*) with Aεbeing a L*1*neighborhood of f*0. To this end, one has then to consider
the metric entropy of the subset of F corresponding to the qualitative condition
*given on h*0*. Moreover, one has to investigate in detail the support of the prior *
on *F via the mapping ˜h → ˜f* *= ˜h exp(−*_{0}*t* *˜h). This appears to be a rather *

diffi-cult problem because of ˜*h*appearing twice, and existing results on random mixing
densities are not easily extensible. The second strategy consists of investigating
consistency directly at the hazard level. Indeed, weak consistency at the density
level implies pointwise consistency of the cumulative hazard:

*n*
*h*:
* _{T}*
0

*h(t) dt*−

*0*

_{T}*h*0

*(t) dt*

*≤ ε*→ 1

*a.s.-P*

_{0}∞

*for any ε, T > 0. Among stronger topologies, a promising one seems to be the one*
induced by_{0}∞*|h(t) − h*0*(t)|S*0*(t) dt,where S*0*(t)*= exp{−

*t*

0*h*0*(s) ds*}.

With reference to the study of the asymptotic behavior of functionals of the random hazard, a further interesting development consists in studying the joint limit as both the number of observations and time diverge. To achieve such a result, one probably needs to find a right balance in the simultaneous divergence of the sample size and time, which lets the influence of the data emerge.

E[ ˜*N (dv, dx)] = ν(dv, dx) and, for any A ∈ B(R*+*)⊗ X such that ν(A) < ∞,*

˜

*N (A)* *is a Poisson random variable of parameter ν(A). Given any finite *
*collec-tion of pairwise disjoint sets, A*1*, . . . , Ak*, in*B(R*+*)⊗ X , the random variables*

˜

*N (A*1*), . . . , ˜N (Ak)* *are mutually independent. Moreover, the intensity measure ν*
must satisfy_{R}+*(v∧ 1)ν(dv, X) < ∞ where a ∧ b = min{a, b}.*

*Let now (M, B(M)) be the space of boundedly finite measures on (X, X ),*
*where μ is said boundedly finite if μ(A) <*+∞ for every bounded measurable
*set A. We suppose that*M is equipped with the topology of vague convergence and
that*B(M) is the corresponding Borel σ -field. Let ˜μ be a random element, defined*
*on (,F , P) and with values in (M, B(M)), and suppose that ˜μ can be *
repre-sented as a linear functional of the Poisson random measure ˜*N* *(with intensity ν)*
as *˜μ(B) =*_{R}+* _{×B}s ˜N (ds, dx)for any B∈ X . From the properties of ˜N* it easily

follows that *˜μ is a CRM on X [*21], that is: (i) *˜μ(∅) = 0 a.s.-P; (ii) for any *
collec-tion of disjoint sets in*X , B*1*, B*2*, . . . ,*the random variables *˜μ(B*1*),* *˜μ(B*2*), . . .*are
mutually independent and *˜μ(**j*≥1*Bj)*=*j*≥1 *˜μ(Bj)*holds true a.s.-P.

Now let *Gν* *be the space of functions g :*X → R+ such that

R+_{×X}[1 −
*e−sg(x)]ν(ds, dx) < ∞. Then, the law of ˜μ is uniquely characterized by its*
*Laplace functional which, for any g inGν*, is given by

E*e*−X*g(x)˜μ(dx)*
= exp
−
R+_{×X}
1*− e−sg(x)*
*ν(ds, dx)*
*.*
(35)

From (35) it is apparent that the law of the CRM *˜μ is completely determined by the*
*corresponding intensity measure ν. Letting λ be a σ -finite measure on*X, we can
*always write the Poisson intensity ν as (*6*), where ρ :B(R*+*)*×X → R+is a kernel
*[i.e., x→ ρ(C|x) is X -measurable for any C ∈ B(R*+*)and ρ(·|x) is a σ -finite*
measure on*B(R*+*)for any x inX]. Note that the kernel ρ(dv|x) is uniquely *
*de-termined outside some set of λ-measure 0, and that such a disintegration is *
guar-anteed by Theorem 15.3.3 in [17]. Finally, recall (see, e.g., Proposition 1 in [30])
that a linear functional of a CRM,_{X}*f (x)˜μ(dx), is a.s. finite if and only if*

R+_{×X}

1*− e−u|f (x)|v*
*ρ(dv|x)λ(dx) < +∞* *∀u > 0.*

(36)

**A.2. Proofs of the results of Section3.**

*Proof of Theorem*2. The first step consists in adapting the K–L condition (15)
to the case of right-censoring. Denote byF0⊂ F×F the class of all pairs of density
*functions (f*1*, f*2*)such that both f*1*and f*2are supported on the entire positive real
*line. Let Xi∼ fi, for i= 1, 2, suppose X*1*is stochastically independent of X*2and
*define ψ(X*1*, X*2*)= (X*1*∧ X*2*,*I*(X*1*≤X*2*)). The density of ψ with respect to the*

Lebesgue measure and the counting measure on*{0, 1} is given by*

*Then φ is one-to-one on*F0 *and the maps φ, φ*−1 defined onF0andF∗_{0}*= φ(F*0*)*,
respectively, are continuous with respect to the supremum distance on distribution
functions. See Peterson [29]. Denote by ¯the prior onF0*and by *∗*= ¯ ◦ φ*−1
the induced prior onF∗_{0}*. Since (f*0*, fc)*∈ F0 *by hypothesis, the continuity of φ*−1
implies that the posterior ¯*(·|(Z*1*, *1*), . . . , (Zn, n))* is weakly consistent at
*(f*0*, fc)* *if *∗*(·|(Z*1*, *1*), . . . , (Zn, n))* *is weakly consistent at φ(f*0*, fc)*.
*Indi-cate by p(x, d), for x∈ R and d = 0, 1 a generic element of F*∗_{0}. Then, K–L support
*condition on *∗*at p*0∈ F∗_{0}takes the form

∗
*p*:
_{∞}
0 *p*0*(z,1) log*
*p*0*(z,1)*
*p(z,1)* *dz*+
_{∞}
0 *p*0*(z,0) log*
*p*0*(z,0)*
*p(z,0)* *dz < ε*
*>*0

*for any ε > 0. As observed in Section*2*, since the prior on fc* does not play any
*role in the analysis, we may treat fc* as fixed, that is, take a prior onF × F of the
*form × δfc. Hence, by setting p*0*(x, d)= φ(f*0*, fc)(z, d)*, the K–L condition

boils down to
*f*:
_{∞}
0 *f*0*(t)Sc(t)*log
*f*0*(t)*
*f (t)* *dt*+
_{∞}
0 *S*0*(t)fc(t)*log
*S*0*(t)*
*Sf(t)*
*dt < ε*
*>*0
(37)

*for any ε > 0, where we defined the survival functions S*0*(t)*= 1 −

_{∞}
*t* *f*0*(x) dx*,
*Sf(t)*= 1 −
_{∞}
*t* *f (x) dx* *and Sc(t)*= 1 −
_{∞}
*t* *fc(x) dx*.

The next step consists in showing that, under the stated hypotheses, the K–L
support condition (37) is satisfied, which in turn implies weak consistency.
Specif-ically, we show that a sufficient condition for (37*) is that, for any δ > 0, there*
*exists T*
*such that, for any T > T*
,

*h*: sup
*t≤T*
*|h(t) − h*0*(t)| < δ,*
_{∞}
*T* *|H − H*0*|f*0*< δ*
*>0,*
(38)
*where H (t)*=_{0}*th(s) ds* *and H*0*(t)*=
_{t}

0*h*0*(s) ds*. By the structural properties
of the model with (2)–(5), it follows that (38) holds under condition (17) and
_{∞}

0 | ˜*H (t)− H*0*(t)|f*0*(t) dt <*∞ a.s. In particular, the latter is implied by
con-dition (i) and the fact that_{0}∞*H*0*(t)f*0*(t) <*∞.

Define the set

*V (δ, T )*:=
*h*: sup
*t≤T|h(t) − h*
0*(t)| < δ,*
_{∞}
*T* *|H − H*0*|f*0
*< δ*
*,*
(39)

which, by (38*), has positive probability for any δ and any T larger than a time*
*point T*
*that may depend on δ. Our goal is then to show that, for any ε > 0, there*
*exists δ > 0 and T sufficiently large such that, for any h∈ V (δ, T ),*

*where f (t)= h(t) exp(−*_{0}*th(s) ds)*. Let us start from (40) by noting that
_{∞}
*T*
log
*f*0
*f*
*f*0*Sc*+ log
*S*0
*Sf*
*fcS*0
(42)
≤ ∞
*T*
*log(h*0*)f*0*Sc*−
_{∞}
*T*
*log(h)f*0*Sc*+
_{∞}
*T* *|H − H*0*|(f*0
*Sc+ fcS*0*).*
As for the first integral in the right-hand side of (42), it is easy to see that
_{∞}

*T* *log(h*0*)f*0*Sc* *goes to zero as T* → ∞. As for the second integral, one needs to
*consider the case of h(t) that eventually goes to zero, but then the negligibility of*
*the integral as T* → ∞ is guaranteed by condition (i) and (8), which is needed for
the model to be well defined. As for the third integral in the right-hand side of (42),
*notice that f*0*(t)Sc(t)+ fc(t)S*0*(t)* *≤ 2f*0*(t)* *for t sufficiently large since Sc* ≤
*1 and fcis eventually smaller than h*0. Therefore

_{∞}

*T* *|H − H*0*|(f*0*Sc+ fcS*0*) <2δ*
*and we can conclude that there exists a positive δ sufficiently smaller than ε/4*
*and T sufficiently large such that (*40*) holds for any h∈ V (δ, T ).*

We are now left to show that (41*) holds. Assume first that h*0*(0) > 0 and write*

*T*
0
*log(f*0*/f )f*0*Sc+ log(S*0*/Sf)fcS*0
= *T*
0 log
* _{h}*
0

*(t)*

*h(t)*

*f*0

*(t)Sc(t) dt*(43) +

*T*0

*t*0

*[h(s) − h*0

*(s)] ds[f*0

*(t)Sc(t)+ fc(t)S*0

*(t)] dt := I*1

*+ I*2

*.*

*Next, let c*:= inf

*t≤T*

*h*0

*(t)*, which is positive by condition (i), and note that, for

*δ < cand h∈ V (δ, T ),*
*I*1≤
*T*
0
*h*0*(t)*
*h(t)* − 1
*f*0*(t)Sc(t) dt*≤
*T*
0
*δ*
*c− δf*0*(t) dt*≤
*δ*
*c− δ*
*I*2≤
*T*
0
sup
*s≤t|h(s) − h*0
*(s)*|
*t[f*0*(t)Sc(t)+ fc(t)S*0*(t)] dt*
*≤ δ* ∞
0 *t[f*0*(t)Sc(t)+ fc(t)S*0*(t)] dt ≤ δE*0*,*
*where E*0:=
_{∞}

0 *tf*0*(t) dt* is finite by condition (i) and the last inequality follows
*from f*0*Sc+ fcS*0*being the density of Z= Y ∧ C which, in turn, is stochastically*
*smaller than Y . Hence, I*1 *+ I*2 *≤ δ(c − δ)*−1 *+ δE*0*, so that δ < min{cε/(4 +*

*ε), ε/(4E*0*)*} implies (41*) for any h∈ V (δ, T ), no matter how large T is. Finally,*
*one can choose δ small enough and T large enough such that (*40) and (41) are
*simultaneously satisfied for any h∈ V (δ, T ).*

*By allowing h*0*(0)= 0, we need a different bound for I*1in (43). We proceed by
*taking 0 < ς < T and split I*1into

*As for I*12*, for fixed ε, find δ and T such that h∈ V (δ, T ) implies I*12*+ I*2*< ε/4,*
*for any ς . As for I*11*, we need to prove that, for the same ε fixed above, there exists*
*a small enough ς > 0 such that*

*ς*
0

*log(h*0*(t)/ h(t))f*0*(t)Sc(t) dt < ε/4.*
(44)

*This is tantamount of showing that log(h*0*/ ˜h)f*0*Sc*is integrable in 0 a.s., which in
*turn reduces to show that log(h*0*/ ˜h)f*0 *is integrable in 0 a.s. since Sc(0)*= 1. Note
that it is sufficient to control the worst case, namely when ˜*h(0)*= 0 a.s., but then
integrability in 0 follows from condition (ii). Indeed, we need to show that there
*exists 0 < p < 1 such that*

lim sup
*τ*↓0

log*{h*0*(τ )/ ˜h(τ )}f*0*(τ )*

*τp*−1 = 0 a.s.

First note that lim*τ*↓0log*{h*0*(τ )}f*0*(τ )*= 0. This can be deduced by reasoning
*in terms of log(f*0*)f*0 *since, clearly, h*0*(τ )∼ f*0*(τ )* *as τ* *→ 0. As for log(f*0*)f*0
*vanishing at zero, we start considering f*0 having regular variation of exponent
*0 < p < 1 at zero, that is, f*0*(τ )∼ τpL(1/τ ) as τ* *→ 0, for L(·) a slowly varying*
function at*∞. Recall that a positive function L(x) defined on R*+ varies slowly
at*∞ if, for every fixed x, L(tx)/L(x) → 1 as t → ∞. Hence,*

*f*0*(τ )*log*[f*0*(τ )] ∼ τp{log(τp)+ log[L(1/τ)]} := τpL*∗*(1/τ ),*

*where L*∗is a slowly varying function at*∞. Hence f*0*log(f*0*)*is a regularly
*vary-ing function at zero with exponent p and, in turn, it vanishes in zero. Note that the*
*larger p is, the faster log{h*0*(τ )}f*0*(τ )vanishes as τ* → 0. Next, we have that, for
*any 0 < p < 1,*
lim sup
*τ*↓0
log*{h*0*(τ )/ ˜h(τ )}f*0*(τ )*
*τp*−1 = 0 + lim sup
*τ*↓0
*− log{ ˜h(τ)}*
*τp*−1 ≤ lim_{τ}_{↓0}
*− log{τr*_{}}
*τp*−1 *,*
*where the last limit is zero for any 0 < p < 1. The integrability then follows for any*
*0 < p < 1. Slightly different arguments can be used when f*0 has regular variation
*of exponent p > 1 at zero, while the special case of f*0slowly varying at zero (i.e.,

*p*= 0) can be dealt by using Lemma 2 of Feller [7], Section VII.8. The proof is
then complete.

*Proof of Proposition* 3. The fact that (ii1) and (ii2) are sufficient for
condi-tion (ii)(b) of Theorem2to hold is straightforward.

Since for DL mixture hazards ˜*h(t)= ˜μ([0, t]) and for OU mixtures ˜h(t) ≥*

√

*2κe−κε˜μ([0, t]) for any ε > t, condition (ii1) is met for both.*

by *¯γ . Hence, we have the generalized gamma subordinator, whose Laplace *
*expo-nent is given by ψ(u):= σ*−1*(u+ ¯γ)σ− ¯γσ*. Moreover,_{0}∞*vερ(dv)*= ∞ for any
*ε < σ* *and the inverse of ψ(u) is of the form ψ*−1*(y)= (σy + γσ)1/σ− γ . Thus,*

we are in a position to apply Proposition 47.18 in [32], which, in our case allow to
*state that there exists a constant C such that*

lim inf
*t*↓0

*˜μ([0, t])*

*g(t)* *= C* *a.s. with 0 < C <∞,*

(45)

*where g(t)= log log(1/t)[(σt*−1*log log(1/t)+ γσ)1/σ* *− γ ]*−1. From (45) it
*fol-lows immediately that, for any δ > 0, lim inft*↓0 *˜μ([0,t]) _{t}1/σ+δ* = ∞ a.s. Hence,

*condi-tion (ii2) is satisfied by taking r* *= 1/σ + δ. To see that condition (ii2) holds*
also if *˜μ is a nonhomogeneous CRM it is enough to note that the *
*correspond-ing Laplace exponent σ*−1_{0}∞*[(u + γ (x))σ* *− γ (x)σ] dx is bounded above by*

*ψ (u):= σ*−1*(u+ ¯γ)σ* *− ¯γσ* with *¯γ = infx*∈R+*γ (x)*≥ 0 and that, infinitesimally,
a nonhomogeneous CRM behaves like a homogeneous one.

*An auxiliary lemma. Before getting into the proofs of the consistency results,*

we provide a useful auxiliary result. LetM be the space of boundedly finite
mea-sures on R+ and denote by G the space of distribution function associated to
*it: clearly, any G*∈ G will be a nondecreasing càdlàg function on R+ such that

*G(0)*= 0.

LEMMA 11. *Let* *˜μ be a CRM on R*+*, satisfying (*H1*), and denote by Q the*

*distribution induced onG. Then, for any G*0*∈ G, any finite M and η > 0,*

*Q*
*G*∈ G : sup
*x≤M*
*|G(x) − G*0*(x)| < η*
*>0.*

PROOF. *Fix ε > 0 and choose (z*0*, . . . , zN)*such that (i) 0*= z*0*≤ z*1*<· · · <*
*z _{N}*

*= M; (ii) all locations, where G*0

*has a jump of size larger than ε/2, are*

*con-tained in (z*1

*, . . . , z*0

_{N−1}); (iii) for l= 1, . . . , N, G*(z*−

*l*

*)− G*0

*(zl*−1

*)≤ ε. Next,*de-fine

*Gε(x)*=

*N*

*l*=1

*j*I

_{l}*(zl≤x),*(46)

*where the jump j _{l}*

*at zl*

*is given by jl*

*= G*0

*{zl} + G*0

*(zl*−

*)− G*0

*(zl*−1

*)*, for

*l*

*= 1, . . . , N. If z*1

*= 0, then set by convention G*0

*(z*0

*)*

*:= G*0

*(*0−

*)*= 0 and

*in c. Define Wl(Gε)= {G ∈ G : G(zl)− G(zl*−1*)∈ Bδ/(2N )[Gε(zl)− Gε(zl*−1*)*]}
*for l= 1, . . . , N, with the convention that G(z*0*):= G(0*−*)= 0 if z*1= 0 so that

*G(z*1*)− G(z*0*)= G{0}. Then**N _{l}*

_{=1}

*Wl(Gε)⊂ {G ∈ G : supx≤M|G(x) − Gε(x)| <*

*δ}. The sets Wl(Gε)*

*are independent under Q and each has positive*

*proba-bility. We conclude that, for any δ > 0, Q{G ∈ G : supx≤M|G(x) − Gε(x)| <*

*δ} ≥ Q{*

*N*

_{l}_{=1}

*Wl} > 0. The proof is then completed by taking ε and δ such that*

*ε+ δ < η.*

Now, relying on Theorem2and Lemma11, we are in a position to provide the
proofs of Theorems4–7. Showing that for the specific kernels at issue (17) is met,
represents a result of independent interest concerning small ball probabilities of
mixtures with respect to CRMs; indeed, passing through Lemma11, we actually
show that (H1) is sufficient for (17), that is, for ˜*h*putting positive probability on
*uniform neighborhoods of h*0.

*Proof of Theorem*4. The first step consists in verifying consistency with
re-spect to hazards of mixture form. To this end we postulate the existence of a
*bound-edly finite measure μ*0onR+such that

*h*0*(t)*=

R+*k(t, x)μ*0*(dx).*

(47)

*Clearly, μ*0has to be such that

_{T}

0 *h*0*(t) dt→ +∞, as T → ∞, in order to ensure*
the model to be properly defined. In the case of the DL kernel, (17) is a direct
consequence of Lemma11*since h*0*(t)= G*0*(t)*and ˜*h(t)= ˜μ([0, t]).*

*The consistency result clearly extends to all increasing hazard rates h*0 with

*h*0*(0)= 0. To see this let μ*0*be the measure associated to h*0*. Then μ*0∈ M since

*μ((0, τ]) = h*0*(τ )→ 0 as τ → 0 and h*0*(t)*=

I*(*0*≤x≤t)μ*0*(dx).*Finally, note that
the moment condition in (i) of Theorem 2 reduces to_{R}+E[ ˜*H (t)]f*0*(t) dt <*∞
*since, for any choice of λ in (*6*) and for any large enough t ,*E[ ˜*H (t)] > t.*

*Proof of Theorem* 5. As before, we first establish (17*) for h*0 of mixture
form (47*) and assume τ to be fixed and (i) and (ii) to hold. Take G∈ {G ∈*
G : sup*t≤T +τ|G(x) − G*0*(x)| < δ} and let hG* be the corresponding hazard rate.
Then, one has