Ecological systems can often be characterised by changes among a finite set of underlying states pertaining to individuals, populations, communities or entire ecosystems through time. Owing to the inherent difficulty of empirical field studies, ecological state dynamics operating at any level of this hierarchy can often be unobservable or ‘hidden’. Ecologists must therefore often contend with incomplete or indirect observations that are somehow related to these underlying processes. By formally disentangling state and observation processes based on simple yet powerful mathematical properties that can be used to describe many ecological phenomena, hidden Markov models (HMMs) can facilitate inferences about complex system state dynamics that might otherwise be intractable. However, HMMs have only recently begun to gain traction within the broader ecological community. We provide a gentle introduction to HMMs, establish some common terminology, review the immense scope of HMMs for applied ecological research and provide a tutorial on implementation and interpretation. By illustrating how practitioners can use a simple conceptual template to customise HMMs for their specific systems of interest, revealing methodological links between existing applications, and highlighting some practical considerations and limitations of these approaches, our goal is to help establish HMMs as a fundamental inferential tool for ecologists.

INTRODUCTION

Ecological systems can often be characterised by changes among underlying system states through time. These state dynamics can pertain to individuals (e.g. birth, death), populations (e.g. increases, decreases), metapopulations (e.g. colonisation, extinction), communities (e.g. succession) or entire ecosystems (e.g. regime shifts). Gaining an understanding of state dynamics at each level of this hierarchy is a central goal of ecology and fundamental to studies of climate change, biodiversity, species distribution and density, habitat and patch selection, population dynamics, behaviour, evolution and many other phenomena (Begon et al., 2006). However, inferring ecological state dynamics is challenging for several reasons, including: (1) these complex systems often display nonlinear, non-monotonic, non-stationary and non-Gaussian behaviour (Scheffer et al., 2001; Tucker and Anand, 2005; Wood, 2010; Pedersen et al., 2011a; Fasiolo et al., 2016); (2) changes in underlying states and dynamics can be rapid and drastic, but also gradual and more subtle (Beisner et al., 2003; Scheffer and Carpenter, 2003; Folke et al., 2004); and (3) the actual state of an ecological entity, be it an individual plant or animal, or a population or community, can often be difficult or impossible to observe directly (Martin et al., 2005; Kéry and Schmidt, 2008; Royle and Dorazio, 2008; Chen et al., 2013; Kellner and Swihart, 2014). Ecologists must therefore often contend with pieces of evidence believed to be informative of the state of an unobservable system at a particular point in time (see Fig. 1).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

System state processes that can be difficult to observe directly, but can be uncovered from common ecological observation processes using hidden Markov models. The state process (blue) can pertain to any level within the ecological hierarchy (‘Individual’, ‘Population’, ‘Community’ or ‘Ecosystem’) and for convenience is categorised as primarily ‘Existential’, ‘Developmental’ or ‘Spatial’ in nature. The observation process (green) can provide information about state processes at different levels of the hierarchy (green lines) and includes capture–recapture, DNA sampling, animal-borne telemetry, count surveys, presence–absence surveys and/or abiotic measurements. Observation and state processes from lower levels can be integrated for inferences at higher levels. For example, community-level biodiversity data could be combined with environmental data to describe ecosystem-level processes.

Whether for management, conservation or empirical testing of ecological theory, there is a need for inferential methods that seek to uncover the relationships between factors driving such systems, and thereby predict them in quantitative terms. Hidden Markov models (HMMs) constitute a class of statistical models that has rapidly gained prominence in ecology because they are able to accommodate complex structures that account for changes between unobservable system states (Ephraim and Merhav, 2002; Cappé et al., 2005; Zucchini et al., 2016). By simultaneously modelling two time series – one consisting of the underlying state dynamics and a second consisting of observations arising from the true state of the system – HMMs are able to detect state changes in noisy time-dependent phenomena by formally disentangling the state and observation processes. For example, using HMMs and their variants:

Historical regime shifts can be identified from reconstructed chronologies;
Long-term dynamics of populations, species, communities and ecosystems in changing environments can be inferred from dynamic biodiversity data;
Species identity and biodiversity can be determined from environmental DNA (eDNA);
Hidden evolutionary traits can be accounted for when assessing the drivers of diversification;
Species occurrence can be linked to variation in habitat, population density, land use, host–pathogen dynamics or predator–prey interactions;
Survival, dispersal, reproduction, disease status and habitat use can be inferred from capture–recapture time series;
Animal movements can be classified into foraging, migrating or other modes for inferences about behaviour, activity budgets, resource selection and physiology; and
Trade-offs between dormancy and colonisation can be inferred from standing flora or fungal fruiting bodies.

The increasing popularity of HMMs has been fuelled by new and detailed data streams, such as those arising from modern remote sensing and geographic information systems (Viovy and Saint, 1994; Gao, 2002), eDNA (Bálint et al., 2018) and genetic sequencing (Hudson, 2008), as well as advances in computing power and user-friendly software (Visser and Speenkenbrink, 2010). However, despite their utility and ubiquity in other fields such as finance (Bhar and Hamori, 2004), speech recognition (Rabiner, 1989) and bioinformatics (Durbin et al., 1998), the vast potential of HMMs for uncovering latent system dynamics from readily available data remains largely unrecognised by the broader ecological community. This is likely attributable to a tendency for the existing ecological literature to characterise HMMs as a subject-specific tool reserved for a particular type of data rather than a general conceptual framework for probabilistic modelling of sequential data. This is also likely exacerbated by a tendency for HMMs to be applied and described quite differently across disciplines. Indeed, many ecologists may not recognise that some of the most well-established inferential frameworks in population, community and movement ecology are in fact special cases of HMMs.

Catering to ecologists and non-statisticians, we describe the structure and properties of HMMs (HIDDEN MARKOV MODELS), establish some common terminology (Table 1) and review case studies from the biological, ecological, genetics and statistical literature (ECOLOGICAL APPLICATIONS OF HIDDEN MARKOV MODELS). Central to our review and synthesis is a simple but flexible conceptual template that ecologists can use to customise HMMs for their specific systems of interest. In addition to highlighting new areas where HMMs may be particularly promising in ecology, we also demonstrate cases where these models have (perhaps unknowingly) already been used by ecologists for decades. We then identify some practical considerations, including implementation, software and potential challenges that practitioners may encounter when using HMMs (IMPLEMENTATION, CHALLENGES AND PITFALLS). Using an illustrative example, we provide a step-by-step tutorial on some of the more technical aspects of HMM implementation in the Supplementary Tutorial. The overall aim of our review is thus to provide a synthesis of the various ways in which HMMs can be used, reveal methodological links between existing applications and thereby establish HMMs as a fundamental inferential tool for ecologists working with sequential data.

Table 1. Glossary

Term	Definition	Synonyms
Conditional independence property	Assumption made for the state-dependent process: conditional on the state at time t, the observation at time t is independent of all other observations and states
Forward algorithm	Recursive scheme for updating the likelihood and state probabilities of an HMM through time	Filtering
Forward–backward algorithm	Recursive scheme for calculating state probabilities for any point in time: $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0001$	Local state decoding; smoothing
Hidden Markov model (HMM)	A special class of state-space model with a finite number of hidden states that typically assumes some form of the Markov property and the conditional independence property	Dependent mixture model; latent Markov model; Markov-switching model; regime-switching model; state-switching model; multi-state model
Initial distribution $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0002$	The probability of being in any of the $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0003$ states at the start of the sequence: $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0004$	Initial probabilities; prior probabilities
Markov property	Assumption made for the state process: $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0005$ (‘conditional on the present, the future is independent of the past’)	Memoryless property
Sojourn time	The amount of time spent in a state before switching to another state	Dwell time; occupancy time
State process $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0006$	Unobserved, serially correlated sequence of states describing how the system evolves over time: $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0007$ for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0008$	Hidden/latent process; system process
State transition probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0009$	The probability of switching from state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0010$ at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0011$ to state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0012$ at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0013$ , $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0014$ , usually represented as an $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0015$ transition probability matrix $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0016$
State-dependent distribution $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0017$	Probability distribution of an observation $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0018$ conditional on a particular state being active at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0019$ , usually from some parametric class (e.g. categorical, Poisson, normal) and represented as an $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0020$ diagonal matrix $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0021$	Emission distribution; measurement model; observation distribution; output distribution; response distribution
State-dependent process $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0022$	The observed process within an HMM, which is assumed to be driven by the underlying unobserved state process	Observation process
State-space model	A conditionally specified hierarchical model consisting of two linked stochastic processes, a latent system process model and an observation process model
Viterbi algorithm	Recursive scheme for finding the sequence of states which is most likely to have given rise to the observed sequence	Global state decoding

HIDDEN MARKOV MODELS

We begin by providing a gentle introduction to HMMs, including model formulation, inference and extensions. Although we have endeavoured to minimise technical material and provide illustrative examples wherever possible, we assume the reader has at least some basic understanding of linear algebra concepts such as matrix multiplication and diagonal matrices (e.g. see Appendix A in Caswell, 2001) and probability theory concepts such as uncertainty, random variables and probability distributions (Gotelli and Ellison, 2013, Chapters 1–2).

Basic model formulation

Hidden Markov models (HMMs) are a class of statistical models for sequential data, in most instances related to systems evolving over time. The system of interest is modelled using a state process (or system process; Table 1), which evolves dynamically such that future states depend on the current state. Many ecological phenomena can naturally be described by such a process (Fig. 1). In an HMM, the state process is not directly observed – it is a ‘hidden’ (or ‘latent’) variable. Instead, observations are made of a state-dependent process (or observation process) that is driven by the underlying state process. As a result, the observations can be regarded as noisy measurements of the system states of interest, but they are typically insufficient to precisely determine the state. Mathematically, an HMM is composed of two sequences:

An observed state-dependent process $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0023$ ; and
An unobserved (hidden) state process $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0024$ .

In most applications, the indices refer to observations made over time at a regular sampling interval (e.g. daily or annual rainfall measurements), but they can also refer to position (e.g. in a sequence of DNA; Henderson et al., 1997; Eddy, 2004) or order (e.g. in a sequence of marine mammal dives; DeRuiter et al., 2017). HMMs can also be formulated in continuous time (Jackson et al., 2003; Amoros et al., 2019), but these models have tended to be less frequently applied in ecology (but see Langrock et al., 2013; Choquet et al., 2017; Olajos et al., 2018). Among the many HMM formulations of relevance to ecology that we highlight in ECOLOGICAL APPLICATIONS OF HIDDEN MARKOV MODELS, some example observation sequences $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0025$ and underlying states $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0026$ include:

$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0027$ Observation of feeding/not feeding, with underlying state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0028$ Hungry or sated;
$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0029$ Count of individuals, with underlying state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0030$ True population abundance; and
$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0031$ Daily rainfall measurement, with underlying state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0032$ Wet or dry season.

Unlike the larger class of state-space models (see Box 1), the state process within an HMM can take on only finitely many possible values: $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0033$ for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0034$ . The basic HMM formulation further involves two key dependence assumptions: (1) the probability of a particular state being active at any time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0035$ is completely determined by the state active at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0036$ (the so-called Markov property); and (2) the probability distribution of an observation at any time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0037$ is completely determined by the state active at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0038$ (Fig. 2). The latter assumption is a conditional independence property, as this implies that $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0039$ is conditionally independent of past and future observations, given $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0040$ . Whether these simplifying assumptions can faithfully characterise the underlying dynamics for the system of interest must be carefully considered (see Challenges and pitfalls).

Box 1. Where do HMMs reside in the taxonomic zoo of latent variable models?

Latent state (or latent variable) models come in many different forms, with a particular variant often evolving its own nomenclature, notation and jargon that can be confusing for non-specialists. Here we use broad and non-technical strokes to differentiate the HMM from its close relatives in the taxonomy of latent state models, with the aim to more clearly position HMMs relative to alternative modelling frameworks. Above all, these models are united by assuming latent states – a fundamental property of the system being modelled that is either partially, or completely, unobservable. They also tend to make a clear distinction between an observation process model – describing noise in the data – and the hidden state process model – describing the underlying patterns and dynamics of interest.

The umbrella terms mixed effects, multilevel or hierarchical models (e.g. Skrondal and Rabe-Hesketh, 2004; Gelman and Hill, 2006; Royle and Dorazio, 2008; Lee and Song, 2012) typically include the most widely known types of latent variable models (e.g. Clogg, 1995). These often treat latent variables as random effects assumed to arise from a distribution as structural elements of a hierarchical statistical model. There is therefore not only random variation in the observations, but also in the parameters of the model itself. While there are special cases and generalisations that are not so easily classified, a simplified taxonomy for a subset of hierarchical latent variable models can be based on the structural dependence in the hidden state process and whether the state space of this hidden process is discrete (i.e. taking on finitely many values) or continuous:

	Continuous	Discrete
	State space
Temporal dependence	State-space model	Hidden Markov model
Temporal independence	Continuous mixture model	Finite mixture model

Latent variable models with a continuous state space and no temporal dependence in the hidden state process fall under the broad class of continuous mixture models (e.g. Lindsay, 1995), with ecological applications including the modelling of closed population abundance (Royle, 2004), disease prevalence (Calabrese et al., 2011) and species distribution (Ovaskainen et al., 2017). State-space models (SSMs) are a special class of latent variable model where the observation process is conditionally specified by a (typically continuous) hidden state process with temporal dependence (e.g. Durbin and Koopman, 2012; Auger-Méthé et al., 2020), with applications including population dynamics (Schnute, 1994; Wang, 2007; Tavecchia et al., 2009; Newman et al., 2014), disease dynamics (Rohani and King, 2010; Cooch et al., 2012) and animal movement (Patterson et al., 2008; Hooten et al., 2017; Patterson et al., 2017). An HMM is a special class of SSM where the state space is finite (see ECOLOGICAL APPLICATIONS OF HIDDEN MARKOV MODELS for many ecological examples). Finite mixture models (e.g. Frühwirth-Schnatter, 2006) assume the state space is finite with no temporal dependence in the hidden state process (e.g. the latent states are non-Markov or do not change over time), with examples including static species occurrence (MacKenzie et al., 2002), closed population capture–recapture (Pledger, 2000) and species distribution (Pledger and Arnold, 2014) models. HMMs and SSMs can therefore be regarded as specific variations of a hierarchical model with serial dependence, where the random effects vary over time. Furthermore, an HMM can be viewed as a discrete version of a SSM or a time-dependent version of a finite mixture model.

It is important to note that things are not quite as simple as depicted above. For example, while an SSM with discrete latent variables can encompass features of an HMM (Jonsen et al., 2005), an SSM with a finite state space is not necessarily an HMM. An HMM might include continuous random effects on its parameters or a state-dependent observation distribution specified as a finite mixture (Altman, 2007). If the number of states becomes very large in an HMM, then it can become a discrete approximation of an SSM with a continuous state space (Besbeas and Morgan, 2019). In the Extensions section and the Challenges and Pitfalls section, we consider circumstances where application of a standard HMM is not supported and other approaches or extensions might be required. [Correction added on 10 November 2020, after first online publication: Box 1 has been relocated to page 5.]

**Figure 2**
Open in figure viewer PowerPoint

Dependence structure of a basic hidden Markov model, with an observed sequence $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0041$ arising from an unobserved sequence of underlying states $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0042$ .

As a consequence of these assumptions, HMMs generally facilitate model building and computation that might otherwise be intractable. A basic $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0043$ -state HMM that formally distinguishes the state and observation processes can be fully specified by the following three components: (1) the initial distribution, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0044$ , specifying the probabilities of being in each state at the start of the sequence; (2) the state transition probabilities, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0045$ , specifying the probability of switching from state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0046$ at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0047$ to state $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0048$ at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0049$ and usually represented as an $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0050$ state transition probability matrix:

where $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0051$ ; and 3) the state-dependent distributions, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0052$ , specifying the probability distribution of an observation $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0053$ conditional on the state at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0054$ and usually represented as an $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0055$ diagonal matrix:

or, equivalently, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0056$ for computational purposes (see Inference). These distributions can pertain to discrete or continuous observations and are generally chosen from an appropriate distributional family. For example, behavioural observation $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0057$ could be modelled using a categorical distribution (MacDonald and Raubenheimer, 1995), count $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0058$ using a non-negative discrete distribution (e.g. Poisson; Besbeas and Morgan, 2019, and measurement $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0059$ using a non-negative continuous distribution (e.g. zero-inflated exponential; Woolhiser and Roldan, 1982). After specifying $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0060$ , $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0061$ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0062$ in terms of the particular system of interest, one can proceed to drawing inferences about unobservable state dynamics from the observation process.

We note that Markov models (Grewal et al., 2019) are commonly used for inferring community- or ecosystem-level dynamics (Waggoner and Stephens, 1970; Wootton, 2001; Tucker and Anand, 2005; Breininger et al., 2010) and providing measures of stability, resilience or persistence (Li, 1995; Pawlowski and McCord, 2009; Zweig et al., 2020), especially in systems composed of sessile organisms such as plant (Horn, 1975; van Hulst, 1979; Usher, 1981; Talluto et al., 2017, but see Chen et al., 2013) or benthic communities (Tanner et al., 1994; Hill et al., 2004; Lowe et al., 2011). A Markov model can simply be viewed as an HMM where it is assumed that the state process is perfectly observed, that is, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0063$ with $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0064$ a matrix with entry one in row $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0065$ , column $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0066$ , and otherwise zeros. For example, patch dynamics HMMs (MacKenzie et al., 2003) are simply generalisations of well-known Markov models for patch dynamics (Hanski, 1994; Moilanen, 1999) for cases when presence–absence data are subject to imperfect detection. Likewise, any Markov model can naturally be embedded as the state process within an HMM for less observable phenomena.

Inference

In addition to the ease with which a wide variety of ecological state and observation process models can be specified (see ECOLOGICAL APPLICATIONS OF HIDDEN MARKOV MODELS), a key strength of the HMM framework is that efficient recursive algorithms are available for conducting statistical inference. Here we will briefly outline some of the most common inferential techniques for HMMs, but motivated readers can find additional technical material and a worked example on model fitting, assessment and interpretation in the Supplementary Tutorial. Using the forward algorithm (also known as filtering), the likelihood $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0067$ as a function of the unknown parameters $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0068$ given the observation sequence $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0069$ can be calculated at a computational cost that is (only) linear in $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0070$ . The parameter vector $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0071$ , which is to be estimated, contains any unknown parameters embedded in the three model-defining components $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0072$ , $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0073$ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0074$ . Made possible by the relatively simple dependence structure of an HMM, the forward algorithm traverses along the time series, updating the likelihood step-by-step while retaining information on the probabilities of being in the different states (Zucchini et al., 2016, pp. 37–39). Application of the forward algorithm is equivalent to evaluating the likelihood using a simple matrix product expression,

$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0075$ (1)

where 1 is a column vector of ones (see Supplementary Tutorial for technical derivation).

In practice, the main challenge when working with HMMs tends to be the estimation of the model parameters. The two main strategies for fitting an HMM are numerical maximisation of the likelihood (Myung, 2003; Zucchini et al., 2016) or Bayesian inference (Ellison, 2004; Gelman et al., 2004) using Markov chain Monte Carlo (MCMC) sampling (Brooks et al., 2011). The former seeks to identify the parameter values that maximise the likelihood function (i.e. the maximum likelihood estimates $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0076$ ), whereas the latter yields a sample from the posterior distribution of the parameters (Ellison, 2004). Specifically for the maximum likelihood (ML) approach, the forward algorithm makes it possible to use standard optimisation methods (Fletcher, 2013) to directly numerically maximise the likelihood (eqn 1). An alternative ML approach is to employ an expectation–maximisation (EM) algorithm that uses similar recursive techniques to iterate between state decoding and updating the parameter vector until convergence (Rabiner, 1989). For MCMC, many different strategies can be used, but these tend to differ in appropriateness and efficiency in a manner that can strongly depend on the specific model and data at hand (Gilks et al., 1996; Gelman et al., 2004; Brooks et al., 2011; Robert and Casella, 2004).

The forward algorithm and similar recursive techniques can further be used for forecasting and state decoding, as well as to conduct formal model checking using pseudo-residuals (Zucchini et al., 2016, Chapters 5 & 6). State decoding is usually accomplished using the Viterbi algorithm or the forward–backward algorithm (also known as smoothing), which respectively identify the most likely sequence of states or the probability of each state at any time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0077$ , conditional on the observations. Fortunately, practitioners can often use existing software for most aspects of HMM-based data analyses and need not dwell on many of the more technical details of implementation (see IMPLEMENTATION, CHALLENGES AND PITFALLS and Supplementary Tutorial).

To illustrate some of the basic mechanics, we use a simple example based on observations of the feeding behaviour of a blue whale (Balaenoptera musculus; cf. DeRuiter et al., 2017). Suppose we assume that observations of the number of feeding lunges performed in each of $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0078$ consecutive dives ( $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0079$ for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0080$ ) arise from $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0081$ states of feeding activity. Building on Fig. 2, we could for example have:

Fig. 3 displays the results for this simple two-state HMM assuming Poisson state-dependent (observation) distributions, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0082$ for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0083$ , when fitted to the full observation sequence via direct numerical maximisation of eqn 1. The rates of the state-dependent distributions were estimated as $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0084$ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0085$ , suggesting states 1 and 2 correspond to ‘low’ and ‘high’ feeding activity respectively. The estimated state transition probability matrix,

**Figure 3**
Open in figure viewer PowerPoint

Estimated state-dependent distributions (top row) and Viterbi-decoded states from a two-state HMM fitted to counts of feeding lunges performed by a blue whale during a sequence of $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0086$ consecutive dives. Here the most likely state sequence identifies periods of ‘low’ (state 1; blue) and ‘high’ (state 2; black) feeding activity.

suggests interspersed bouts of ‘low’ and ‘high’ feeding activity, but with bouts of ‘high’ activity tending to span fewer dives. The estimated initial distribution $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0087$ suggests this individual was more likely to have been in the ‘low’ activity state at the start of the sequence. Most ecological applications of HMMs involve more complex inferences related to specific hypotheses about system state dynamics, and a great strength of the HMM framework is the relative ease with which the basic model formulation can be modified to describe a wide variety of processes (Zucchini et al., 2016, Chapters 9–13). Next we highlight some extensions that we consider to be highly relevant in ecological research.

Extensions

The dependence assumptions made within the basic HMM are mathematically convenient, but not always appropriate (see Box 2). The Markov property implies that the amount of time spent in a state before switching to another state – the so-called sojourn time – follows a geometric distribution. The most likely length of any given sojourn time hence is one unit, which may not be realistic for certain state processes. The obvious extension is to allow for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0088$ th-order dependencies in the state process (Fig. 4a), such that the state at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0089$ depends on the states at times $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0090$ . An alternative assumes the state process is ‘semi-Markov’ with the sojourn time flexibly modelled using any distribution on the positive integers (Choquet et al., 2011; van de Kerk et al., 2015; King and Langrock, 2016).

Box 2. To HMM, or not to HMM, that is the question

The structure of a statistical model should be congruent with the data-generating process in question. HMMs are neither a panacea nor a black box – the appropriateness and feasibility of a particular model will be case-dependent and requires careful consideration. In determining if HMMs are appropriate for describing a particular system, one must consider two questions:

Do the hidden state dynamics display time dependence which can be represented using Markov chains? If the current system state is not related to the previous state(s), then a latent variable model without time dependence should be considered (see Box 1). Diagnostics examining temporal patterns in residuals (Li, 2003) can help to empirically determine if the assumptions of conditional independence and Markovity are sufficient (see Supplementary Tutorial). When the first-order Markov assumption may not be appropriate for the state process, one can further ask the question: can system memory be adequately approximated while preserving Markovity? Faithful representation of system memory may require the inclusion of informative covariates or more complex time dependence structures, and it is possible to expand HMMs to higher order Markovian or semi-Markovian dependence (Zucchini et al., 2016, Chapter 12). While modelling this higher order temporal dependence is sometimes preferable (Hestbeck et al., 1991), it is more complex and thus less widely used. General time-series modelling often captures complex dependence structures using autoregressive processes (Durbin and Koopman, 2012, Chapter 3), and more complicated variations of HMMs can capture some of these features (Lawler et al., 2019). However, other latent variable approaches will often be better suited for more complex temporal dependence structures. There is no foolproof or automatic way to make this determination, and we must typically rely on residual diagnostics (Li, 2003; Zucchini et al., 2016, Chapter 6) and expert knowledge of the system dynamics.
Can the system be well described by a feasibly finite set of latent states? Our review highlights a wide range of ecological scenarios where the possible states of the system of interest form (or can be approximated by) a finite set. The number of parameters and the computational burden of an HMM can become large with increases in state dimension, and this can be of particular concern when the finite set of states is a coarser approximation of a finer discrete space (e.g. population abundance) or a continuous space (e.g. spatial location). Such approximations have strengths and weaknesses. When used as discrete approximations to state-space models with continuous support (see Box 1), HMMs can be useful when arbitrary constraints on the state space are required (e.g. restricting aquatic organisms to location states off land) or when combining both discrete and continuous state processes. However, an HMM for a large number of states with a fully parameterised transition probability matrix – where transitions between any of the states are possible – will be computationally expensive, perhaps prohibitively so. Systems with large state spaces can often be approximated by an HMM when transitions between states are local – where transitions can only occur between neighbouring states – and the transition probabilities therefore include a relatively small number of parameters that describe this local behaviour. For example, Thygesen et al. (2009), Pedersen et al. (2011b), and Glennie et al. (2019) use these properties of sparsity to make an HMM approach computationally efficient for very large state spaces. In short, large numbers of states do not necessarily prohibit application of an HMM; this is dependent on the computer resources available and the properties of the state process. Alternatively, it is possible to reduce the size of an infeasible state space by making a coarser approximation (e.g. binning abundance states together into larger states; Zucchini et al., 2016, pp. 162–163; Besbeas and Morgan, 2019). Appropriateness will depend on the sensitivity of the inference to the precise value of the state process and is best investigated by varying the coarseness of the approximation. If the set of states is too coarse-grained, approximation might lead to spurious inference about the latent states. For example, coarse-graining could result in masking or misclassification of meaningfully distinct states. The decision of the appropriate number of states can be challenging; there is again no foolproof or automatic way to determine this, and we must usually rely on expert knowledge of the specific system of interest. When the finite state space of an HMM is infeasible or inappropriate, it will often be better to consider other approaches (e.g. Patterson et al., 2008; Cooch et al., 2012; Patterson et al., 2017; Auger-Méthé et al., 2020).

**Figure 4**
Open in figure viewer PowerPoint

Graphical models associated with different extensions of the basic HMM formulation: (a) state sequence with memory order 2; (b) influence of covariate vectors $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0091$ on state dynamics; (c) observations depending on both states and previous observations; (d) bivariate observation sequence, conditionally independent given the states.

HMMs are often used to infer the drivers of ecological state processes by relating the state transition probabilities to explanatory covariates (Fig. 4b). Indeed, any of the parameters of a basic HMM can be modelled as a function of covariates (e.g. sex, age, habitat type, chlorophyll-a) using an appropriate link function (McCullagh and Nelder, 1989). Link functions $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0092$ can relate the natural scale parameters $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0093$ to a $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0094$ design matrix of covariates $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0095$ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0096$ -vector of working scale parameters $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0097$ such that $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0098$ (see White and Burnham, 1999; MacKenzie et al., 2002; Patterson et al., 2009, for common examples of link functions in HMMs). When simultaneously analysing multiple observation sequences, heterogeneity across the different sequences can be modelled through explanatory covariates or mixed HMMs that include random effects (Altman, 2007; Schliehe-Diecks et al., 2012; Towner et al., 2016).

At the level of the observation process, it is relatively straightforward to relax the conditional independence assumption. For example, it can be assumed that the observation at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0099$ depends not only on the state at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0100$ but also the observation at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0101$ (Fig. 4c; Langrock et al., 2014b; Lawler et al., 2019). It is also straightforward to model multivariate observation sequences using multivariate state-dependent distributions (Choquet et al., 2013; Phillips et al., 2015; van Beest et al., 2019), where it is often assumed that the different variables observed are conditionally independent and a univariate distribution is specified for each of the variables (Fig. 4d). Owing to the Markov property, this does not imply that the individual components are serially independent or mutually independent (Zucchini et al., 2016, Chapter 9). However, this assumption is not required and will not always be appropriate, in which case a multivariate distribution should be considered.

ECOLOGICAL APPLICATIONS OF HIDDEN MARKOV MODELS

In their classic textbook, Begon et al. (2006) present the evolutionary foundation of ecology and its superstructure built from individual organisms to populations, communities and ecosystems. At each level of this hierarchy, we will illustrate how HMMs can be used for identifying patterns and dynamics of many different types of ecological state variables that would otherwise be difficult or impossible to observe directly. For each application, we emphasise the two principal components of any HMM – the observation process and the state process – as a conceptual template for ecologists to formulate HMMs in terms of their particular systems of interest.

The observation process in ecological studies is often driven by many factors, including the system state variable(s) of interest, the biotic and/or abiotic components of the system, and study design (Fig. 1). Among the most common types of observation processes in ecology are capture–recapture (Williams et al., 2002), DNA sampling (Bohmann et al., 2014; Rowe et al., 2017; Bálint et al., 2018), animal-borne telemetry (Cooke et al., 2004; White and Garrott, 1990; Hooten et al., 2017), count surveys (Buckland et al., 2004; Charmantier et al., 2006; Nichols et al., 2009), presence–absence surveys (Koleff et al., 2003; MacKenzie et al., 2018) and abiotic measurement (e.g. temperature, precipitation, sediment type). These observation processes are not mutually exclusive, can contribute information at different levels of the hierarchy and can be pooled for inference (Schaub and Abadi, 2011; Gimenez et al., 2012; Evans et al., 2016).

Using Fig. 1 as our expositional roadmap, we begin with applications for individual-level state dynamics. We then work our way up to the population, community and ecosystem levels. Within each level of the ecological hierarchy, we find it convenient to distinguish ‘existential’, ‘developmental’ and ‘spatial’ states. Although there is inevitably some degree of overlap, particularly at the higher levels of the hierarchy that are inherently spatial, we use this distinction in an attempt to separate states of being that in isolation can be viewed as essentially non-spatial from state dynamics that are more strictly spatial in nature. We further delineate the non-spatial states as ‘existential’ based on a fundamental measure of existence at each level of the hierarchy and ‘developmental’ based on state characteristics that can drive the dynamics of this fundamental measure of existence. We employ these categories simply for ease of exposition and view them as neither exhaustive nor mutually exclusive.

Although typically not referred to as HMMs in the ecological literature, several subfields of ecology have been using HMMs for individual- to community-level inference for decades. HMMs have also become standard in biological sequence analysis and molecular ecology (Durbin et al., 1998; Barbu and Limnios, 2009; Yoon, 2009), and there is much crossover potential for state-of-the-art bioinformatic methods to other applications in ecology (Jones et al., 2006; Tucker and Duplisea, 2012). HMMs are also used for very specialised tasks of relevance to ecology, such as counting annual layers in ice cores (Winstrup et al., 2012) or characterising plant architectures (Durand et al., 2005). There are therefore many example HMM applications within some areas of ecology, of which only a handful can be covered in the material that follows. However, in other areas the promise of HMMs has only just begun to be recognised.

Individual level

Existential state

At the level of an individual organism, a fundamental measure of existence is to be alive or not (i.e. dead or unborn). We will therefore begin by demonstrating that one of the oldest and most popular inferential tools in wildlife ecology, the Cormack-Jolly-Seber (CJS) model of survival (Williams et al., 2002), is a special case of an HMM. The CJS model estimates survival probabilities ( $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0102$ ) from capture–recapture data. Capture–recapture data consist of $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0103$ sequences of encounter histories for marked individuals collected through time, where for each individual the observed data are represented as a binary series of ones and zeros. For the CJS model, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0104$ indicates a marked individual was alive and detected at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0105$ , while $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0106$ indicates non-detection. Marked individuals can either be alive or dead at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0107$ , but the ‘alive’ state is only partially observable and the ‘dead’ state is completely unobservable. Under this observation process, if $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0108$ it is known that the individual survived from time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0109$ to time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0110$ (with probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0111$ ) and was detected with probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0112$ . However, when $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0113$ there are two possibilities: (1) the individual survived to time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0114$ (with probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0115$ ) but was not detected (with probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0116$ ); or (2) the individual did not survive from time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0117$ to time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0118$ (with probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0119$ ).

Although not originally described as such, the CJS model is simply a two-state HMM that conditions on first capture. Framing the observed and hidden processes within the dependence structure of a basic HMM (Fig. 2), we could for example have:

The state-dependent observation distribution for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0120$ is a simple Bernoulli (i.e. a coin flip) with success probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0121$ if alive and success probability $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0122$ if dead:

We thus have the initial distribution

state transition probability matrix

and state-dependent observation distribution matrix

The CJS model is thus a very simple HMM with an absorbing ‘dead’ state and only two unknown parameters ( $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0124$ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0125$ ). As an HMM, it can not only be used to estimate survival, but also the point in time when any given individual was most likely to have died (based on local or global state decoding; see Table 1).

The classic Jolly-Seber capture–recapture model and its various extensions (Pradel, 1996; Williams et al., 2002) go a step further by incorporating both birth and death processes. It simply involves extending the two-state model to an additional ‘unborn’ (UB) state. We could for example now have:

To formulate a three-state HMM with an additional ‘unborn’ state, we must extend our components for the hidden and observed processes accordingly:

and

where

$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0126$

$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0127$ is the probability that an individual was already in the population at the beginning of the study, $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0128$ is the probability that any given individual was born at time $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0129$ , and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0130$ is the probability that an individual entered the population on occasion $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0131$ given it had not already entered up to that time. Importantly, note that the two-state and three-state HMMs rely on the exact same binary data $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0132$ , but we are able to make additional inferences in the three-state model by re-formulating the observed and hidden processes in terms of both birth and death. While we have employed these well-known individual-level capture–recapture models to initially demonstrate the key idea of linking observed state-dependent processes to the underlying state dynamics via HMMs, these types of inferences are not limited to traditional capture–recapture observation processes. For example, telemetry and count data can also be used in HMMs describing individual-level birth and death processes (Schmidt et al., 2015; Cowen et al., 2017).

Developmental state

Individual-level data often contain additional information about developmental states such as those related to size (Nichols et al., 1992), reproduction (Nichols et al., 1994), social groups (Marescot et al., 2018) or disease (Benhaiem et al., 2018). However, assigning individuals to states can be difficult when traits such as breeding (Kendall et al., 2012), infection (Chambert et al., 2012), sex (Pradel et al., 2008) or even species (Runge et al., 2007) are ascertained through observations in the field. This difficulty has motivated models for individual histories that can not only account for multiple developmental states (Lebreton et al., 2009), but also uncertainty arising from partially or completely unobservable states (Pradel, 2005). Such multi-state models can be used for testing a broad range of formal biological hypotheses, including host–pathogen dynamics in disease ecology (Lachish et al., 2011), reproductive costs in evolutionary ecology (Garnier et al., 2016) and social dominance in behavioural ecology (Dupont et al., 2015). For example, it is straightforward to extend the capture–recapture HMM to multiple ‘alive’ states parameterised in terms of state-specific survival probabilities $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0133$ and transition probabilities between these ‘alive’ states $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0134$ . Consider a $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0135$ -state HMM for capture–recapture data that incorporates reproductive status, where $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0136$ indicates ‘alive and breeding’ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0137$ indicates ‘alive and non-breeding’:

and

where $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0138$ is an indicator function taking the value 1 when $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0139$ and 0 otherwise. To assess the costs of reproduction, a biologist will be interested in the probability of breeding in year $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0140$ , given breeding $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0141$ or not $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0142$ in year $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0143$ , as well as assessing any differences in survival probability between breeders $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0144$ and non-breeders $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0145$ . By simply re-expressing the $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0146$ , $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0147$ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0148$ components in terms of the specific state and observation processes of interest, such models can be used to infer the dynamics of conjunctivitis in house finches (Conn and Cooch, 2009), senescence in deer (Choquet et al., 2011), reproduction in Florida manatees (Kendall et al., 2012), interspecific competition between ungulates (Gamelon et al., 2020) and life-history trade-offs in elephant seals (Lloyd et al., 2020). Similar HMMs can also be used to investigate relationships between life-history traits and demographic parameters that are important in determining the fitness of phenotypes or genotypes (Stoelting et al., 2015). Several measures of individual fitness have been proposed, but one commonly used for field studies is lifetime reproductive success (Rouan et al., 2009; Gimenez and Gaillard, 2018). These approaches can be readily adapted to quantify other measures of fitness (McGraw and Caswell, 1996; Link et al., 2002; Coulson et al., 2006; Marescot et al., 2018).

Inferences about developmental states are of course not limited to traditional capture–recapture data, and significant advancements in animal-borne biotelemetry technology have brought many new and exciting opportunities (Cooke et al., 2004; Hooten et al., 2017; Patterson et al., 2017). For example, telemetry location data can be used to identify migratory phases (Weng et al., 2007), predation events (Franke et al., 2006) or the torpor-arousal cycle of hibernation (Hope and Jones, 2012). The multi-state (i.e. hidden Markov) movement model is often used to infer these types of movement behaviour modes from trajectories in two-dimensional space, where the observations are typically expressed in terms of the bivariate sequence of Euclidean distances (or ‘step lengths’) and turning angles between consecutive locations (Franke et al., 2004; Morales et al., 2004). For a model involving $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0149$ states that assumes conditional independence between step length ( $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0150$ ; in meters) and turning angle ( $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0151$ ; in radians) as in Fig. 4d, we could for example have:

These states could correspond to ‘resident’ (state 1) and ‘transient’ (state 2) behavioural phases, such that within state 2 the movements tend to be longer and directionally persistent (i.e. with turning angles concentrated near zero). When assuming conditional independence of the observations, the bivariate state-dependent distribution for $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0152$ is simply the product of two univariate state-dependent distributions,

$urn:x-wiley:1461023X:media:ele13610:ele13610-math-0153$

These univariate distributions are typically assumed to be the gamma or Weibull distribution for step length and the von Mises or wrapped Cauchy distribution for turning angle. Unlike our previous examples so far, the number of underlying states in these types of HMMs is generally not clear a priori and needs to be selected based on both biological and statistical criteria (Pohle et al., 2017). Another difference is that there is often no predetermined structure in the state transition probability matrix,

and all entries are freely estimated (but still subject to $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0154$ ). As a consequence, the characteristics of the model states as represented by the state-dependent distributions are fully data driven, and hence may not correspond exactly to biologically meaningful entities (see IMPLEMENTATION, CHALLENGES AND PITFALLS).

Similar HMMs for animal movement have been used, inter alia, to identify wolf kill-sites (Franke et al., 2006), the relationship between southern bluefin tuna behaviour and ocean temperature (Patterson et al., 2009), activity budgets for harbour seals (McClintock et al., 2013), hunting strategies of white sharks (Towner et al., 2016), the behavioural response of northern gannets to frontal activity (Grecian et al., 2018) and how common noctules adjust their space use to the lunar cycle (Roeleke et al., 2018). Driven by the influx of new biotelemetry sensor technology, HMMs have also been used to analyse the sequences of dives of marine animals (Hart et al., 2010; Quick et al., 2017; DeRuiter et al., 2017; van Beest et al., 2019). The remote collection of activity data at potentially very high temporal resolutions using accelerometers is another emerging application area (Diosdado et al., 2015; Leos-Barajas et al., 2017b; Papastamatiou et al., 2018a,b; Adam et al., 2019b). These HMM formulations are conceptually very similar to the movement model outlined above, with the state process corresponding to behavioural modes (or at least proxies thereof), and the activity data represented by the state-dependent process. Fig. 5 illustrates a possible workflow for inferring four behavioural modes from high-resolution accelerometer data collected from a striated caracara (Phalcoboenus australis) over a period of 1 hour. Here the vector of dynamic body acceleration was used as a univariate summary of the three-dimensional raw acceleration data, and a gamma distribution was used for the state-dependent observation process. In this example, the HMM can be regarded as a clustering scheme which maps observed input data to unobserved underlying classes with biological interpretations roughly corresponding to ‘resting’, ‘minimal activity’ (e.g. preening), ‘moderate activity’ (e.g. walking, digging) and ‘flying’. Complete details of this analysis, including each step of the workflow and example R (R Core Team, 2019) code, can be found in the Supplementary Tutorial.

Spatial state

HMMs can also be used for inferences about the unobserved spatial location of an individual. For example, capture–recapture data can consist of sequences of observations arising from a set of discrete spatial states, where these often refer to ecologically important geographic areas, such as wintering and breeding sites for migratory birds (Brownie et al., 1993) or spawning sites for fish (Schwarz et al., 1993). For a $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0155$ -state HMM with two sites (A and B), where $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0156$ indicates ‘alive at site A’ and $urn:x-wiley:1461023X:media:ele13610:ele13610-math-0157$ indicates ‘alive at site B’, we could for example have:

Clearly, this discrete-space HMM is structurally identical to the multi-state capture–recapture HMMs already described in the previous section; the only difference is the state transition probability parameters are now interpreted as site-specific survival and movement probabilities between the sites (e.g. fidelity or dispersal; Lagrange et al., 2014; Cayuela et al., 2020). Based on global state decoding, these HMMs can therefore also be used to infer the most likely spatial state for periods when an individual was alive but its location was not observed.

Another important application of HMMs is for geolocation based on indirect measurements that vary with space, such as light, pressure, temperature and tidal patterns (Thygesen et al., 2009; Rakhimberdiev et al., 2015). Although too technical to be described in detail here, geolocation HMMs can be particularly useful for inferring individual location from archival tag data (Basson et al., 2016). These HMMs have even been extended to include state-switching behaviours such as those described in the previous section (Pedersen et al., 2008, 2011b). Animal movement behaviour HMMs have also been extended to accommodate partially observed location data common to marine mammal satellite telemetry studies (Jonsen et al., 2005; McClintock et al., 2012).

Population level

We consider two ways that inference on the population level can arise: (1) an individual-level model, based on data from multiple individuals (e.g. capture–recapture), quantitatively connected to a population-level concept through an explicit model; or (2) a population-level model, based on population-level data (e.g. counts or presence–absence), with no explicit model for processes at the individual level.