Individual differences in male reproductive success drive genetic drift and natural selection, altering genetic variation and phenotypic trait distributions in future generations. Therefore, identifying the determinants of reproductive success is important for understanding the ecology and evolution of plants. Here, based on the spatially explicit mating model (the neighborhood model), we develop a hierarchical probability model that links co-dominant genotypes of offspring and candidate parents with phenotypic determinants of male reproductive success. The model accounts for pollen dispersal, genotyping errors as well as individual variation in selfing, pollen immigration, and differentiation of immigrant pollen pools. Unlike the classic neighborhood model approach, our approach is specially designed to account for excessive variation (overdispersion) in male fecundity. We implemented a Bayesian estimation method (the Windows computer program available at: https://www.ukw.edu.pl/pracownicy/plik/igor_chybicki/1806/) that, among others, allows for selecting phenotypic variables important for male fecundity and assessing the fraction of variance in fecundity (R²) explained by selected variables. Simulations showed that our method outperforms both the classic neighborhood model and the two-step approach, where fecundities and the effects of phenotypic variables are estimated separately. The analysis of two data examples showed that in wind-pollinated trees, male fecundity depends on both the amount of produced pollen and the ability to pollen spread. However, despite that the tree size was positively correlated with male fecundity, it explained only a fraction of the total variance in fecundity, indicating the presence of additional factors. Finally, case studies highlighted the importance of accounting for pollen dispersal in the estimation of fecundity determinants.

1 INTRODUCTION

Individuals in a sexually reproducing population can differ substantially in their reproductive success. The uneven contribution of individuals to the next generation is involved in genetic drift and natural selection mechanisms, altering genetic variation and phenotypic trait distributions in future generations. Therefore, understanding the determinants of reproductive success takes a central position in ecology, evolutionary and conservation biology (Knight, 2003; Lyons et al., 1989; Winter et al., 2008).

Much of the emphasis has been placed on identifying phenotypic variation that determines individual male reproductive success in plant populations (Meagher, 1991; Morgan & Conner, 2001; Smouse & Meagher, 1994; Smouse et al., 1999; Snow & Lewis, 1993). To overcome difficulty related to the hidden nature of male reproductive success, marker-assisted paternity analysis has been used as a standard tool for estimating numbers of effective pollen gametes contributed by pollen donors (Ashley, 2010; Snow & Lewis, 1993). In a simplistic approach, these numbers can be regressed on phenotypic variables to identify determinants of male reproductive success (Born et al., 2008; Hopley et al., 2015; Setsuko & Tomaru, 2011; Tambarussi et al., 2015). However, a fixed spatial distribution of adult plants together with spatially restricted pollen dispersal impose natural constraints on mating opportunities, generating a nonrandom structure of pollen gametes captured by individual mother plants even under uniform fecundity (Meagher, 1991; Smouse et al., 2001). Therefore, it has become evident that a number of sired seeds is actually a function of two major factors acting simultaneously, that is pollen dispersal capability and male reproductive potential (or male fecundity; Meagher, 1991; see also Figure 2B in Klein et al. (2008) for schematic illustration). These factors cannot be separated easily, and, instead, they need to be simultaneously accounted for in the analysis of determinants of male reproductive success.

With the advent of the neighborhood model, the effects of pollen dispersal and fecundity have started to be explicitly taken into account (Adams & Birkes, 1989). The model came with the brilliant idea of embedding a regression analysis of variable effects into the fractional paternity analysis, where all possible pollen sources are modelled using a compositional approach. Consequently, any uncertainty about the pollen source contributes automatically to the precision of estimates of regression parameters. The neighborhood model approach allowed combining genetic and nongenetic information to increase the power of the paternity assignment (Hadfield et al., 2006). Therefore, besides the practical benefit, the neighborhood model offered a substantial statistical advantage over the approach based on a simple paternity assignment.

In the original neighborhood model, hereafter called the classic neighborhood model, the effects of phenotypic characters on male fecundity were assessed using a fixed-effects model. In this approach, slopes of a regression function (selection gradients; Lande & Arnold, 1983) were estimated under the assumption that variation in reproductive success follows the variation expected for a multinomial distribution with probabilities being a (soft-max) function of a linear combination of measured phenotypic variables. Hence, any excessive variance, often termed overdispersion, in male reproductive success can lead to statistical issues, namely to the elevated risk of false positives (or an extreme anticonservatism; Hadfield et al., 2010), as first reported by Klein and colleagues (2008, 2011). Although overdispersion can be stochastic, it often results from variables that contribute to fecundity variation but are not included in the regression function.

The classic neighborhood model has been adjusted specifically to avoid various problems occurring in the case of real data (Burczyk & Chybicki, 2004; Chybicki, 2018; Chybicki & Burczyk, 2010; Gérard et al., 2006; Oddou-Muratorio et al., 2005). However, none of these modifications treated the built-in regression analysis significantly. In this respect, an important step forward was the approach of Klein et al. (2008), who proposed to treat individual fecundities as random variables characterized by the arbitrarily assumed probability distribution, replacing the fixed-effects model of fecundity with a random-effects model. Using the hierarchical Bayesian approach, individual fecundities were made estimable parameters along with dispersal kernel and the other components of the model. The simulation showed that the model effectively captures the actual variance in male fecundity (Klein et al., 2011), resolving the problem of occasional overdispersion.

Despite a clear statistical advantage, the random-effects model may be considered as a half-step forward, especially if a study aims to identify phenotypic effects rather than variation in fecundity alone. It is because the estimation of selection gradients requires estimates of individual fecundities in the first step to proceed with a regression analysis in the second step. Although such an approach can be generally valid, it certainly under-exploits available data compared to the classic neighborhood model because potential determinants of fecundity are completely ignored in the estimation of fecundity. As a result, the effectiveness of the two-step approach based on the random-effects model may be questioned.

This study aimed at extending the classic neighborhood model in order to quantify effects of phenotypic variables on male reproductive success in the face of overdispersion in male fecundity. Drawing advantages from both the classic neighborhood model and the random-effects model as well as from recent statistical developments (Chybicki et al., 2019), we elaborated a new approach to study ecological determinants of male reproductive success in plant populations. The new approach was subjected to a simulation study to reveal basic statistical properties related to sampling design. In addition, the new approach was compared to the classic neighborhood model and the two-step approach based on the random-effects model. Finally, two real data examples were analysed to illustrate the potential of the method in studying plant populations.

2 MATERIALS AND METHODS

2.1 The hierarchical neighborhood model

The model considers the genealogy of a sample of offspring grouped into maternal families, given that genotypes of the offspring, mothers, and candidate fathers are known. In addition, spatial locations of individuals are taken into account in order to reflect nonrandom pollen dispersal. The total probability for the genealogy of the i-th offspring in the family of the j-th adult individual (mother) ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0001$ ) is

$urn:x-wiley:1755098X:media:men13307:men13307-math-0002$ (1)

where $urn:x-wiley:1755098X:media:men13307:men13307-math-0003$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0004$ , and $urn:x-wiley:1755098X:media:men13307:men13307-math-0005$ are (Mendelian) probabilities for the offspring's genealogy given the genotype of the offspring ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0006$ ), the genotype of the j-th adult individual (mother, $urn:x-wiley:1755098X:media:men13307:men13307-math-0007$ ), the genotype of the k-th candidate father ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0008$ ) and the background gene pool for the j-th family ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0009$ ), $urn:x-wiley:1755098X:media:men13307:men13307-math-0010$ is the probability that the j-th mother receives pollen from the background population, $urn:x-wiley:1755098X:media:men13307:men13307-math-0011$ is the probability that the j-th mother produces offspring through self-fertilization, $urn:x-wiley:1755098X:media:men13307:men13307-math-0012$ is the probability that the j-th mother receives pollen from one of the candidate fathers, and $urn:x-wiley:1755098X:media:men13307:men13307-math-0013$ is the probability that the k-th father fertilizes the j-th mother given that pollen gamete comes from one of sampled candidate fathers. In this model, for the j-th family $urn:x-wiley:1755098X:media:men13307:men13307-math-0014$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0015$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0016$ is the number of adult individuals in the sample.

The probability for the genealogy of the offspring sample is

$urn:x-wiley:1755098X:media:men13307:men13307-math-0017$ (2)

where $urn:x-wiley:1755098X:media:men13307:men13307-math-0018$ is the number of offspring. Detailed assumptions for the model's components are described in the following subsections.

2.1.1 Pollen source probabilities

The classic neighborhood model assumes that $urn:x-wiley:1755098X:media:men13307:men13307-math-0019$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0020$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0021$ are the same for all sampled families. Here, as in earlier studies (Chybicki & Burczyk, 2013; Chybicki et al., 2019; Tani et al., 2015), we assumed that these probabilities might show inter-family variation. To do so, a vector $urn:x-wiley:1755098X:media:men13307:men13307-math-0022$ for the j-th family was assumed to follow a Dirichlet distribution with three hyper-parameters $urn:x-wiley:1755098X:media:men13307:men13307-math-0023$ .

2.1.2 Background population

Background pollen donors are incorporated into the model as allele frequencies in the background population ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0024$ ). In order to reflect potential differences in background pollen pools among families, for the j-th mother and the l-th locus, the vector of allele frequencies $urn:x-wiley:1755098X:media:men13307:men13307-math-0025$ was assumed to follow a Dirichlet distribution with parameters $urn:x-wiley:1755098X:media:men13307:men13307-math-0026$ , where F is the divergence rate and $urn:x-wiley:1755098X:media:men13307:men13307-math-0027$ is the frequency of the 1^st allele in the l-th locus in a global background population (Chybicki, 2013). Here, F is very similar to the global $urn:x-wiley:1755098X:media:men13307:men13307-math-0028$ parameter in the TwoGener approach (Smouse et al., 2001) that captures the differentiation of background pollen pools between families.

2.1.3 Male reproductive success

The probability $urn:x-wiley:1755098X:media:men13307:men13307-math-0029$ that the k-th candidate father contributes to the pollen pool of the j-th mother, given that pollen comes from local pollen donors, was assumed to be a function of both pollen dispersal and individual male fecundity:

$urn:x-wiley:1755098X:media:men13307:men13307-math-0030$ (3)

where $urn:x-wiley:1755098X:media:men13307:men13307-math-0031$ when the k-th adult individual can play a function of pollen donor and $urn:x-wiley:1755098X:media:men13307:men13307-math-0032$ otherwise, $urn:x-wiley:1755098X:media:men13307:men13307-math-0033$ is the probability of pollen dispersal (dispersal kernel) between the k-th father and the mother of the j-th family, and $urn:x-wiley:1755098X:media:men13307:men13307-math-0034$ is male reproductive potential (fecundity) of the k-th candidate father. Note that $urn:x-wiley:1755098X:media:men13307:men13307-math-0035$ adjusts $urn:x-wiley:1755098X:media:men13307:men13307-math-0036$ for the knowledge about gender.

To model pollen dispersal, the exponential-power function (Austerlitz et al., 2004)

$urn:x-wiley:1755098X:media:men13307:men13307-math-0037$ (4)

was assumed, where $urn:x-wiley:1755098X:media:men13307:men13307-math-0038$ is an Euclidean distance between the k-th candidate father and the j-th mother, $urn:x-wiley:1755098X:media:men13307:men13307-math-0039$ is the scale parameter, $urn:x-wiley:1755098X:media:men13307:men13307-math-0040$ is the shape parameter, and $urn:x-wiley:1755098X:media:men13307:men13307-math-0041$ is a normalizing constant that cancels out in the equation (3]. For the purpose of estimation, the function was parameterized with the mean forward dispersal distance $urn:x-wiley:1755098X:media:men13307:men13307-math-0042$ (Austerlitz et al., 2004), so that the scale parameter was substituted with $urn:x-wiley:1755098X:media:men13307:men13307-math-0043$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0044$ is the standard gamma function.

The male fecundity was assumed to be a random variable, such that the logarithm of fecundity $urn:x-wiley:1755098X:media:men13307:men13307-math-0045$ is normally distributed (see Klein et al., 2008). Specifically, we assumed that the expected log-fecundity of the k-th individual is $urn:x-wiley:1755098X:media:men13307:men13307-math-0046$ , that is $urn:x-wiley:1755098X:media:men13307:men13307-math-0047$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0048$ is the mean and $urn:x-wiley:1755098X:media:men13307:men13307-math-0049$ is the standard deviation.

2.1.4 Selection gradients

In order to incorporate the effects of phenotypic variables on male reproductive potential, $urn:x-wiley:1755098X:media:men13307:men13307-math-0050$ was assumed to be a linear combination of variables $urn:x-wiley:1755098X:media:men13307:men13307-math-0051$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0052$ is the m-th (normalized) variable measured for the k-th individual. Because the neighborhood model implements the expected proportion of pollen gametes of the k-th male in a given maternal pollen pool, the $urn:x-wiley:1755098X:media:men13307:men13307-math-0053$ term in the equation (3) appearing both in the numerator and denominator needs to be defined up to a constant. For this reason, $urn:x-wiley:1755098X:media:men13307:men13307-math-0054$ lacks a constant term, which would eventually cancel out in (3). It follows that $urn:x-wiley:1755098X:media:men13307:men13307-math-0055$ can be used as a reference mean value of the relative log fecundity so that $urn:x-wiley:1755098X:media:men13307:men13307-math-0056$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0057$ means that the k-th individual has greater and lower fecundity than the average-valued male. The null model for covariates, when $urn:x-wiley:1755098X:media:men13307:men13307-math-0058$ , is equivalent to the assumption $urn:x-wiley:1755098X:media:men13307:men13307-math-0059$ .

The parameter $urn:x-wiley:1755098X:media:men13307:men13307-math-0060$ models overdispersion, that is the variation in male fecundity that is not captured by the regression function. Consequently, for $urn:x-wiley:1755098X:media:men13307:men13307-math-0061$ diminishing to zero, the hierarchical neighborhood model becomes equivalent to the classic neighborhood model, that is $urn:x-wiley:1755098X:media:men13307:men13307-math-0062$ , which, therefore, can be treated as a special case of the hierarchical model.

2.2 Estimation

The hierarchical neighborhood model has multiple estimable parameters. Parameters of interest include pollen source probabilities, dispersal kernel parameters, background pollen divergence, and selection gradients. In addition, allele frequencies in background pollen pools need to be estimated as well, but they are usually less important in empirical studies (they are nuisance parameters). Individual fecundity values can be treated as either nuisance parameters or goal parameters, depending on the context.

In the classic neighborhood model approach, the parameters are estimated via maximizing the log-likelihood function:

$urn:x-wiley:1755098X:media:men13307:men13307-math-0063$ (6)

Nonetheless, in the hierarchical model, selection gradients as well as $urn:x-wiley:1755098X:media:men13307:men13307-math-0064$ , { $urn:x-wiley:1755098X:media:men13307:men13307-math-0065$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0066$ do not enter to the likelihood function. To make the estimation of these parameters possible, a hierarchical Bayesian approach can be used. In this regard, our study followed the methods developed earlier (Chybicki, 2013; Chybicki & Burczyk, 2013; Klein et al., 2008). These methods implemented the Markov chain Monte Carlo (MCMC) algorithm to approximate the posterior distribution of parameters. Here, a Bayesian variable selection procedure was additionally implemented using the reversible-jump MCMC (RJMCMC) algorithm (details are given in the Appendix S1), similar to that described in Chybicki et al. (2019). The RJMCMC algorithm was used to narrow the focus to parsimonious regression models that tend to include important selection gradients only. The algorithm allows estimating the posterior probabilities of the regression models. Hence, one is able, quite automatically, to select those effects that remain in the model with the highest probability.

For a rough assessment of a regression model's quality, the fraction of the total variance that is captured by the model (R²) is usually of interest. While a Bayesian R² cannot be derived from the standard definition without problems, an empirical R² equivalent can be computed easily following Gelman et al. (2019), as $urn:x-wiley:1755098X:media:men13307:men13307-math-0067$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0068$ is the variance of the modelled predicted means, and $urn:x-wiley:1755098X:media:men13307:men13307-math-0069$ is the modelled residual variance. In our case, this approach results in $urn:x-wiley:1755098X:media:men13307:men13307-math-0070$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0071$ is the variance estimator of a variable indexed from k = 1 to N_A, and $urn:x-wiley:1755098X:media:men13307:men13307-math-0072$ . Therefore, the estimation algorithm was designed to estimate $urn:x-wiley:1755098X:media:men13307:men13307-math-0073$ along with the model parameters.

2.3 Simulations

Because the method described here is an extension of the existing methods, many of its properties were already tested in simulations (Burczyk et al., 2002; Burczyk & Chybicki, 2004; Burczyk & Koralewski, 2005; Chybicki, 2013, 2018; Chybicki et al., 2019; Klein et al., 2011, 2013). Therefore, our simulations were designed to focus on (i) the ability of the novel method to properly select and quantify selection gradients in the face of different levels of residual variance as well as different sampling designs, (ii) comparisons between methods based on the classic neighborhood model and the hierarchical neighborhood model in the face of nonzero residual variance and (iii) comparisons between the hierarchical approach and the two-step approach based on estimated fecundities (i.e. the null hierarchical model + a separate regression analysis).

The reference simulation algorithm began with generating $urn:x-wiley:1755098X:media:men13307:men13307-math-0074$ hermaphrodite plants. For each plant, spatial co-ordinates were drawn randomly within a quadrate to get a population density of 20 individuals per hectare. Then, genotypes were randomly assigned at 16 loci, each with 8 alleles. Finally, for each plant, three independent phenotype variables were drawn from a normal distribution. Phenotypes were normalized to get mean 0 and variance 1. $urn:x-wiley:1755098X:media:men13307:men13307-math-0075$ plants were drawn to serve as mother plants. To simulate a sample of offspring, 20 offspring individuals per mother ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0076$ ) were generated as in Chybicki (2018). First, for each (k-th) candidate parent, the (expected) log-fecundity $urn:x-wiley:1755098X:media:men13307:men13307-math-0077$ was drawn from the normal distribution with the mean $urn:x-wiley:1755098X:media:men13307:men13307-math-0078$ and the standard deviation $urn:x-wiley:1755098X:media:men13307:men13307-math-0079$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0080$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0081$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0082$ . Thus, the first variable had a strong positive effect; the second variable had a moderate negative effect, while the third variable did not affect male fecundity. Then, each candidate father was assigned the expected proportion of gametes in a local pool of outcross pollen of a given maternal individual. To simulate pollen dispersal, the shape parameter of the exponential-power function was set to 0.5, while the scale parameter was adjusted to get the mean distance of pollen dispersal of 50 metres. Each offspring in the j-th family was randomly assigned as a result of either self-pollination (with the probability $urn:x-wiley:1755098X:media:men13307:men13307-math-0083$ ), pollen immigration (with the probability $urn:x-wiley:1755098X:media:men13307:men13307-math-0084$ ), or local outcross pollination (with the probability $urn:x-wiley:1755098X:media:men13307:men13307-math-0085$ ). For the j-th family, the vector $urn:x-wiley:1755098X:media:men13307:men13307-math-0086$ was drawn from a Dirichlet distribution with parameters $urn:x-wiley:1755098X:media:men13307:men13307-math-0087$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0088$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0089$ . The average probability of self-pollination, pollen immigration, and local outcrossing was 0.020, 0.196, and 0.784, and 90% values laid within 0–0.055, 0.025–0.443 and 0.536–0.965, respectively. For an offspring with a local outcross origin, the father was drawn at random according to the expected proportions, computed based on both pollen dispersal kernel and fecundity. Offspring genotypes were generated assuming the Mendelian laws. Background allele frequencies for each family were drawn from the Dirichlet distribution assuming $urn:x-wiley:1755098X:media:men13307:men13307-math-0090$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0091$ (for $urn:x-wiley:1755098X:media:men13307:men13307-math-0092$ from 1 to 8). As a result, a sample eventually consisted of $urn:x-wiley:1755098X:media:men13307:men13307-math-0093$ 100 parents (including mothers) and $urn:x-wiley:1755098X:media:men13307:men13307-math-0094$ 400 progeny. Due to pollen immigration and self-pollination, about 300 progeny contained information about the distribution of local pollen dispersal and male fecundity.

We first aimed to study the effect of unexplained variance in fecundity. For this purpose, data were generated as in the reference simulation modifying $urn:x-wiley:1755098X:media:men13307:men13307-math-0095$ to get different overdispersion levels. The resulting $urn:x-wiley:1755098X:media:men13307:men13307-math-0096$ values for $urn:x-wiley:1755098X:media:men13307:men13307-math-0097$ concentrated on 0.95 (±0.01), 0.83 (±0.02) and 0.56 (±0.04), respectively. Additionally, to study the omitted variable effect, we used the reference data with the 1^st phenotypic variable ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0098$ ) replaced by a random normal deviate.

To study the impact of sampling effort on the quality of estimates, the reference algorithm was modified as follows. First, we focused on the impact of progeny sample size so that instead of $urn:x-wiley:1755098X:media:men13307:men13307-math-0099$ = 20, we simulated $urn:x-wiley:1755098X:media:men13307:men13307-math-0100$ = 10 or 40 to get the total number of progeny $urn:x-wiley:1755098X:media:men13307:men13307-math-0101$ = 200 or 800 (instead of 400). Subsequently, we studied a potential effect of the proportion of the offspring to adults by setting $urn:x-wiley:1755098X:media:men13307:men13307-math-0102$ to $urn:x-wiley:1755098X:media:men13307:men13307-math-0103$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0104$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0105$ . To assess whether the number of families alters the quality of estimates, instead of $urn:x-wiley:1755098X:media:men13307:men13307-math-0106$ = 20, data were generated assuming $urn:x-wiley:1755098X:media:men13307:men13307-math-0107$ = 10 or 40, but the total number of offspring remained as in the reference simulation.

For each scenario, 100 replicates were simulated and subjected to the analysis with the NM2F software (Chybicki & Burczyk, 2013), modified appropriately to incorporate the RJMCMC algorithm. The posterior distribution was approximated with 100,000 MCMC updates (keeping every 20^th update), after 20,000 initial iterations for pilot adjusting. The marginal posterior distributions of parameters were computed based on the subset of MCMC samples representing the most probable regression model. Estimates of selection gradients, the standard deviation $urn:x-wiley:1755098X:media:men13307:men13307-math-0108$ and $urn:x-wiley:1755098X:media:men13307:men13307-math-0109$ were characterized by the bias, the mean squared error (MSE), and the coverage of the 95% highest posterior density interval (HPDI) approximated as the shortest interval containing 95% parameter draws. In the case of $urn:x-wiley:1755098X:media:men13307:men13307-math-0110$ , the expected value was computed empirically for each simulated data using $urn:x-wiley:1755098X:media:men13307:men13307-math-0111$ , where $urn:x-wiley:1755098X:media:men13307:men13307-math-0112$ is the expected residual of male fecundity for the k-th individual. The model selection procedure was summarized with the frequency of selection of the true vs. the null regression model and the frequency of selection of a regression model, including the m-th phenotypic variable. In this way, the frequencies of false positives and false negatives were assessed.

The hierarchical neighborhood model was designed to deal with overdispersion in male fecundity that is not explained by selection gradients implemented in the classic neighborhood model. In order to compare the two methods under overdispersion in fecundity, data were generated as for the reference scenario, except that (i) pollen source probabilities $urn:x-wiley:1755098X:media:men13307:men13307-math-0113$ were assumed uniform across families, (ii) background pollen pools were assumed uniform across families (i.e. $urn:x-wiley:1755098X:media:men13307:men13307-math-0114$ = 0) and equal to the frequencies in the simulated population. These modifications allowed simulated data to meet exactly the assumptions of the classic neighborhood model implemented in the NMπ software (Chybicki, 2018) so that the unexplained variance in fecundity remained the only source of potential differences between methods.

Three dispersion levels were simulated, that is $urn:x-wiley:1755098X:media:men13307:men13307-math-0115$ . Then, data were analysed with NMπ (the maximum likelihood approach, the classic model) and NM2F (the Bayesian approach, the hierarchical model) software in parallel. In the case of NMπ, estimates were derived based on the best model after model selection performed using the likelihood ratio test. Because this part of the simulation study required manual handling with the software, 20 replicates per scenario were analysed only.

The hierarchical method performs the estimation of the effects of phenotypic variables on male fecundity in a single estimation step. Nonetheless, it is also possible to run the two-step analysis, where individual fecundities are first estimated based on the null model (equivalent to that developed by Klein et al., 2008) and subsequent regression analysis is performed to estimate a regression model for fecundity (as in Oddou-Muratorio et al., 2018). Because the quality of fecundity estimates depends on the amount of information used in a model, the hierarchical approach and the two-step approach can differ in this respect. Therefore, we compared the quality of estimated regression parameters based on the hierarchical approach and the two-step approach. We focused on simulated data for the three levels of sampling effort, that is $urn:x-wiley:1755098X:media:men13307:men13307-math-0116$ . For each level, we analysed 20 replicates, assuming the null regression model (with slopes fixed at zero). Posterior median estimates of individual male fecundity were then used to run a stepwise regression analysis.

2.4 The case studies

Two real data sets were analyzed to demonstrate the capabilities of the new method for different sampling designs. In both cases, the analysis was conducted primarily using the new method (the hierarchical neighborhood model). In addition, data were analysed with the classic neighborhood model as well as the two-step approach, where individual fecundities and regression slopes are estimated separately. In these additional analyses, we focused on the estimates of regression slopes, treating the estimates based on the hierarchical model as a reference.

2.4.1 Norway spruce

The first example was the clonal seed orchard of Norway spruce (Picea abies (L.) Karst.) (Dering et al., 2014). Despite the study plot did not represent a natural population, the study system possessed all major characteristics of a typical study designed for analyzing plant mating patterns. The trees produced seeds after unconstrained pollination, and there was no isolation from external pollen sources that reached 58% on average (Dering et al., 2014). According to paternity analysis of seed crop in three mast seasons (in 1996, 2004 and 2006), individual trees tend to contribute differently to maternal pollen pools, with 50% of the total seed crop produced by ca. 10% trees. Paternity analysis revealed that the distance between mates is a significant factor in mating success. However, the impact of phenotypic variables was not studied.

The sample contained 447 trees, including five mother trees, characterized by microsatellite genotypes and spatial co-ordinates. In addition, each tree was characterized with three variables that are potentially related to male fecundity, that is male strobili abundance (i.e. the estimated number of male cones based on field observations), tree height (a field measurement) and crown volume (estimated based on measurements of tree height, first branching height and crown projection). For genotyping, five microsatellite loci were used (121 detected alleles in total), yielding the combined exclusion probability of 0.9879. Because field measurements were taken in the single mast year (2006), only seeds collected that year were analyzed in the case study (500 seeds).

To match the reality of the study plot and get the extra information about mating patterns, the hierarchical neighborhood model was modified as follows. To account for possible directionality in pollen dispersal, the exponential-power-von Misses dispersal kernel was assumed (see Chybicki, 2018 for the equation). Hence, the dispersal kernel had two additional parameters: $urn:x-wiley:1755098X:media:men13307:men13307-math-0117$ – the rate of anisotropy and $urn:x-wiley:1755098X:media:men13307:men13307-math-0118$ – the prevailing direction of dispersal. Because the von Misses distribution is circular, $urn:x-wiley:1755098X:media:men13307:men13307-math-0119$ was assumed to be uniformly distributed between zero and infinity while $urn:x-wiley:1755098X:media:men13307:men13307-math-0120$ to be uniformly distributed between 0 and 2π. In this way, we avoided potential problems with the posterior distribution due to equivalence of ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0121$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0122$ ) and ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0123$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0124$ ). Parent–offspring genotype mismatches were treated as in the NMπ software (Chybicki, 2018), with locus-specific error rates assumed to follow the negative exponential distribution truncated at 1. Finally, the Dirichlet distribution for the background pollen pools was assumed to have family-specific divergence parameters $urn:x-wiley:1755098X:media:men13307:men13307-math-0125$ , with the exponential distribution, truncated at 1, taken as a prior (Chybicki, 2013).

Despite that the assumption of nonrandom pollen dispersal was reasonable in the light of earlier study (Dering et al., 2014), we also run the analysis based on the assumption of completely random dispersal. In this case, $urn:x-wiley:1755098X:media:men13307:men13307-math-0126$ was fixed at zero, and the dispersal function $urn:x-wiley:1755098X:media:men13307:men13307-math-0127$ was fixed at 1. Hence, in the ‘random dispersal’ model, individual reproductive success depended only on fecundity parameters. To compare a relative predictive fit between the two models, we computed the widely applicable information criterion (WAIC) (Watanabe, 2010), which accounts for a different number of parameters between models and is well suited to hierarchical models (Gelman et al., 2014). The model with lower WAIC was chosen as the one characterized by better relative goodness of fit.

To get estimates of regression slopes under the classic neighborhood model, data were analysed with the NMπ software (Chybicki, 2018). Similarly, as in the above analysis, we set the following parameters as being estimable: pollen immigration (m), self-fertilization (s), exponential-power-von Misses dispersal kernel (δ, b, κ, α₀), mistyping errors (ε₁, …, ε₅) and regression slopes (β₁, β₂, β₃). The selection of the best-fitting regression model was based on the forward selection. Estimates for regression slopes under the two-step approach were obtained as in the simulation study; that is, individual log fecundities ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0128$ ), estimated based on the same model as described above except for the assumption $urn:x-wiley:1755098X:media:men13307:men13307-math-0129$ (no selection gradients involved), were used as a response variable in the stepwise regression analysis with the standardized phenotypes taken as explanatory variables.

2.4.2 English yew

The second example was microsatellite data collected to study seed and pollen dispersal patterns within a natural English yew population (Taxus baccata L.), a European dioecious conifer (Chybicki & Oleksa, 2018). The earlier study showed that pollen dispersal was nonrandom and followed a leptokurtic dispersal kernel. In addition, male reproductive success was linked with trunk diameter (with the slope of 0.278 ± 0.103). However, since the slope was estimated ignoring potential overdispersion in male fecundity, it might represent a false-positive result.

In contrast to the spruce study, here, progeny were sampled at the stage of naturally regenerated seedlings, without prior knowledge about maternity. However, thanks to the parentage analysis conducted with NMπ (Chybicki, 2018) and taking advantage of separate sexes, 121 out of 220 seedlings were assigned the mother tree with the probability ≥0.8. These seedlings were used as a sample of progeny with known mothers but unknown fathers. As a result, we identified 55 maternal families, with the number of progeny ranged between 1 and 10 (2.2 on average). As a local population of candidate pollen donors, 128 male trees were subsampled from the original data set.

Because of a low number of progeny per mother, we assumed that pollen source probabilities were common across families (i.e. without no variation among families). In addition, because of dioecy, we fixed the probability of self-fertilization at zero. Consequently, we took a uniform Beta distribution as a prior for pollen immigration (m) vs. local pollination (c). Also, because the pollen immigration appeared to be close to zero in the extracted subset of progeny (see the results), we estimated neither allele frequencies in the background population nor their divergence rate and, instead, we assumed that these frequencies were equal to frequencies in the sample of trees. Finally, because the sampled seedlings represented a mixture of multiple regeneration seasons, we assumed that the pollen dispersal kernel was isotropic.

The analysis was run twice, that is under the assumption of random vs. nonrandom pollen dispersal. The predictive fit of the two models was compared using WAIC. Similarly, as in the spruce case study, the regression slope was estimated using the NMπ software and the two-step approach.

3 RESULTS

3.1 Simulations

3.1.1 The effect of overdispersion

Generally, the simulations showed that the hierarchical neighborhood model correctly identified the regression model for male fecundity (Table 1). The increase in overdispersion resulted mostly in the increase of false negatives for the medium-level slope coefficient ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0130$ ). In addition, false positives appeared to increase (i.e. $urn:x-wiley:1755098X:media:men13307:men13307-math-0131$ was included in the model) as the overdispersion level increased, but the frequency of false positives remained at an acceptable level of 1%–2% only. As for the quality of parameter estimates, the slope parameters $urn:x-wiley:1755098X:media:men13307:men13307-math-0132$ were estimated with very little or no bias. In addition, their credible intervals showed good coverage. The level of overdispersion in fecundity ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0133$ ) appeared to be less accurately estimated, especially when $urn:x-wiley:1755098X:media:men13307:men13307-math-0134$ . On the other hand, the empirical estimate of model fit ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0135$ ) was in a relatively good agreement with the expected value. Only, the coverage of credible interval for $urn:x-wiley:1755098X:media:men13307:men13307-math-0136$ appeared to decrease slightly under the nominal level for the highest overdispersion. Regardless of the level of overdispersion, the method never resulted in the selection of the null model.

Table 1. The impact of overdispersion in fecundity on the selection of the true regression model and estimates of variable effects on fecundity

Scenario	Parameter	Bias	MSE	Coverage	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0137$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0138$ )
Low overdispersion in fecundity	$urn:x-wiley:1755098X:media:men13307:men13307-math-0139$ = 1	0.018	0.007	0.96	1.0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0140$ = −0.5	−0.001	0.008	0.95	0.99
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0141$ = 0	0.000	0.000	1	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0142$ = 0.25	0.090	0.012	0.91
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0143$	−0.033	0.002	0.94
	f(True model) = 0.99
Medium overdispersion in fecundity	$urn:x-wiley:1755098X:media:men13307:men13307-math-0144$ = 1	0.002	0.011	0.95	1.0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0145$ = −0.5	−0.015	0.009	0.95	1.0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0146$ = 0	0.003	0.001	0.99	0.01
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0147$ = 0.5	0.018	0.007	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0148$	−0.006	0.002	0.98
	f(True model) = 0.99
High overdispersion in fecundity	$urn:x-wiley:1755098X:media:men13307:men13307-math-0149$ = 1	0.041	0.028	0.96	1.0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0150$ = −0.5	0.021	0.051	0.82	0.86
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0151$ = 0	−0.007	0.002	0.98	0.02
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0152$ = 1.0	0.033	0.025	0.95
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0153$	0.012	0.008	0.88
	f(True model) = 0.85
Omitted variable	$urn:x-wiley:1755098X:media:men13307:men13307-math-0154$ = 0	0.000	0.000	1	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0155$ = −0.5	0.049	0.054	0.80	0.83
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0156$ = 0	0.008	0.004	0.98	0.02
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0157$ ^a	–	–	–
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0158$	0.001	0.009	0.81
	f(True model) = 0.81

f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0159$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0160$ ) – the frequency of selection of a regression model including $urn:x-wiley:1755098X:media:men13307:men13307-math-0161$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0162$ – the slope of the effect (a regression parameter) for the $urn:x-wiley:1755098X:media:men13307:men13307-math-0163$ -th variable, $urn:x-wiley:1755098X:media:men13307:men13307-math-0164$ – the standard deviation of the normal distribution that captures the overdispersion in fecundity, $urn:x-wiley:1755098X:media:men13307:men13307-math-0165$ –the fraction of variance explained by the regression model, f(True model) – the frequency of selecting the true model.
^a The actual value of overdispersion parameter is unknown because the omitted variable introduced extra unexplained variance.

The effect of the omitted variable was similar to the effect of increased overdispersion (Table 1). Because the omitted variable had a strong effect, the excess in the variation of the realized fecundity effectively weakened the ability of the method to properly identify the effect of the moderate-effect variable. As a result, we observed the increased frequency of selection of the null model (17%) instead of the actual model with one variable. On the other hand, the omitted variable scenario resulted in a very low frequency of false positives. Also, the variance proportion explained by the regression model was estimated without bias. The relatively low coverage level for $urn:x-wiley:1755098X:media:men13307:men13307-math-0166$ (81%) followed exactly the frequency of selection of the true model. Interestingly, for the subset of results where the best model was different from the null model, the coverage for $urn:x-wiley:1755098X:media:men13307:men13307-math-0167$ increased to 97% (Results not shown).

3.1.2 The effect of sampling design

Simulations showed that the effect of the number of progeny on the quality of estimates of parameters related to the regression model is only moderate (Table 2). It was mainly manifested in the increased MSE. Because parameter estimates showed little bias, the increased MSE was mostly due to the increased variance of estimates. However, the quality of credibility intervals remained comparably good regardless of the sampling effort. A slightly decreased frequency of selection of the true model was another effect of the reduced number of progeny. Nonetheless, even for the sample size of 200 progeny individuals, the model selection procedure resulted in 95% accuracy. The reduction of progeny number also resulted in a slight decrease in the frequency of identification of the moderate effect, leading to 4% of false negatives. On the other hand, the frequency of false positives remained almost unaffected.

Table 2. The impact of the number of progeny on the selection of the true regression model and estimates of variable effects on fecundity

Number of progeny	Parameter	Bias	MSE	Coverage	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0168$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0169$ )
200	$urn:x-wiley:1755098X:media:men13307:men13307-math-0170$	0.010	0.018	0.92	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0171$	−0.006	0.021	0.94	0.96
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0172$	0.003	0.001	0.99	0.01
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0173$	0.062	0.018	0.97
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0174$	−0.030	0.006	0.93
	f(True model) = 0.95
	f(Null model) = 0
400	$urn:x-wiley:1755098X:media:men13307:men13307-math-0175$	0.002	0.011	0.95	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0176$	−0.015	0.009	0.95	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0177$	0.003	0.001	0.99	0.01
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0178$	0.018	0.007	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0179$	−0.006	0.002	0.98
	f(True model) = 0.99
	f(Null model) = 0
800	$urn:x-wiley:1755098X:media:men13307:men13307-math-0180$	0.011	0.007	0.92	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0181$	−0.005	0.007	0.98	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0182$	0.000	0.000	1	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0183$	0.015	0.005	0.97
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0184$	−0.001	0.001	0.95
	f(True model) = 1.00
	f(Null model) = 0

f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0185$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0186$ ) – the frequency of selection of a regression model including $urn:x-wiley:1755098X:media:men13307:men13307-math-0187$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0188$ – the slope of the effect (a regression parameter) for the $urn:x-wiley:1755098X:media:men13307:men13307-math-0189$ -th variable, $urn:x-wiley:1755098X:media:men13307:men13307-math-0190$ – the standard deviation of the normal distribution that captures the overdispersion in fecundity, $urn:x-wiley:1755098X:media:men13307:men13307-math-0191$ – the fraction of variance explained by the regression model, f(True model) – the frequency of selection of the true model.

Generally, doubling the number of progeny per adult increased the frequency of selection of the true model and decreased the bias and MSE of estimates. Nonetheless, the relative sampling effort, in terms of the number of progeny to the number of adults, appeared to have little effect on the selection of the true regression model and the quality of estimates of variable effects, especially when compared against the sole effect of the number of adults in a sample. Whereas, for the fixed number of progeny (200 or 400), the sampling scenario with 50 adults in a sample was characterized by a lower inclusion probability of the moderate variable (Table 3) compared to the scenario with 100 adults (Table 2).

Table 3. The impact of the number of progeny to number of adults on the selection of the true regression model and estimates of variable effects on fecundity

Number of adults	Number of progeny	Parameter	Bias	MSE	Coverage	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0192$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0193$ )
50	200	$urn:x-wiley:1755098X:media:men13307:men13307-math-0194$	0.036	0.019	0.99	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0195$	0.029	0.038	0.87	0.89
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0196$	0.004	0.001	0.99	0.01
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0197$	0.074	0.021	0.97
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0198$	−0.029	0.007	0.96
		f(True model) = 0.88
100	400	$urn:x-wiley:1755098X:media:men13307:men13307-math-0199$	0.002	0.011	0.95	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0200$	−0.015	0.009	0.95	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0201$	0.003	0.001	0.99	0.01
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0202$	0.018	0.007	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0203$	−0.006	0.002	0.98
		f(True model) = 0.99
50	400	$urn:x-wiley:1755098X:media:men13307:men13307-math-0204$	0.031	0.014	0.98	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0205$	−0.004	0.026	0.87	0.96
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0206$	−0.004	0.002	0.99	0.01
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0207$	0.038	0.012	0.99
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0208$	−0.001	0.004	0.94
		f(True model) = 0.95
100	800	$urn:x-wiley:1755098X:media:men13307:men13307-math-0209$	0.011	0.007	0.92	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0210$	−0.005	0.007	0.98	1
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0211$	0.000	0.000	1	0
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0212$	0.015	0.005	0.97
		$urn:x-wiley:1755098X:media:men13307:men13307-math-0213$	−0.001	0.001	0.95
		f(True model) = 1.0

f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0214$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0215$ ) – the frequency of selection of a regression model including $urn:x-wiley:1755098X:media:men13307:men13307-math-0216$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0217$ – the slope of the effect (a regression parameter) for the $urn:x-wiley:1755098X:media:men13307:men13307-math-0218$ -th variable, $urn:x-wiley:1755098X:media:men13307:men13307-math-0219$ – the standard deviation of the normal distribution that captures the overdispersion in fecundity, $urn:x-wiley:1755098X:media:men13307:men13307-math-0220$ – the fraction of variance explained by the regression model, f(True model) – the frequency of selection of the true model.

In simulations, we observed no clear tendency for bias and MSE as well as for the model selection that could correspond with the change in the number of maternal families (Table 4). To some extent, the coverage tended to decrease, and the frequency of false positives tended to increase as the number of families increased. However, differences between scenarios were too small to make robust conclusions in this respect.

Table 4. The impact of the number of maternal families on the selection of the true regression model and estimates of variable effects on fecundity

Number of families	Parameter	Bias	MSE	Coverage	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0221$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0222$ )
10	$urn:x-wiley:1755098X:media:men13307:men13307-math-0223$	0.028	0.012	0.99	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0224$	−0.004	0.006	0.99	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0225$	0.006	0.002	0.98	0.02
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0226$	0.052	0.013	0.92
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0227$	−0.016	0.003	0.96
	f(True model) = 0.98
20	$urn:x-wiley:1755098X:media:men13307:men13307-math-0228$	0.002	0.011	0.95	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0229$	−0.015	0.009	0.95	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0230$	0.003	0.001	0.99	0.01
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0231$	0.018	0.007	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0232$	−0.007	0.002	0.98
	f(True model) = 0.99
40	$urn:x-wiley:1755098X:media:men13307:men13307-math-0233$	0.019	0.011	0.93	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0234$	0.018	0.013	0.94	0.98
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0235$	0.000	0.000	1	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0236$	0.036	0.010	0.94
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0237$	−0.015	0.003	0.95
	f(True model) = 0.98

f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0238$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0239$ ) – the frequency of selection of a regression model including $urn:x-wiley:1755098X:media:men13307:men13307-math-0240$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0241$ – the slope of the effect (a regression parameter) for the $urn:x-wiley:1755098X:media:men13307:men13307-math-0242$ -th variable, $urn:x-wiley:1755098X:media:men13307:men13307-math-0243$ – the standard deviation of the normal distribution that captures the overdispersion in fecundity, $urn:x-wiley:1755098X:media:men13307:men13307-math-0244$ – the fraction of variance explained by the regression model, f(True model) – the frequency of selection of the true model.

3.1.3 The hierarchical approach vs. the classic approach

In the case of overdispersion in fecundity, the classic approach tended to preserve sensitivity at the cost of specificity in identifying the effects of phenotypic variables on male fecundity (Table 5). In contrast, the hierarchical approach tended to behave the opposite. However, false negatives appeared to show up less frequently in the hierarchical approach than false positives in the classic one. In terms of bias and MSE, the two methods showed comparable quality, suggesting that the estimation procedure (MLE vs. Bayesian approach) does not influence the quality of estimates.

Table 5. The comparison of the classic neighborhood model and the hierarchical neighborhood model (20 replicates per scenario)

Overdispersion	Parameter	Classic model			Hierarchical model
Overdispersion	Parameter	Bias	MSE	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0245$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0246$ )	Bias	MSE	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0247$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0248$ )
$urn:x-wiley:1755098X:media:men13307:men13307-math-0249$	$urn:x-wiley:1755098X:media:men13307:men13307-math-0250$	0.024	0.006	1	0.037	0.007	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0251$	−0.003	0.002	1	−0.002	0.002	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0252$	0.006	0.001	0.05	0	0	0
	f(True model)	0.95			1.00
$urn:x-wiley:1755098X:media:men13307:men13307-math-0253$	$urn:x-wiley:1755098X:media:men13307:men13307-math-0254$	0.042	0.017	1	0.055	0.015	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0255$	0.001	0.008	1	−0.019	0.005	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0256$	0.000	0.004	0.2	0	0	0
	f(True model)	0.800			1.00
$urn:x-wiley:1755098X:media:men13307:men13307-math-0257$	$urn:x-wiley:1755098X:media:men13307:men13307-math-0258$	−0.098	0.042	1	−0.056	0.022	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0259$	0.080	0.034	1	0.102	0.073	0.75
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0260$	−0.015	0.037	0.5	0	0	0
	f(True model)	0.50			0.75

f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0261$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0262$ ) – the frequency of selection of a regression model including $urn:x-wiley:1755098X:media:men13307:men13307-math-0263$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0264$ – the slope of the effect (a regression parameter) for the $urn:x-wiley:1755098X:media:men13307:men13307-math-0265$ -th variable, $urn:x-wiley:1755098X:media:men13307:men13307-math-0266$ – the standard deviation of the normal distribution that captures the overdispersion in fecundity, f(True model) – the frequency of selection of the true model.

3.1.4 The hierarchical approach vs. the regression analysis on estimated fecundities

In contrast to the method based on the hierarchical neighborhood model, the combination of the estimation of individual fecundities and the standard regression analysis (the two-step approach) generated biased slope estimates (Table 6). The two-step approach tended to produce a regression model characterized by the underestimated proportion of the explained variance (R²). However, the two-step approach was successful in identifying the true regression model. Comparing three levels of the sampling effort revealed that the overall quality of estimates based on the two-step approach increased together with the number of progeny in a sample. Nonetheless, assuming the bias in R² decreases linearly with the logarithm of the number of progeny (see Table 6), for the considered simulation setup, ca. 3000 progeny would be needed to reduce the bias to zero or to achieve the efficiency of the hierarchical approach.

Table 6. The comparison of the regression analysis on estimated fecundities and the hierarchical neighborhood model (20 replicates per scenario)

Number of seeds	Parameter	Regression analysis on estimated fecundities			Hierarchical model
Number of seeds	Parameter	Bias	MSE	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0267$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0268$ )	Bias	MSE	f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0269$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0270$ )
200	$urn:x-wiley:1755098X:media:men13307:men13307-math-0271$	−0.487	0.252	1	0.003	0.019	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0272$	0.263	0.076	0.95	0.032	0.021	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0273$	−0.008	0.001	0.05	0	0	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0274$	−0.397	0.163		−0.008	0.003
	f(True model)	0.90			1
400	$urn:x-wiley:1755098X:media:men13307:men13307-math-0275$	−0.352	0.134	1	−0.008	0.008	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0276$	0.147	0.025	1	−0.046	0.010	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0277$	0.006	0.001	0.05	0	0	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0278$	−0.279	0.082		−0.016	0.002
	f(True model)	0.95			1
800	$urn:x-wiley:1755098X:media:men13307:men13307-math-0279$	−0.224	0.060	1	0.002	0.010	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0280$	0.107	0.016	1	−0.018	0.006	1
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0281$	0.008	0.001	0.05	0	0	0
	$urn:x-wiley:1755098X:media:men13307:men13307-math-0282$	−0.194	0.039		−0.007	0.001
	f(True model)	0.95			1

f( $urn:x-wiley:1755098X:media:men13307:men13307-math-0283$ in $urn:x-wiley:1755098X:media:men13307:men13307-math-0284$ ) – the frequency of selection of a regression model including $urn:x-wiley:1755098X:media:men13307:men13307-math-0285$ , $urn:x-wiley:1755098X:media:men13307:men13307-math-0286$ – the slope of the effect (a regression parameter) for the $urn:x-wiley:1755098X:media:men13307:men13307-math-0287$ -th variable, $urn:x-wiley:1755098X:media:men13307:men13307-math-0288$ – the fraction of variance explained by the regression model, f(True model) – the frequency of selection of the true model.

3.2 The case studies

3.2.1 Norway spruce

The model with nonrandom pollen dispersal was characterized by WAIC = 10,674.2, while the model with random pollen dispersal had WAIC =10,706.9. The difference in WAIC of 32.7 between models provided sufficient support for choosing the nonrandom dispersal model as the one with the better predictive fit. Therefore, we focused primarily on estimates under the model with nonrandom pollen dispersal, except for parameters related with the effects of phenotypic variables on male fecundity, where we compared the results under the two models.

The estimated frequencies of genotyping errors spanned between 0.5% and 7.2% (posterior medians) with a mean error frequency of 2.1% (posterior median). The posterior median of the parameter of truncated exponential distribution used as a prior for individual error frequencies was 0.022. The HPD interval spanned between 0.007 and 0.053. Overall, the estimates revealed that genotyping errors were scarce and concerned effectively only one marker.

The analysis revealed that, on average, 64% of pollen gametes came from the background population. In this regard, mother trees showed significant differences, as individual estimates varied between 48% and 79%. Almost 6% of successful pollen was produced by mother trees themselves, resulting in self-fertilization. Thus, 28% of pollen gametes came from local trees due to outcross fertilization. These gametes (roughly 140 seeds) can be considered as an effective sample for estimating parameters related to male reproductive success, that is pollen dispersal kernel and pollen donor fecundity.

The analysis also revealed that pollen gametes that originated in the background population were characterized by considerable genetic divergence levels among mother trees. Individual estimates of divergence parameter $urn:x-wiley:1755098X:media:men13307:men13307-math-0289$ ranged from 0.128 to 0.321, with the mean across mother trees of 0.228.

Among pollen dispersal kernel parameters, the mean forward dispersal distance $urn:x-wiley:1755098X:media:men13307:men13307-math-0290$ proved difficult to characterize precisely as revealed by the 95% credible interval spanned between 2 and 56,233 m. The posterior median of $urn:x-wiley:1755098X:media:men13307:men13307-math-0291$ equalled 575 m. The posterior distribution of the shape parameter was characterized by the median of 0.073 and the 95% credible interval between 0.010 and 0.466, revealing that the dispersal function was highly fat-tailed. In addition, dispersal function was also highly anisotropic with the prevailing direction of 173° from North clockwise (95% HDP between 130° and 210°), or approximately due South. The median estimate of the rate of anisotropy $urn:x-wiley:1755098X:media:men13307:men13307-math-0292$ equalled 1.005 with the 95% HPD interval between 0.281 and 1.714.

Individual posterior median estimates of male log-fecundity ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0293$ ) spanned between −2.92 and 3.85 (Figure 1), with the mean 0.028 and standard deviation 1.219. Although the empirical distribution of posterior medians approached a normal-like distribution (Figure 1), according to the Shapiro–Wilk test, it deviated significantly from normality (p = 0.017), mostly due to a presence of greater variation in the right tail of the distribution. Individual estimates of male fecundity exhibited high variance so that only eight (out of 447) estimates revealed significant departure from zero, according to the estimated 95% credible intervals. In all significant cases, fecundity was greater than zero.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The estimated values of male log-fecundity ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0294$ ) in the Norway spruce case study: (a) spatial distribution in the study plot (negative values shown as white and positive values shown as grey, mother trees shown as black circles), (b) empirical frequency distribution, (c) individual estimates with the 95% highest density intervals, (d) relationship between standardized tree phenotypes and the estimated male log-fecundity. The effect of male strobili abundance is shown as a partial residual plot to account for the effect of tree height

**Figure 1**
Open in figure viewer PowerPoint

The estimated values of male log-fecundity ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0294$ ) in the Norway spruce case study: (a) spatial distribution in the study plot (negative values shown as white and positive values shown as grey, mother trees shown as black circles), (b) empirical frequency distribution, (c) individual estimates with the 95% highest density intervals, (d) relationship between standardized tree phenotypes and the estimated male log-fecundity. The effect of male strobili abundance is shown as a partial residual plot to account for the effect of tree height

Three phenotypic variables generated eight alternative regression models (including the null model). Under the assumption of nonrandom pollen dispersal, the model with the highest posterior probability of 0.57 contained male strobili abundance and tree height (Figure 2). The second-best model, with the probability of 0.30, contained only tree height as an explanatory variable. The remaining models had a cumulative posterior probability of 0.13. On the other hand, under the random pollen dispersal, the best model, containing tree height only, had the posterior probability of 0.41. The second-best model was the null model with the probability of 0.26.

Based on the best model selected under nonrandom pollen dispersal, tree height had the strongest and positive effect on male fecundity, with the posterior median regression slope $urn:x-wiley:1755098X:media:men13307:men13307-math-0295$ = 0.932 (95% HPDI: 0.294–1.712) and the posterior inclusion probability of 0.903. For male strobili abundance, the posterior median regression slope was $urn:x-wiley:1755098X:media:men13307:men13307-math-0296$ = 0.445 (95% HPDI: 0.126–0.732) and the posterior inclusion probability was 0.655. The estimate of the overdispersion parameter was relatively high, with the posterior median of $urn:x-wiley:1755098X:media:men13307:men13307-math-0297$ = 1.243. However, the estimate was characterized by a wide credible interval between 0.640 and 2.008 (see also Figure 3). The two variables explained roughly 50% of the total variance in male fecundity, but the precision of the $urn:x-wiley:1755098X:media:men13307:men13307-math-0298$ estimate was relatively low, as revealed by the empirical estimate of the 95% credible interval between 0.158 and 0.808. For comparison, the estimate of slope for tree height under the second-best model was $urn:x-wiley:1755098X:media:men13307:men13307-math-0299$ 1.186 (95% HDPI: 0.444–2.008). However, the second-best regression model was characterized by increased $urn:x-wiley:1755098X:media:men13307:men13307-math-0300$ 1.648 (95% HDPI: 0.993–2.470) and diminished $urn:x-wiley:1755098X:media:men13307:men13307-math-0301$ = 0.352 (95% HPDI: 0.037–0.683), compared to the best model.

Based on the best model with random pollen dispersal, tree height had again the strong positive effect on male fecundity with a similar slope of 0.970 (95% HPDI: 0.102–1.924) but much lower posterior inclusion probability of 0.543 compared to the best model under nonrandom pollen dispersal. Moreover, estimates of both the overdispersion $urn:x-wiley:1755098X:media:men13307:men13307-math-0302$ (95% HDPI: 1.206–2.296) and, especially, the proportion of explained variance in fecundity $urn:x-wiley:1755098X:media:men13307:men13307-math-0303$ (95% HDPI: 0.000–0.526) revealed that the estimated regression model was characterized by poor quality compared to the regression model under nonrandom dispersal.

Using both the classic neighborhood model and the two-step approach, we selected the same regression model for male fecundity as in the case of the hierarchical approach. The classic model also yielded very similar estimates of slopes, that is $urn:x-wiley:1755098X:media:men13307:men13307-math-0304$ = 0.506 (95% CI: 0.366–0.646) and $urn:x-wiley:1755098X:media:men13307:men13307-math-0305$ = 1.307 (95% CI: 0.775–1.840), but characterized by narrower confidence bounds compared to the hierarchical approach. On the other hand, slopes estimated using the two-step approach were significantly lower than the hierarchical approach-based estimates, that is $urn:x-wiley:1755098X:media:men13307:men13307-math-0306$ = 0.097 (95% CI: 0.031–0.127) and $urn:x-wiley:1755098X:media:men13307:men13307-math-0307$ = 0.066 (95% CI: 0.001–0.132). Interestingly, the strength of effects was reversed as compared to the reference estimates.

3.2.2 English yew

The nonrandom pollen dispersal model showed a better predictive fit (WAIC =2,607.9) than the model with random pollen dispersal (WAIC = 2,832.4). Therefore, we focused on the results obtained based on the best-fitting model.

Estimates of genotyping error frequencies ranged from 0.002 to 0.101 with a grand mean of 0.018. In agreement with the earlier study, only one marker (Me-998-304A) showed a non-negligible frequency of genotyping errors of 0.101. Generally, estimates of genotyping errors suggested that genotypic data provided solid information about genealogies.

The posterior median pollen immigration frequency was 0.026 only (95% HPDI: 0.000–0.090), suggesting that a great majority of seedlings had two parents within the study plot. Hence, estimates of parameters of male reproductive success reported below were effectively a function of 117 seedlings.

The posterior median of the mean forward dispersal distance was 2,399 m (95% HPDI: 50–69,081 m), whereas the shape parameter of the exponential-power function was characterized by the posterior median of 0.081 (95% HPDI: 0.022–0.237). In other words, the analysis revealed the pollen dispersal kernel function was strongly fat-tailed.

Estimates of individual male log-fecundity spanned between −1.121 and 2.611 (Figure 4). The empirical distribution of posterior medians departed significantly from the normal distribution, as revealed by the Shapiro–Wilk test (p < 0.001). Only four individual estimates deviated significantly from zero. In all cases, males were characterized by higher fecundity than the mean.

**Figure 4**
Open in figure viewer PowerPoint

The estimated values of male log-fecundity ( $urn:x-wiley:1755098X:media:men13307:men13307-math-0308$ ) in the English yew case study: (a) spatial distribution in the study plot (negative values shown as white and positive values shown as grey, mother trees shown as black circles), (b) empirical frequency distribution, (c) individual estimates with the 95% highest density intervals, (d) relationship between the standardized tree phenotype and the estimated male fecundity

Under the assumption of nonrandom pollen dispersal, the model with a significant effect of trunk diameter on male fecundity had the probability of 0.89 (Figure 2). Thus, despite the high uncertainty of individual fecundity estimates, we treated the null model as very unlikely. On the contrary, under the assumption of random pollen dispersal, the null model was characterized by a slightly higher posterior probability than the alternative model (0.53 vs. 0.47). Consequently, under random dispersal, we could not confirm that the trunk diameter is an important predictor of fecundity.

Under the nonrandom dispersal model, the regression slope for trunk diameter of 0.412 (95% HPDI: 0.102–0.717) pointed at a moderate effect of the variable on male fecundity (Figure 3). Indeed, the variation of individual fecundity estimates around the regression function was non-negligible (Figure 4), with the posterior median $urn:x-wiley:1755098X:media:men13307:men13307-math-0309$ of 1.022 and 95% HPD interval between 0.650 and 1.444. The estimate of $urn:x-wiley:1755098X:media:men13307:men13307-math-0310$ was only 14% (95% HPD: 1.6–37.1%), revealing that a majority of individual variation in fecundity remained unexplained.

Under the model with random pollen dispersal, the overdispersion $urn:x-wiley:1755098X:media:men13307:men13307-math-0311$ was the only nonzero fecundity hyper-parameter. It was characterized by the posterior median 1.081 and 95% HPD interval between 0.711 and 1.475.

Both the classic neighborhood model and the two-step approach confirmed that male fecundity is associated with trunk diameter. Also, although the estimated slopes were somewhat lower in these additional analyses, that is 0.326 (95% CI: 0.103–0.549) and 0.126 (95% CI: 0.026–0.225) for the classic and two-step approach, respectively, they laid within the credible interval of the slope estimated based on the hierarchical approach.

4 DISCUSSION

In this study, on top of the neighborhood model (Adams & Birkes, 1989; Burczyk et al., 2002; Chybicki & Burczyk, 2013; Klein et al., 2008), we developed a hierarchical model approach to study the effects of phenotypic variables on male reproductive success in plant populations. We showed that our method is characterized by improved statistical properties compared to both the classic neighborhood model (the fixed-effects type of analysis) and the two-step approach, where the estimation of fecundities and the analysis of variable effects are conducted separately. In the presence of overdispersion in male fecundity, the classic approach appeared to be slightly more sensitive to detect moderate-to-weak effects at the cost of elevated risk of false-positive effects. In contrast, the combined two-step approach was characterized by largely biased slope estimates though it quite correctly identified the true regression model.

The simulations revealed that a sample of 200 progeny individuals is sufficient to correctly determine the relationship between phenotypic variables and male reproductive success. Doubling the number of progeny resulted in a reduced variance but did not influence much sensitivity and specificity of regression parameter estimators. However, the sampling effort should be considered in the context of the effective sample size, that is after accounting for pollen immigration. Pollen immigration is routinely observed in natural populations, and immigration rates vary greatly between populations and species (Ashley, 2010; Ellstrand, 2014). Numerous factors shape the observed level of immigration, including the distance to nearest pollen sources, pollen dispersal mechanism (e.g. wind- vs. animal-mediated), life form (herbs vs. trees) as well as study plot characteristics such as dimensions, shape or density (Ashley, 2010; DiLeo et al., 2018; Ellstrand, 2014). Pollen immigration can also vary at the individual level, for example due to the location of a mother plant in regard to plot boundaries (Chybicki & Burczyk, 2013). Therefore, at the research design stage, it is crucial to consider the negative impact of pollen immigration on the effective number of progeny. Assuming that pollen immigration is 50%, the minimum recommended sample size is 250–300 seeds. However, for species characterized by extensive pollen flow, the effective sample size may be as low as ≤20% of the total sample size only (e.g. Ortego et al., 2014). In such cases, ≥600 seeds should be sampled. Excessive self-fertilization can be an additional limiting factor of the effective sample size, especially in species revealing high variation in self-fertilization rates either among populations or individuals (Setsuko et al., 2013; reviewed in Whitehead et al., 2018). It is worth noting that the variation in self-fertilization rates can be explained with ecological variables using an analogous approach (Chybicki et al., 2019). Combining the two methods allows determinants of the observed selfing and outcrossing patterns to be fully characterized.

The number of candidate fathers is equally as important as the number of offspring for the estimation of regression parameters. It stems from the fact that the estimation requires a sufficient number of data points at the higher hierarchy level. A sample of 100 adults was shown to satisfy the sample size demand, whereas a sample of 50 individuals appeared to negatively influence the sensitivity in regard to low-to-moderate effects. This finding points at potential difficulties with applying the method to plants that form small populations, such as threatened species (Wade et al., 2016).

Interestingly, the number of mother plants seems to be of little importance for correct identification of the effects of phenotypic variables. Nonetheless, the simulations represented quite an idealistic situation compared to real populations. In real populations, for example, pollen dispersal may be very limited (Degen et al., 2004; Lloyd et al., 2018; Moracho et al., 2016). Under such conditions, sampling of a limited number of families may influence the phenotypic spectrum of successful pollen donors leading to either false discoveries or false-negative effects. Because the number of progeny per family appeared to be unimportant for the quality of regression parameter estimates, if a study is focused on identifying phenotypic determinants of male reproductive success, it is advisable to sample as many mothers as possible even at the cost of low progeny number per tree, as in the yew case study. Such a sampling scheme is in line with earlier suggestions (Chybicki et al., 2019; Koelling et al., 2012). On the other hand, if individual variation in selfing or pollen migration rates is of interest, one should seek a compromise between the number of offspring per family and the total number of families. It seems that 10–15 mother plants and about 20 seeds per mother would satisfy both deals.

Comparison of the hierarchical neighborhood model approach and the two-step approach showed that the relatively high estimation efficiency of the former method results from the integration of genetic and nongenetic data to estimate fecundities. The strategy of combining different data types to increase the power of a statistical procedure is well known in molecular ecology (Chybicki et al., 2019; Gaggiotti et al., 2004; Guillot et al., 2012; Okuyama & Bolker, 2005). Also, the classic neighborhood model can be thought of as another example of a combined data approach. Therefore, the hierarchical neighborhood model is not truly novel in this context. However, the proposed approach adjusts the existing model to account for the presence of excessive variation in individual fecundity that goes significantly beyond the fixed-effects model. It should be stressed that the success of the hierarchical method relies on properly selected covariates of reproductive success. In plants, male fecundity is expected to correlate with flowering intensity (Jong & Klinkhamer, 2005). The abundance of male flowers (or equivalents) translates directly into an abundance of produced pollen. In addition, in animal-pollinated plants, the flower abundance attracts pollinators and increases pollen transfer (Glaettli & Barrett, 2008). Also, floral characters (floral display, floral colour, nectar and scent) may be important for male success because they influence visitation frequency and foraging time of pollinators, and indirectly pollen dispersal between plants (Abraham, 2005; Huang et al., 2006; Yan et al., 2016). However, flower traits are often difficult to quantify, especially if there are many candidate parents within a study plot or if a study species is characterized by large dimensions, as many trees. Fortunately, flower abundance usually covaries with individual size, making size-related phenotypic variables a right choice for covariates of reproductive success (Avanzi et al., 2020; Younginger et al., 2017). In trees, trunk diameter is the easiest and the most accurately measurable predictor of this kind. Therefore, it should be, and usually is, a variable of the first choice if flowering observations are not available. The analysed examples of spruce and yew data can serve as a good illustration of how important is the selection of phenotypic variables. In the spruce case study, explanatory variables included traits that have a priori high likelihood of being predictors of male fecundity, and two out of three variables appeared to have a significant effect. As a result, the estimated regression model explained about 50% of the total variation in fecundity. In the yew case study, on the contrary, only a trunk diameter was included. Although it appeared to be significantly related to male reproductive success, a variation in trunk diameter explained only 14% of the total variation in male fecundity. Interestingly, in both cases, the level of overdispersion was still relatively high, suggesting that male fecundity is shaped by additional unmeasured factors, such as flowering time, genetic compatibility between plants as well as inbreeding and outbreeding depression that may lead to early-stage mortality of seeds and, consequently, to an un-intended ascertainment bias in the sample of progeny.

The relevant question remains whether the classic neighborhood model approach is still useful. Clearly, the hierarchical neighborhood model offers better estimates of phenotypic effects than the classic neighborhood model. On the other hand, the classic model is more straightforward and allows the maximum-likelihood procedure to be used for parameter estimation. Therefore, the classic approach is more time-efficient than the hierarchical approach, which relies on the MCMC algorithm. This appears to play an important role in the validation of the method using computer simulations. Also, due to higher sensitivity, the classic method allows detecting effects based on a small effective sample of progeny. It can be important in the case of species characterized by high pollen dispersal capacity that makes the effective progeny number much lower than the census number of progeny in a sample. However, since the classic method can detect false-positive effects in the face of overdispersion related to omitted factors, the results should always be considered with caution, especially when they point at counterfactual or hard to explain relationships. More importantly, the classic neighborhood model remains useful for assessing the ongoing gene flow (i.e. dispersal kernel, immigration). In this case, in order to yield unbiased gene flow parameters (Burczyk & Chybicki, 2004), the phenotypic variables need to be added to account for the actual, often size-dependent, distribution of pollen productivity (analogously to the seed shadow model; Clark et al., 1999), especially when plant sizes exhibit large variation.

Our simulations, as well as case studies, suggested that the approach relying on separate estimation for male fecundity and the effects of phenotypic variables requires a substantial amount of paternity assignments in order to eventually provide accurate estimates of regression slopes. Therefore, it is recommended to sample a large number of progeny (as in Klein et al., 2008 or Chybicki & Burczyk, 2013) to saturate the mating model with informative data. Otherwise, fecundity estimates may tend to reflect priors, and the estimates of regression slopes may become severely biased or even meaningless. Interestingly, the dependency of fecundity estimates on priors can also be noted in the hierarchical neighborhood model approach, as clearly shown in the spruce case study. Inspection of Figure 1d reveals that low fecundity estimates tended to largely reflect their prior distributions (i.e. the estimated regression model). However, because the priors are saturated with phenotypic data, the fecundities are estimated accurately, of course, as long as the underlying regression model is identified correctly. Nonetheless, despite that the two-step approach is characterized by a high sampling demand, it still offers some advantages. While the regression model used in the hierarchical approach is relatively simple, the actual relationship between phenotypic variables and male reproductive success may need more complex patterns to account for. Inequality of variance over the range of individual fecundities seems to be an important statistical phenomenon not implemented in the hierarchical neighborhood model. Although the heteroscedasticity does not introduce any bias to slope coefficients (White, 1980), it influences the variances of the estimates. Consequently, the statistical power of variable selection may be reduced (Cleasby & Nakagawa, 2011). With the two-step approach, advanced techniques of regression analysis can be easily implemented to overcome this problem. For this reason, the hierarchical neighborhood model approach and the two-step approach can be considered as complementary methods, keeping in mind significant differences in data demand.

In our simulations and the case studies, we considered a somewhat limited number of potential determinants of reproductive success so that we observed no problems with identifying the best model, suggesting that the designed RJMCMC algorithm efficiently explored the model space. However, in the case of large number (e.g. dozens) of explanatory variables, it may be advisable to restrict the model space to the most promising variables which can be preselected in the initial run as components of the median probability model, or the variables for which the inclusion probability is ≥0.5 (Barbieri & Berger, 2004), as suggested in our earlier work (Chybicki et al., 2019).

Our case studies provided a good illustration of the applicability of the hierarchical neighborhood model to real data. Here, we showed that male reproductive success in spruce tends to increase with both tree height and male strobili abundance while it is not related to crown volume. Interestingly, tree height appeared to be the most important factor associated with fecundity. The slope close to unity indicates that the unit increase in tree height (1 unit = 1 standard deviation of the trait) gives approximately the unit increase in male relative log-fecundity. For comparison, the effect of male strobili abundance was only half of that of tree height. The relatively higher importance of tree height on male fecundity is somewhat unexpected given that male strobili abundance is directly correlated with the amount of produced pollen. This finding may reflect differences in measurement accuracy between the two traits. On the other hand, wind-pollinated trees characterized by higher stature are expected to shed pollen more effectively than shorter ones due to better exposition to wind (Di-Giovanni & Kevan, 1991; Tackenberg, 2003). The empirical evidence from the study of mating patterns in Pinus attenuata (Burczyk et al., 1996) provides good support for this expectation. Hence, our results are in line with earlier studies (see Petit & Hampe, 2006) that, in wind-pollinated plants, relative male reproductive success depends on both the relative amount of produced pollen and the relative ability to spread pollen.

The yew case study confirmed our previous finding (Chybicki & Oleksa, 2018), based on the classic neighborhood model, that male trees with greater trunk diameter tend to have higher reproductive success. However, with the new method, unlike the classic approach, we were able to show how individual trees deviate from the general pattern. The dispersion of fecundities around the regression line (Figure 4d) suggests that additional factors must shape the observed male fecundity. A future study could focus on the explanation of the presence of several male trees characterized by the fecundity above the predictions based on the regression model, primarily because such a high variation in reproductive success can be related with the ongoing adaptation (Gerzabek et al., 2017). Given that the study population is close to the margin of natural species distribution, the identification of determinants of the excessive variation in male fecundity could be important both in the context of gene pool conservation and for the understanding of mechanisms behind the natural selection on male strobili traits (Mayol et al., 2020).

The application of the hierarchical neighborhood model approach to real data also showed that ignoring nonrandom pollen dispersal may lead to false-negative results, especially when the magnitude of a variable effect is moderate-to-low. Only strong effects seem to remain relatively robust to the assumption about random vs. nonrandom dispersal. We should underscore that the observation is not new, as mentioned in the introduction. However, the analyzed case studies made an excellent opportunity to emphasize the importance of ignored pollen dispersal in studies of determinants of reproductive success. It must be said that the problem is expected when the observed parentage counts are regressed on measured phenotypes because such counts are a function of both pollen dispersal and fecundity, while only the former factor is accounted for in such analyses. Consequently, future studies should rely on well-tested approaches, such as the hierarchical neighborhood model developed in this study.

5 CONCLUSIONS

Here, we showed that the explicit accounting for overdispersion in male fecundity improves the neighborhood model's performance, reducing significantly the frequency of false discovery of the effects of phenotypic variables. The important refinement, as compared with the classic neighborhood model, is also the ability of the hierarchical approach to quantify the proportion of the total variation in fecundity that is explained by the estimated regression model. Using our approach, it is now easy to assess whether additional factors may play a role in determining reproductive success. We believe this feature may be stimulating for future studies on mating patterns as well as on sexual selection in plants (Lankinen & Green, 2015).

ACKNOWLEDGEMENTS

The study was supported by the National Science Centre, Poland (the grant UMO-2018/31/B/NZ8/01808 to IJC) and Poznań University of Life Sciences (MD).

AUTHOR CONTRIBUTIONS

I.J.C. developed the statistical approach and ran simulations. A.O. and M.D. performed sampling, made ecological measurements and genotyped the case study populations. I.J.C. analysed the empiric data and wrote the first draft. All the authors contributed to the final version of the manuscript.

Open Research

DATA AVAILABILITY STATEMENT

Sampling locations, morphological data and microsatellite genotypes of Picea abies and Taxus baccata: The Dryad repository (10.5061/dryad.51c59zw72).
The Windows computer program NM2F implementing the hierarchical neighborhood model is available at https://www.ukw.edu.pl/pracownicy/plik/igor_chybicki/1806/.

Supporting Information

REFERENCES

Abraham, J. N. (2005). Insect choice and floral size dimorphism: Sexual selection or natural selection? Journal of Insect Behavior, 18(6), 743–756. https://doi.org/10.1007/s10905-005-8737-1
10.1007/s10905-005-8737-1
Web of Science® Google Scholar
Adams, W. T., & Birkes, D. S. (1989). Mating Patterns in Seed Orchards. Proceedings of the 20th Southern Forest Tree Improvement Conference, pp. 75–86. Retrieved from https://rngr.net/publications/tree-improvement-proceedings/sftic/1989/mating-patterns-in-seed-orchards
Google Scholar
Ashley, M. V. (2010). Plant parentage, pollination, and dispersal: How DNA microsatellites have altered the landscape. Critical Reviews in Plant Sciences, 29(3), 148–161. https://doi.org/10.1080/07352689.2010.481167
10.1080/07352689.2010.481167
CAS Web of Science® Google Scholar
Austerlitz, F., Dick, C. W., Dutech, C., Klein, E. K., Oddou-Muratorio, S., Smouse, P. E., & Sork, V. L. (2004). Using genetic markers to estimate the pollen dispersal curve. Molecular Ecology, 13(4), 937–954. https://doi.org/10.1111/j.1365-294X.2004.02100.x
10.1111/j.1365-294X.2004.02100.x
CAS PubMed Web of Science® Google Scholar
Avanzi, C., Heer, K., Büntgen, U., Labriola, M., Leonardi, S., Opgenoorth, L., Piermattei, A., Urbinati, C., Vendramin, G. G., & Piotti, A. (2020). Individual reproductive success in Norway spruce natural populations depends on growth rate, age and sensitivity to temperature. Heredity, 124(6), 685–698. https://doi.org/10.1038/s41437-020-0305-0
10.1038/s41437-020-0305-0
PubMed Web of Science® Google Scholar
Barbieri, M. M., & Berger, J. O. (2004). Optimal predictive model selection. Annals of Statistics, 32(3), 870–897. https://doi.org/10.1214/009053604000000238
10.1214/009053604000000238
Web of Science® Google Scholar
Born, C., Kjellberg, F., Chevallier, M. H., Vignes, H., Dikangadissi, J. T., Sanguié, J., & Hossaert-Mckey, M. (2008). Colonization processes and the maintenance of genetic diversity: Insights from a pioneer rainforest tree, Aucoumea klaineana. Proceedings of the Royal Society B: Biological Sciences, 275(1647), 2171–2179. https://doi.org/10.1098/rspb.2008.0446
10.1098/rspb.2008.0446
PubMed Web of Science® Google Scholar
Burczyk, J., Adams, W. T., Moran, G. F., & Griffin, A. R. (2002). Complex patterns of mating revealed in a Eucalyptus regnans seed orchard using allozyme markers and the neighbourhood model. Molecular Ecology, 11(11), 2379–2391. https://doi.org/10.1046/j.1365-294X.2002.01603.x
10.1046/j.1365-294X.2002.01603.x
CAS PubMed Web of Science® Google Scholar
Burczyk, J., Adams, W. T., & Shimizu, J. Y. (1996). Mating patterns and pollen dispersal in a natural knobcone pine (Pinus attenuate Lemmon.) stand. Heredity, 77(3), 251–260. https://doi.org/10.1038/hdy.1996.139
10.1038/hdy.1996.139
Web of Science® Google Scholar
Burczyk, J., & Chybicki, I. J. (2004). Cautions on direct gene flow estimation in plant populations. Evolution: International Journal of Organic. Evolution, 58(5), 956–963.
10.1111/j.0014-3820.2004.tb00430.x
PubMed Web of Science® Google Scholar
Burczyk, J., & Koralewski, T. E. (2005). Parentage versus two-generation analyses for estimating pollen-mediated gene flow in plant populations. Molecular Ecology, 14(8), 2525–2537. https://doi.org/10.1111/j.1365-294X.2005.02593.x
10.1111/j.1365-294X.2005.02593.x
CAS PubMed Web of Science® Google Scholar
Chybicki, I. J. (2013). Note on the applicability of the f-model in analysis of pollen pool heterogeneity. Journal of Heredity, 104(4), 578–585. https://doi.org/10.1093/jhered/est029
10.1093/jhered/est029
CAS PubMed Web of Science® Google Scholar
Chybicki, I. J. (2018). NMπ—improved re-implementation of NM+, a software for estimating gene dispersal and mating patterns. Molecular Ecology Resources, 18(1), 159–168. https://doi.org/10.1111/1755-0998.12710
10.1111/1755-0998.12710
PubMed Web of Science® Google Scholar
Chybicki, I. J., & Burczyk, J. (2010). NM+: Software implementing parentage-based models for estimating gene dispersal and mating patterns in plants. Molecular Ecology Resources, 10(6), 1071–1075.
10.1111/j.1755-0998.2010.02849.x
CAS PubMed Web of Science® Google Scholar
Chybicki, I. J., & Burczyk, J. (2013). Seeing the forest through the trees: Comprehensive inference on individual mating patterns in a mixed stand of Quercus robur and Q. petraea. Annals of Botany, 112(3), 561–574. https://doi.org/10.1093/aob/mct131
10.1093/aob/mct131
CAS PubMed Web of Science® Google Scholar
Chybicki, I. J., Iszkuło, G., & Suszka, J. (2019). Bayesian quantification of ecological determinants of outcrossing in natural plant populations: Computer simulations and the case study of biparental inbreeding in English yew. Molecular Ecology, 28(17), 4077–4096. https://doi.org/10.1111/mec.15195
10.1111/mec.15195
PubMed Web of Science® Google Scholar
Chybicki, I. J., & Oleksa, A. (2018). Seed and pollen gene dispersal in Taxus baccata, a dioecious conifer in the face of strong population fragmentation. Annals of Botany, 122(3), 409–421. https://doi.org/10.1093/aob/mcy081
10.1093/aob/mcy081
PubMed Web of Science® Google Scholar
Clark, J. S., Silman, M., Kern, R., Macklin, E., & HilleRisLambers, J. (1999). Seed dispersal near and far: Patterns across temperate and tropical forests. Ecology, 80(5), 1475. https://doi.org/10.2307/176541
10.1890/0012-9658(1999)080[1475:SDNAFP]2.0.CO;2
Web of Science® Google Scholar
Cleasby, I. R., & Nakagawa, S. (2011). Neglected biological patterns in the residuals: A behavioural ecologist’s guide to co-operating with heteroscedasticity. Behavioral Ecology and Sociobiology, 65, 2361–2372. https://doi.org/10.2307/41414703
10.1007/s00265-011-1254-7
Web of Science® Google Scholar
de Jong, T. J., & Klinkhamer, P. G. L. (2005). Evolutionary ecology of plant reproductive strategies. Cambridge University Press.
Google Scholar
Degen, B., Bandou, E., & Caron, H. (2004). Limited pollen dispersal and biparental inbreeding in Symphonia globulifera in French Guiana. Heredity, 93(6), 585–591. https://doi.org/10.1038/sj.hdy.6800560
10.1038/sj.hdy.6800560
CAS PubMed Web of Science® Google Scholar
Dering, M., Misiorny, A., & Chałupka, W. (2014). Inter-year variation in selfing, background pollination, and paternal contribution in a Norway spruce clonal seed orchard. Canadian Journal of Forest Research, 44(7), 760–767. https://doi.org/10.1139/cjfr-2014-0061
10.1139/cjfr-2014-0061
Web of Science® Google Scholar
Di-Giovanni, F., & Kevan, P. G. (1991). Factors affecting pollen dynamics and its importance to pollen contamination: a review. Canadian Journal of Forest Research, 21(8), 1155–1170. https://doi.org/10.1139/x91-163
10.1139/x91-163
Web of Science® Google Scholar
DiLeo, M. F., Holderegger, R., & Wagner, H. H. (2018). Contemporary pollen flow as a multiscale process: Evidence from the insect-pollinated herb, Pulsatilla vulgaris. Journal of Ecology, 106(6), 2242–2255. https://doi.org/10.1111/1365-2745.12992
10.1111/1365-2745.12992
Web of Science® Google Scholar
Ellstrand, N. C. (2014). Is gene flow the most important evolutionary force in plants? American Journal of Botany, 101(5), 737–753. https://doi.org/10.3732/ajb.1400024
10.3732/ajb.1400024
PubMed Web of Science® Google Scholar
Gaggiotti, O. E., Brooks, S. P., Amos, W., & Harwood, J. (2004). Combining demographic, environmental and genetic data to test hypotheses about colonization events in metapopulations. Molecular Ecology, 13(4), 811–825. https://doi.org/10.1046/j.1365-294X.2003.02028.x
10.1046/j.1365-294X.2003.02028.x
CAS PubMed Web of Science® Google Scholar
Gelman, A., Goodrich, B., Gabry, J., & Vehtari, A. (2019). R-squared for bayesian regression models. American Statistician, 73, 307–309. https://doi.org/10.1080/00031305.2018.1549100
10.1080/00031305.2018.1549100
Web of Science® Google Scholar
Gelman, A., Hwang, J., & Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24(6), 997–1016. https://doi.org/10.1007/s11222-013-9416-2
10.1007/s11222-013-9416-2
Web of Science® Google Scholar
Gérard, P. R., Klein, E. K., Austerlitz, F., Fernández-Manjarrés, J. F., & Frascaria-Lacoste, N. (2006). Assortative mating and differential male mating success in an ash hybrid zone population. BMC Evolutionary Biology, 6, https://doi.org/10.1186/1471-2148-6-96
10.1186/1471-2148-6-96
PubMed Web of Science® Google Scholar
Gerzabek, G., Oddou-Muratorio, S., & Hampe, A. (2017). Temporal change and determinants of maternal reproductive success in an expanding oak forest stand. Journal of Ecology, 105(1), 39–48. https://doi.org/10.1111/1365-2745.12677
10.1111/1365-2745.12677
Web of Science® Google Scholar
Glaettli, M., & Barrett, S. C. H. (2008). Pollinator responses to variation in floral display and flower size in dioecious Sagittaria latifolia (Alismataceae). New Phytologist, 179(4), 1193–1201. https://doi.org/10.1111/j.1469-8137.2008.02532.x
10.1111/j.1469-8137.2008.02532.x
PubMed Web of Science® Google Scholar
Guillot, G., Renaud, S., Ledevin, R., Michaux, J., & Claude, J. (2012). A unifying model for the analysis of phenotypic, genetic, and geographic data. Systematic Biology, 61(6), 897–911. https://doi.org/10.1093/sysbio/sys038
10.1093/sysbio/sys038
PubMed Web of Science® Google Scholar
Hadfield, J. D., Richardson, D. S., & Burke, T. (2006). Towards unbiased parentage assignment: Combining genetic, behavioural and spatial data in a Bayesian framework. Molecular Ecology, 15(12), 3715–3730. https://doi.org/10.1111/j.1365-294X.2006.03050.x
10.1111/j.1365-294X.2006.03050.x
CAS PubMed Web of Science® Google Scholar
Hadfield, J. D., Wilson, A. J., Garant, D., Sheldon, B. C., & Kruuk, L. E. B. (2010). The misuse of BLUP in ecology and evolution. American Naturalist, 175(1), 116–125. https://doi.org/10.1086/648604
10.1086/648604
PubMed Web of Science® Google Scholar
Hopley, T., Zwart, A. B., & Young, A. G. (2015). Among-population pollen movement and skewed male fitness in a dioecious weed. Biological Invasions, 17(7), 2147–2161. https://doi.org/10.1007/s10530-015-0867-6
10.1007/s10530-015-0867-6
Web of Science® Google Scholar
Huang, S.-Q., Tang, L.-L., Sun, J.-F., & Lu, Y. (2006). Pollinator response to female and male floral display in a monoecious species and its implications for the evolution of floral dimorphism. New Phytologist, 171(2), 417–424. https://doi.org/10.1111/j.1469-8137.2006.01766.x
10.1111/j.1469-8137.2006.01766.x
CAS PubMed Web of Science® Google Scholar
Klein, E. K., Bontemps, A., & Oddou-Muratorio, S. (2013). Seed dispersal kernels estimated from genotypes of established seedlings: Does density-dependent mortality matter? Methods in Ecology and Evolution, 4(11), 1059–1069. https://doi.org/10.1111/2041-210X.12110
10.1111/2041-210X.12110
Web of Science® Google Scholar
Klein, E. K., Carpentier, F. H., & Oddou-Muratorio, S. (2011). Estimating the variance of male fecundity from genotypes of progeny arrays: evaluation of the Bayesian forward approach. Methods in Ecology and Evolution, 2(4), 349–361. https://doi.org/10.1111/j.2041-210X.2010.00085.x
10.1111/j.2041-210X.2010.00085.x
Web of Science® Google Scholar
Klein, E. K., Desassis, N., & Oddou-Muratorio, S. (2008). Pollen flow in the wildservice tree, Sorbus torminalis (L.) Crantz. IV. Whole interindividual variance of male fecundity estimated jointly with the dispersal kernel. Molecular Ecology, 17(14), 3323–3336. https://doi.org/10.1111/j.1365-294X.2008.03809.x
10.1111/j.1365-294X.2008.03809.x
CAS PubMed Web of Science® Google Scholar
Knight, T. M. (2003). Floral density, pollen limitation, and reproductive success in Trillium grandiflorum. Oecologia, 137(4), 557–563. https://doi.org/10.1007/s00442-003-1371-8
10.1007/s00442-003-1371-8
PubMed Web of Science® Google Scholar
Koelling, V. A., Monnahan, P. J., & Kelly, J. K. (2012). A Bayesian method for the joint estimation of outcrossing rate and inbreeding depression. Heredity, 109(6), 393–400. https://doi.org/10.1038/hdy.2012.58
10.1038/hdy.2012.58
CAS PubMed Web of Science® Google Scholar
Lande, R., & Arnold, S. J. (1983). The measurement of selection on correlated characters. Evolution, 37(6), 1210–1226. https://doi.org/10.1111/j.1558-5646.1983.tb00236.x
10.1111/j.1558-5646.1983.tb00236.x
PubMed Web of Science® Google Scholar
Lankinen, Å., & Green, K. K. (2015). Using theories of sexual selection and sexual conflict to improve our understanding of plant ecology and evolution. AoB PLANTS, 7(1), 1–18. https://doi.org/10.1093/aobpla/plv008
Google Scholar
Lloyd, M. W., Tumas, H. R., & Neel, M. C. (2018). Limited pollen dispersal, small genetic neighborhoods, and biparental inbreeding in Vallisneria americana. American Journal of Botany, 105(2), 227–240. https://doi.org/10.1002/ajb2.1031
10.1002/ajb2.1031
PubMed Web of Science® Google Scholar
Lyons, E. E., Waser, N. M., Price, M. V., Antonovics, J., & Motten, A. F. (1989). Sources of variation in plant reproductive success and implications for concepts of sexual selection. The American Naturalist, 134, 409–433. https://doi.org/10.2307/2462178
10.1086/284988
Web of Science® Google Scholar
Mayol, M., Riba, M., Cavers, S., Grivet, D., Vincenot, L., Cattonaro, F., & González-Martínez, S. C. (2020). A multiscale approach to detect selection in nonmodel tree species: Widespread adaptation despite population decline in Taxus baccata L. Evolutionary Applications, 13(1), 143–160. https://doi.org/10.1111/eva.12838
10.1111/eva.12838
CAS PubMed Web of Science® Google Scholar
Meagher, T. R. (1991). Analysis of paternity within a natural population of Chamaelirium luteum. II. Patterns of male reproductive success. American Naturalist, 137(6), 738–752. https://doi.org/10.1086/285191
10.1086/285191
Web of Science® Google Scholar
Moracho, E., Moreno, G., Jordano, P., & Hampe, A. (2016). Unusually limited pollen dispersal and connectivity of Pedunculate oak (Quercus robur) refugial populations at the species’ southern range margin. Molecular Ecology, 25(14), 3319–3331. https://doi.org/10.1111/mec.13692
10.1111/mec.13692
CAS PubMed Web of Science® Google Scholar
Morgan, M. T., & Conner, J. K. (2001). Using genetic markers to directly estimate male selection gradients. Evolution, 55(2), 272–281. https://doi.org/10.1111/j.0014-3820.2001.tb01292.x
10.1111/j.0014-3820.2001.tb01292.x
CAS PubMed Web of Science® Google Scholar
Oddou-Muratorio, S., Gauzere, J., Bontemps, A., Rey, J.-F., & Klein, E. K. (2018). Tree, sex and size: Ecological determinants of male vs. female fecundity in three Fagus sylvatica stands. Molecular Ecology, 27(15), 3131–3145. https://doi.org/10.1111/mec.14770
10.1111/mec.14770
PubMed Web of Science® Google Scholar
Oddou-Muratorio, S., Klein, E. K., & Austerlitz, F. (2005). Pollen flow in the wildservice tree, Sorbus torminalis (L.) Crantz. II. Pollen dispersal and heterogeneity in mating success inferred from parent-offspring analysis. Molecular Ecology, 14(14), 4441–4452. https://doi.org/10.1111/j.1365-294X.2005.02720.x
10.1111/j.1365-294X.2005.02720.x
CAS PubMed Web of Science® Google Scholar
Okuyama, T., & Bolker, B. M. (2005). Combining genetic and ecological data to estimate sea turtle origins. Ecological Applications, 15(1), 315–325. https://doi.org/10.1890/03-5063
10.1890/03-5063
Web of Science® Google Scholar
Ortego, J., Bonal, R., Muñoz, A., & Aparicio, J. M. (2014). Extensive pollen immigration and no evidence of disrupted mating patterns or reproduction in a highly fragmented holm oak stand. Journal of Plant Ecology, 7(4), 384–395. https://doi.org/10.1093/JPE/RTT049
10.1093/jpe/rtt049
Web of Science® Google Scholar
Petit, R. J., & Hampe, A. (2006). Some evolutionary consequences of being a tree. Annual Review of Ecology, Evolution, and Systematics, 37(1), 187–214. https://doi.org/10.1146/annurev.ecolsys.37.091305.110215
10.1146/annurev.ecolsys.37.091305.110215
Web of Science® Google Scholar
Setsuko, S., Nagamitsu, T., & Tomaru, N. (2013). Pollen flow and effects of population structure on selfing rates and female and male reproductive success in fragmented Magnolia stellata populations. BMC Ecology, 13(1), 10. https://doi.org/10.1186/1472-6785-13-10
10.1186/1472-6785-13-10
PubMed Web of Science® Google Scholar
Setsuko, S., & Tomaru, N. (2011). The effects of plant size and light availability on male and female reproductive success and functional gender in a hermaphrodite tree species. Magnolia Stellata. Botany, 89(9), 593–604. https://doi.org/10.1139/b11-051
10.1139/b11-051
Web of Science® Google Scholar
Smouse, P. E., Dyer, R. J., Westfall, R. D., & Sork, V. L. (2001). Two-generation analysis of pollen flow across a landscape. I. Male gamete heterogeneity among females. Evolution, 55(2), 260–271. https://doi.org/10.1111/j.0014-3820.2001.tb01291.x
10.1111/j.0014-3820.2001.tb01291.x
CAS PubMed Web of Science® Google Scholar
Smouse, P. E., & Meagher, T. R. (1994). Genetic analysis of male reproductive contributions in Chamaelirium luteum (L.) gray (Liliaceae). Genetics, 136(1), 313–322.
CAS PubMed Web of Science® Google Scholar
Smouse, P. E., Meagher, T. R., & Kobak, C. J. (1999). Parentage analysis in Chamaelirium luteum (L.) Gray (Liliceae): Why do some males have higher reproductive contributions? Journal of Evolutionary Biology, 12(6), 1069–1077. https://doi.org/10.1046/j.1420-9101.1999.00114.x
10.1046/j.1420-9101.1999.00114.x
Web of Science® Google Scholar
Snow, A. A., & Lewis, P. O. (1993). Reproductive traits and male fertility in plants: Empirical approaches. Annual Review of Ecology and Systematics, 24(1), 331–351. https://doi.org/10.1146/annurev.es.24.110193.001555
10.1146/annurev.es.24.110193.001555
Google Scholar
Tackenberg, O. (2003). Modeling long-distance dispersal of plant diaspores by wind. Ecological Monographs, 73(2), 173–189.
10.1890/0012-9615(2003)073[0173:MLDOPD]2.0.CO;2
Web of Science® Google Scholar
Tambarussi, E. V., Boshier, D., Vencovsky, R., Freitas, M. L. M., & Sebbenn, A. M. (2015). Paternity analysis reveals significant isolation and near neighbor pollen dispersal in small Cariniana legalis Mart. Kuntze populations in the Brazilian Atlantic Forest. Ecology and Evolution, 5(23), 5588–5600. https://doi.org/10.1002/ece3.1816
10.1002/ece3.1816
PubMed Web of Science® Google Scholar
Tani, N., Tsumura, Y., Fukasawa, K., Kado, T., Taguchi, Y., Lee, S. L., Lee, C. T., Muhammad, N., Niiyama, K., Otani, T., Yagihashi, T., Tanouchi, H., Ripin, A., & Kassim, A. R. (2015). Mixed mating system are regulated by fecundity in Shorea curtisii (Dipterocarpaceae) as revealed by comparison under different pollen limited conditions. PLoS One, 10(5), e0123445. https://doi.org/10.1371/journal.pone.0123445
10.1371/journal.pone.0123445
PubMed Web of Science® Google Scholar
Wade, E. M., Nadarajan, J., Yang, X., Ballesteros, D., Sun, W., & Pritchard, H. W. (2016). Plant species with extremely small populations (PSESP) in China: A seed and spore biology perspective. Plant Diversity, 38(5), 209–220. https://doi.org/10.1016/j.pld.2016.09.002
10.1016/j.pld.2016.09.002
PubMed Google Scholar
Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.
Web of Science® Google Scholar
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817. https://doi.org/10.2307/1912934
10.2307/1912934
CAS Web of Science® Google Scholar
Whitehead, M. R., Lanfear, R., Mitchell, R. J., & Karron, J. D. (2018). Plant mating systems often vary widely among populations. Frontiers in Ecology and Evolution, 6, 38. https://doi.org/10.3389/fevo.2018.00038
10.3389/fevo.2018.00038
Web of Science® Google Scholar
Winter, C., Lehmann, S., & Diekmann, M. (2008). Determinants of reproductive success: A comparative study of five endangered river corridor plants in fragmented habitats. Biological Conservation, 141(4), 1095–1104. https://doi.org/10.1016/j.biocon.2008.02.002
10.1016/j.biocon.2008.02.002
Web of Science® Google Scholar
Yan, J., Wang, G., Sui, Y., Wang, M., & Zhang, L. (2016). Pollinator responses to floral colour change, nectar, and scent promote reproductive fitness in Quisqualis indica (Combretaceae). Scientific Reports, 6(1), 1–10. https://doi.org/10.1038/srep24408
CAS PubMed Web of Science® Google Scholar
Younginger, B. S., Sirová, D., Cruzan, M. B., & Ballhorn, D. J. (2017). Is biomass a reliable estimate of plant fitness? Applications in Plant Sciences, 5(2), 1600094. https://doi.org/10.3732/apps.1600094
10.3732/apps.1600094
Web of Science® Google Scholar

Citing Literature

Volume21, Issue3

April 2021

Pages 781-800

Identification of determinants of pollen donor fecundity using the hierarchical neighborhood model

Abstract

1 INTRODUCTION

2 MATERIALS AND METHODS

2.1 The hierarchical neighborhood model

2.1.1 Pollen source probabilities

2.1.2 Background population

2.1.3 Male reproductive success

2.1.4 Selection gradients

2.2 Estimation

2.3 Simulations

2.4 The case studies

2.4.1 Norway spruce

2.4.2 English yew

3 RESULTS

3.1 Simulations

3.1.1 The effect of overdispersion

3.1.2 The effect of sampling design

3.1.3 The hierarchical approach vs. the classic approach

3.1.4 The hierarchical approach vs. the regression analysis on estimated fecundities

3.2 The case studies

3.2.1 Norway spruce

3.2.2 English yew

4 DISCUSSION

5 CONCLUSIONS

ACKNOWLEDGEMENTS

AUTHOR CONTRIBUTIONS

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Identification of determinants of pollen donor fecundity using the hierarchical neighborhood model

Abstract

1 INTRODUCTION

2 MATERIALS AND METHODS

2.1 The hierarchical neighborhood model

2.1.1 Pollen source probabilities

2.1.2 Background population

2.1.3 Male reproductive success

2.1.4 Selection gradients

2.2 Estimation

2.3 Simulations

2.4 The case studies

2.4.1 Norway spruce

2.4.2 English yew

3 RESULTS

3.1 Simulations

3.1.1 The effect of overdispersion

3.1.2 The effect of sampling design

3.1.3 The hierarchical approach vs. the classic approach

3.1.4 The hierarchical approach vs. the regression analysis on estimated fecundities

3.2 The case studies

3.2.1 Norway spruce

3.2.2 English yew

4 DISCUSSION

5 CONCLUSIONS

ACKNOWLEDGEMENTS

AUTHOR CONTRIBUTIONS

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information