The registration of herbicides in the European Union requires an assessment of risks to nontarget terrestrial plants (NTTPs). Regulatory plant studies are performed to determine risk-assessment-relevant endpoints (50% effect rate) for quantitative parameters, mostly biomass and survival. Recently, the European Food Safety Authority stated that endpoints for qualitatively assessed plant visual injuries (PVIs) such as necrosis, chlorosis, and so forth should be considered for the risk assessment as equal to endpoints derived from quantitatively determined parameters. However, the lack of guidance in the NTTP test guidelines on how to assess PVI and how to derive a statistically meaningful endpoint for PVI makes their use in risk assessments challenging. To evaluate and improve the reliability, reproducibility, and regulatory relevance of PVI assessments in NTTP studies, the PVI Working Group was formed in 2022 within the SETAC Plant Interest Group. In a first exercise, research needs, guidance gaps, and shortcomings in current methodologies were identified and are presented together with recommendations for a future, validated, and harmonized method for the assessment of PVI. Survey results revealed a high variability in how PVI are currently assessed, and that the reliability of these data is unclear. Under current conditions, the PVI data can rather be seen as supportive information instead of using the data for the statistically sound determination of a regulatory endpoint. Consequently, standardization and harmonization of procedures for the assessment of PVI are needed. An improved scoring methodology should be developed that allows for a precise, statistically sound endpoint determination. Regarding the regulatory relevance of PVI, further research is required to assess the biological meaning of PVI data and how this is connected to the regulatory requirements and protection goals. Last but not least, guidance is required on how to evaluate the historically available PVI data that are based on various assessment methodologies. Integr Environ Assess Manag 2024;20:915–923. © 2023 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals LLC on behalf of Society of Environmental Toxicology & Chemistry (SETAC).

INTRODUCTION

Herbicides are important tools within the integrated pest management framework for farmers to reduce the competition between crops and weeds for limited resources such as light, water, and nutrients in agricultural areas. Herbicides have a specific impact on plant physiology and metabolism, the particulars of which are dependent on the product's mode of action, which will result in suppressing the weed's vigor and therefore favors crop development (Cobb, 2022). Since herbicides are manufactured to control weed populations in the field, they might also impact nontarget terrestrial plants (NTTPs). In order to protect NTTPs, a comprehensive evaluation of the products' risks on NTTPs is required in the EC No 1107/2009 (European Commission, 2009). The herbicide's potential effects on NTTPs are compared to the expected exposure in the off-field area in a risk assessment paradigm. The risk assessment for herbicides is based on effect data from seedling emergence (OECD 208 2006a) and vegetative vigor studies (OECD 227 2006b). In both study types, several rates of the herbicide under evaluation are tested on several species to determine effects on biomass (shoot fresh or dry weight, shoot length), survival, emergence, and plant visual injury (PVI) such as necrosis, chlorosis, color change, and so forth, also referred to as “visual phytotoxicity, visible detrimental effects, abnormalities in appearance or morphological symptoms of phytotoxicity.” As phytotoxicity refers to the inherent potential of a substance to cause PVIs as mentioned above, the term PVI is used in this brief communication. For the measurable parameters of biomass, survival, and emergence, the dose–response data are used to determine an effect rate endpoint (ER_x) with linear and nonlinear regression models for each test species, which is used for the risk assessment. In the EU, the relevant endpoint for the risk assessment is the ER₅₀. In the past, it has been noted (European Food Safety Authority [EFSA], 2014, 2019) that in a number of cases, the endpoints for the visually assessed plant injuries (e.g., necrosis, chlorosis, color change, etc.) were reported to be lower than for the measurable parameters (e.g., biomass or survival). It was concluded that if the PVI endpoint is the lowest, it should be considered for the risk assessment. This requirement is included in the central zone guidance document (Central secretariat, 2022) and also in some national guidance documents in the EU (Agentur für Gesundheit und Ernährungssicherheit [AGES], 2021; College voor de toelating van gewasbeschermingsmiddelen en biociden [Ctgb], 2022), without providing any detailed guidance.

The lack of guidance in the guidance documents and the NTTP OECD test guidelines (OECD 208 2006a; OECD 227 2006b) on how to assess PVI and how to derive a statistically meaningful ER₅₀ from the PVI data leads to high variability in how laboratories are assessing PVI and how an endpoint can be derived. Also, the subjective nature of the PVI assessment (based on qualitative observations rather than direct measurements as, e.g., low, medium, or high effects) results in a low reliability of these data compared to quantitatively assessed empirical data points such as biomass, which is based on objective weight measurements. This makes it challenging to compare the VI data with endpoints derived of the measurable parameters to conclude on the lowest of all available endpoints. Accordingly, there is a gap between what the current methods for PVI assessments can deliver in terms of reliability and reproducibility and what is requested to perform a protective NTTP risk assessment. Specifically, details on the relevant number of PVI scores that can be reliably and reproducibly assigned to affected plants by different assessors and a suitable statistical method for endpoint determination based on these data are missing in regulatory binding documents. The PVI Working Group (WG) was formed in 2022 within the SETAC Plant Interest Group to evaluate and improve the reliability, reproducibility, and regulatory relevance of PVI assessments in NTTP studies. This brief communication addresses the current state of the art of PVI assessments in NTTP studies, its shortcomings, and potential research needs.

METHODOLOGICAL ISSUES REGARDING PVI ASSESSMENTS IN NTTP STUDIES

In order to assess the risk of plant protection products on NTTPs, seedling emergence and vegetative vigor studies (OECD 208 2006a; OECD 227 2006b) are required in the EU Comission Regulation No 283 and 284/2013 (European Commission, 2013a, 2013b). In both study types, parameters such as plant biomass, survival, plant height, and additionally emergence for the OECD 208 2006a are determined via quantitative, nonsubjective measurements. Besides these quantitative parameters, PVI is qualitatively assessed. Plant visual injury is a regular, qualitative, and visual comparison of the appearance of treated plants with the untreated control plants, which manifests in symptoms such as necrosis, chlorosis, color change, leaf curling, stunting, and so forth and is performed as a subjective estimation by an individual assessor. The advantage of a visual rating of effects in plant studies is that it is a nondestructive, cost-, and time-effective method that provides qualitative results on the visually obvious effects on plants. The data derived from visual assessments are often used at the screening stage of an experimental series as a good indicator to assess the effects, which can be used as information if required to design further experiments with more detailed and quantitative data. In the broader field of plant research, different types of scoring systems are currently used in order to visually assess weed control performance of herbicides (Canadian Weed Science Society, 2018; Frans & Talbert, 1977; Hamill et al., 1977; Rasmussen, 1956), various plant injury symptoms (Anwar et al., 2010; European and Mediterranean Plant Protection Organization [EPPO], 2014; Pradeep et al., 2020), nutrient deficiencies in plants (Rodriguez de Cianzio et al., 1979; Santos et al., 2015), plant disease infestation (Henfling, 1987; Koyshibayev, 2016), or vegetation cover (Damgaard, 2014). All types propose to visually rate the extent of abnormalities of single plants or plants in field plots in comparison to the plants or field plots of the control group on different scales as, for example, 1-5/A-E, 0-10, or 0-100. For instance, the EPPO (2014) recommends assessing and quantifying individual plant symptoms (e.g., modifications in color, necrosis, or deformation) via the number of affected plants, use of a scale (e.g., none, slight, medium, strong), percentage surface area affected, or via a rating relative to an untreated control.

The shortcomings of any kind of visual assessment lie within the subjectivity of the assessment, which is open to human bias, and might result in a low precision, reproducibility, and reliability of the results (Andújar et al., 2010; Donald, 2006; Klimeš, 2003; Nash, 1981). Plant visual injury assessments for NTTP studies are in particular based on expert judgment by trained personnel. However, assessment of different intensities of abnormalities at the same time might lead to an over- or underestimation of the effects (Nilsson, 1995). As such, the resulting data are dependent on the opinion of the individual conducting the assessment at the time of the assessment and do not reflect an accurate and objective measurement of visual injury symptoms in a way that it will be appropriately quantified for use in risk assessment. For example, when a scale of 11 different percent classes (0, 10, 20 … 100) is used, it requires that an assessor can detect differences of 10% in a specific injury by sight. As a result, the level of reproducibility decreases with an increasing number of scores as this requires an extremely precise level of visual judgment from the assessor. Additionally, the resulting data have a low reproducibility between individual assessors, as different assessors will not necessarily assign the same score to the same plant due to individual bias. Horst et al. (1984) showed that the visual evaluation of the quality and density of turf grasses based on a 1–9 scale showed a high variability among 10 evaluators. For example, the average rating for the turfgrass density of a specific cultivar (Kentucky bluegrass) ranged from 4.7 to 8.2. For regulatory NTTP studies, the OECD guidelines (OECD 208 2006a; OECD 227 2006b) propose the use of a uniform scoring for visual injury to evaluate the observable toxic response and provide two references (Frans & Talbert, 1977; Hamill et al., 1977) for the performance of a qualitative and quantitative visual rating. Both references describe scoring systems that can be used to assess the weed control performance of herbicides in field trials. The two examples recommend using a scale of 11 steps either as 0–10 (0, 1, 2, … 10) or as 0–100 (0, 10, 20, … 100). The ultimate goal of the described scoring systems in both references is to evaluate the performance of herbicides in terms of weed reduction in a field plot experiment by assessing the ground coverage of weeds. Accordingly, the visual estimation is based on the distinction between dark soil and green plant weed coverage and will be attributed to one of the available categories. Because the field plot size is consistent and known for each plot, and there is a clear visual distinction between the dark soil and the green plant coverage, we anticipate that the estimates will have some level of precision when performing weed coverage assessments. In contradiction to this, the visual assessment in NTTP studies aim at evaluating the visual symptoms of a single plant or a few plants within one pot in reference to an average visual estimation of all control plants. While the detection of the symptom itself might be easy, the estimation of the extent for one or even several different symptoms of one plant or a few plants in a pot compared to a control requires a detailed observation (e.g., deformation, chlorosis, color change, etc.) and is susceptible to a high variability of estimated results between different assessors. The color and intensity of illumination in the greenhouse as well as the color of the surroundings where the plants are located might have an impact on the visual assessments (Nilsson, 1995).

Also, the aim of assessing weed control performance is different from the assessment of effects in NTTP studies. An herbicide can be seen as successful if the ground is bare of live weeds (Donald, 2006). This means that for weed control performance assessments, strong effects are relevant (80%–100% effects). The current standard endpoint used in risk assessment of NTTP is an ER₅₀ in Europe. However, visual assessments in general are not considered to be precise around and below the 50% effect level (Andújar et al., 2010; Sherwood et al., 1983), but are most accurately performed at the lower or upper end of the scale (around 10% or 90%) (Donald, 2006; Hebert, 1982). This means that if the goal of a scoring system is designed to assess weed control performance, it is not directly transferable to NTTP studies to determine an ER₅₀. In the system of Hamill et al. (1977) the weed cover of a field plot is rated from 0 = complete weed cover to 10 = absence of weeds. However, there are no specific details on how to account for different, nonlethal PVI symptoms that would be required for use in an NTTP study. Conclusively, using the guidance provided in Hamill et al. (1977) for performing PVI assessments for NTTPs is questionable.

Frans and Talbert (1977) propose a scoring system that differentiates between the effects of an herbicide on the weeds and the crop. For each of the scores, a qualitative description is provided (e.g., score: 20 = poor weed control or some crop discoloration, stunting, or stand loss; score: 50 = deficient to moderate weed control or crop injury more lasting, recovery doubtful). The crop-specific scoring in Frans and Talbert (1977) might be transferable to score PVI in NTTP studies. However, the qualitative description of the symptoms is not consistent over the whole scale. At the score 20, the symptoms on the crop to be considered are specified as “some crop discoloration, stunting, or stand loss.” At the score 50, it reads “crop injury more lasting, recovery doubtful,” which is more general compared to the description at the score of 20. As the granularity of the symptoms' description changes across the scale, it might be that the symptom description of both scores can be assigned to the same plant, which should not be the case for a suitable scoring system. Ideally, the scoring scale should provide unambiguous descriptions of the symptomology to be considered at each of the different scores. In NTTP studies, often up to 10 different species are tested. As PVI symptoms (e.g., leaf deformation) might appear differently for each individual species and depend on the mode of action of the test substance, comprehensive investigations of all relevant combinations of PVI symptoms, test species, and mode of actions would be required in order to develop a detailed symptom description. This would be similar to the approach of Clive (1971), who developed a reference catalogue for visual assessment of different specific crop diseases. However, the appearance of crop diseases is less variable than the appearance of different PVI symptoms, which makes it easier to create a reference description of the symptoms.

Regarding the guidance and the provided references in the OECD guideline, one can conclude that there is too little specific guidance and a lack of harmonization provided as to how to perform the PVI assessments in NTTP studies. This often results in a lack of precise definition terminology in many NTTP study reports that do not allow for an appropriate interpretation of the assessed PVI data in terms of an endpoint. In order to identify the scientific community's perspective on the quality of the PVI data for regulatory purposes and the variability of methodological approaches to assess PVI, two surveys were initiated by the WG (Table 1, Figure 2, Supporting Information). The survey (Table 1, Supporting Information) showed that there are some methodological differences in assessing PVI across different labs. This indicates that different scoring scales are currently used (e.g., 0–5, 0–10, etc.). In all labs, the scores are assigned to treated plants in comparison to the control plants. Usually, a total score is assigned combining all observed symptoms, but one lab assesses the symptoms separately and one only scores the most severe symptom. The survey also revealed huge differences in which and how many symptoms are incorporated into one score. As an example, approximately half of the laboratories include visual estimates of change in biomass (e.g., stunted growth) in the assessment, while the other half do not, which has a strong impact on the overall score value (Figure 1). If stunted growth is included in the PVI assessment, it would potentially overestimate the overall effects, as stunting is covered in biomass-related parameters such as plant weight or height. The guidance documents (AGES, 2021; Central Zone Steering Committee, 2021; Ctgb, 2022) make reference to the EFSA (2014) when drawing conclusions about the necessity of an endpoint for PVI assessments. It is assumed that this refers to the following statement: “biomass was a more sensitive endpoint in 46 cases (63%) and visual assessment was a more sensitive endpoint in 27 cases (37%)” (page 51). A comparison of PVI data with biomass data in terms of an endpoint is challenging, given the lack of a harmonized, usable assessment method and endpoint estimation approach and care has to be taken how to best compare endpoints taken from diverse studies. Unfortunately, the studies and data used for the comparison on which the necessity for using PVI data in the NTTP environmental risk assessment (ERA) is based are not publicity available (cited as “unpublished” in EFSA [2014]). The data used in this study are quite limited, consisting of only seven studies involving four herbicides. It is not evident how PVI was assessed and how the mentioned endpoints (ER₂₀, ER₂₅, and ER₅₀) were derived from visual observations in these seven studies. It is also unclear whether all studies used similar methods for assessment and endpoint estimation. Deriving a general statement based on this analysis solely seems questionable. A possible improvement and harmonization for the use of PVI data for the NTTP risk assessment would be to exclude changes referring to the biomass from the PVI assessments and assess the PVI symptoms separately. A thorough evaluation of the impact of each symptom on the potential protection goal (e.g., survival of a plant) could be done to rank symptoms according to their importance (see Regulatory Relevance section).

Table 1. Responses of 12 different laboratories to survey questions on the applied methodologies to assess PVI

Question: What rating scale do you use for visual injury assessments? Assume that all scales cover a range from “no injury” to “severe injury/death”
	Answers
Answer options	Number	In %
A scale with a small number of possible scores (e.g., 0, 1, 2, 3, 4)	1	8.33
A scale using 0–10 as whole numbers	1	8.33
A 0–100 scale where scores are only recorded on the 10 s scale (e.g., as 10, 20, 30, etc.)	6	50.00
A 0, 1, 2, 3 … 100 scale where all whole numbers within that scale can be used for more precise scores	0	0.00
Other scale not noted above, or different scales used for different studies	4	33.33

Question: What visual injury symptoms are included in your assessments? Please check all that apply
	Answers
Answer options	Number	In %
Chlorosis	12	100.00
Necrosis	12	100.00
Leaf blotching	7	58.33
Leaf curling	9	75.00
Distorted plant tissue	6	50.00
Apical bud damage	4	33.33
Visual biomass change (e.g., larger/smaller)	5	41.67
Other—Please note any other visual injury symptoms you include in assessments	7	58.33

Note: Further results of the survey can be found in the Supporting Information.
Abbreviation: PVI, plant visual injury.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Control sunflower plant (left) in comparison to treated sunflower plant (right). If stunted growth would be considered in the plant visual injury assessment on a scale of eleven% classes (0, 10, 20 … 100), the treated plant could be rated with approximately 50% affected. If stunted growth is not considered, one might rate with “no effect”

STATISTICAL ISSUES REGARDING PVI ASSESSMENTS IN NTTP STUDIES

Another issue regarding PVI assessments and especially the use of the data for the NTTP risk assessment is the lack of guidance and harmonization on statistically robust and reliable endpoint estimation. The USEPA (1996) proposes that qualitative data based on ratings for visual injury symptoms are not statistically analyzed but may be used to report qualitative no-effect levels. A statistical evaluation requires conversion of the scores (qualitative) into averaged % values (semiquantitative). Although OECD guidance mentions that it is possible to establish such a link, it does not specify how to do it in a robust and reproducible manner. One option to link scores to percent could be to assign percent ranges to the different scores. Using, for example, a scale from 1 to 6 and considering that 1 is assigned only to those plants or replicates that show no PVI (0% effect), there would be five different classes (1 = slight symptoms, 2 = moderate symptoms, 3 = severe symptoms, 4 = total plant symptom, 5 = moribund) to assess the extent of PVI. Taking into account that dead plants (>90% PVI) should not be rated, one could evenly distribute 90% to the five categories as shown in Table 2. If attempting to determine the 50% effect level based on the 1–6 rating, one could conclude that the rating 4 represents the middle of the scale. It has to be noted, though, that this does not necessarily mean that this represents the true 50% effect level as the score 4 constitutes a range (36%–54%). In addition to the required standardization and harmonization of how to define a score that represents the 50% effect level for the various available scales; the prevalence of the 50% scores needs to be considered within the plants or replicates in order to derive an ER₅₀. In theory, the ER₅₀ based on PVI scores could be set at the test rate where all plants or replicates were rated, with the score representing the 50% effect level. In reality, this test rate cannot be determined easily as there are many cases where at one test rate only no or few plants are rated at the 50% effect level score and at the next higher test rate, already a few or a lot of the plants are rated, with scores reflecting an effect range >50%. In this case, the 50% effect level is expected in between those two test rates and cannot easily be compared to the ER₅₀ of other parameters to conclude which is the more sensitive endpoint. A potential compromise for these cases could be to rely on the more robust, quantitative parameters (shoot dry weight, survival) if its determined ER₅₀ is also in between those two rates where the 50% effect level for PVI is expected. In order to use the data of the currently used scoring scales for the risk assessments or for comparison to the quantitative endpoints, a procedural guidance would be required on how to define the 50% effect level score for the used scale and how to consider its prevalence within the replicates or plants to conclude where the 50% effect level for PVI is expected. However, independently of the scoring system used, the resulting data of PVI assessments are ordinal data, which do not easily allow for a linkage to a metric system (e.g., percent) or calculation of a mean value. Also, it is not mathematically correct to analyze such data with linear or nonlinear regression models as it is done for quantal (e.g., survival) or continuous (e.g., biomass) data, respectively (Green et al., 2018). Furthermore, it should be considered that the visual score data do not provide any indication of natural variability within the test species (e.g., all controls are scored as 0% effect), which has an impact on the statistical power of the results compared to more robust parameters such as shoot dry weight. In addition, scoring all control plants with 0% effect by default requires mentally averaging the variable appearance of all control plants, which adds another layer of complexity for the assessor doing the PVI scoring.

Table 2. One potential option to link PVI scores to percent ranges for a scale with six different scores

Scores	Representative percent ranges
1 (No symptoms)	No effect
2 (Slight symptoms)	≤18%
3 (Moderate symptoms)	>18%–36%
4 (Severe symptoms)	>36%–54%
5 (Total plant symptoms)	>54%–72%
6 ≥ (Moribund)	>72%–90%

Abbreviation: PVI, plant visual injury.

REGULATORY RELEVANCE

Closely related to the methodological challenges is the question about the biological meaning and the ecological consequences of a certain change in the visual appearance of plants. How can the specific PVI symptoms (e.g., change in color, necrosis, leaf deformation) be connected to the regulatory requirements in terms of a protection goal and an endpoint? For instance, the meaning of symptoms, for example, change in color or leaf deformation, could be defined by their impact on the plant's survival. The impact of 50% leaf deformation might have a different impact as 50% necrosis on the plant's potential to survive. The different symptoms would need a relevance ranking in terms of their impact on the survival (or any other yet to be defined protection goal). Since some symptoms are specific to the mode of action of an active substance, the mode of action could be a suitable indicator of the general relevance of the PVI data for the risk assessment. For substances that mainly have an impact on the growth of plants, the PVI does not seem to be crucial as the effects will most likely be covered by other endpoints such as biomass. However, for a substance that is known to have bleaching effects, the PVI assessment (color change) could show stronger sensitivity than other parameters. However, this results in the question about the relevance of a potential lower PVI endpoint based on color changes regarding the protection goal. Once the biological and regulatory relevance for PVI can be defined, it will be possible to develop a more suitable scoring and assessment scheme for PVI. If this new methodology enables to determine an ER₅₀ based on PVI, one could evaluate in which and how many cases PVI is more sensitive and conclude on the regulatory relevance of this endpoint. As revealed by the first survey of the WG, it is unclear in how many cases PVI endpoints are lower than others and whether the current data can be used to inform the risk assessment (Figure 2). Again, it remains to be determined to what extent such lower PVI endpoints are relevant for the population of plants and ultimately the protection goal. Considering the inherent unreliability of PVI assessments, one might question their regulatory significance. To the best of available knowledge, there are no other examples in the ERA where nonmeasured assessments, such as those done for fish behavior during regulatory testing, are used for risk assessment purposes. Typically, these assessments are done to record observations that are not assessable by measurements and are solely used as supportive information. The only exception is visual injury assessment in aquatic macrophytes, which some member states have recently utilized for endpoint determination in ERA. The results of the two surveys (Table 1, Figure 2, Supporting Information) need to be reflected and considered while also considering the regulatory relevance and developing a harmonized PVI assessment approach. If a new approach is developed, guidance is needed on how to consider results from studies that were done with a different approach.

SETAC WORKING GROUP ON PVI

Despite the lack of guidance on how to assess PVI in NTTP studies, how to statistically determine an endpoint based on the data, and how this is related to regulatory relevance, some guidance documents are requesting to perform a risk assessment based on PVI if the endpoint is the lowest (AGES, 2021; Central Zone Steering Committee, 2021; Ctgb, 2022). While no explicit guidelines are provided, there have been some suggestions regarding endpoint determination (Boutin et al., 1993). Boutin et al. (1993) proposed the use of 9–10 distinct scores for assessing PVI. To facilitate statistical analysis, it is recommended to normalize these scores to percentages, where each score corresponds to a specific percentage range. In terms of endpoint determination, each score represents the midpoint of its respective percentage range and may be used as an effect value for endpoint derivation using nonlinear regression or nonparametric models. However, it is important to note that assigning these scores heavily relies on the subjective judgment of the assessor, making it challenging to consistently and reproducibly assign 9–10 different scores. Consequently, it is crucial not to confuse the resulting data with objectively quantifiable metrics, such as biomass or plant height. Furthermore, even when scores are defined within percentage ranges, they still constitute ordinal data, which are not mathematically suitable for evaluation using nonlinear regression models (Green et al., 2018). This gap was identified and led to the formation of the SETAC Plants Interest Group WG on Visual Injury in 2022. The WG defined the following mission statement: Evaluation and improvement of reliability, reproducibility, and regulatory relevance of PVI assessments. To achieve this goal, a harmonized approach to assessing PVI for NTTPs regarding method, statistics, and regulatory relevance should be defined and implemented. In a first step, the WG has defined the following four properties that a future, validated, and harmonized method for assessment of PVI should fulfill in order to be reliable and reproducible:

1.
Increments in scores are logically consistent in meaning across the scoring scale. This is not the case for most of the currently used scoring systems. Often, the highest score is defined as, for example, moribund or a dead plant, and the lowest score is defined as “no symptom.” However, a middle score (e.g., 50%) does not typically reflect a “half-moribund” or “half-dead” plant, but rather a VI seen on 50% of the shoot.
2.
Scores are reproducible: For regulatory plant studies, this means that different assessors would give an injured plant the same score based on their visual assessment.
3.
Scoring is objective (to the extent possible): The judgment of an assessor needs to be minimized to the extent possible, which could be achieved by using a scoring system with few (3–5) rankings that are clearly defined (e.g., by pictures). A good standard or benchmark to compare visual assessments is crucial to increase the objectivity (Horst et al., 1984).
4.
Scores have clear biological meaning and are compatible with regulatory needs: Ideally, the biological meaning should be connected to regulatory needs that are defined in specific protection goals and laid out as regulatory endpoints (ER_x). Clear guidance from regulatory authorities on the purpose of the visual injury assessments is needed.

The listed desirable properties are purposely kept on an abstract level and need to be translated into more concrete, operable properties and instructions in the future, building on discussions with the whole regulatory and scientific community, which is one of the goals of the WG. This process will be challenging, as fulfilling all of the desired properties in one harmonized PVI assessment scheme seems impossible. For example, scoring the chance of survival of a plant and thereby giving the score a biological and eventually regulatory meaning (property 4) will most likely conflict with being objective (property 3). In order to develop a more suitable scoring system for the assessment of PVI in NTTP studies, to be used in the risk assessment, the biological and regulatory relevance of PVI has to be defined first.

CONCLUSION

Using endpoints derived from PVI assessments for the risk assessment of NTTPs is a requirement in the EU. However, the lack of guidance in the NTTP OECD test guidelines on how to assess PVI and how to derive statistically meaningful endpoints from the PVI data contradicts this requirement. Under current conditions, the PVI data can rather be seen as supportive data instead of using these for the precise determination of a regulatory endpoint. The PVI WG formed within the SETAC Plant Interest Group evaluated the reliability, reproducibility, and regulatory relevance of current PVI assessments in NTTP studies. It became clear that further research is required to assess the biological meaning of PVI injury data and how this is connected to the regulatory requirements and protection goals. In addition, standardization and harmonization of procedures for the plant scoring in the greenhouse and determination of a PVI endpoint are needed in order to evaluate the regulatory relevance of PVI data. Potentially, a more suitable scoring methodology should be developed in the future. In a first step, the WG has defined four properties that a future, validated, and harmonized method for assessment of PVI should fulfill: (1) logical consistency of the scoring scale, (2) reproducible scores, (3) objectivity of scoring (to extend possible), and (4) scores with clear biological meaning and compatible with regulatory needs. Finally, a harmonized solution on how to evaluate historical and therefore diverse PVI data should be developed.

AUTHOR CONTRIBUTION

Sebastian Fellmann: Conceptualization; writing—original draft; writing—review and editing. Andreas Duffner: Conceptualization; visualization; writing—review and editing. Ashlee Kirkwood: Conceptualization; data curation; investigation; writing—review and editing. Patricia Lopez-Mancisidor: Conceptualization; project administration; writing—review and editing. Joshua Arnie: Conceptualization; project administration; writing—review and editing. Henry Krueger: Conceptualization; data curation; investigation; writing—review and editing. Gunther du Hoffmann: Conceptualization; writing—review and editing. Jeffrey Wolf: Conceptualization; writing—review and editing. Gwendolin Kraetzig: Conceptualization; writing—review and editing. Tim Springer: Conceptualization; writing—review and editing. Rena Isemer: Conceptualization; supervision; writing—original draft; writing—review and editing.

ACKNOWLEDGMENT

There are no funders to report for this submission. The authors wish to acknowledge participants of the survey (anonymous) and Stefanie Noeding (Bayer AG, Germany) for providing Figure 1.

CONFLICT OF INTEREST

Sebastian Fellmann, Patricia Lopez-Mancisidor, Gwendolin Kraetzig, and Rena Isemer are employees of plant protection products manufacturers. Andreas Duffner, Ashlee Kirkwood, Joshua Arnie, Henry Krueger, Gunther du Hoffmann, Jeffrey Wolf, and Tim Springer are employees of agricultural consulting or contract research companies. All authors are paid for their work by their employers.

Open Research

DATA AVAILABILITY STATEMENT

All data are available in the article and the Supporting Information. Please contact corresponding author Sebastian Fellmann ([email protected]) for more information.

Supporting Information

REFERENCES

Agentur für Gesundheit und Ernährungssicherheit (AGES). (2021, April). National risk assessment for the authorization of plant protection products (PPP): Ecotoxicology non-target terrestrial plants (NTTP). Information for notifier/applicants and other interested parties. Version 2.
Google Scholar
Andújar, D., Ribeiro, A., Carmona, R., Fernandez-Quntanilla, C., & Dorado, J. (2010). An assessment of the accuracy and consistency of human perception of weed cover. Weed Research, 50(6), 638–647.
10.1111/j.1365-3180.2010.00809.x
Web of Science® Google Scholar
Anwar, N., Kikuchi, A., & Watanabe, K. N. (2010). Assessment of somaclonal variation for salinity tolerance in sweet potato regenerated plants. African Journal of Biotechnology, 9(43), 7256–7265.
CAS Web of Science® Google Scholar
Boutin, C., Freemark, K. E., & Keddy, C. J. (1993). Proposed guidelines for registration of chemical pesticides: Non-target plant testing and evaluation. Technical Report Series No. 145 (p. 91). Canadian Wildlife Service, Environment Canada.
Google Scholar
Canadian Weed Science Society. (2018, January 15). Description of 0–100 rating scale for herbicide efficacy and crop phytotoxicity. https://weedscience.ca/cwss_scm-rating-scale/
Google Scholar
Central secretariat. (2022). Bullet Points: Ecotoxicology Bullet points from the final agreements of the 5th CZHW in Ecotoxicology, November 12-14 2019 [2023/06/15]. https://circabc.europa.eu/ui/group/0b40948d-7247-4819-bbf9-ecca3250d893/library/127e5b52-9be7-435e-95a7-807ccdebd74d/details
Google Scholar
Clive, W. (1971). An illustrated series of assessment keys for plant diseases, their preparation and usage. Canadian Plant Disease Survey, 7, 39.
Google Scholar
Cobb, A. H. (2022). Herbicides and plant physiology (p. 23). John Wiley & Sons.
Google Scholar
College voor de toelating van gewasbeschermingsmiddelen en biociden (Ctgb). (2022). Ecotoxicology; terrestrial; non target arthropods and plants. Version 2.5. In Evaluation manual for the authorization of plant protection products according to regulation (EC) No 1107/2009. Ctgb.
Google Scholar
Damgaard, C. (2014). Estimating mean plant cover from different types of cover data: A coherent statistical framework. Ecosphere, 5(2), 1–7.
10.1890/ES13-00300.1
Web of Science® Google Scholar
Donald, W. W. (2006). Between-observer differences in relative corn yield vs. rated weed control. Weed Technology, 20(1), 41–51.
10.1614/WT-04-294R.1
Web of Science® Google Scholar
European Commission. (2009). Regulation (EC) No 1107/2009 of the European Parliament and of the Council of 21 October 2009 concerning the placing of plant protection products on the market and repealing Council Directives 79/117/EEC and 91/414/EEC. Official Journal of the European Union, 309, 1–50.
Google Scholar
European Commission. (2013a). Commission Regulation (EU) No. 283/2013 of 1 March 2013 setting out the data requirements for active substances, in accordance with Regulation (EC) No. 1107/2009 of the European Parliament and of the Council concerning the placing of plant protection products on the market. Official Journal of the European Union, L 93, 1–84.
Google Scholar
European Commission. (2013b). Commission Regulation (EU) No. 284/2013 of 1 March 2013 setting out the data requirements for plant protection products, in accordance with Regulation (EC) No. 1107/2009 of the European Parliament and of the Council concerning the placing of plant protection products on the market. Official Journal of the European Union, L 93, 85–152.
Google Scholar
European Food Safety Authority (EFSA). (2014). Panel on plant protection products and their residues (PPR). Scientific opinion addressing the state of the science on risk assessment of plant protection products for non-target terrestrial plants. EFSA Journal, 12(7), 3800.
10.2903/j.efsa.2014.3800
Google Scholar
European Food Safety Authority (EFSA). (2019). Outcome of the pesticides peer review meeting on general recurring issues in ecotoxicology. EFSA Journal, 16(7), 1673E.
Google Scholar
European and Mediterranean Plant Protection Organization (EPPO). (2014). Bulletin OEPP/EPPO bulletin (Vol. 44, pp. 265–273). Wiley.
Google Scholar
Frans, R. E., & Talbert, R. E. (1977). Design of field experiments and the measurement and analysis of plant response. In B. Truelove (Eds.), Research methods in weed science (pp. 15–23). South Weed Science Policy.
Google Scholar
Green, J. W., Springer, T. A., & Holbech, H. (2018). Analysis of ordinal data. In J. W. Green, T. A. Springer, & H. Holbech (Eds.), Statistical analysis of ecotoxicity studies (p. 244). John Wiley & Sons.
10.1002/9781119488798.ch9
Google Scholar
Hamill, A. S., Marriage, P. B., & Friesen, G. (1977). A method for assessing herbicide performance in small plot experiments. Weed Science, 25(5), 386–389.
10.1017/S0043174500033713
CAS Web of Science® Google Scholar
Hebert, T. T. (1982). The rationale for the Horsfall–Barratt plant disease assessment scale. Phytopathology, 72(10), 1269.
10.1094/Phyto-72-1269
Web of Science® Google Scholar
Henfling, J. W. (1987). Late blight of potato: Phytophthorc infestans ( 2nd ed., p. 25). International Potato Center.
Google Scholar
Horst, G. L., Engelke, M. C., & Meyers, W. (1984). Assessment of visual evaluation techniques. Agronomy Journal, 76(4), 619–622.
10.2134/agronj1984.00021962007600040027x
Web of Science® Google Scholar
Klimeš, L. (2003). Scale-dependent variation in visual estimates of grassland plant cover. Journal of Vegetation Science, 14, 815–821.
10.1111/j.1654-1103.2003.tb02214.x
Web of Science® Google Scholar
Koyshibayev, M. (2016). Guidelines for monitoring diseases, pests and weeds in cereal crops. Food and Agriculture Organization of the United Nations.
Google Scholar
Nash, R. G. (1981). Phytotoxic interaction studies: Techniques for evaluation and presentation of results. Weed Science, 29, 147–155.
10.1017/S0043174500061701
CAS Web of Science® Google Scholar
Nilsson, H. (1995). Remote sensing and image analysis in plant pathology. Annual Review of Phytopathology, 33(1), 489–528.
10.1146/annurev.py.33.090195.002421
CAS PubMed Google Scholar
Organisation for Economic Co-Operation and Development (OECD). (2006a). Test No. 208: Terrestrial plant test: Seedling emergence and seedling growth test (p. 21). OECD Guideline for the testing of chemical, Section 2.
Google Scholar
Organisation for Economic Co-Operation and Development (OECD). (2006b). Test No. 227: Terrestrial plant test: Vegetative vigour test (p. 21). OECD Guideline for the testing of chemical, Section 2.
Google Scholar
Pradeep, K., Bell, R. W., & Vance, W. (2020). Variation of Cicer germplasm to manganese toxicity tolerance. Frontiers of Plant Science, 11, 588065.
10.3389/fpls.2020.588065
PubMed Web of Science® Google Scholar
Rasmussen, L. L. (1956). Evaluation of selective weed control in wheat. Weeds, 4, 15–21.
10.2307/4040005
Google Scholar
Rodriguez de Cianzio, S., Fehr, D. W., & Anderson, I. C. (1979). Genotypic evaluation for iron deficiency chlorosis in soybeans by visual scores and chlorophyll concentration. Crop Science, 19(5), 644–646.
10.2135/cropsci1979.0011183X001900050024x
CAS Web of Science® Google Scholar
Santos, C. S., Roriz, M., Carvalho, S. M., & Vasconcelos, M. W. (2015). Iron partitioning at an early growth stage impacts iron deficiency responses in soybean plants (Glycine max L.). Frontiers of Plant Science, 6, 325.
10.3389/fpls.2015.00325
PubMed Web of Science® Google Scholar
Sherwood, R. T., Berg, C. C., Hoover, M. R., & Zeiders, K. E. (1983). Illusions in visual assessment of Stagonospora leaf spot of orchardgrass. Phytopathology, 73(2), 173–177.
10.1094/Phyto-73-173
Web of Science® Google Scholar
USEPA. (1996). Ecological effects test guidelines OCSPP 850.4000 background and special considerations-tests with terrestrial and aquatic plants, cyanobacteria, and terrestrial soil-core microcosms (EPA 712-C-96-151).
Google Scholar

Citing Literature

Volume20, Issue4

July 2024

Pages 915-923

Reproducibility, reliability, and regulatory relevance of plant visual injury assessments

Abstract

INTRODUCTION

METHODOLOGICAL ISSUES REGARDING PVI ASSESSMENTS IN NTTP STUDIES

STATISTICAL ISSUES REGARDING PVI ASSESSMENTS IN NTTP STUDIES

REGULATORY RELEVANCE

SETAC WORKING GROUP ON PVI

CONCLUSION

AUTHOR CONTRIBUTION

ACKNOWLEDGMENT

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Reproducibility, reliability, and regulatory relevance of plant visual injury assessments

Abstract

INTRODUCTION

METHODOLOGICAL ISSUES REGARDING PVI ASSESSMENTS IN NTTP STUDIES

STATISTICAL ISSUES REGARDING PVI ASSESSMENTS IN NTTP STUDIES

REGULATORY RELEVANCE

SETAC WORKING GROUP ON PVI

CONCLUSION

AUTHOR CONTRIBUTION

ACKNOWLEDGMENT

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information