Observer Classification of Live, Mechanically Damaged, and Dead Pink Salmon Eggs
Abstract
The susceptibility of pink salmon Oncorhynchus gorbuscha eggs to mechanical damage (shock) was studied to test the ability of observers to discriminate among live, dead, and damaged eggs. In a series of six laboratory trials, the mean error rate in discrimination did not exceed 12% and was 3.5% or less in four of six trials. The most common error was misclassification of damaged eggs as live (≤9 ± 1% (mean ± SE)), an error that is irrelevant in field studies designed to determine the natural death rate. The second most common error was damaged eggs classified as dead (≤4.6 ± 1%) when observation times were 60 min or less; this was reduced to less than 0.5% when observations were limited to 12 min or less. Inexperienced observers were easily trained (within 1 h) to classify eggs. To accurately describe natural systems before sample disturbance, damaged and dead egg categories should not be combined when reporting data.
Introduction
Hydraulic pumping is used as a research and management tool to collect eggs from salmon redds. For example, after the Exxon Valdez oil spill, pink salmon Oncorhynchus gorbuscha eggs were collected by hydraulic pumping to determine if exposure to oil had reduced embryo survival (e.g., Bue et al. 1998). This violent injection of an air–water mixture into streambed gravel can mechanically damage (shock) developing salmon eggs, particularly when embryos are the least mature, obscuring the distinction between natural and sampling mortality. The appearance of damaged eggs changes from pink to white as freshwater penetrates the ruptured vitelline barrier and causes protein coagulation. Resistance to damage increases as embryos mature and reinforce the barrier with epidermal tissue. In this paper, we use the word “egg” to refer to either infertile or fertilized eggs. Early embryonic development is not visible macroscopically, thus the visual cues used by observers to classify condition are based on egg structures, not embryonic tissue. The same cues are used for infertile eggs and for fertilized eggs with visible (more mature) embryonic development.
Although the susceptibility of pink salmon eggs to shocking before embryonic eye pigmentation has been adequately described in both laboratory (Smirnov 1954, 1975; Jensen and Alderdice 1989; Jensen 1997) and field studies (Collins et al. 2000; Thedinga et al., unpublished), observer ability to discriminate among live, dead, and damaged eggs has not been reported. Only if the typical technician can readily discriminate egg condition will field data be accurate. Thus, our goal was to study observer ability to correctly identify damaged eggs rather than to study mechanical egg damage per se. We are aware of no other comparable research. To examine observer discrimination, 10 people repeatedly classified unknown mixtures of live, dead, and damaged eggs throughout early embryonic development and until remaining eggs were all resistant to damage. To supplement the primary observations and aid in data interpretation, the resistance of eggs to damage was recorded throughout the 1-month trial period. Additional eggs were incubated to compare laboratory shock procedures to hydraulic shock.
Methods
Most pink salmon eggs examined were collected from wild stock, fertilized, and maintained in hatchery conditions. Additional naturally spawned eggs were included in some trials (as explained later). Gametes collected on September 18, 2000, from pink salmon at Sashin Creek (Little Port Walter, Alaska) were kept cool and flown to the Auke Creek hatchery (Juneau, Alaska). Eggs were placed in plastic cups, a few milliliters of milt were added, freshwater was added, and the mixture was gently decanted between two cups three times. Eggs were then placed directly into a Heath tray incubator with flowing freshwater; 18 ± 2% [mean ± SE] remained infertile. To control microbial growth, approximately 100 L of seawater was added to the incubator every 2–3 d, the freshwater flow being interrupted for 1 h. Water temperature generally declined from 9.7°C at fertilization to 6.6°C on the final day of study (mean = 8.4°C; range = 6.6–9.9°C).
Egg resistance trials
At 3–7-d intervals, groups of eggs were shocked by dropping them from a height of 1 m onto a slightly angled hard surface (typically 20–130 eggs/replicate × 3–6 replicates). Shock intensity was intentionally high to ensure damage and to emulate the vigorous shock potential of hydraulic sampling methods. The numbers of damaged and surviving eggs were recorded (determined by color change). Water was always fresh in these tests. Samples of each classification were preserved and later inspected for development. Beginning 36 d after fertilization, infertile eggs were easily distinguishable from developing eggs, and observations were subdivided into developing and infertile groups. The percentage of developing eggs that survived shock was calculated from direct observation on or after day 36 or estimated from the average fertility rate in preserved early samples.
To compare laboratory and typical hydraulic sampling stresses, three female and two male pink salmon were artificially spawned and incubated in a hatchery. Eggs were tested for shock resistance 1, 7, 21, and 28 d after fertilization. The 1-m drop tests were completed as previously described (36 ≤ ntotal ≤ 129 per replicate with three replicates each time). Eggs sampled hydraulically were placed in a 10-cm-diameter × 2-cm-high aluminum ring protected with eight 0.05-cm × 2-cm × 10-cm pieces of plastic, and covered with approximately 20–25 cm of gravel, a depth typical in well-populated spawning grounds (Heard 1991). The barrel was filled with freshwater so that the gravel surface was submerged about 20 cm. Eggs were hydraulically sampled with a 1-m-long × 3.8-cm-diameter stainless steel probe partially thrust into the gravel and discharging a 170 L/min air–water mixture (181 ≤ ntotal ≤ 259 with one replicate each time). Water temperature declined from roughly 7–5°C (mean = 5.7°C; range = 4.4–6.9°C).
Discrimination trials
Observer discrimination trials were completed weekly from September 18, 2000, through October 31, 2000, for a total of six trials. In each trial, 10 observers were presented random mixtures of live, dead, and damaged pink salmon eggs in 10 petri dishes, with 30–66 eggs/dish. Immediately prior to each trial, eggs for study were randomly drawn by two administrators from known live, dead, and shocked groups and randomly distributed among petri dishes according to randomly predetermined numbers. The number of eggs damaged by shock varied with egg maturity; thus after each 1-h trial, one administrator and the first author independently reinspected each petri dish, exchanged results, and arrived at final classifications by further inspection and consensus when there were differences. The numbers of eggs in each category were unknown to the observers.
Each observer began classifying eggs about 5 min after shocking, up to 5 min being allowed per dish for observation. Observers were allowed to sort eggs but were required to gently mix them before moving to a new dish and were not allowed to compare results. During each 1-h trial, observers rotated to a new dish every 5 min for the first 40 min and every 10 min thereafter. Based on their own assessments of prior experience, observers were subdivided into three classes: experienced, inexperienced, and intermediate.
Some variables were adjusted among trials, which occurred weekly from September 18, 2000, through October 31, 2000, for a total of six trials. The total number of eggs per dish was constant (50) in trials 1 and 2, but variable in all other trials. In trial 1, nearly all dead eggs had visible microbial growth. Some live eggs were shocked during the course of this trial due to rough handling by inexperienced observers. In trials 2–4, dead eggs pooled from preceding trials were used as dead eggs, and the percentage of eggs with microbial growth was controlled (0–25% of dead eggs per dish). In trials 5–6, naturally spawned dead eggs were used; these were sampled by hydraulic pumping a week preceding each trial, fixed in 5–10% phosphate-buffered formalin, and soaked 1–2 d in flowing freshwater before observations. Dead eggs used in trial 5 were collected from Lover's Cove, near Little Port Walter, Alaska (October 17, 2000). Dead eggs in trial 6 were a mixture of dead, recently dead, and dead eyed eggs chosen at random (0–33% possible for each category).
Potential sources of error were recorded or calculated. All six potential sources of misclassification were calculated (live eggs scored as damaged or dead, damaged eggs scored as live or dead, and dead eggs scored as live or damaged) and expressed as a percentage of the total number of eggs reported. Other sources of error included incorrect egg counts, recordkeeping errors, and additional shocking during the course of observation. Obvious recordkeeping errors were infrequent and corrected. The number of damaged eggs was adjusted for change during the first trial (only) by assuming that the majority of observers had correctly identified all damaged eggs in each dish. Where the true number of damaged eggs was unclear, the assessment of experienced observers was favored and values were adjusted as infrequently as possible. Ambiguities in the first trial were due to additional egg shocking caused by inadvertent rough handling by inexperienced observers.
For any given trial, 80–100% of the original observers were present. Seventy percent of the original observers participated in all trials. One to two observers were added as substitutes in three trials, and these substitutes each participated in two to three trials. Both substitute observers were inexperienced. Two of the original observers who could not participate in all trials were also inexperienced, and one was intermediate.
Data were analyzed with analysis of variance (ANOVA) or regression methods as appropriate (reported data are means ± SE). Percentages were arcsine-transformed (Snedecor and Cochran 1980) prior to ANOVA. Multifactor ANOVA (total percent eggs misclassified = trial + experience + time + dish nested in trial) was used to describe general observer discrimination across trials (trial = trial number, experience = experience level, and time = time after damage). The importance of time for each of the six possible error types was explored with multifactor ANOVA (pE = trial + experience + time + dish nested in trial, where pE = percent error). Time effects were further explored with logistic regression.
Results and Discussion
Resistance to Mechanical Damage
Developing eggs were initially susceptible to mechanical damage (shock), but resistance rose sharply after 15-d incubation and 95% of viable embryos survived shock on or after 26 d (following the onset of eye pigmentation; Figure 1b). Infertile eggs were invariably damaged. As embryos matured, damage was observed in a few percent of viable embryos between 15 d (head and trunk differentiation) and 26 d (after onset of eye pigmentation; Table 1).

(a–b) Resistance of embryos to mechanical shock and (c) total percent misclassification as time after fertilization increased (mean ± SE). Data in (a) represent an ancillary experiment designed to compare the shock intensity of a 1-m drop with that of hydraulic sampling. Observations with open symbols were not replicated

The mechanical shock routinely used in this laboratory study is about the same as the shock expected for hydraulic sampling in natural stream systems. Survival curves for hydraulically shocked eggs and eggs dropped 1 m onto a hard surface were nearly identical where pink salmon eggs from the same crosses were reared under identical conditions, suggesting that these physical stresses are similar (Figure 1a). The offset in shock resistance evident between eggs in the ancillary experiment (Figure 1a) and those in the primary experiment (Figure 1b) is most likely due to differences in incubation conditions.
Observer Error
Average discrimination error (mean total percentage of misclassified eggs, averaged across all observation times and observers) ranged from 1% to 4% in the early and late trials but peaked at 10–12% in trials 2 and 3 (PANOVA < 0.001; Figure 1c). The high frequency of microbial growth on dead eggs probably helped with discrimination in trial 1. Dead egg condition in trials 2 and 3 may have contributed to the higher incidence of error in these trials because recently dead eggs are typically harder to distinguish from damaged eggs than are dead eggs that have been discolored or have microbial growth. Consistent with this inference, increases in discrimination error over time between damaged eggs and dead eggs were greatest in these two trials (Figure 2a). However, the source and condition of dead eggs in trial 4 was about the same as in trials 2 and 3, yet discrimination error was considerably smaller, suggesting other causal factors (such as a possible relationship between discrimination error and egg maturation). A population of immature embryos (prior to eyeing), which are susceptible to damage, and infertile eggs is likely the most challenging to classify accurately because there are no initial (predamage) color differences. When eyed embryos are frequent, classification may be easier because the color of older eggs (which are less susceptible to damage) is darker than that of immature and infertile eggs. (Older eggs tend to be covered with algal growth, perhaps because waste excretion supplies nutrients for such growth.) Discrimination between damaged and live eggs was clearly the most difficult in trials 2 and 3 but not in trial 1 where eggs were the least mature, indicating other unknown factors are also involved (Figure 2b). We suspect maturation and the easy identification of dead eggs (in trials 5–6) best explain the low incidence of observer error in the latter portion of the experiment, and the easy identification of dead eggs may best explain the low error incidence in the first trial.

Mean time-related changes ± SEs in (a) the percentage of damaged eggs scored as dead and (b) the percentage of damaged eggs scored as live
Sources of Observer Error
The largest source of error (up to 9%, averaged across all observation times and observers) was the misclassification of damaged eggs as live eggs (Figure 3a). Assuming the underlying objective of typical hydraulic studies in natural systems is to distinguish the number of live and dead eggs in a stream before sampler influence, this type of error is unimportant.

Types and frequencies of observer classification errors. Hatched bars are means ± SEs) for all observation times combined (5–60 min). Open bars are means ±SEs for which the maximum time between mechanical damage and observation was 12 min or less. The scale at the bottom indicates the times (days postfertilization) of the trials
The misclassification of damaged eggs as dead eggs was the second largest source of error (up to 4.6%, averaged across all observation times and observers; Figure 3b). As damaged eggs in trials 2–3 whitened, they became more difficult to discriminate from the dead eggs produced in the hatchery. Discrimination between damaged eggs and dead eggs was relatively easy in the first trial because of the distinctive microbial growth on dead eggs. Dead eggs from a natural stream system were different enough from damaged eggs that damaged eggs were infrequently confused with dead, even when the least different wild dead egg category (recently dead) was included (in trial 6). While the misclassification of damaged eggs as dead eggs can cause important errors in the field, this can be minimized by the quick removal of eggs from water, and by restricting the time between shock (hydraulic pumping) and observation.
The third largest source of error (≤2.3%) was dead eggs classified as damaged eggs (Figure 3c). In trial 4, dead eggs were more frequently misclassified as damaged at the end of the trial, suggesting that this group of dead eggs looked more like damaged eggs than eggs that had been dead for a long period of time. This error can also be minimized by quickly removing eggs collected by hydraulic pumping from water; under these conditions, white eggs must have been dead before removal from water.
Other sources of error were minor (less than 1%) and generally may have been caused by recording errors rather than misclassification (Figure 3d–f). However, the misclassification of live eggs as damaged might have been caused by light refraction through the walls of the petri dishes. An inattentive observer might have confused the slight color change caused by refracted light with early damage. The misclassification of live eggs as dead (<0.25%) was almost certainly due to recording errors, not to actual misclassification.
Observer Experience
Inexperienced observers were easily trained to identify mechanically damaged eggs within a single session and can provide accurate data in field settings. Observer learning was evident in the first trial, and error rates declined once each individual understood the egg classification system. Overall experience level was significant (P = 0.042); inexperienced observers tended to make more errors than experienced observers. However, differences in the total error between inexperienced and experienced observers in individual trials were always less than 4% and averaged 1%. Furthermore, observer experience differences were significant in only one of six error categories (percent dead eggs misclassified as damaged; P = 0.008; mean difference between inexperienced and experienced observers = 0.8%). Experience was not significant in any other categories (0.083 ≤ P ≤ 0.976), including damaged eggs scored as dead (P = 0.976). Thus, the minor differences evident as a result of observer experience is not likely to bias assessment of naturally spawned pink salmon egg populations in field studies.
Error Control
Observer error rates for some classifications were time dependent (0.0001 < PANOVA ≤ 0.699) and were consistent with the color change from pink (living) to white (dead) that occurs when eggs are damaged (Figure 2). The percentage of damaged eggs mistaken for live eggs decreased as time increased (Pregression < 0.05 in all six trials), and the percentages of damaged eggs mistaken for dead eggs increased with time (Pregression < 0.05 in two of six trials). No consistent relationship between time and other misclassifications was evident.
Errors in discriminating damaged and dead eggs can be reduced by limiting the time between sampling (mechanical shock) and egg assessment. The average percentage of damaged eggs erroneously scored as dead was reduced below 0.5% and was usually 0% when observation time was limited to 12 min or less. Because damaged eggs become increasingly difficult to distinguish from dead eggs when maintained in water, egg condition should be assessed as soon as possible after collection. Serendipitously, we also found that the color change can be arrested by placing egg samples in air; under these conditions, water does not enter damaged eggs and protein coagulation does not occur. Accordingly, classification errors are reduced, yet ample time is available for assessment. The quick removal of eggs from water is the critical issue. Thedinga et al. (unpublished) conservatively recommend limiting hydraulic pumping to 1-min intervals. Results of this study suggest that assessment within 10–12 min (when eggs are maintained in water) may be adequate. However, Collins et al. (2000) report that eggs shocked by hydraulic sampling can turn opaque white within minutes, suggesting that the collection of eggs in 1-min intervals is a good strategy.
More damaged eggs are misclassified as live when the time between shock and assessment is minimized, but this error is irrelevant in typical field studies where the objective is to distinguish live and dead eggs before sampler disturbance. Thus, the tradeoff between a less accurate separation of live and damaged eggs for a more accurate separation of damaged and dead eggs is generally desirable. If a precise discrimination between live and damaged eggs is necessary, pink-colored eggs can be placed back into water after other data are collected. For example, shocking provides a way of discriminating between infertile and developing eggs, so extended hydration (an hour or more) can improve this measurement.
Finally, unlike the test circumstances in this study in which observers were not allowed to compare samples, routine comparison of data among observers should also reduce overall error rates. Site-specific factors, such as the intrusion of salt water, can complicate observation; dead and damaged eggs become translucent orange instead of opaque white when held in salt water. Eggs were no easier to classify in salt water than in freshwater (M.G.C., unpublished observations). Communication among observers is undoubtedly helpful in field studies.
In conclusion, we recommend that eggs obtained by hydraulic pumping be classified as live, damaged, or dead. Eggs should be quickly removed from water to arrest color changes, and classification should be prompt to limit overall error rates. The percentage of eggs damaged by mechanical shock may potentially provide valuable insight into run timing and egg superimposition in wild runs, although a record of percent eyed eggs may provide the same information. At a minimum, live eggs and damaged eggs should be combined into a single “live” category, distinct from eggs dead before observer disturbance if the objective is to study in situ mortality. Combining damaged and dead egg data does not accurately reflect presample conditions.
Acknowledgments
Our thanks goes to numerous colleagues who were willing to serve as observers in this study. The research described in this paper was supported by the Exxon Valdez Oil Spill Trustee Council. However, the findings and conclusions presented by the authors are their own and do not necessarily reflect the view or position of the Trustee Council.