Volume 39, Issue 4 e70145
STANDARD ARTICLE
Open Access

Evaluation of Subjective Assessment of Right Heart Size and Function Using Standard 2D-Echocardiographic Recordings in Horses With and Without Pulmonary Hypertension

Julia N. van Spijk

Corresponding Author

Julia N. van Spijk

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Correspondence:

Julia N. van Spijk ([email protected])

Search for more papers by this author
Hannah K. Junge

Hannah K. Junge

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Search for more papers by this author
Christina Eberhardt

Christina Eberhardt

Department of Companion Animal Clinical Studies, Faculty of Veterinary Science, University of Pretoria, Onderstepoort, South Africa

Search for more papers by this author
Natalie Wolf

Natalie Wolf

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Search for more papers by this author
Debora Vogt

Debora Vogt

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Search for more papers by this author
Paula Zscherpe

Paula Zscherpe

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Equine Clinic, Centre for Clinical Veterinary Medicine, Faculty of Veterinary Medicine, Munich, Germany

Search for more papers by this author
Elena Herger

Elena Herger

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Search for more papers by this author
Manon Straub

Manon Straub

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Search for more papers by this author
Colin C. Schwarzwald

Colin C. Schwarzwald

Equine Department, University of Zurich, Vetsuisse Faculty, Zurich, Switzerland

Search for more papers by this author
First published: 30 June 2025

Funding: The authors received no specific funding for this work.

ABSTRACT

Background

Echocardiographic evaluation of right heart (RH) size and function in horses is challenging and relies on subjective assessment.

Objectives

Evaluate inter- and intra-rater agreement of subjective assessment of RH size, RH function, and the presence of pulmonary hypertension (PHT) in horses. Assess subjective RH changes with and without PHT and the influence of observer experience.

Animals

Healthy horses (n = 30) and horses with Doppler measurements suggesting the presence or absence of PHT (n = 30 each).

Methods

Nine standard echocardiographic recordings were analyzed by experienced (n = 4) and inexperienced (n = 5) observers. RH size, RH function, pulmonary artery (PA) size and distensibility, and the presence of PHT were subjectively assessed as normal, mildly, moderately, or severely changed. Inter- and intra-rater agreement was calculated using percentual agreement (% agree) and kappa (k). Sensitivity and specificity to detect PHT were calculated.

Results

Overall inter-rater agreement in all observers was low with 61% perfect agreement and k of 0.21; it was higher in experienced observers (k = 0.34, 77% agree) than in beginners (k = 0.18, 52% agree). Intra-rater agreement showed % agree > 80% in experienced observers (k = 0.35–0.76) and < 80% in beginners (k = 0.33–0.54). RH size and function were more commonly abnormal in the PHT group, with high specificity but low sensitivity to detect PHT.

Conclusions and Clinical Importance

Rater agreement of subjective RH assessment was low and influenced by observer experience. Subjective absence of RH changes does not allow ruling out PHT, while the presence of characteristic RH changes suggests PHT.

Abbreviations

  • % agree
  • perfect percentual agreement
  • LV
  • left ventricle
  • NPV
  • negative predictive value
  • PA
  • pulmonary artery
  • PAP
  • pulmonary artery pressure
  • PHT
  • pulmonary hypertension
  • PPV
  • positive predictive value
  • PR
  • pulmonic valve regurgitation
  • PR Vmax
  • maximal velocity of pulmonic valve regurgitation jet measured by continuous-wave Doppler echocardiography
  • RA
  • right atrium
  • RH
  • right heart
  • RV
  • right ventricle
  • TR
  • tricuspid valve regurgitation
  • TR Vmax
  • maximal velocity of tricuspid valve regurgitation jet measured by continuous-wave Doppler echocardiography
  • 1 Introduction

    Historically, the clinical relevance of the right heart (RH) in various heart and lung diseases in people has been underestimated, and its importance has only recently become apparent [1]. Pulmonary hypertension (PHT), resulting in right ventricular (RV) pressure overload, volume overload, and functional impairment, is the most common cause of RH failure in humans [1-3]. The prognostic relevance of RH size and function has been demonstrated in a wide range of cardiac diseases, from subclinical mitral valve insufficiency to heart failure [4-10]. This development has led to the inclusion of RH assessment as a standard component of echocardiographic examination in people [1, 4, 11, 12]. The relevance of RH function and RH disease in horses is still largely unknown. As in people, PHT is a possible consequence of a variety of diseases, of which severe mitral valve disease, severe lung disease, and congenital disease are probably the most common ones [13].

    Echocardiographic assessment of the RH is challenging due to its complex geometric shape. In horses, echocardiographic studies are further complicated by the animals' size and by the anatomical position of the heart in the thorax, which limits the available sonographic windows [13]. Objective RH quantification is considered difficult and has scarcely been described in horses [14]. Current recommendations for echocardiography in horses emphasize the importance of subjective assessment of the RH in multiple echocardiographic image planes [15].

    Subjective assessment by visual estimation of the RH size and function in humans has been shown to be inaccurate, with a wide interobserver variability [16-18]. However, “eyeballing” has been shown to be associated with a good sensitivity differentiating between normal and abnormal RV function and allows detection of severe systolic RV impairment [18, 19]. The level of observer experience was repeatedly shown to influence the ability to detect RH abnormalities by subjective echocardiographic assessment [17, 19, 20].

    To date, no data on observer agreement and reliability of subjective echocardiographic assessment to detect RH abnormalities in horses are available. The aims of this study were therefore (a) to evaluate the inter-rater and intra-rater agreement of subjective assessment of RH size and function in horses; (b) to evaluate the presence of subjectively assessed changes in RH size and function in horses with and without PHT; (c) to evaluate the subjective assessment of echocardiographic standard images to detect cases with PHT; and (d) to evaluate the influence of the level of experience on the rater agreement and the ability to detect cases with PHT.

    2 Materials and Methods

    2.1 Study Sample

    The study sample consisted of horses that had undergone a complete echocardiographic examination at the authors' institution between August 2007 and September 2023. Examinations of adult horses (> 3 years old) of all breeds and sexes were retrospectively chosen from the digital echocardiography database (EchoPac, GE Healthcare, Glattbrugg, Switzerland). All echocardiographic examinations were performed by experienced clinicians using a standardized protocol.

    Horses either were considered healthy, or they had to be diagnosed with heart disease with a tricuspid valve regurgitation (TR) or pulmonic valve regurgitation (PR) interrogated by continuous-wave Doppler echocardiography. Cases were selected with the goal to include a variety of patients with normal RH size and function, different degrees of abnormal RH size and function, and the presence or absence of PHT.

    The cases were assigned to three groups according to their cardiac health status, similarly to a previous study conducted at this institution [21]. Group “Normal” included 30 horses with normal medical history, physical examination, electrocardiogram, and echocardiographic examination. Group “PHT” included 30 horses with Doppler echocardiographic findings suggesting PHT. Twenty-nine horses had TR or PR with maximal velocity of pulmonic regurgitation (PR Vmax) faster than 2.5 m/s, suggesting a mean pulmonary artery pressure (PAP) of > 30 mmHg (with an assumed right atrial pressure of 5 mmHg) or maximal velocity of tricuspid regurgitation (TR Vmax) faster than 3.2 m/s, suggesting a systolic PAP of > 46 mmHg (with an assumed right atrial pressure of 5 mmHg). The cut-offs were set based on previous studies in horses and in accordance with small animal guidelines [21-23]. All Doppler measurements had been taken as part of the initial examination. One horse was included with invasively measured PAP confirming the presence of PHT (systolic/diastolic (mean) PAP of 81/32 (51) mmHg) in the absence of pulmonic or tricuspid valve insufficiencies. Group “No-PHT” included 30 horses with heart disease but without Doppler echocardiographic evidence of PHT. All these horses had TR or PR detected, but maximum jet velocities did not suggest the presence of PHT (PR Vmax ≤ 2.5 m/s, TR Vmax ≤ 3.2 m/s).

    Information on age, sex, breed, body weight, heart rate, and main diagnosis were extracted from the echocardiography database.

    2.2 Echocardiographic Recordings

    For each horse, nine standard echocardiographic cine-loop recordings (Figure 1) including at least three consecutive cardiac cycles were selected to be included in this study. None of the image planes available in the database had originally been recorded with the main goal of RH assessment. Hence, none of the recordings were optimized to image the RH, but all image planes contained parts of the RH for subjective assessment. All recordings included a simultaneous electrocardiographic recording. All cine-loop clips were exported in .wmv video format and were stored in the same order in a case-specific folder that contained two subfolders: subfolder 1 contained 8 recordings (A–H in Figure 1) and subfolder 2 contained the last recording (I in Figure 1) separately. The main folder and all file names were coded for blinding purposes. Fifteen cases, 5 out of each group, were included twice, with different case identification (i.e., folder and file coding), for assessment of intra-rater agreement.

    Details are in the caption following the image
    The nine standard echocardiographic image planes provided as video clips to observers to grade right heart size and function, as well as pulmonary artery size, pulmonary artery distensibility, and the presence of pulmonary hypertension. (A) Right parasternal long axis four chamber view, focused on the left atrium; (B) Right parasternal long axis four chamber view, focused on the left ventricle (LV); (C) Right parasternal long axis right ventricular outflow tract view; (D) Right parasternal short axis view of the LV at the level of the apex; (E) Right parasternal short axis view of the LV at the level of papillary muscles; (F) Right parasternal short axis view of the LV at the level of the chordae tendineae; (G) Right parasternal short axis M-mode image of the LV at the level of the chordae tendineae; (H) Left parasternal long axis view of the aortic valve and LV; (I) Right parasternal long axis left ventricular outflow tract view.

    Based on video A-H, observers were asked to grade RV size, RV function, right atrial (RA) size, septal flattening, and septal motion as “normal”, “mild change”, “moderate change”, and “severe change”. Furthermore, an overall assessment of RH size and an overall assessment of RH function was given using the same categories (normal, mild, moderate, severe change). Only after grading video A-H, observers were allowed to view video I (LV outflow tract with pulmonary artery in cross section). They were then asked to grade pulmonary artery (PA) size and distensibility (normal, mild, moderate, severe), and to make an overall judgment on the presence or absence of PHT (yes/no). No change in assessment scores of video A–H was allowed after viewing of video I. This protocol was used to preclude observer bias on RH assessment based on the PA assessment. The used protocol can be found in supplementary materials (Suppl. Mat. 1).

    2.3 Observers

    Observers were divided into two groups based on their experience. The aim was to have a minimum of three people per group, and suitable veterinarians were actively asked to participate in the study. The first group of observers (“experts”) consisted of four equine internal medicine specialists with at least three years' experience in equine echocardiography. The second group (“beginners”) comprised five veterinarians without specific experience in performing and interpreting echocardiographic examinations of horses, such as interns and equine internal medicine residents.

    In a joint introduction provided by the first author (Julia N. van Spijk), all observers were presented with the same basic principles of RH anatomy and function, standard echocardiographic image planes, normal findings, and clearly abnormal right heart findings. Observers were instructed not to measure any dimensions but to base their judgment solely on subjective evaluation. No specific definitions of what constitutes mild, moderate, or severe changes were given, and the observers were not provided with any reference cases for the grading. Observers were blinded to case information and group allocation. Case order was randomized and the same for all observers.

    2.4 Data Analysis

    Normal distribution of data was tested using histogram inspection and Shapiro-Wilk testing for age, body weight, and heart rate data. Descriptive statistics were used showing median and range for non-parametric data and mean and range for normally distributed data. To test for differences between groups, Kruskal-Wallis test with Bonferroni post hoc analysis was used for age and heart rate, and analysis of variance (ANOVA) was used for body weight. Fisher's exact test was used to compare sex and breed distribution between groups.

    Inter-rater agreement was calculated using percentual agreement (% agree) and Fleiss Kappa statistics (k, agreement beyond chance) for each point of the subjective assessment (RV size, RV function, RA size, septal flattening, septal motion, overall RH size, overall RH function, PA size, PA distensibility, presence of PHT) and all ratings together. For the latter, the grading differences for all points of the analysis were analyzed in one statistic. Inter-rater agreement was calculated for all observers together and for each group of observers (i.e., experts and beginners).

    Intra-rater agreement was calculated using percentual agreement and weighted Cohen's Kappa statistics (k, agreement beyond chance) for each point of the subjective assessment and all ratings together.

    The frequency of gradings (normal, mild, moderate, severe) per point of the subjective analysis (RV size, RV function, RA size, septal flattening, septal motion, overall RH size, overall RH function, PA size, PA distensibility, presence of PHT) and per group (Normal, No-PHT, PHT) was calculated and described. A heat map graph indicating the grading of each point per observer was created for each horse and arranged per group (Normal, No-PHT, PHT) and depending on PR Vmax and TR Vmax values.

    Sensitivity, specificity, positive predictive values (PPV), and negative predictive values (NPV) of subjective judgment of the presence of PHT were calculated per observer based on the group assignment (Normal, No-PHT, PHT).

    Statistical analyzes were performed using commercially available statistical software (IBM SPSS statistics, v28). The level of agreement beyond chance using kappa statistics was considered almost perfect (k = 0.81–1), substantial (k = 0.61–0.8), moderate (k = 0.41–0.6), fair (k = 0.21–0.4), slight (k = 0–0.2), or none (k < 0) [24].

    3 Results

    3.1 Study Sample

    An overview of age, sex, breed, body weight, and heart rate in different groups can be found in Table 1. Statistically significant differences between groups were found in age, with higher values in the group No-PHT compared to group Normal and group PHT, and in heart rate, with higher values in the group PHT compared to group No-PHT. The Fisher exact test showed differences in sex and breed distribution between groups. No statistically significant differences were found between groups for body weight.

    TABLE 1. Age, sex, breed, body weight, and heart rate data of horses in different groups for subjective evaluation of RH size and function using standard echocardiographic images.
    Overall n = 90 Group normal n = 30 Group no-PHT n = 30 Group PHT n = 30 p
    Age 3–27 (10) years 3–22 (9) years [§] 3–27 (15) years [§, ⁎] 3–25 (8) years [⁎]

    0.01

    [⁎, p = 0.001]

    [§, p = 0.048]

    Sex

    32 (36%) mares

    53 (59%) geldings

    5 (6%) stallions

    16 (53%) mares

    14 (47%) geldings

    0 (0%) stallions

    6 (20%) mares

    22 (73%) geldings

    2 (7%) stallions

    10 (33%) mares

    17 (57%) geldings

    3 (10%) stallions

    0.04
    Breed

    62 (69%) Warmblood

    7 (8%) Thoroughbred

    6 (7%) Arabian

    4 (4%) Standardbred

    11 (12%) Others

    21 (70%) Warmblood

    4 (13%) Standardbred

    5 (17%) Others

    21 (70%) Warmblood

    4 (13%) Arabian

    1 (3%) Thoroughbred

    4 (13%) Others

    20 (67%) Warmblood

    6 (20%) Thoroughbred

    2 (7%) Arabian

    2 (7%) Others

    0.005
    Body weight 350–720 (542) kg 350–707 (547) kg 420–660 (542) kg 360–720 (537) kg > 0.05
    Heart rate 23–89 (39) bpm 32–54 (39) bpm 28–64 (36) bpm [⁎] 27–89 (45) bpm [⁎] 0.02 [⁎, p = 0.03]
    • Note: Statistical differences between groups are marked by identical symbols.
    • Abbreviations: bpm, beats per minute; PHT, pulmonary hypertension.
    • a Other breeds include: 3 Franche-Montagne, 2 Andalusian, 2 Appaloosa, 2 Paint Horse, 1 Pinto, 1 Quarter Horse.

    Group Normal included horses with normal clinical and echocardiographic examinations without pathologic findings except for trace valvular insufficiencies. Main diagnoses in horses of the group No-PHT were mild to severe valvular insufficiencies (n = 29) and diastolic LV dysfunction of unknown origin (n = 1). Nine horses in this group showed arrhythmias. Main diagnoses in horses of the group PHT were mild to severe valvular insufficiencies (n = 22), ventricular septal defect (n = 2), congestive heart failure secondary to valvular insufficiencies (n = 4), cardiomyopathy, and pulmonary arteritis (n = 1 each). Nine horses in this group showed arrhythmias. Detailed information can be found in supplementary materials (Suppl. Mat. 2).

    3.2 Inter-Rater Agreement

    The overall inter-rater agreement beyond chance in all observers was fair (k = 0.21) with a perfect agreement between observers in 61% of all ratings (Figure 2). The overall inter-rater agreement was higher in experienced observers (k = 0.34, 77% agree) than in beginners (k = 0.18, 52% agree). The kappa inter-rater agreement between all observers was slight (k = 0–0.2) for RA size, RV size, RV function, septal motion, septal flattening, overall RH size, and overall RH function, while kappa inter-rater agreement only including experienced observers for these criteria was fair (k = 0.2–0.4; Figure 3). The inter-rater agreement beyond chance for the assessment of PA size and distensibility was fair in beginners and when including all observers (k = 0.2–0.3), and moderate (k = 0.4–0.6) in experienced observers. The kappa inter-rater agreement on the presence of PHT was fair in all observers (k = 0.40) and beginners (k = 0.28), and substantial in the experienced observer group only (k = 0.79). Percentual agreement between observers is shown in Figure 3.

    Details are in the caption following the image
    Inter-observer agreement shown as percentual agreement including all gradings in subjective evaluation of right heart size and function, pulmonary artery, and the presence of pulmonary hypertension including 90 echocardiographic examinations of horses with and without PHT. k indicates Fleiss Kappa per observer group. The difference in gradings is depicted with different colors according to the legend.
    Details are in the caption following the image
    Inter-observer agreement shown as percentual agreement in subjective evaluation of different criteria of right heart size and function, pulmonary artery, and the presence of pulmonary hypertension including 90 echocardiographic examinations of horses with and without PHT. k indicates Fleiss Kappa per observer group. The difference in gradings is depicted with different colors, as shown in the legend.

    3.3 Intra-Rater Agreement

    One pair of exams for intra-rater assessment needed to be excluded due to wrong video assignment. Therefore, 14 examinations in pairs (Group Normal and No-PHT: 5 cases each, Group PHT: 4 cases) were included in the intra-rater assessment.

    Perfect percentual intra-rater agreement ranged from 52.1% to 93.6% with values > 80% in all experienced observers and < 80% in all observers defined as beginners (Figure 4). Weighted Cohen's Kappa ranged from 0.33 to 0.77, and therefore indicated fair agreement beyond chance in 3 observers, moderate agreement in 3 observers, and substantial agreement in 3 observers. In one observer (Expert 2) agreement differed considerably between percentual agreement with 82% agreement and Cohen's Kappa statistics of k = 0.35, only indicating a fair agreement. This can be explained by the asymmetric distribution of gradings of this observer in the first versus the second analysis of the same exam with an overrepresentation of the grade 0 in the second gradings [25]. Intra-rater agreement per each point of the assessment can be found in supplementary material (Suppl. Mat. 3).

    Details are in the caption following the image
    Intra-rater agreement of the subjective assessment of grading the right heart size, right heart function, and the presence of pulmonary hypertension based on 14 echocardiographic exams. k indicates calculated weighted Cohen's Kappa for each observer.

    3.4 Descriptive RH Changes

    Subjective assessment of RH size and function was more commonly abnormal in the PHT group than in the other groups (Figure 5). Overall RH size was considered abnormal in 63%, 32%, and 35%, and moderately or severely abnormal in 41%, 6%, and 6% of exams in the groups PHT, No-PHT, and Normal, respectively. Overall RH function was considered abnormal in 52%, 25%, and 19%, and moderately or severely abnormal in 24%, 4%, and 3% of exams in the groups PHT, No-PHT, and Normal, respectively. Overall subjective assessment of PHT suggested the presence of PHT in 47%, 11%, and 12% of exams in the groups PHT, No-PHT, and Normal, respectively. When considering the absolute PR Vmax values of individual cases, cases with faster PR Vmax values showed more severe changes in the RH and were more likely to be assessed as PHT cases (Figure 6). In all cases with PR Vmax > 3.0 m/s, a minimum of 8/9 observers agreed to the presence of PHT. With TR Vmax values there was no clear relation between the TR velocities and the agreement on the presence of PHT (Suppl. Mat. 4). An overview of gradings of all observers in all exams of different groups can be found in the supplementary material (Suppl. Mat. 5).

    Details are in the caption following the image
    Subjective evaluation of different criteria of right heart size and function, pulmonary artery, and the presence of pulmonary hypertension using standard echocardiographic images of 9 observers. Groups included 30 normal horses (group Normal), 30 horses with cardiac disease but without PHT (group No-PHT), and 30 horses with Doppler echocardiographic signs of PHT (group PHT).
    Details are in the caption following the image
    Heat maps to summarize subjective evaluation of different criteria of right heart size and function, pulmonary artery, and the presence of PHT by 9 different observers using standard echocardiographic images in horses with suggested PHT based on PR Vmax > 2.5 m/s.

    3.5 Sensitivity and Specificity to Detect PHT

    Subjective assessment of the available echocardiographic cine-loop recordings showed a high specificity, but low sensitivity to detect PHT cases in most observers, except for one observer (Beginner 2; Table 2). This specific observer also showed the lowest intra-rater observer agreement value (see above).

    TABLE 2. Sensitivity, specificity, positive and negative predictive values of subjective assessment of the presence of PHT in 90 echocardiographic examinations of horses with and without PHT in experienced and non-experienced observers.
    Sensitivity Specificity PPV NPV
    Expert 1 33% 100% 100% 75%
    Expert 2 43% 95% 81% 77%
    Expert 3 33% 95% 77% 74%
    Expert 4 33% 100% 100% 75%
    Beginner 1 40% 95% 80% 76%
    Beginner 2 97% 45% 47% 97%
    Beginner 3 43% 95% 81% 77%
    Beginner 4 40% 80% 50% 73%
    Beginner 5 57% 88% 71% 80%
    • Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

    In experienced observers, specificity was 95%–100%, sensitivity 22%–43%, PPV 77%–100%, and NPV 74%–77%. In beginners, values were more variable with a sensitivity of 40%–97%, specificity of 45%–95%, PPV of 47%–81%, and NPV of 73%–97%.

    4 Discussion

    This retrospective study evaluated the rater agreement of subjective assessment of RH size and function using standard echocardiographic recordings in healthy horses and in horses with heart disease with and without echocardiographic evidence of PHT. Furthermore, it evaluated the presence of subjectively assessed RH changes and the reliability of detecting horses with suspected PHT by using subjective RH assessment only. Results of this study show low inter-rater agreement of subjective analysis of RH size and function using standard echocardiographic images. While subjective echocardiographic assessment of RH size and function and ruling out PHT proved to be unreliable, experts showed substantial inter-rater agreement on the presence of PHT and were able to detect cases with suggested PHT with a high specificity.

    Grading of changes in RH size and function varied markedly between observers, and the inter-rater agreement of subjective assessments was poor overall. These findings are comparable to results on visual assessment of RV size and function in humans, showing considerable variability in gradings [16, 17, 20]. The visual assessment of RH size and function is therefore considered inaccurate, and the addition of quantitative assessment was shown to increase accuracy and decrease variability in humans [16]. In horses, quantitative measurement of the RH size and function is considered challenging; however, data on the variability of echocardiographic measurements of RV size and function in healthy horses have been published [14]. Whether the inclusion of RH measurements in echocardiographic protocols in horses increases rater agreement is unknown.

    Echocardiographic assessment of PA size and PA-to-aortic diameter ratios correlate with PA pressures in humans and dogs and have been shown to be abnormal in horses with PHT [13, 15, 21, 26-29]. In the present study, the subjective assessment of the size and distensibility of the PA showed better inter-rater agreement than assessment of the RH. However, the agreement beyond chance was still only fair to moderate. Interestingly, the best inter-rater agreement was seen in the overall assessment of the presence of PHT with a substantial agreement beyond chance in experienced observers. The number of options influences the rater agreement, and agreement in binary criteria such as the presence or absence of PHT tends to be higher compared to criteria with gradings. However, the substantial agreement beyond chance with perfect match of grading in > 90% of cases in experienced and moderate agreement beyond chance with perfect match of grading in > 70% of cases in inexperienced observers is still remarkable. Overall, subjective assessment including all individually judged points, summarized in a binary outcome (PHT, No-PHT) improved the inter-rater agreement substantially.

    A good rater agreement does not ensure the correctness of a judgment. While the specificity to detect cases with suspected PHT was generally high, the sensitivity to detect these cases by subjective assessment of standard two-dimensional and M-mode echocardiographic images was low. Experienced observers reached specificities of 95%–100% with positive predictive values over 75%, indicating a high certainty of the presence of PHT if an experienced echocardiographer judges PHT to be present. On the other hand, sensitivity was low with values around 35%, suggesting that a considerable number of cases will be missed by only subjective assessment of two-dimensional and M-mode echocardiographic recordings. Inclusion of further techniques such as objective measurements of RH and PA dimensions, Doppler interrogation of PR or TR, if present, Doppler interrogation of pulmonic flow, or tissue Doppler imaging of the RV might increase the sensitivity to detect PHT. Invasive PA pressure measurement by catheterization remains the gold standard to detect PHT. Considering the definition of PHT based on Doppler measurements of PR and TR in the present study, misclassification of cases is possible. Consequently, the sensitivity of subjective PHT detection using gold standard measurements might be even lower, as some cases of PHT could have been missed.

    Overall, subjective RH assessment showed clear differences between groups of horses: RH size and function were judged to be abnormal in 63% and 52% of horses with PHT, but only in 32%–35% and 19%–25% of horses without PHT. RH enlargement and decreased RH systolic function are known changes associated with RH volume and pressure overload, eventually leading to RH failure. Notably, in the present study, one third of the PHT cases were not judged to have subjective changes in RH size, and almost half of the cases were even considered to have normal RH function. It is unclear whether including more objective methods such as measurement of RH dimensions and assessment of RH function using tricuspid annular plane systolic excursion (TAPSE), tissue Doppler methods, or two-dimensional strain analyzes would have revealed abnormal findings in these cases. On the other hand, it is also unclear whether all horses with PHT indeed had RH changes that were not detected or if the degree of PHT did not lead to relevant RH changes in these horses. Furthermore, in this study, standard echocardiographic views were used, and it is unknown if inclusion of more focused RH views would have improved rater agreement, detection of RH changes or presence of PHT. However, it seems likely that subtle changes in RH size and function are easily missed using subjective assessment only. This is in agreement with the low sensitivity to detect PHT cases in this study and is highlighted by the fact that cases with higher PR Vmax were more commonly judged to have PHT than cases with lower PR Vmax. Of interest, this correlation was not clearly seen with TR Vmax measurements. This might be explained by the greater difficulty in aligning the ultrasound beam with the tricuspid regurgitant jets compared to the pulmonic regurgitant jets, causing a higher error rate in this measurement.

    Observer experience has been shown to strongly influence the agreement of ratings on subjective RH assessment in humans [17, 19, 20]. This effect was also seen in the present study with higher inter-rater agreement values in all evaluated criteria in experts compared to less experienced observers. In addition to the inter-rater agreement, the intra-rater agreement was also clearly higher in experienced cardiologists, with values of perfect match of gradings consistently over 80%. Gradings of experienced observers were more consistent and showed considerable agreement within the group of experts, highlighting the importance of experience in this challenging field of veterinary medicine.

    Statistically significant differences in heart rates were found between different groups of horses. As PHT can be a sequela of heart failure, typically presenting with tachycardia, higher heart rates in these cases seem logical and inevitable. However, higher heart rates in the PHT group might have influenced observers towards the presence of more severe RH changes by assumption of more severe heart disease. More severe heart disease and an increased likelihood of PHT by obvious heart disease such as the presence of a VSD in this group might also have influenced observers. Left heart disease, however, was also present in the group No-PHT, which could have influenced the ratings and explained some right heart changes, such as abnormal septal motion, without the presence of PHT. In addition, the presence of arrythmias (9/20 group PHT, 9/20 group No-PHT, 0/20 group Normal) might have biased observers towards the presence of RH changes. Furthermore, the influence of heart rate on pulmonary pressures and TR Vmax and PR Vmax measurements, respectively, is not completely known. Higher pressures are known to occur at higher heart rates during exercise, and higher heart rates may therefore have influenced pressure estimates and group allocation. However, a relevant influence of heart rate on maximum velocities leading to misclassification into different groups seems questionable. The higher age of horses in the group No-PHT might be correlated to a high percentage of horses with aortic valve insufficiency in this group, which is more likely to occur in older horses. Statistically significant differences found in age, sex, and breeds among horses in different groups are unlikely to have influenced the results of the present study.

    A main limitation of this study is the absence of a gold standard to define PHT. Group allocation of horses with and without PHT was based on indirect pressure estimates using PR Vmax and TR Vmax, assessed by Doppler echocardiography; direct pressure measurement was only performed in a single case. Doppler echocardiographic measurements easily underestimate pressure measurements due to imperfect alignment or image quality of the Doppler signal. On the other hand, Doppler signals can also be overestimated due to intrinsic spectral broadening or suboptimal gain settings. Underestimation of Doppler measurements seems more common due to difficult alignment with PR and TR jets in horses. Therefore, if PR Vmax and TR Vmax were discordant with respect to the given cut-offs and only one measurement was indicative of PHT, it was considered more likely that one measurement was underestimated, and horses then were still assigned to the PHT group. While this classification might have introduced some bias, it also reduced the likelihood of including horses with PHT in the No-PHT group. Furthermore, in the absence of right-sided valvular insufficiencies, some cases of PHT might have been missed. Without using direct pressure measurements, we cannot rule out the possibility that cases with PHT were missed, especially in the group No-PHT where some degree of precapillary or postcapillary PHT due to severe heart or lung disease cannot be precluded. In healthy horses, PR and TR velocities were not measured and measurable regurgitations were mostly absent. However, it seemed reasonable to assume the absence of PHT in these horses based on history, clinical, and echocardiographic examination. Cut-off velocities to diagnose PHT are not well established and validated to diagnose PHT in horses. Therefore, cut-offs were chosen based on published data on PA pressures in healthy horses, previously used definitions from another study, and respective small animal guidelines [21-23]. Hence, the group allocation might have been imperfect. However, this did not affect the assessment of rater agreement, which was a major goal of the study.

    The gold standard to assess RH size and function is cardiac magnetic resonance, and studies evaluating subjective assessment of echocardiographic images in people compare their results to magnetic resonance results to determine true sensitivity and specificity [17, 18]. Due to the size of the horse, this is obviously not feasible, and a comparison to a true gold standard for RH assessment is not applicable.

    Another limitation of this study is the small number of observers. A larger group of observers, including multi-centric data, would have improved the study. However, this was not practicable within the scope of this investigation. All observers received a joint introduction to normal and abnormal echocardiographic findings of the right heart. The goal was to provide at least some basic knowledge to the inexperienced observers to make it feasible for them to assess the right heart by knowing the image planes, anatomical structures, and possible changes. This might have introduced some bias by improving the inter-rater assessment slightly; however, inter-rater agreement was still low, and there was an obvious difference between experienced and inexperienced observers. Hence, inter-observer agreement would be even worse when different observers with very different backgrounds and expertise were asked to subjectively assess right heart size and function.

    To avoid bias on the RH assessment because of the concurrent assessment of the pulmonary artery size and distensibility, this study included a 2-step approach to the assessment of RH size and function and the presence of PHT. The aim was to evaluate observer agreement of RH assessment without considering other information such as pulmonary artery size, presence of right-sided valve insufficiencies, or other clinical information on the case. This situation differs from a clinical setting, where all echocardiographic images are available, and all findings are taken together to make a final, most appropriate diagnosis.

    5 Conclusions

    This study shows low inter-rater agreement of subjective analysis of RH size and function using standard echocardiographic images in horses with and without PHT. Sensitivity to detect cases with PHT and associated RH changes was low, however, specificity to detect PHT was considerably higher. Severe cases of PHT, characterized by high PR Vmax, were more likely to be associated with moderate to severe subjective changes in RH size, RH function, and the presence of PHT. Experienced observers demonstrated greater accuracy and higher consistency in their assessments, underscoring the role of expertise in subjective echocardiographic evaluations.

    Acknowledgments

    Open access publishing facilitated by Universitat Zurich, as part of the Wiley - Universitat Zurich agreement via the Consortium Of Swiss Academic Libraries.

      Disclosure

      Authors declare no off-label use of antimicrobials.

      Ethics Statement

      Authors declare no institutional animal care and use committee or other approval was needed. Authors declare human ethics approval was not needed.

      Conflicts of Interest

      The authors declare no conflicts of interest.

        The full text of this article hosted at iucr.org is unavailable due to technical difficulties.