Prospective evaluation of the Sunshine Appendicitis Grading System score
Abstract
Background
Although there is a wealth of information predicting risk of post-operative intra-abdominal collection and guiding antibiotic therapy following appendicectomy, confusion remains because of lack of consensus on the clinical severity and definition of ‘complicated’ appendicitis. This study aimed to develop a standardized intra-operative grading system: Sunshine Appendicitis Grading System (SAGS) for acute appendicitis that correlates independently with the risk of intra-abdominal collections.
Methods
Two-hundred and forty-six patients undergoing emergency laparoscopy for suspected appendicitis were prospectively scored according to the severity of appendicitis and followed up for complications including intra-abdominal collection. After termination of the study, the SAGS score was repeated by an independent surgeon based on operation notes and intra-operative photography to determine inter-rater agreement. The primary outcome measure was incidence of intra-abdominal collection, secondary outcome measures were all complications and length of stay.
Results
SAGS score demonstrated good inter-rater agreement (kappa Kw 0.869; 95% CI 0.796–0.941; P < 0.001). A risk ratio of 2.594 (95% CI 0.655−4.065; P < 0.001) for intra-abdominal collection was found using SAGS score as a predictor. The discriminative ability of SAGS score was supported by an area under the curve value of 0.850 (95% CI 0.799–0.892; P < 0.001).
Conclusions
SAGS score can be used to simply and accurately classify the severity of appendicitis and to independently predict the risk of intra-abdominal collection. It can therefore be used to stratify risk, guide antibiotic therapy, follow-up and standardize the definitions of appendicitis severity for future research.
Introduction
Appendicitis is one of the most common surgical emergencies, yet controversies still remain regarding its management. Most common variations occur over duration of inpatient admission and length of antibiotic use.
The rationale for the use of post-operative antibiotics is primarily to reduce the risk of post-appendicectomy intra-abdominal collection (IAC),1 which has a high morbidity usually requiring re-admission and intervention. Conversely, there is a burden on the patient and population at large of inappropriate antibiotic use.2 Despite extensive research and meta-analyses aimed at determining which patients are at higher risk of IAC (and therefore require post-operative antibiotics), solid consensus guidance remains elusive.
A major contributing factor to the confusion surrounding treatment guidelines is the lack of consistency and terminology when describing the severity of appendicitis. Basic principles would indicate that the severity of intra-abdominal contamination should correlate with the development of IAC and other complications. This theory has been well established in diverticulitis and has facilitated the recording of severity in this condition and therefore supported the generalizability of subsequent research.3 Currently, scoring systems for severity of appendicitis use preoperative biochemical markers, physiological markers or even imaging results to stratify severity.4, 5 Many of these parameters are either unavailable (e.g. imaging) or expensive and arguably unnecessary (e.g. C-reactive protein, CRP). A few use intra-operative findings to establish severity including terms such as ‘gangrenous’, ‘necrotic’ and perforated without clear definitions of each or inter-publication consistency.6, 7
The Sunshine Appendicitis Grading System (SAGS) uses clinical principles to provide a very simple score based on intra-operative findings to stratify severity of appendicitis. The hypothesis being that this simple score will accurately and independently predict the risk of intra-abdominal collection.
The aims of the study were to validate the SAGS score as a reproducible intra-operative tool and to determine if the SAGS score can independently predict development of intra-abdominal collection.
Methods
The study was a prospective observational study to evaluate SAGS. Ethical approval was gained from the committee prior to implementation.
Two-hundred and forty-six patients undergoing laparoscopic appendicectomy for presumed appendicitis were recruited between August 2012 and March 2013. Patient demographics, temperature, white cell count and presence or absence of diabetes were recorded along with duration and mode of antibiotics.
Two meetings were held prior to the commencement of the study to educate surgeons on scoring appendicitis with SAGS. Posters with example photographs of each score and a table of the scoring system (Table 1) were placed around the ‘write-up room’. Surgeons were reminded to visualize and photograph all four quadrants of the abdomen and asked to score the severity of intra-operative findings with the SAGS system (Table 1). Current treatment guidelines for acute appendicitis at Sunshine Hospital were followed to determine duration and route of administration of antibiotics.
SAGS score | Intra-operative findings |
---|---|
0 | No appendicitis |
1 | Simple appendicitis (any of the following):
|
2 | Purulent appendicitis (any of the following):
|
3 | Purulent appendicitis with four quadrant contamination |
4 | Perforated appendix (any of the following):
|
- RIF, right iliac fossa.
Patients were followed up by clinical review at 2 weeks and telephone interview at 6 weeks. Case note review was also performed at this stage to identify re-admissions, re-presentations and re-operations.
Outcome data included length of stay, complications and re-presentations. The primary outcome measure was intra-abdominal collection. Intra-abdominal collection was diagnosed on computed tomography (CT) or by ultrasound scanning. Imaging was performed on the basis of symptoms at the discretion of the treating clinician.
SAGS was re-scored by an independent surgeon at the termination of the study based on the operative notes and intra-operative photos. No score was awarded if the operative note provided insufficient data.
Statistical analysis
Inter-rater agreement for SAGS scores was calculated using weighted kappa statistics (Kw) because the categories were ordered. A minimum sample size of 49 patients for two raters was calculated to achieve a power of 80% (two-sided alpha = 0.05) to detect Kw > 0.4. For analysis of bias between two raters (one giving consistently higher or lower scores than the other), an exact single binomial test was used to calculate a χ2 where two-sided P < 0.05 indicates bias between raters.8 Univariate analyses were carried out for SAGS scores and intra-abdominal collections. Binary logistic regression modelling was then used to see if SAGS scores and white cell count could predict intra-abdominal collections, the outcome variable. P < 0.05 was considered statistically significant. All analyses were performed with Stata v12 (StataCorp, College Station, TX, USA).
Results
Two-hundred and forty-six patients were included, 199 (81%) completed follow-up for 6 weeks. Of the 47 patients who could not be contacted, their notes were reviewed to identify re-presentations or re-admissions. SAGS scores were recorded for all patients at operation and 245 patients by the observer.
The largest cohort was SAGS 1 with 120 patients. Male to female ratio was equal throughout the cohorts except in the SAGS 0 group, which contained 80% female patients.
Complication rate and length of stay (LOS) increased alongside SAGS score. Average LOS was 1–3 days in SAGS 0 and 1 groups, respectively, and 8–10 days in SAGS 3 and 4 (Table 2).
SAGS | Total number of patients | Male (%) | Age (years) | Average duration of antibiotics (days) | Length of stay (days) |
---|---|---|---|---|---|
0 | 54 | 20.30 | 24 | 2.2 | 1.5 |
1 | 120 | 57.50 | 28 | 2.2 | 2.5 |
2 | 37 | 54 | 30 | 2.2 | 4.6 |
3 | 2 | 50 | 57 | 4.5 | 10 |
4 | 30 | 50 | 33 | 5 | 8.8 |
- SAGS, Sunshine Appendicitis Grading System.
A weighted kappa value (Kw) of 0.869; 95% CI 0.796–0.941; P < 0.001 demonstrated good inter-rater agreement for the SAGS score. Furthermore, no bias between the raters was shown (P = 0.711; Table 3).
Rater 1 | ||||||
---|---|---|---|---|---|---|
Rater 2 | 0 | 1 | 2 | 3 | 4 | Total |
0 | 30 | 7 | 0 | 0 | 0 | 37 |
1 | 2 | 58 | 5 | 0 | 3 | 68 |
2 | 0 | 7 | 18 | 0 | 1 | 26 |
3 | 0 | 0 | 1 | 0 | 0 | 1 |
4 | 0 | 0 | 2 | 1 | 18 | 21 |
Total | 32 | 72 | 26 | 1 | 22 | 153 |
- Kw = 0.869; 95% CI 0.796–0.941; P < 0.001. Bias (by method of single binomial test): P = 0.711; no bias.
There were 11 intra-abdominal collections identified (4.5%). Nine occurred in the SAGS 1–4 group (appendicitis) and two occurred in the SAGS 0 (non-appendicitis); of these, one was a tubo-ovarian abscess and one an infected haematoma.
Univariate analysis revealed a significant association between SAGS score and intra-abdominal collection (P < 0.001; Table 4). Binary logistic regression was performed excluding the SAGS 0 group (because of the confounding effects of an alternative diagnosis) using SAGS score and white cell count (WCC) as independent predictors. A risk ratio of 2.594 (P < 0.001) was found for SAGS score. WCC was not a predictor in this model (Table 5).
SAGS | Intra-abdominal collection | Total | |
---|---|---|---|
No | Yes | ||
0 | 55 | 0 | 55 |
1 | 120 | 1 | 121 |
2 | 34 | 3 | 37 |
3 | 3 | 0 | 3 |
4 | 25 | 5 | 30 |
Total | 237 | 9 | 246 |
- χ2 (4 d.f.) = 21.437; P < 0.001. SAGS, Sunshine Appendicitis Grading System.
IAC | Risk ratio | SE | z | P | 95% CI |
---|---|---|---|---|---|
SAGS | 2.594 | 0.595 | 4.1 | <0.001 | 0.655–4.065 |
WCC ≥ 13 | 0.284 | 0.188 | −1.9 | 0.057 | 0.078–1.037 |
- CI, confidence interval; SAGS, Sunshine Appendicitis Grading System; SE, standard error; WCC, white cell count.
Finally, the area under the curve for sensitivity and specificity was calculated at 0.85 indicating SAGS score is a robust predictor of intra-abdominal collection (Fig. 1).

The predictive power of the model. Area under the curve (AUC): 0.850; 95% CI 0.799–0.892; P < 0.001.
Discussion
Vast swathes of literature are published about appendicitis and the approach to diagnosis, post-operative care and management are changing. There is a move towards minimizing inpatient stay and post-operative antibiotic use but it can be difficult to predict which patients require more intense follow-up and longer courses of treatment. It is challenging to compare published literature as there is little consistency in assessed variables and arbitrary grouping of ‘complicated appendicitis’ without a specific definition.
Investigators have attempted to use physiological parameters for predicting intra-abdominal abscess. Fraser et al. found a correlation between fever and white cell count and IAC formation however this only became statistically significant at days 3 and 5 post-operatively, respectively, limiting its clinical utility.9 An initial high WCC and CRP do correlate with a higher risk of having a perforated appendix10 and are therefore surrogate markers for the intra-operative severity of disease rather than independent predictors of post-operative course. As many patients are diagnosed and treated prior to mounting an inflammatory response, it seems more reasonable to use clinical over physiological markers of severity.
Despite this, there is considerable variation in the clinical grading of severity of appendicitis and larger studies group together ‘Complicated appendicitis’ as a risk factor for IAC and other complications. This amorphous group usually contains perforated appendicitis and a combination of free pus, obvious necrosis or gangrenous appendicitis or any other associated complications.6, 7 It is this ‘complicated appendicitis’ that is used to develop guidelines for post-operative antibiotic use.11
The accuracy of grouping these findings together has been challenged: Emil et al.12 found on sub-set analysis, that clinically gangrenous and perforated appendicitis have very different outcomes, with gangrenous appendicitis having a similar complication rate to simple appendicitis. One possible explanation is that a clinical impression of gangrenous appendicitis may reflect the degree of difficulty the surgeon feels at the time of operation. Another explanation is that it is the degree of intra-abdominal contamination alone that should predict the risk of intra-abdominal collection and the post-operative course.6
SAGS score is a simple intra-operative score with a high rate of inter-observer agreement making it more useful than subjective terms such as ‘necrotic’, ‘gangrenous’ or ‘supperative’ appendicitis. This facilitates communication between clinicians and importantly, consistency when reporting new research and guidelines into the management of appendicitis.
Statistical analysis supports the hypothesis that SAGS score is an independent risk factor for development of intra-abdominal collection. The risk increases incrementally with the SAGS score and also predicts length of stay and the risk of all complications.
Interestingly, there were only two patients in the SAGS 3 category. The authors suspect that this reflects the significant association of four quadrant contamination with perforation of the appendix. It may be that in these two patients, there was an unrecognized perforation.
Further study is needed in the form of a randomized controlled trial using SAGS score to guide antibiotic use. Based on the current study, the authors would recommend limiting the antibiotic prescribing to perioperative antibiotics only for SAGS score 1 and 2, and 5 days of antibiotic therapy for SAGS 3 and 4.