Use of Artificial Intelligence as an Innovative Method for Liver Graft Macrosteatosis Assessment
Abstract
The worldwide implementation of a liver graft pool using marginal livers (ie, grafts with a high risk of technical complications and impaired function or with a risk of transmitting infection or malignancy to the recipient) has led to a growing interest in developing methods for accurate evaluation of graft quality. Liver steatosis is associated with a higher risk of primary nonfunction, early graft dysfunction, and poor graft survival rate. The present study aimed to analyze the value of artificial intelligence (AI) in the assessment of liver steatosis during procurement compared with liver biopsy evaluation. A total of 117 consecutive liver grafts from brain-dead donors were included and classified into 2 cohorts: ≥30 versus <30% hepatic steatosis. AI analysis required the presence of an intraoperative smartphone liver picture as well as a graft biopsy and donor data. First, a new algorithm arising from current visual recognition methods was developed, trained, and validated to obtain automatic liver graft segmentation from smartphone images. Second, a fully automated texture analysis and classification of the liver graft was performed by machine-learning algorithms. Automatic liver graft segmentation from smartphone images achieved an accuracy (Acc) of 98%, whereas the analysis of the liver graft features (cropped picture and donor data) showed an Acc of 89% in graft classification (≥30 versus <30%). This study demonstrates that AI has the potential to assess steatosis in a handy and noninvasive way to reliably identify potential nontransplantable liver grafts and to avoid improper graft utilization.
Abbreviations
-
- Acc
-
- accuracy
-
- AI
-
- artificial intelligence
-
- DBD
-
- donation after brain death
-
- FCNN
-
- fully convolutional neural network
-
- GGT
-
- gamma-glutamyltransferase
-
- HS
-
- hepatic steatosis
-
- IQR
-
- interquartile range
-
- LT
-
- liver transplantation
-
- ML
-
- machine-learning
-
- Prec
-
- precision
-
- Rec
-
- recall
-
- RGB
-
- red green blue
-
- SVM-SIL
-
- support vector machines–single instance learning
In liver transplantation (LT), the percentage of hepatic steatosis (HS) in the liver graft is associated with increased risk of graft dysfunction or early nonfunction,(1, 2) and fast and accurate assessment of HS is of paramount importance in the setting of organ procurement for LT.
Although a wide range of methods for the assessment of HS are currently available, pathological examination remains the gold standard for both HS diagnosis and HS grading, and it is used as the reference method for any evaluation of any new HS measurement technique.(3) However, routine use of pathological examination is limited by invasiveness or the need of additional time-consuming procedures and instrumentation, which are often unavailable in remote graft procurement hospitals.(4) Therefore, the decision to accept or discard a liver graft still relies on indirect parameters, such as clinical history, blood tests, liver/spleen attenuation ratio on imaging (when available), and, more importantly, on the harvesting surgeon’s personal evaluation of the liver texture and color. Although this latter parameter achieves an accuracy (Acc) of 86%(5) for experienced surgeons, it remains a qualitative and subjective evaluation and a source of major biases in case of comparison of repeated measures.
Several recent reports focused on tissue analysis using artificial intelligence (AI), including machine-learning (ML) approaches and fully convolutional neural network (FCNN) concepts, from intraoperative optical images. These types of computerized analyses may help to obtain standardized basic pathological analysis without interobserver variability, classification biases, or technical constraints.(6-9) On the basis of the recent promising strategy reported by our team on HS graft analysis with AI ML technology, the present study aims at automating liver graft segmentation from smartphone images and validating the robustness of this approach to assess HS on a larger cohort of liver grafts using different cutoff values.
Patients and Methods
Participants
This prospective study involved 2 academic, high-volume LT centers and was approved by the institutional review board of AP-HP. From January to June 2018, any liver graft from a donation after brain death (DBD) or donation after circulatory death donor that was proposed for LT to 1 of the 2 participating centers, regardless the setting of the organ procurement, was eligible for inclusion. The decision to accept or decline the proposal and to subsequently carry on for graft procurement relied on the on-call transplant surgeon.
Inclusion in the study required the presence of an intraoperative picture (taken with a smartphone from one of the harvesting team members); results of a graft biopsy; and clinical, biochemical, and radiological donor data.
Data Collection
Important data for decision making in graft acceptance and for LT outcome were recorded and used for ML analysis to improve the performance of the algorithm. Donor variables were selected based on the factors used by the surgeon to reach a decision,(10) including age, weight, and height. The following biochemical variables, recorded at the time of referral, were included: gamma-glutamyltransferase, alanine aminotransferase, aspartate aminotransferase, and total bilirubin. Lastly, the liver/spleen attenuation ratio from a CT scan was also recorded.
Test Methods
The test method was HS tissue classification developed with an AI ML algorithm based on images taken during graft harvesting with a smartphone.
AI Algorithms
For HS classification, ML algorithms were used. ML is an AI method commonly used to build algorithms that learn from data and make predictions based on these algorithms. In this study, we used the support vector machines–single instance learning (SVM-SIL) system. SVM-SIL are semisupervised ML algorithms that use both labeled and unlabeled data to build a predictive algorithm for classification analyses. More specifically, the SVM-SIL system, using the training algorithm, builds a model from a set of training examples (ie, with or without significant HS) that will assign new examples to one category or the other, making it a nonprobabilistic binary linear classifier. This process, performed as many times as the number of included cases, allows assessing the sensitivity, specificity, and Acc of the classification algorithm model. This method is currently the most advanced existing system in ML to avoid selection and classification biases. For liver segmentation from intraoperative photographs, we used FCNN, which is one of the deep-learning strategies that involves some convolutional filters that can learn hierarchical features from data. The role of the filters consists in extracting some characteristics from the input images and collects them in maps that include these features. The number of filters for each layer is chosen according to the time necessary for training the network and the complexity of the problem. In general, a higher number of filters will give better results. The rule is applied only up to a certain threshold because an increase in the number of filters beyond this threshold no longer affects the performance.
Smartphone Images
During the procurement, at least 1 digital image of the liver graft was taken by the personal smartphone of a member of the harvesting surgical team, using a previously published standardized protocol to obtain the best possible image quality and reproducibility.(11, 12) The same smartphone (iPhone 6S; Apple Inc.,TM Cupertino, CA) was used. The camera was automatically white-balanced and used in the Macro mode. On these smartphones, exposure and focus area selection modes were nonadjustable.
Pathological Analysis
The existence and quantification of HS was estimated on frozen sections from liver graft biopsies, and these sections were considered the reference control test. Surgical biopsies were performed during liver procurement, in the case of suspected severe liver steatosis, or after reperfusion, as routinely performed in both institutions. Presence and proportion of HS was assessed by an experienced pathologist (N.P.), supervised by a referenced pathologist (V.P.), and expressed in percentage of hepatocytes with macrosteatosis steatosis (0%-100%).
Hepatic Steatosis
Test Positivity Cutoffs
An HS rate >60% is associated with a high risk of primary graft failure(13) and is widely accepted as a cutoff value to discard liver graft discard. However, early allograft dysfunction was reported even in grafts with HS >30% generating different acceptance policies. For this reason and because 30% is the cutoff value used at our centers, the ML method for HS assessment was predetermined for 2 categories of grafts: HS <30% and HS >30%.
AI Cutoff
In our previous study(14) a set of images was created by manually cropping the original photographs so the target organ (liver) would occupy 100% of the frame with elimination of the background. Contrary to this method, in this study, a FCNN was used that was composed of several layers as shown in Fig. 1, inspired by U-net(15) and Resnet.(16) In our case, the model of the network consisted of convolutional (descending) and up-convolutional (ascending) paths. The combination of a convolutional block and 2 identity convolutional blocks is repeated 4 times (in Fig. 1: from stage 1 to stage 4). In each stage, the number of convolutional kernels per layer is doubled. The up-convolutional path is symmetric to the convolutional one. Each stage, repeated 4 times (in Fig. 1: from stage 5 to stage 8), presents an up-convolutional block instead of the convolutional one. The ascending path ends with an up-sampling block of (2, 2) size and a convolutional one, in this case with a (3 × 3) kernel and a sigmoid activation. The network was trained and validated with red green blue (RGB) images and with grayscale images that were obtained by converting the original ones. The results from the automatic segmentation are compared with the ones from the manual segmentation, which is considered the gold standard. To evaluate the segmentation performance of FCNN, we calculated the Acc, recall (Rec), and precision (Prec) on MATLAB software (MathWorks, Inc. Massachusetts 01760 USA).

Once liver graft masks were obtained by FCNN, ML was used to classify features. Because the liver texture is heterogeneous, the texture analysis was performed on liver image patches. Each image was divided into 15 nonoverlapping patches of 100 × 100 pixels. Each patch was classified in transplantable (HS <30%) or nontransplantable (HS >30%) categories, and the features of each patch according to its category were extracted by rotation-invariant local binary patterns, which are resistant to light and camera pose variations and accurately render the liver tissue (Fig. 2). To perform the classification of patches according to donor data, multiple-instance learning on SVM-SIL was used, which has the strong advantage of allowing the fusion of patch-wise information (such as textural features) with image-wise information (such as donor data features). The feature classification was implemented with scikit-learn (http://scikit-learn.org). To perform robust performance evaluation, we performed leave-one-patient-out cross-validation (ie, the images from all patients-1 were used for training and the remaining one for testing). The duration of the test classification analysis was 10−3 seconds. Donor feature (photographs and data) classification was analyzed in terms of Acc, Rec, and Prec on MATLAB software.

Pathology
According to the most commonly used scoring system, HS was categorized as follows: normal (grade 0), when the proportion of HS-affected cells ranged from 0% to 5%; mild (grade 1), between 5% and 33%; moderate (grade 2), between 34% and 66%; and severe (grade 3), when the proportion of affected cells was >67%.(17)
Blindness
Clinical, biological, and radiological information were available to the surgeons performing the organ procurement, to the pathologist performing the graft biopsy analyses, and to the team in charge of the AI assessment. The results of HS based on the pathology report were available to AI assessors, but pathologists were blinded to AI HS assessment results.
Statistical Analysis
Quantitative continuous variables were expressed in median and interquartile range (IQR; 25th to 75th percentile) for discrete variables, as appropriate. AI liver graft segmentation from intraoperative photograph and classification of donor features were analyzed in term of Acc (true positive + true negative/whole sample); Rec, or sensitivity (true positive/true positive + false negative); and Prec, or specificity (true positive/true positive + false positive), compared with the manual photograph cropping and to the index test (liver biopsy).
No sample size calculation was needed. For HS classification, a balanced data set with 1:1 ratio groups (with the same numbers of liver grafts for each groups) is required for an appropriate analysis by the SVM-SIL system. Therefore, any liver graft with proven HS >30% during the study period was included in the HS >30% group. A control group was settled by randomly including liver grafts with HS <30% procured with the same technique during the same study period, on a 1:1 ratio. No other baseline variables were considered for matching.
The design of this study was based on the Essential Items for Reporting Diagnostic Accuracy Studies guidelines.(18)
Results
From January to June 2018, 117 consecutive liver grafts from deceased donors were photographed and biopsied with the intention to be transplanted. Of these, 28 had HS >30% and were included in the analysis without being transplanted. A total of 28 liver grafts with HS <30% were procured during the same study period, and they were randomly selected on a 1:1 ratio and included in the control group. This created a balanced data set, which is required for an appropriate SVM-SIL analysis. Thus, the final inclusion cohort for HS classification comprised 56 liver grafts.
A liver biopsy was performed without any complications in all 117 patients. A total of 40 RGB liver graft images (size 3264 × 2448 pixels, 8 megapixels) were taken during each graft procurement and were analyzed. For liver segmentation analysis (liver graft image extraction), all 117 intraoperative pictures were used: 50 for the training data set and 67 for the testing data set. The Prec medians for grayscale and RGB images were equal to 95% and 97%, respectively. The Acc and Rec medians for grayscale and RGB images were 92%, 89% and 98%, 97%, respectively (Fig. 3).

Texture analysis by SVM-SIL was performed on liver image patches, and a balanced data set of 600 patches was obtained. A flow diagram of participants is summarized in Fig. 4, and baseline characteristics are summarized in Table 1. After analysis of the features of each data set, the SVM-SIL showed good results in graft classification according to their acceptance or discard for LT, with a Rec of 93% and a Prec of 82% for transplanted grafts and with a Rec of 97% and Prec of 83% for discarded grafts.

Preoperative Characteristics | Transplanted Grafts (n = 28) | Not Transplanted Grafts (n = 28) |
---|---|---|
Age, years | 52 (18-88) | 62 (17-86) |
Sex, n | ||
Male | 16 | 17 |
Female | 12 | 11 |
BMI, kg/m2 | 23.6 (17-31) | 30 (16-40) |
Height, m | 170 (150-190) | 170 (150-190) |
Weight, kg | 69 (52-90) | 90 (44-178) |
ICU, days | 3 (1-10) | 3 (1-11) |
AST, IU/L | 57 (17-177) | 193 (24-2000) |
ALT, IU/L | 37 (11-136) | 196 (14-2000) |
GGT, IU/L | 69 (15-318) | 95 (16-569) |
Bilirubin, µmol/L | 11 (3-23) | 17 (3.6-57) |
Lactate, mmol/L | 2 (0.5-6.2) | 3.4 (0.8-14) |
Liver/spleen density, Hounsfield units | 16 (2-70) | 17 (5-79) |
Macrosteatosis (%, mean and min-max) | 15 (5-30) | 40 (30-90) |
Microsteatosis (% mean and min-max) | 2 (0-5) | 50 (20-80) |
Fibrosis (stage) | 0 | 0 |
NOTE:
- Data are given as median (IQR) unless otherwise noted.
The classification of donor data from the liver picture according to the proportion of steatosis (superior or inferior to 30% of HS) showed an Acc of 89%.
Discussion
This is the first multicenter prospective study to assess the performance of AI technologies to evaluate HS through smartphone pictures and donor data with comparison to liver biopsy. The AI approach reported a sensitivity (Rec) of 97% and 93% for the classification of nonsteatotic and steatotic grafts based on a cutoff value of 30% with an Acc of 89%. These results validate our primary monocentric study where an analysis of 40 liver grafts (2 classes of 20 grafts) showed a sensitivity (Rec) of 80% and 95% for the classification of nonsteatotic, mild, and moderate steatotic versus severe steatotic grafts (cutoff of HS = 60%) with an Acc of 88%.(14) Furthermore, the use of FCNN for liver graft image extraction from the donor picture, first reported in literature, leads to a fully automated HS assessment method. Testing this AI approach with the most frequently used HS cutoff (30% and 60%) for liver graft acceptance policy supports that this method represents a promising step toward a clinically relevant processing system for automatic, noninvasive, and objective HS assessment in the setting of LT.
The main strengths of this study are the noninvasiveness, rapidity, and utilization of a device that is available worldwide (ie, smartphone) for the analysis as compared with the standard reference examination, the liver biopsy. Liver photographs and donor data are processed and classified within seconds with the computer-assisted semisupervised system. No additional procedure or equipment is needed during graft harvesting, and no invasive liver sampling is necessary. On the contrary, a liver biopsy requires a surgical graft sampling, which may cause complications such as bleeding and involves an examination from a pathologist, which is time consuming and frequently unavailable 24/7 in remote places. In contrast, this AI-based steatosis assessment technique is a real-time, noninvasive method to assess HS.
The pictures were acquired with an easy-to-use and widely available smartphone camera. Although smartphones have a significant cost, they are prevalent in the general population (100% of our surgical team). Moreover, the smartphone camera used in this study (Apple iPhone) has a medium performance when compared with other smartphones,(19) so it would be reasonable to assume that newer smartphones could also be used for graft picture acquisition. The second advantage of this method is the standardization of HS evaluation. Even though a liver biopsy is considered as the standard examination for HS assessment, this technique is subject to interobserver variability or sampling size limitation, as underlined by previous publications.(3) Furthermore, when pathological examination is not available, the decision to accept or discard a liver graft is usually based on donor data, liver texture, macroscopic evaluation, and subjective clinical experience of the harvesting surgeon, making the graft evaluation highly subjective to bias. By using an AI-based classification, interobserver variability and classification bias are dramatically limited. Indeed, the use of AI allows converting a subjective decision-making process into a standardized, computerized and homogeneous process.
In this experimental study, the highest classification performance was obtained using texture features combined with significant donor data. Indeed, the inclusion of donor features in the algorithm helped increase the Acc of the classification by SVM-SIL, as previously demonstrated.(14) The real innovation of this study, through AI computerization of a human process, is 3-fold: first, to translate the subjective visual assessment of liver texture into an objective and standardized method (smartphone picture); second, to develop a fully automated HS assessment by the liver graft image extraction; and third, to replicate the clinical experience of a transplant surgeon (donor data analysis). A major limitation of this study is the small sample size, which is a common problem within the computer-assisted diagnosis community,(20) particularly in the LT setting wherein a larger cohort of discarded grafts (without any intention to be transplanted) are available only for machine perfusion studies.(21) On the contrary, our sample of discarded grafts was accepted with the intention to be transplanted but then discarded for HS >30% (measured by intraoperative biopsy). A second limit is the impossibility of SVM-SIL analysis to differentiate macrosteatosis and microsteatosis by liver picture and biology when only macrosteatosis is predictive of liver graft dysfunction. Therefore, the present results will require confirmation in larger multicentric studies probably in the machine perfusion study setting. Enlarging the training data set would also allow investigating more advanced machine learning methods. Nevertheless, it has been reported that SVM-SIL achieves competitive results as compared with other more sophisticated semisupervised methods.(7) The third limit is that the HS assessment method described here, as all methods reported in literature,(5) could not be a substitute to liver biopsy, as no methods are set to predict other liver conditions (such as balloon degeneration, centrilobular necrosis, or chronic hepatitis), which (more rarely) can also adversely affect graft outcome. Lastly, the use of 30% steatosis as a cutoff value could not reflect the worldwide liver graft acceptance policy and the old smartphone technology (the Iphone 6s was first officially released on September 25, 2015) could limit algorithm performance.
As the LT scientific community reaches for new standards on HS assessment, research to find alternatives to histopathology are highly encouraged. Interesting results are available on magnetic resonance imaging and elastography,(22, 23) but none of them is either practical or performing in the very specific setting of organ procurement.
This research showed that liver texture analysis from pictures and donor data features, analyzed by an AI approach, could represent a promising step toward a helpful processing system to support the surgeon’s decision, particularly the younger surgeons of the future, to accept or discard a liver graft during procurement.
Acknowledgments
We thank Evelyne Monmignot, Anne-Gaëlle Ceres, Anne Buisine, and Smices Surgical 5 Square de la Poste, 34920 Le Crès, France for providing the technical support for the study.