Volume 35, Issue 1 pp. 28-35
Review
Free Access

Artificial Intelligence in Musculoskeletal Imaging: A Paradigm Shift

Joseph E Burns

Joseph E Burns

Department of Radiological Sciences, University of California-Irvine School of Medicine, Orange, CA, USA

Search for more papers by this author
Jianhua Yao

Jianhua Yao

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences Department, Clinical Center, National Institutes of Health, Bethesda, MD, USA

Search for more papers by this author
Ronald M Summers

Corresponding Author

Ronald M Summers

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences Department, Clinical Center, National Institutes of Health, Bethesda, MD, USA

Address Correspondence to: Ronald M Summers, MD, PhD, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Building 10, Room 1C224D MSC 1182, Bethesda, MD 20892-1182, USA. E-mail: [email protected]Search for more papers by this author
First published: 09 August 2019
Citations: 33
This work was supported in part by the Intramural Research Program of the National Institutes of Health, Clinical Center. The opinions expressed herein are those of the authors and do not necessarily represent those of the DHHS, NIH, or UCI.

ABSTRACT

Artificial intelligence is upending many of our assumptions about the ability of computers to detect and diagnose diseases on medical images. Deep learning, a recent innovation in artificial intelligence, has shown the ability to interpret medical images with sensitivities and specificities at or near that of skilled clinicians for some applications. In this review, we summarize the history of artificial intelligence, present some recent research advances, and speculate about the potential revolutionary clinical impact of the latest computer techniques for bone and muscle imaging. © 2019 American Society for Bone and Mineral Research. Published 2019. This article is a U.S. Government work and is in the public domain in the USA.

Introduction

Artificial intelligence (AI), a subfield of computer science, can be broadly defined as the use of a computer to replicate human intelligence in performing tasks. Nested within this umbrella term of “artificial intelligence” is the field of machine learning. Important goals of machine learning are to impart to computers the capacity to learn without explicit programming and to make more accurate predictions from data. Although machine learning algorithms in general do progressively improve with experience at performing tasks, they still may need some human input or guidance to improve performance.

Computer analysis of images has a long history in musculoskeletal radiology. The last 5 years have seen a dramatic improvement in such analyses. In particular, recent developments in artificial intelligence have caused a paradigm shift in the ability of computers to detect, characterize, and measure complex pathologies in the musculoskeletal system.

These advances have arisen in large measure because of improvements in the field of computer vision, a branch of computer science. The computer scientists in this field have developed a suite of technologies loosely called deep learning that are able to recognize objects in natural world images like the photographs we take while on vacation.1 It turns out that these same technologies are highly effective at detecting abnormalities on medical images, including radiology, pathology, and ophthalmologic images.2-5

For this review, we will focus on musculoskeletal radiology images. We performed a search of PubMed for papers with the following key word set: (machine learning OR deep learning OR convolutional OR neural network) AND (musculoskeletal OR skeletal OR bone OR skeleton OR muscle) AND (X-ray OR xray OR radiography OR computed tomography OR CT OR magnetic resonance OR MRI). This search yielded more than 800 results as of this writing, the majority (71%, 581/821) from 2013 and later. Clearly this field is growing rapidly. In this review, we focus on the authors' experience with the application of artificial intelligence in musculoskeletal imaging and how we believe this represents a paradigm shift in this field.

Background

Although deep learning has only been applied to radiology over the past few years, radiology researchers have been developing computer tools for automated detection and characterization of musculoskeletal disease on radiologic images for decades. Three generations of development are loosely defined centered about the parallel evolution of medical imaging technology and computer hardware and software, and availability of very large labeled data sets.

First-generation development was inhibited by primary image acquisition on analog photographic film, with ad hoc digital conversion, and the markedly slower speeds of available computers. Some early examples in musculoskeletal radiology include classifying bone tumors on radiographs (Lodwick and colleagues6), segmenting the femur and tibia on AP knee radiographs (Ausherman and colleagues7), and assessing bone mineral density on hand and wrist radiographs (Geraets and colleagues8).

Second-generation development is characterized by increasingly prevalent primary digital image acquisition, more capable computers, and new algorithms. This generation continued until the early to mid-2010s. Some examples include texture-based morphometry of osteoporosis on CT (Mundinger and colleagues9), estimation of skeletal age on radiographs (Pietka and colleagues10; Frisch and colleagues11), assessment of bone trabeculae on MRI and CT (Link and colleagues12), measurement of bone mineral density on CT (Lang and colleagues13; Summers and colleagues14) (Fig. 1), and detection of lytic and sclerotic metastases on CT (O'Connor and colleagues15; Burns and colleagues16) (Fig. 2 and Fig. 3). More examples can be found in a recent review.17

Details are in the caption following the image
Fully automated phantomless bone mineral densitometry for osteoporosis assessment on contrast-enhanced CT of a 65-year-old woman using second-generation AI software. The green oval indicates the region of interest (ROI) in the L1 vertebral body automatically located and measured by the software. The ROI is placed in the anterior vertebral body so as to avoid the basivertebral vein. The green and red boxes indicate the automatically located vertebral body/spinous process and entire vertebra, respectively.
Details are in the caption following the image
Lytic spine metastasis detection using second-generation AI software. There are two lytic lesions, 3.4 cm (5.1 cm2) and 1.2 cm (1.1 cm2), presumably metastatic, in L1 vertebral body of a 47-year-old man with melanoma. (A) Original axial bone window image showing lesions (arrows). (B) Color-coded image showing lesion detection and classification. (B): Green = computer-aided detection of lytic lesions; red = spine detection; blue = spinal canal detection; brown = Lytic lesion candidate rejected by filter or classifier. Reproduced from O'Connor and colleagues.15
Details are in the caption following the image
Sclerotic spine metastasis detection using second generation AI software. (A) Sclerotic vertebral body tumors (focal regions of high bone density) denoted by blue arrows. (B) Watershed method for segmentation of regions of similar bone and soft tissue density. (C) CAD system detected high-density lesions (green). (D) Reference standard segmentation (blue). (E) CAD system 3D detections (green) with ground truth lesions superimposed.

In these first two generations, researchers attempted to replicate human perception and cognition in the interpretation of medical images. They did so by handcrafting computer models to mimic human perception of the features of the anatomy or pathology of interest. While neural networks were available at this time, use was inhibited because of CPU (rather than graphics processing unit or GPU) based computation and inability to train deep networks. Consequently, other analytical techniques such as support vector machines (SVMs) rose in prominence. Unfortunately, those other techniques were insufficient to achieve clinician-level performance.

Recent advances have led to the third generation of development. These advances include faster and cheaper GPUs, the ability to train deep (as opposed to shallow) neural networks with many layers, and the increasing availability of very large labeled data sets. These deep learning systems can learn features from medical images without being specifically designed to assess these features (ie, handcrafted features are no longer necessary). The third-generation techniques are more accurate and rapidly developed (from years to months) than prior generations.

Imaging Modality Examples

Artificial intelligence techniques have been applied to the various modalities of musculoskeletal imaging, including DXA, radiography, CT, and MRI. In this section, we will review a number of such applications.

DXA

On DXA, applications include detection of osteoporotic vertebral compression fractures and femoral segmentation for assessment of bone density.18, 19 Using semi-automated vertebral segmentation and appearance models, Roberts et al. attained 88% sensitivity for compression fracture detection at a 5% false-positive rate.

Radiography

Kim and MacKinnon20 and Lindsey and colleagues21 used deep convolutional neural networks to assess wrist radiographs for fractures. Kim and MacKinnon attained a sensitivity of 0.9 and specificity of 0.88 in wrist fracture detection with transfer learning on a deep CNN model pretrained on non-medical images. Lindsey and colleagues showed that with deep learning, wrist fractures were correctly detected by emergency medicine clinicians with a relative reduction in misinterpretation rate of 47.0%.

Bone age analysis from pediatric hand radiographs has been performed recently using deep learning.22, 23

Computed tomography

Our group and others have focused to a great extent on musculoskeletal pathology of the central body or axial skeleton on CT. Reasons for this focus include the widespread use of body CT and the relatively infrequent use of extremity CT. For AI research, CT also has some benefits over MRI, including fewer artefacts and the direct physical meaning of the pixel intensities. For example, CT Hounsfield units (HU) are calibrated daily to a standard, but MR signal intensity is determined by a host of machine-specific parameters and characteristics.

Examples of this work include automated detection of traumatic and compression fractures, degenerative changes, epidural masses, bone metastases, and bone mineral density of the spine on CT utilizing both second- and third-generation techniques.15, 16, 24-38 The typical strategy for these applications involves vertebral level labeling and segmentation, followed by feature- or learning-based automated measurement or detection.39-42

Magnetic resonance imaging

Because MRI is widely used for assessing spinal degenerative changes in the setting of neck and back pain, significant efforts have been made toward automated delineation of spinal anatomy on MRI images.43-48 Forsberg and colleagues studied detection and labeling of cervical and lumbar vertebrae using third- and second-generation techniques, respectively, with sensitivities of 99.1% to 99.8% for detection and accuracies of 96.0% to 97.0% for labeling.44 Lu and colleagues segmented the lumbar vertebrae and assessed for spinal stenosis using a U-Net deep learning architecture.47 For vertebral detection, the Dice Similarity Coefficient, a measure of segmentation accuracy, was 0.93, and for spinal canal and foraminal stenosis grading, the accuracies were 80.4% and 78.1%, respectively.

Neoplasia detection in bone on MRI has been performed by Jerebko and Wang.49, 50 Automated detection of the fascia lata in the thigh was used to assess for fatty replacement of muscle in patients with muscular dystrophy.51 Knee cartilage defects and meniscal tears have been automatically assessed on MRI.52

In the next sections, we discuss specific applications in more detail.

Fracture detection

There has been considerable interest in developing automated tools to detect a variety of bone fractures. One such system automatically detected acute traumatic fractures of thoracic and lumbar vertebral bodies on CT24, 53, 54 (Fig. 4). The sensitivity for detection of fractures within each vertebra was 81%, with a false-positive rate of 2.7 per patient in the test-set patients. Acute pelvic fractures can also be automatically detected.55, 56

Details are in the caption following the image
Spine acute traumatic fracture detection using second generation AI software. (A) Spine segmentation. There is a fracture at T12 (green arrow). (B) 2D and (C) 3D cortical shell segmentation maps. Cyan = periosteal surface; red: endosteal surface. (D) Unwrapped cortical shell. (E) Detections on unwrapped map. Fracture detections 1 and 2 are focal curvilinear regions of low bone density. (F) Detections on axial slice. (G) 3D display of detections. Adapted from Burns and colleagues.24

Radiologists may overlook spine compression fractures if they do not routinely review sagittal midline images on body CT.57 In response, a system was designed for the automated detection and localization of thoracic and lumbar vertebral body compression fractures on CT. In addition to detection, the system determined Genant classification of the fractures26, 58 (Fig. 5). In a set of CT scans with 210 fractured vertebrae, the sensitivity for detection of vertebrae with compression fractures was 95.7%, with a false-positive rate of 0.29 per patient.26

Details are in the caption following the image
Vertebral body compression fracture detection and characterization in an osteopenic patient using second-generation AI software. Eighty-six-year-old female, compression fractures at T3 and T7. (A, B) the program detects the vertebral endplates, creates a coordinate axis, and establishes distances between endplates along the z axis. (C) Sagittal CT section shows automated spine segmentation and vertebral partitioning. (D, E) Cross section of stacked (D) sagittal and (E) axial vertebral height compasses. Each compass is an axial sector map of the vertebral height loss. White = no height loss. Gray to black = increasing height loss. (E) Height compasses of a grade 2 concave fracture at T3 (top), a grade 3 wedge fracture at T7 (middle), and a normal vertebral body at L2 (bottom). A = anterior; P = posterior; L = left; R = right. Adapted from Burns and colleagues.26

Bone oncology

Automated detection of lytic, sclerotic, and mixed density metastatic bone lesions of the spine and sclerotic lesions of the ribs has been developed (Figs. 2 and 3).15, 16, 59-61 In one such system, the sensitivity (and false-positive rate per patient) was 81% (2.1), 81% (1.3), and 76% (2.1) for sclerotic, lytic, and mixed lesions of the spine, respectively, using SVM classifiers.27 This system is a first step toward the quantitative analysis of metastatic spine disease for determination of tumor burden, assessment of lesion change over time, and inclusion of bone lesions into treatment response criteria such as RECIST. In other work, 185 sclerotic lesions were identified in patient ribs, and a system with 75.4% sensitivity at an average of 5.6 false-positives per case was designed and tested using an SVM classifier.59 Performance improvements will be required for such systems to be used clinically.

Osteoporosis and assessment of bone mineral density

Automated bone mineral density determination on CT may be performed with a densitometry phantom in the field-of-view or with a calibrated scanner14 (Fig. 1). Calibration curves may be constructed from the phantoms in dedicated QCT scans, mapping CT attenuation in HU to bone mineral density in mg/cc on that scanner. This calibration curve may then be used to convert the mean trabecular bone CT attenuation on this now-calibrated scanner to give a BMD estimation without the presence of the phantom. DXA and automated quantitative CT of the lumbar spine have been compared for assessment of bone mineral density on CT with an area under the ROC curve of 0.888.62

Osteoporosis and fragility fracture risk have also been assessed on dental panorex radiographs, hip radiographs, and on MRI.63-65 For example, MRI in combination with FRAX score, BMD, and patient physical characteristics has been used to predict osteoporotic bone fractures.65

Opportunistic screening

Opportunistic screening means the detection of abnormalities unrelated to the primary indication for the scan. Examples of opportunistic screening include bone mineral densitometry, visceral fat analysis, and sarcopenia assessment on CT scans obtained for colorectal cancer screening or other indications14, 62, 66-71 (Fig. 6). Such opportunistic screening does not require additional radiation exposure and provides additional information from images that already exist. It is envisioned that such opportunistic screening may lead to early detection, risk assessment, and favorable treatment outcomes in such conditions as osteoporosis, metabolic syndrome, and sarcopenia.

Details are in the caption following the image
Muscle volumetry for sarcopenia assessment using third-generation AI software. (A) Original axial contrast-enhanced abdominal CT image of a 65-year-old woman at the L3 level. (B) Muscle group segmentation automatically segmented and measured by the software (the skeletal muscle index group) (red).

Other applications

Other AI applications include bone strength determination and osteoarthritis evaluation.72

Purpose-Driven Development: Putting Medicine on a Scientific Quantitative Basis

Although in many cases, the automated detection of a specific pathology is the initial goal, the ultimate goal of much of this research is quantitative characterization of the disease of interest. In this way assessment of disease can be taken from the qualitative or semiquantitative historical basis of 19th- and 20th-century medicine into the realm of other scientific disciplines using the full power of modern computing. Potential advantages of doing so include quantitative analysis of static and dynamic characteristics, reduced diagnostic variability, integration of imaging, pathology, genomic and laboratory data, and discovery of hidden correlations heretofore unknown and uninvestigated, akin to the advances made in data mining genomic data.

Challenges in Development

As alluded to previously, one of the enabling (as well as limiting) factors in AI development for musculoskeletal imaging is the public availability of well-labeled image data sets. Examples of available data sets include those for bone age assessment, knee MRI, bone radiographs, and spine imaging.41, 73-75 Data sets such as these are often used for competitive challenges typically in association with an image processing conference. Examples include SpineWeb and intervertebral disc segmentation (https://ivdm3seg.weebly.com/) at the MICCAI meeting and the Pediatric Bone Age Machine Learning Challenge at the 2017 meeting of the RSNA.41, 75

There are also many possible clinical tasks to be automated, many of which might be used only infrequently.76 This leads to development challenges such as a lack of relevant data sets and difficulties in attaining high performance. Difficulties in labeling the reference standard, such as uncertainties in lesion boundaries or diagnosis, and variabilities in human labeling accuracies, can also lead to reduced performance.

Challenges in Implementation

There have also been challenges in transitioning computer technologies from the lab bench (ideation, design, and testing) to the patient's bedside (direct clinical application). Computer-aided diagnosis has had a mixed track record in patient care, dampening enthusiasm for these technologies in clinical settings. Early CAD systems, which were FDA-approved and integrated into direct clinical patient care, predominantly for mammography, often had high sensitivities but also high false-positive rates, leading to inefficiencies and unnecessary testing in clinical practice.77 The new deep learning-based techniques may reduce the false-positive rates and provide more accurate diagnoses. Another challenge to implementing these technologies in clinical settings has been practice workflow and economic factors. For example, it is not clear how these automated tools would be paid for because there are few mechanisms for reimbursement for such practice enhancements.

Opportunities

Automated image analysis tools can have other applications that are underrecognized. For example, population analyses and epidemiologic assessments can be performed more readily when images can be converted to measurements. Automated analyses can enable triage, for example, by quickly identifying critical exam findings and alerting physicians to their presence. They can permit better integration with other quantitative clinical measures such as lab values and genomic data. They can also improve measurement reproducibility, for example, in the setting of RECIST measurements of solid tumors, which are known to demonstrate large observer variability.78

The Future

The power of the latest machine learning algorithms represents a paradigm shift over the failed promises of earlier generations of CAD. There is huge excitement about the application of machine learning to images. This has led to burgeoning investment in startup companies in this space. Consequently, we can expect to see many commercial products and much physician interest in these technologies. Potential benefits to the patient include more prompt and accurate diagnoses, more coordination with other clinical data, and more accurate disease prognoses. Better understanding of the natural progression of disease, an enlarged spectrum of epidemiologic studies, and diffusion of diagnostic technologies to underserved communities for global health improvements are other potential benefits. Whether these benefits will come to pass depends on many factors in addition to basic research progress, including clinical acceptance and means of reimbursement.

Disclosures

RMS has pending and/or awarded patents for automated image analyses, and receives royalty income from iCAD, Ping An, ScanMed, and Koninklijke Philips. His lab received research support from Ping An Technology Company Ltd. and NVIDIA. JY has pending and/or awarded patents for automated image analyses, and receives royalty income from iCAD and Ping An. He is presently an employee at Tencent Holdings.

Acknowledgments

We thank Drs Nathan Lay and Daniel Elton for software engineering. Tatiana Wiese is thanked for Fig. 3.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.