Original Contribution
Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods

https://doi.org/10.1016/j.ultrasmedbio.2015.11.016Get rights and content

Abstract

This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a “bottom-up” approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions.

Introduction

Breast cancer is the leading cause of cancer-related death in women (Cheng et al. 2010). In 2014, approximately 232,670 new cases of breast cancer were diagnosed and resulted in approximately 40,000 deaths in the United States (Siegel et al. 2014). Screening mammography is widely used and recommended for the early detection of breast cancer. Studies have indicated that the addition of ultrasound can increase the overall cancer detection rate and reduce the number of unnecessary biopsies (Costantini et al., 2006, Hwang et al., 2005). Screening ultrasound is becoming an important addition to routine breast cancer screening because of its superior ability in imaging dense breast tissue and its lack of ionizing radiation.

Despite its many advantages, however, the quality of ultrasound has been relatively low because of the intrinsic speckle noise and low contrast between different tissue types. Digital image processing techniques and machine learning methods have been applied to improve detection rate and increase specificity (Chen et al., 2003, Huang et al., 2006, Segyeong et al., 2004). Advances in the field of medical image processing has improved the ability of computer-aided diagnosis (CAD) to reduce background noise, improve image contrast, detect regions of interest, differentiate a tumor from background and therefore help differentiate benign from worrisome lesions (Drukker et al., 2006, Moon et al., 2013a; Shen et al. 2007). Among all these functionalities of CAD systems, classifying a tumor into benign or worrisome categories is the ultimate objective.

The performance of machine learning methods relies heavily on how well the characteristics of tumors are represented by digital features, which can be separated into two categories: knowledge based and statistic based. Knowledge-based features are derived from the Breast Imaging Reporting and Data System (BI-RADS) lexicon (Mendelson et al. 2013), which is used to characterize lesions based on shape, margin, orientation, echo pattern and acoustic shadowing (Chen et al., 2003, Moon et al., 2013a, Song et al., 2005). The other category of features is obtained from statistical computation, such as auto-covariance coefficients and frequency domain features (Huang and Chen, 2005, Mogatadakala et al., 2006). These features capture the correlation between pixels and do not necessarily correspond to any observable features in ultrasound images.

The BI-RADS lexicon aims to standardize mammography and ultrasound reports so that reports are clear, succinct and consistent among readers. Although all BI-RADS terms are descriptive, not quantitative, they need to be “translated” into computerized features so a CAD system can compute these features automatically. Several groups have proposed approaches to quantify BI-RADS features (André et al., 2007, Mainiero et al., 2005, Moon et al., 2013b), including a comprehensive study by Alam et al. (2011). For example, the most commonly used ultrasound BI-RADS feature is the “parallel” orientation, which corresponds to the “long axis of a lesion paralleling the skin line.” To quantify this feature, an equivalent ellipse of the lesion was identified, and the ratio between the horizontal axis and the vertical axis of the ellipse was computed (Chen et al., 2004, Moon et al., 2013a, Sahiner, 2007). If the ratio is larger than one, the tumor is more likely benign; if the ratio is less than one, it is more likely malignant.

For this study, we performed a complete translation of the entire ultrasound BI-RADS lexicon into digital features, which are used in machine learning methods for the purpose of developing an effective CAD system for breast ultrasound. We have proposed new and validated digital features to distinguish benign from worrisome lesions with the ultimate goal of improving the accuracy of breast cancer diagnosis.

Section snippets

Methods

The database used in this study contains 283 breast ultrasound images. The images were collected subsequently without excluding any data by the Second Affiliated Hospital of Harbin Medical University (Harbin, China), using a VIVID 7 (GE, Horten, Norway) with a 5- to 14-MHz linear probe. The aperture of the transducer is 4 cm. To obtain the original ultrasound images, the techniques harmonics, spatial compounding and speckle reduction were not used. The average size of the images is 500 × 420

Student's t-test for computerized BI-RADS features

The mean value and standard deviation of each computerized BI-RADS feature for the benign and malignant groups are listed in Table 2. According to Student's t-test, seven features differed statistically between the benign and malignant groups, at significance level of 0.01 (p < 0.01): (f1) ADEE, (f2) height/width, (f3) AvgDiff, (f4) NumPeaks, (f5) AvgPeaks, (f6) ADCH and (f8) entropy. The other three features did not significantly differ at level 0.01, including (f7) echogenicity, (f9) shadow

Discussion and Conclusions

Computer-aided diagnosis for breast ultrasound is a field that has been extensively studied. A crucial task for a CAD system is discovering efficient computerized features to distinguish benign and malignant tumors. The fifth edition of the Ultrasound BI-RADS lexicon (Mendelson et al. 2013) was quantified into computerized features and evaluated using Student's t-test. Multiple features were combined to serve as input for machine learning methods, and the bottom-up feature selection procedure

Acknowledgments

Thanks are due to radiologists Dr. Jiawei Tian and Dr. Yanxin Su from the Second Affiliated Hospital of Harbin Medical University (China) for their efforts in collecting the images and labeling the database.

References (23)

  • M. Hall et al.

    The WEKA data mining software: An update

    SIGKDD Explorations

    (2009)
  • Cited by (131)

    View all citing articles on Scopus
    View full text