Machine Learning in Melanoma Diagnosis. Limitations About to be Overcome

González-Cruz, C.; Jofre, M.A.; Podlipnik, S.; Combalia, M.; Gareau, D.; Gamboa, M.; Vallone, M.G.; Faride Barragán-Estudillo, Z.; Tamez-Peña, A.L.; Montoya, J.; América Jesús-Silva, M.; Carrera, C.; Malvehy, J.; Puig, S.

doi:10.1016/j.adengl.2019.09.003

Información del artículo

Resumen

Texto completo

Bibliografía

Descargar PDF

Estadísticas

Tablas (1)

Table 1. A. Images Chosen for Analysis by ML. Location and Diagnosis.

Abstract

Background

Automated image classification is a promising branch of machine learning (ML) useful for skin cancer diagnosis, but little has been determined about its limitations for general usability in current clinical practice.

Objective

To determine limitations in the selection of skin cancer images for ML analysis, particularly in melanoma.

Methods

Retrospective cohort study design, including 2,849 consecutive high-quality dermoscopy images of skin tumors from 2010 to 2014, for evaluation by a ML system. Each dermoscopy image was assorted according to its eligibility for ML analysis.

Results

Of the 2,849 images chosen from our database, 968 (34%) met the inclusion criteria for analysis by the ML system. Only 64.7% of nevi and 36.6% of melanoma met the inclusion criteria. Of the 528 melanomas, 335 (63.4%) were excluded. An absence of normal surrounding skin (40.5% of all melanomas from our database) and absence of pigmentation (14.2%) were the most common reasons for exclusion from ML analysis.

Discussion

Only 36.6% of our melanomas were admissible for analysis by state-of-the-art ML systems. We conclude that future ML systems should be trained on larger datasets which include relevant non-ideal images from lesions evaluated in real clinical practice. Fortunately, many of these limitations are being overcome by the scientific community as recent works show.

Keywords:

Melanoma

Skin cancer

Dermoscopy

Image classification

Machine learning

Artificial intelligence

Convolutional neural networks

Resumen

Antecedentes

La clasificación automática de imágenes es una rama prometedora del aprendizaje automático (de sus siglas en inglés Machine Learning [ML]), y es una herramienta útil en el diagnóstico de cáncer de piel. Sin embargo, poco se ha estudiado acerca de las limitaciones de su uso en la práctica clínica diaria.

Objetivo

Determinar las limitaciones que existen en cuanto a la selección de imágenes usadas para el análisis por ML de las neoplasias cutáneas, en particular del melanoma.

Métodos

Se diseñó un estudio de cohorte retrospectivo, donde se incluyeron de forma consecutiva 2.849 imágenes dermatoscópicas de alta calidad de tumores cutáneos para su valoración por un sistema de ML, recogidas entre los años 2010 y 2014. Cada imagen dermatoscópica fue clasificada según las características de elegibilidad para el análisis por ML.

Resultados

De las 2.849 imágenes elegidas a partir de nuestra base de datos, 968 (34%) cumplieron los criterios de inclusión. De los 528 melanomas, 335 (63,4%) fueron excluidos. La ausencia de piel normal circundante (40,5% de todos los melanomas de nuestra base de datos) y la ausencia de pigmentación (14,2%) fueron las causas más frecuentes de exclusión para el análisis por ML.

Discusión

Solo el 36,6% de nuestros melanomas se consideraron aceptables para el análisis por sistemas de ML de última generación. Concluimos que los futuros sistemas de ML deberán ser entrenados a partir de bases de datos más grandes que incluyan imágenes representativas de la práctica clínica habitual. Afortunadamente, muchas de estas limitaciones están siendo superadas gracias a los avances realizados recientemente por la comunidad científica, como se ha demostrado en trabajos recientes.

Palabras clave:

Melanoma

Cáncer de piel

Dermatoscopia

Clasificación de imágenes

Aprendizaje automático

Inteligencia artificial

Redes neuronales convolucionales

Texto completo

Introduction

Automated image classification by pattern recognition is a branch of machine learning (ML) which offers the dermatologist a useful tool for assessment in the diagnosis of skin cancer.1 Deep convolutional neural networks (DCNN) have dramatically improved accuracy in feature learning and object classification2 and have been successfully used in the classification of dermoscopic images of skin lesions.3 However, the selection of images may include certain special features which prevent its universal use at the present time. In this study we assessed some exclusion criteria in the selection of skin cancer images (with an emphasis on melanoma) for ML analysis, according to recent works in this field.1,4,5

Materials and Methods

This study was conducted in a tertiary academic skin cancer center in Barcelona, Spain. A retrospective cohort study was designed including 2,849 consecutive high-quality dermoscopy images of skin tumors from the Melanoma Unit database from 2010 to 2014. The DermLite® photo digital epiluminescence microscopy system 3Gen with 37mm thread size and a Canon camera, model G16 were used. Pathological diagnosis was available for 2,429 images. Finally, the images were assorted according to their theoretical eligibility for ML analysis, pursuant to some potential exclusion criteria1,4,5: difficulty in lesion border detection (absence of pigmentation, absence of normal surrounding skin, presence of hair, location on volar skin), metastasis or an ulcerated lesion.

This study has been approved by the institutional review board. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Results

Out of the 2,849 images from our database, 968 (34%) were selectable as they did not have any potential exclusion criteria for analysis by a ML system. Nevi, melanoma and basal cell carcinoma were the most frequent lesions in our database. Only 64.7% of nevi and 36.6% of melanoma did not have any potential exclusion criteria (Table 1).

Of 528 melanomas, 335 (63.4%) could potentially be excluded. An absence of normal surrounding skin (40.5% of all melanomas) and absence of pigmentation (14.2%) were the most common reasons for exclusion from ML analysis. Other reasons for exclusion are shown in Table 1.

Table 1.

A. Images Chosen for Analysis by ML. Location and Diagnosis.

	Had Any Potential Exclusion Criteria (% From Total by Location or Diagnosis)		Had Not Any Potential Exclusion Criteria (% From Total by Location or Diagnosis)		Total
Location
Head and neck	633	(76.8%)	191	(23.2%)	824
Upper limbs	159	(62.1%)	97	(37.9%)	256
Lower limbs	297	(60.4%)	195	(39.6%)	492
Volar skin	62	(100%)	0	(0%)	62
Trunk	538	(53.1%)	475	(46.9%)	1013
Mucosa	15	(83.3%)	3	(16.7%)	18
Other	149	(81%)	35	(19%)	184

Diagnosis
Basal cell carcinoma	295	(69.6%)	129	(30.4%)	424
Squamous cell carcinoma	59	(89.4%)	7	(10.6%)	66
Scar	21	(77.8%)	6	(22.2%)	27
Dermatofibroma	17	(77.3%)	5	(22.7%)	22
Lentigo	26	(66.7%)	13	(33.3%)	39

Melanoma	335	(63.4%)	193	(36.6%)	528
Cutaneous metastasis	9	(100%)	0	0	9
Nevus	256	(35.3%)	470	(64.7%)	726
Actinic Keratosis	137	(78.3%)	38	(21.7%)	175
Seborrheic Keratosis	95	(67.9%)	45	(32.1%)	140
Other	225	(82.4%)	48	(17.6%)	273
Pathological diagnosis NA	–	–	–	–	420

B. Reasons for Exclusion
Melanoma	Number of Excluded (% From Total Melanoma)
Reasons for exclusion
Absence of pigmentation	75 (14.2%)
Absence of normal surrounding skin	214 (40.5%)
Presence of hair	28 (5.3%)
Metastasis	29 (5.5%)
Location on volar skin	23 (4.4%)
Ulcerated lesion	19 (3.6%)

Discussion

Melanoma accounts for the majority of skin cancer deaths. Early diagnosis and treatment significantly improves its prognosis. The development of an effective screening method is needed and automated image classification by pattern recognition may achieve diagnostic accuracy similar to expert dermatologist.6 However, some limitations have to be overcome. One of these is the exclusion criteria in the selection of skin cancer images. While solely high-quality dermoscopy images were selected from our database, only 34% did not have any potential exclusion criteria for classification by most state-of-the-art ML algorithms. Moreover, 63.4% of our melanomas had at least one of the potential exclusion criteria mentioned above. This considerably decreases diagnostic accuracy and utility of some ML systems. Large lesions are a serious problem for ML algorithms, as they do not fit within the diameter of the majority of dermoscopy lenses, and this renders all the state-of-the-art systems which need to pre-compute lesion segmentation.1 Even if some works have proposed hair detection/removal methods,5 most ML systems’ performance is deteriorated by its presence. Since most dermoscopy datasets for algorithm training don’t include volar skin lesions, the systems trained on these won’t be able to correctly classify acral lesions. Nevertheless, the artificial intelligence community is rapidly moving to overcome these nuances. Yu et al.7 published recently a work where DCNN was used for acral melanoma and nevus classification. In this work we consider the limitations of most but not all ML systems.

Our study shows that the main potential exclusion criteria were the absence of normal surrounding skin and the absence of pigmentation. Many melanomas developed in sun-damaged skin with abnormal surrounding skin, which makes them unsuitable for analysis by most of the current ML systems due to difficulties in lesion border detection.5 Moreover, amelanotic melanoma which accounts for 2%–8% of all melanomas8 cannot yet be diagnosed by most current ML systems. This could be addressed by designing ML systems which are able to work with images which do not contain the entire lesion and increasing the dataset size, selecting a higher number of representative dermoscopy images.

In conclusion, we consider that ML systems, especially those based in the new developments in the deep learning field will not only convert ML into a valuable tool for the dermatologist but also for the general population. However, these systems are able to overcome some limitations to enlarge spectrum of measurable images. It is clear though that researchers are moving forward towards this direction, since some of the exclusion criteria mentioned in this work have already been overcome by recent algorithms included in the ISIC International Symposium.3

Funding/Support

The study in the Melanoma Unit, Hospital Clínic, Barcelona was supported in part by grants from Fondo de Investigaciones Sanitarias P.I. 12/00840, PI15/00956 and PI15/00716 Spain; by the CIBER de Enfermedades Raras of the Instituto de Salud Carlos III, Spain, co-funded by “Fondo Europeo de Desarrollo Regional (FEDER). Unión Europea. Una manera de hacer Europa”; by the AGAUR 2014_SGR_603 and 2017_SGR_1134 of the Catalan Government, Spain; by a grant from “Fundació La Marató de TV3, 201331-30”, Catalonia, Spain; by the European Commission under the 6th Framework Programme, Contract n°: LSHC-CT-2006-018702 (GenoMEL); by CERCA Programme/Generalitat de Catalunya and by a Research Grant from “Fundación Científica de la Asociación Española Contra el Cáncer” GCB15152978SOEN, Spain. Part of the work was developed at the building Centro Esther Koplowitz, Barcelona.

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgements

Thanks to our patients and their families who are the main reason for our studies; to nurses from the Melanoma Unit of Hospital Clínic of Barcelona, Daniel Gabriel, Pablo Iglesias and Maria E Moliner for helping to collect patient data and to Paul Hetherington for helping with English editing and correction of the manuscript.

References

[1]

D.S. Gareau, J. Correa da Rosa, S. Yagerman, J.A. Carucci, N. Gulati, F. Hueto, et al.

Digital imaging biomarkers feed machine learning for melanoma screening.

Exp Dermatol, 26 (2017), pp. 615-618

http://dx.doi.org/10.1111/exd.13250 | Medline

[2]

Y. Fujisawa, Y. Otomo, Y. Ogata, Y. Nakamura, R. Fujita, Y. Ishitsuka, et al.

Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis.

Br J Dermatol, (2018.),

http://dx.doi.org/10.1111/bjd.16924

[3]

M.A. Marchetti, N.C.F. Codella, S.W. Dusza, D.A. Gutman, B. Helba, A. Kalloo, et al.

Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images.

J Am Acad Dermatol, 78 (2018), pp. 270-277

http://dx.doi.org/10.1016/j.jaad.2017.08.016 | Medline

[4]

P. Tschandl, C. Rosendahl, H. Kittler.

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions.

Sci Data, 5 (2018), pp. 180161

http://dx.doi.org/10.1038/sdata.2018.161

[5]

M. Celebi, Q. Wen, H. Iyatomi, K. Shimizu, H. Zhou, G. Schaefer.

A state-of-the-art survey on lesion border detection in dermoscopy images.

Dermoscopy image analysis,

[6]

A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, et al.

Dermatologist-level classification of skin cancer with deep neural networks.

Nature, 542 (2017), pp. 115-118

http://dx.doi.org/10.1038/nature21056 | Medline

[7]

C. Yu, S. Yang, W. Kim, J. Jung, K.Y. Chung, S.W. Lee, et al.

Acral melanoma detection using a convolutional neural network for dermoscopy images.

PLoS One, 13 (2018), pp. e0193321

http://dx.doi.org/10.1371/journal.pone.0193321 | Medline

[8]

M.A. Pizzichetta, H. Kittler, I. Stanganelli, G. Ghigliotti, M.T. Corradin, P. Rubegni, et al.

Dermoscopic diagnosis of amelanotic/hypomelanotic melanoma.

Br J Dermatol, 177 (2017), pp. 538-540

http://dx.doi.org/10.1111/bjd.15093 | Medline

☆

Please cite this article as: González-Cruz C, Jofre MA, Podlipnik S, Combalia M, Gareau D, Gamboa M, et al. Uso del aprendizaje automático en el diagnóstico del melanoma. Limitaciones por superar. Actas Dermosifiliogr. 2020;111:313–316.

Indexada en:

Síguenos:

Indexada en:

Síguenos:

Suscríbase a la newsletter