Since the field of dermatopathology is not an exact science, it is prone to personal subjectivity, which sometimes causes disagreements on the diagnosis and assessment of some histological features. In the case of melanoma, some variables such as regression are associated with low interobserver agreement. On the contrary, other variables such as the measurement of Breslow thickness show high reproducibility.
ObjectiveThe main objective of our study was to investigate multiple features of 60 consecutive cases of melanoma to establish interobserver reproducibility.
Methods and main resultsWe conducted an observational and descriptive study at Hospital de Manises, Valencia, Spain, IVO Foundation, Valencia, Spain, and Hospital 12 de Octubre, Madrid, Spain. The mean level of agreement of all study variables was moderate (Cohen's kappa coefficient statistic=0.5). The highest agreement corresponded to polypoid morphology, pigmentation, ulceration, and solar elastosis. On the other hand, the lowest level agreement was reached for the presence of cellular pleomorphism and tumor necrosis.
ConclusionsOur mean level of agreement was moderate, which reflects that some of the measured characteristics such as cellular pleomorphism or the presence of necrosis cannot be used for future studies or must be redefined and their reproducibility, reestablished. When conducting a research study, it is necessary to analyze the study variables to demonstrate their validity to measure or classify a certain feature. It is also advisable to warrant that that the variables are reproducible to be able to use them for other studies or in the routine clinical practice.
La dermatopatología no es una ciencia exacta estando sujeta a la subjetividad personal, lo que en ocasiones provoca variabilidad interobservador en cuanto al diagnóstico y la valoración de determinadas características histológicas. Respecto al melanoma, algunas variables como la regresión presentan baja concordancia interobservador. Por el contrario, otras variables como la medición del espesor de Breslow muestran una alta reproducibilidad.
ObjetivoEl principal objetivo de nuestro estudio fue investigar la reproducibilidad interobservador de múltiples características sobre un total de 60 casos consecutivos de melanoma.
Métodos y resultados principales Se realizó un estudio observacional y descriptivo en el Hospital de Manises (Manises), la Fundación IVO (Valencia) y el Hospital 12 de Octubre (Madrid). La concordancia media de las variables del estudio fue moderada. Las mayores tasas de concordancia se obtuvieron para la morfología polipoide, la pigmentación, la ulceración y la elastosis solar. Por el contrario, la concordancia más baja se dio para la presencia de pleomorfismo celular y la necrosis tumoral. Es necesario al realizar un estudio de investigación, analizar las variables de estudio y demostrar su validez para medir o clasificar una determinada característica. Adicionalmente, es recomendable garantizar que las variables sean reproducibles para poder utilizarlas en otros estudios o en la práctica clínica habitual.
ConclusionesNuestra concordancia media fue moderada, lo que refleja que algunas de las características medidas como el pleomorfismo celular o la presencia de necrosis no pueden ser utilizadas para futuros estudios o, por el contrario, deben ser redefinidas y restablecida su reproducibilidad.
Multiple morphological features are often used in the histopathological diagnosis of cutaneous melanoma. These can be categorized into two major types such as architectural (pagetoid infiltration and solar elastosis) and cytological features (melanocytic atypia and rate of mitoses, among others).
Several features have prognostic implications, the most important being the presence of ulceration and Breslow thickness.
Molecular biology is changing the way we see and classify melanocytic lesions, and there have been interesting advances in this field over the past few years.1
Moreover, it is remarkable how heterogeneous melanoma can get in terms of its histopathological appearance. Although this is probably due to multiple factors, mutational status is gaining interest as it might determine some clinical and histopathological features such as the degree of pagetoid spread or the shape and size of melanoma cells. In a previous study, Viros et al.,2 defined some histopathological features associated with the presence of BRAF mutations. BRAF mutated melanomas displayed an increased upward migration and nest formation of intraepidermal melanocytes, a sharper demarcation towards the surrounding skin, thickening of the involved epidermis, rounder, larger and more pigmented melanoma cells, and thickening of the involved epidermis.
Histopathological reports of tumor characteristics are subject to interobserver variability, so diagnosis and description of melanoma can often be problematic. In melanocytic lesions, several studies have reported a low level of agreement for some semi-quantitative features, such as lymphocytic infiltration or regression.3–6 In addition, the differential diagnosis between dysplastic nevi and early-stage invasive melanoma can sometimes be challenging too.7,8
On the other hand, some prognostic features generally have high interobserver reproducibility, such as Breslow thickness and the presence of ulceration.4,6,9
The main outcome was the concordance between the two groups for each pathologic feature assessed.
Material and methodsClinician panel and review procedureOur study was conducted by two groups of researchers, the first one included two dermatopathologists from the same institution (IVO) and the second, one pathologist (HUDO) and one dermatologist (HM).
The dermatologist performed the histological assessment always under supervision and accompanied by the pathologist and had experience in dermatopathology (held a 4-month internship in the dermatopathology unit at Hospital Universitario 12 de Octubre, Madrid, Spain and a 15-day short internship in Ackerman Academy, New York City, NY, United States).
The study was conducted in two stages. Stage #1 was the pre-selection of the histologic features to be studied, as well as their definitions. This was done considering previous literature, specifically Viros et al. research work,2 and preliminary meetings of the research group in which a subset of 40 samples with different mutational status was examined. We decided to include some extra features based on their prevalence. This stage also included an assessment of the level of agreement in a training set of 10 samples to identify and resolve discrepancies in the assessment of the definitive variables. This assessment was performed separately by each of the two groups.
Finally, the last stage was performed the same way but with the final sample set on which the present study was conducted.
Cases studiedThe histological sections were obtained consecutively from IVO pathology files from January 2004 through December 2004.
Demographic data including age, sex, location, type of melanoma, Breslow thickness and Clark level were retrieved from the IVO Melanoma Database.
Tissue fragments were fixed in formalin, routinely processed and stained with haematoxylin and eosin. All cases that were not optimal for review were excluded.
Histopathological featuresSpecific features analyzed by Viros et al.,2 were included in our analysis:
Upward spread of intraepidermal melanocytes or pagetoid spread (100% melanocytes at the dermoepidermal junction, 75% up to 100% melanocytes at the dermoepidermal junction, 50% pagetoid spread, >50% pagetoid spread), nest formation of intraepidermal melanocytes (no nests, <25% melanocytes in nests, 25% up to 50% melanocytes in nests, >50% melanocytes in nests), pigmentation of melanocytes (absence, slight, moderate, high, very high), epidermal contour (atrophic, thinned, normal, thickened, hypertrophic), lateral circumscription (discontinuous, gradual but continuous, abrupt).
We also included some additional features given their potential interest:
Pattern of growth (expansive vs infiltrative), solar elastosis (2 modalities: 1 – absence, minimum, moderate, high and 2 – low vs high), presence of ulceration (absence vs presence), type of ulceration (expansive vs infiltrative), polypoid shape (absence vs presence), regression (absence, ≤50%, >50%), presence of necrosis (absence vs presence), and presence of pleomorphism (absence vs presence).
Their definitions are described in detail in the supplementary data.
Statistical analysisResults were exported to an Excel table. When a feature or an entire case could not be evaluated for whatever reasons, it was considered as non-applicable.
Agreement among the two groups was assessed using Cohen's kappa coefficient statistic (κ), which is a known index for measuring chance corrected agreement on a nominal or ordinal scale. According to Landis and Koch,10 values >0.75 represent excellent agreement beyond chance, values between 0.75 and 0.40 represent fair to good agreement beyond chance, and values <0.40 represent poor agreement beyond chance. A κ value close to one means almost perfect agreement.10
In the case of ordinal variables, we used Cohen's weighted kappa – a modification of the original kappa statistic – proposed for nominal variables in the presence of two observers.11
For each characteristic considered, a 2×2 diagnostic table was built using dichotomous categories, and specific κ values were calculated.
ResultsThe first 60 consecutive cases diagnosed in the IVO Dermatology service from January 2004 through December 2004 were selected.
Five cases were excluded, four of them because they could not be interpreted due to their small size and one because of duplication (two sections of the same case). So, eventually 55 valid cases were considered to stablish kappa values.
The study population included 55 patients, 28 (50.9%) men and 27 (49.1%) women, with a mean age at diagnosis of 58.9 years (range, 23–82 years). Data on the location of the primary tumor, the histological type and the tumor stage are shown in Table 1 of the supplementary data.
General concordance was moderate (median kappa value, 0.5). Maximum values were for presence of polypoid shape (0.8), pigmentation (0.7), presence of ulceration (0.7) elastosis (high CSD/low CSD) 0.7, and degree of elastosis (0.7).
The most discordant values were the presence of pleomorphism (0.2) and necrosis (0.3).
Table 1 shows the kappa values from different studies, specifically our results and those from the studies conducted by Viros et al.,2 and Broekaert et al.12
Kappa values of the present and former studies.
Viros et al.2 | Broekaert et al.16 | Our study | |
---|---|---|---|
Spread | 0.7 | 0.7 | 0.5 |
Nesting | 0.6 | 0.4 | 0.5 |
Pigmentation | 0.7 | 0.7 | 0.7 |
Lateral circumscription | 0.4 | 0.4 | 0.3 |
Epidermal contour | 0.5 | 0.4 | 0.4 |
Solar elastosis (4 grades) | 0.8 | 0.6 | 0.7 |
Pattern of growth | N/A | N/A | 0.4 |
Elastosis (high CSD/low CSD) | N/A | N/A | 0.7 |
Ulceration | N/A | N/A | 0.7 |
Type of ulceration | N/A | N/A | 0.4 |
Polypoid shape | N/A | N/A | 0.8 |
Regression | N/A | N/A | 0.5 |
Necrosis | N/A | N/A | 0.3 |
Pleomorphism | N/A | N/A | 0.2 |
Finally, we conducted an adjustment of the variables pagetoid spread and formation of intraepidermal nests and assessed the interobserver agreement. Specifically, we reduced the number of categories from 4 down to 2.
The κ value for pagetoid spread was 0.7 and the κ value for nest formation, 0.4.
DiscussionIn this study, we examined the reproducibility or interobserver agreement of some characteristics of 60 cases of malignant melanoma, some of them with prognostic implications. The overall concordance was moderate. The highest kappa values were for polypoid shape and solar elastosis (high CSD/low CSD). On the other hand, the lowest values reported were for necrosis and pleomorphism.
Melanoma is a heterogeneous tumor as it shows different clinical and histopathological characteristics, sometimes making diagnosis challenging.
Former studies have evaluated the interobserver reproducibility of diagnostic criteria in melanoma and other melanocytic lesions with heterogeneous results.2,4,8,9,13–16
These studies often perform the circulation of pathological sections of cutaneous melanomas or melanocytic lesions to different combinations of pathologists and dermatopathologists to categorize multiple histopathological variables.
The highest concordance rates were achieved for the most important variable in terms of prognostic value for Breslow thickness. This is something predictable as it is a quantitative variable.4,6,9,14,17–20
On the contrary, some other variables with prognostic importance show low or moderate reproducibility as Clark level assessment.9,14,19,20
Our study is based on several variables that were defined in the study by Viros et al.2 This study showed better kappa values for most of the variables tested compared to our results2 (see Table 1).
Another study undertaken by the same group (Broekaert et al.)12 presented similar kappa values, except for nesting, epidermal contour and circumscription that were lower than the values obtained in the study conducted by Viros et al.2 (Table 1).
An explanation for some of the discordances reported may be that melanoma can be a very large lesion and show overt heterogenicity per se. Therefore, when considering nest formation or pagetoid spread, an area with high number of nests or intraepidermal spread can exist followed by another with total absence, which complicates providing a result to quantify this finding. Additionally, some features can be detected only in some glass sections.
Regarding lateral circumscription and epidermal contour, we have detected that often, melanoma can show different transitions from the intraepidermal growth portion of the tumor to normal skin from one to the other side of the glass section.
It is remarkable that solar elastosis was more reproducible than other variables when assessed in two categories (high CSD/low CSD), a simpler but valid way of classifying elastosis, first defined by Landi et al.,21 and considered by the World Health Organization (WHO) as a major characteristic that categorizes different types of melanoma.22
Kappa concordance for ulceration was substantial as former studies have stated.6,9,17,19,20 Assessment of ulceration may be only possible in some histological sections of melanoma; for instance, ulceration may only touch a small portion of melanoma and go unnoticed in some glass sections.
When there is a focal loss of epidermis, it may be problematic to establish ulceration, as it can be an actual ulceration or a sectioning artifact. Unless there is evidence of a dermal scar or a previous biopsy, it can be troublesome to distinguish between traumatic and non-traumatic ulceration.9
Polypoid shape was highly reproducible in our experience as it is easy to assess.
In terms of regression, in our research, its evaluation showed moderate concordance, with heterogeneous rates being reported in former studies. Most studies that assess several features showed low reproducibility6,19,20 while others showed higher rates.23 Kang et al. showed better reproducibility for regression in their study, yet we should mention that this study was only focused on regression. It should be expected that a study associated with only one feature should reach higher concordance than studies assessing several variables, especially if there is previous training. Categorization of regression and criteria have changed over time, which can be a cause of variable concordance. In fact, some studies consider early, intermediate or late regression,23 while others only consider two grades (presence or slight/absence).19,20
Literature is scarce on the interobserver reproducibility regarding necrosis and pleomorphism.
The variable pleomorphism is very subjective, and melanoma cells from malignant tumors are pleomorphic in general, particularly in some areas of the tumor. This feature can be problematic as it can be difficult to stablish due to its subjectivity.
Regarding assessment of necrosis, in Urso et al. study,13 interobserver concordance was stablished for 55 cases of melanoma and necrosis was found in one case only. Therefore, since it is a less prevalent feature, it can be a finding only made in a small area and go unnoticed.
Finally, there is a really important fact we should mention which is subjectivity, as it has been confirmed that even in some observations there is intraobserver discordance.
Consistent with this, Elmore et al. showed intraobserver discordance (and logically lower interobserver discordance) for categorizing atypia in melanocytic lesions, especially when categorizing those with values different than the extreme values.8
We propose that, if possible, variables should be redefined and regrouped with fewer categories. For example, pagetoid spread could be categorized into two rather than four groups considering 0: absence or minor pagetoid spread (<25% of the cells) and 1: overt pagetoid spread (>25% of the cells). Similarly, nest formation could be redefined as having two categories only; 0: non or minimal nest formation (<25% of cells in intraepidermal nests) vs 1: marked nest formation (>25% of cells in of intraepidermal nests). Shape of cells, epidermal contour and pigmentation could be also simplified in fewer categories.
On the contrary, there are some variables with low concordance that cannot be adjusted, such as lateral circumscription. Therefore, we will probably not use this variable for this definition in the future.
Any adjustment of the categories of the variable like their regrouping, if possible, should be tested for reproducibility again.
In our experience, after regrouping and studying interobserver concordance of pagetoid spread and nest formation only in pagetoid spread improved. It is an example that the regrouping of categories of variables can sometimes improve their reproducibility or, at least, maintain it, while making sure its relevance remains untouched.
Perhaps it is still early, but new technologies and A.I. could be a before and after in the histological categorization of melanocytic lesions. For this purpose, it is necessary to generate useful algorithms that should be trained until their diagnostic sensitivity approaches that of an experienced pathologist.24
As a matter of fact, a three-dimensional histology computer model of malignant melanoma has been tested with promising results. It evaluated different tissue levels while avoiding the problem of some features being present in a limited subset of slides. Although these are promising technologies in the field of histopathology diagnosis there are still some limitations that should be addressed.25
In conclusion, our study showed median moderate reproducibility of several histopathological features of melanoma, meaning that, if possible, some variables should be redefined and evaluated for interobserver agreement for future research.
Conflict of interestThe authors declare that they have no conflict of interest.