Interrater reliability for the detection of cortical lesions on phase-sensitive inversion recovery magnetic resonance imaging in patients with multiple sclerosis

Objective To assess the reliability of phase-sensitive inversion recovery (PSIR) magnetic resonance imaging (MRI) and its accuracy for determining the topography of demyelinating cortical lesions in patients with multiple sclerosis (MS). Materials and Methods This was a cross-sectional study conducted at a tertiary referral center for MS and other demyelinating disorders. We assessed the agreement among three raters for the detection and topographic classification of cortical lesions on fluid-attenuated inversion recovery (FLAIR) and PSIR sequences in patients with MS. Results We recruited 71 patients with MS. The PSIR sequences detected 50% more lesions than did the FLAIR sequences. For detecting cortical lesions, the level of interrater agreement was satisfactory, with a mean free-response kappa (κFR) coefficient of 0.60, whereas the mean κFR for the topographic reclassification of the lesions was 0.57. On PSIR sequences, the raters reclassified 366 lesions (20% of the lesions detected on FLAIR sequences), with excellent interrater agreement. There was a significant correlation between the total number of lesions detected on PSIR sequences and the Expanded Disability Status Scale score (ρ = 0.35; p < 0.001). Conclusion It seems that PSIR sequences perform better than do FLAIR sequences, with clinically satisfactory interrater agreement, for the detection and topographic classification of cortical lesions. In our sample of patients with MS, the PSIR MRI findings were significantly associated with the disability status, which could influence decisions regarding the treatment of such patients.


INTRODUCTION
Multiple sclerosis (MS) is a chronic, inflammatory, demyelinating degenerative disease of the central nervous system (CNS) and has a markedly heterogeneous clinical cross the blood-brain barrier and induce CNS damage, leading to the formation of demyelinating lesions (1)(2)(3)(4) .
Historically, MS was considered as a disease of the white matter.However, various magnetic resonance imaging (MRI) studies of patients with MS have consistently shown that demyelinating lesions are common in the gray matter of cortical and deep brain structures of such patients and can be extensive (5)(6)(7)(8)(9) .Accurate assessment of gray matter damage has contributed significantly to the interpretation of the association that cortical lesions have with clinical and cognitive outcomes in MS (8,9) .Depending on their location, cortical lesions can be classified as leukocortical (type I), intracortical (type II), or subpial (subdivided into types III and IV).These diverse topographies suggest that the onset and growth of cortical lesions occur through various mechanisms, and their distinct locations probably account for the heterogeneous course of MS (3,4,9,10) .Type I cortical lesions can be confused with juxtacortical lesions, which are quite common, being found in 50% of patients with MS in autopsy studies (4,11) .
For the diagnosis and follow-up of patients with MS, it is essential to perform MRI examination of the CNS.That examination can provide information regarding the activity and progression of the disease, which are used as criteria for treatment failure, as well as being useful in screening for complications of MS treatment (12,13) .The standard MRI protocol for MS includes volumetric fluidattenuated inversion recovery (FLAIR), T1-weighted, gadolinium contrast-enhanced T1-weighted, and T2-weighted sequences.Those conventional sequences present low sensitivity for the detection of cortical lesions, even if the images are acquired in high-power scanners.In contrast, non-conventional pulse sequences, such as those based on phase-sensitive inversion recovery (PSIR) or double inversion recovery (DIR), substantially improve the sensitivity for the detection of cortical lesions (7,8) .
A PSIR sequence is T1-weighted and presents a marked contrast between white and gray matter.It has a higher signal-to-noise ratio than does a DIR sequence, with faster acquisition times, resulting in images with higher resolution.On a PSIR sequence, the distinct boundary between the cortical strip and adjacent white matter provides a more accurate differentiation between cortical lesions, juxtacortical lesions, and white matter lesions (14)(15)(16) .Lesions identified as fully localized in gray matter on other sequences, such as FLAIR and DIR sequences, might in fact be juxtacortical lesions or be totally located in the white matter.The intrinsically low signal-to-noise ratio of DIR sequences, which are widely used in the detection of cortical lesions, does not allow increases in resolution beyond that achieved during the long periods required for the acquisition of clinically acceptable scans.In contrast, PSIR sequences can provide higher image resolution within shorter, clinically feasible periods, the typical scan time being approximately 11 min (15,16) .Some studies have demonstrated that PSIR sequences have higher accuracy than do FLAIR sequences and are 3-4 times more sensitive than are DIR sequences for the detection of cortical lesions (12,16,17) .In addition, the PSIR findings have been shown to correlate significantly with physical and cognitive function in patients with MS; therefore, given the clinical relevance of cortical damage in MS, the PSIR findings could have important diagnostic and prognostic implications (11,18) .
The most recent revision of the diagnostic criteria for MS incorporated the presence of cortical lesions as a discriminating criterion for the spatial dissemination of the disease (19) .This new parameter made it crucial to assess the accuracy of recently developed MRI sequences used in order to detect these lesions.Therefore, we assessed the performance of PSIR sequences and their accuracy for determining the topography of demyelinating cortical lesions in patients with MS.

MATERIALS AND METHODS
This was a cross-sectional study of consecutive patients with MS who underwent MRI of the CNS within a 12-month period, based on the recommendations put forth in the Standards for Reporting of Diagnostic Accuracy Studies (20) and in the Guidelines for Reporting Reliability and Agreement Studies (21) .The study was approved by the local research ethics committee, and all participating patients gave written informed consent.
The MRI sequences were acquired in a 3.0-T scanner (Spectra; Siemens Medical Systems, Erlangen, Germany), with a 32-channel head/neck coil.The image acquisition protocol consisted of the following: unenhanced T1-weighted turbo spin-echo, 3D T1-weighted gradientecho, and 3D FLAIR sampling perfection with applicationoptimized contrasts using different flip angle evolution sequences, all in the sagittal plane; T2-weighted turbo spin-echo and diffusion-weighted echo-planar imaging sequences, both in the axial plane; intravenous infusion of gadolinium-based contrast (Gadovist; Bayer Schering Pharma AG, Berlin, Germany), with dosing according to patient body weight; and acquisition of contrast-enhanced axial susceptibility-weighted imaging and sagittal 3D T1weighted gradient-echo sequences, together with axial T1weighted fast spin-echo magnetization transfer contrast and axial PSIR sequences.The main sequences of interest were 3D FLAIR and PSIR.The following parameters were used for 3D FLAIR: field of view, 260 mm; repetition time/ echo time, 5,000/395 ms; matrix, 256 × 256; voxel size, 0.5 × 0.4 × 1.0 mm; and acquisition time, 5 min 15 s.The parameters used for PSIR were field of view, 220 mm; repetition time/echo time, 2,600/10 ms; matrix, 195 × 320; voxel size, 0.4 × 0.4 × 3.0 mm; and acquisition time, 3 min 47 s.Cortical lesions were identified on the basis of recommendations in the literature (12,15,17) .Two neuroradiology interns, each with one year of experience, designated rater 1 and rater 2, respectively, evaluated images obtained from the picture archiving and communication system, and their findings were compared with those of a senior rater with 14 years of experience (six dedicated to patients with MS), designated rater 3, who evaluated the same images.
The Expanded Disability Status Scale (EDSS) score, duration of disease, MS phenotype, sex, and age were extracted from patient electronic medical records.All of the raters were blinded to the patient clinical data and to the findings of their peers.Each rater identified and counted the juxtacortical and cortical lesions in the supratentorial and infratentorial regions, first on the FLAIR sequences and then on the PSIR sequences, which were considered the standard sequences to define the topographic classification of the cortical lesions.

Statistical analysis
The findings of each rater are presented as the means and standard deviations (SDs) of the lesion counts.After applying the Shapiro-Wilk test to verify the distribution of the variables, we estimated the correlation coefficients for the associations that the numbers of juxtacortical and cortical lesions had with the EDSS score, duration of disease, and patient age.
To calculate the agreement between the findings of the two inexperienced raters and those of rater 3, for the presence of lesions, their localization, and topographic classification, we used the free-response kappa (κ FR ) statistic.That was chosen because the traditional assessment of interrater agreement can be misleading in situations in which only positive findings are computed, such as in imaging examinations.In such examinations, the lack of a specific number for negative observations (absence of lesions) limits the use of the classical kappa statistic.In many cases, the images must cover extensive areas of bulky organs.In addition, multiple abnormalities, such as MS plaques, make the measurement of agreement impractical or biased by a free-response paradigm, leading to underestimation of kappa coefficients (22) .With the κ FR statistic, the agreement does not depend on unknown/negative data and can be estimated from positive results alone, which is particularly useful in imaging studies (22) .The classification of the κ FR follows the standard Fleiss ratings employed for the kappa coefficient (23) , categorizing agreement as poor (< 0.40), satisfactory (0.40 < κ FR < 0.75), or excellent (> 0.75).Intraclass correlation coefficients (ICCs), with twoway random effects for agreement between raters, which is appropriate for examiners with different levels of experience (24,25) , were calculated for the total lesion counts on FLAIR and PSIR sequences, as well as for the reclassification of cortical lesions (defined as any change in their topographical classification on PSIR sequences).
The concordance between FLAIR and PSIR images for the same rater (intrarater agreement) was assessed with a Bland-Altman plot, which represents the repeatability between different methods, analyzing the averages of the differences in the number of lesions found by one rater.To be clinically acceptable, 95% of the differences between the methods must remain within a range of the mean of the differences ± 1.96 SD, which would indicate the potential for interchangeability between the methods (26) .Therefore, the results are expressed as means and SDs, with 95% confidence intervals (CIs).To exclude a proportion bias, we performed regression analysis of the differences between the methods.On the basis of data in the literature, we established a priori a limit of seven lesions as a clinically relevant and acceptable difference between the sequences (27)(28)(29) .The sample size was calculated in order to provide clinically satisfactory coefficients of agreement, at a significance level of p < 0.05 and a power of 0.80.For the correlation analysis, Spearman's rho (ρ) was calculated.The statistical analysis was performed using the Stata software, version 14.0 (Stata Corp LP, College Station, TX, USA).

RESULTS
The sample comprised 71 patients, all with the relapsing-remitting MS phenotype.The mean age was 47.3 ± 11.7 years.Among the 71 patients evaluated, 47 (62%) were female, the mean score on the EDSS was 2.4 ± 1.8, and the mean duration of disease was 12.9 ± 7.1 years.The raters detected 1,796 lesions on the FLAIR sequences (mean of 8.4 lesions/patient/rater), all of which were also visualized on the PSIR sequences (Figure 1).On the PSIR sequences, the raters detected 870 lesions (mean of 4.0 lesions/patient/rater) that were not visualized on the FLAIR sequences (Table 1), translating to approximately 50% additional lesions detected (Figure 2).Most (84%) of the lesions were in the supratentorial compartment.There were 28 patients (39.4%) in whom no cortical lesions were detected on the FLAIR sequences.In 12 of those patients, the raters found cortical lesions on PSIR images that were not apparent on FLAIR images.Therefore, the adjusted number of lesion-free individuals was 16 (22.5% of the sample).There was no significant difference between the patients without cortical lesions on FLAIR images and those without cortical lesions on PSIR images, in terms of mean age (46 ± 11.9 years vs. 43.6 ± 11.8 years; p = 0.47) or the mean of duration of disease (12.5 ± 6.8 years vs. 11.0 ± 6.3 years; p = 0.68).However, the mean EDSS score was 40% lower in the latter group (1.5 ± 0.9 vs. 2.5 ± 1.8).
The PSIR and FLAIR sequences both showed satisfactory interrater agreement in distinguishing between patients with and without lesions (mean agreement, 80.0%; κ FR , 0.58-0.62).For the reclassification of lesions, the mean agreement was 73.2%, with an κ FR ranging from 0.56 to 0.58 (Table 2).The raters reclassified 366 lesions (1.7 lesions/patient/rater), corresponding to 20% of the lesions detected on the FLAIR sequences.The most common reclassification was from juxtacortical lesions to type I cortical lesions (in 81.5%), followed by juxtacortical lesions to agreement for the total lesion counts (Table 3).Spearman's ρ showed a significant correlation between the total number of lesions detected on PSIR sequences and the EDSS score (ρ = 0.35; p < 0.001).However, the total lesion count was not found to correlate significantly with any of the other variables assessed.
As can be seen in the Bland-Altman plot (Figure 3), the mean intrarater difference between the FLAIR and PSIR sequence counts was 0.23 ± 3.09 lesions (inverse logit-transformed 95% CI: 0.19-0.27),approximately 97% of the differences remaining within the limits of the preestablished range for the expected agreement (Figure 3).The regression analysis ruled out a systematic variation in the means of differences (p = 0.27), thus excluding a proportion bias in the differences between the FLAIR and PSIR sequences.type II cortical lesions (in 11.5%), juxtacortical lesions to white matter lesions (in 4.5%), and white matter lesions to juxtacortical lesions (in 2.5%).
The ICC demonstrated excellent interrater agreement for the number of lesions reclassified and satisfactory

DISCUSSION
To our knowledge, this is the fi rst study to assess the reliability of PSIR MRI among raters with markedly different levels of experience, which rules out the infl uence of training effects on the agreement coeffi cients (21,23,24) .We found clinically satisfactory levels of intrarater and interrater agreement for PSIR sequences, as well as showing that PSIR MRI had better sensitivity for the detection of cortical and juxtacortical lesions (50% better than FLAIR MRI), with no loss of reliability, and for the topographic reclassifi cation of lesions.
The heterogeneity of the course of MS seems to be multifactorial, and it is likely that the topographic and volumetric characteristics of the demyelinating lesions contribute to that heterogeneity (30)(31)(32)(33) .Therefore, MRI sequences that are more sensitive can help explain the variability in the course of the disease (31)(32)(33) .Despite having incorporated the presence of cortical lesions as a discriminating criterion for the spatial dissemination of MS, an expert panel concluded that conventional MRI sequences are incapable of accurately detecting and localizing cortical lesions and recommended the use of advanced sequences (19) .Likewise, the current consensus guidelines issued by the Magnetic Resonance Imaging in MS (MAGNIMS) group suggest considering the number and location of cortical lesions as a marker of MS progression, as long as the rater has a high level of expertise in MRI analysis (32) .Therefore, the accurate detection of cortical lesions has gained greater relevance, given its impact on the diagnosis and prognosis of MS.Consequently, assessing the reliability of advanced MRI sequences has become essential, and the present study meets that demand.
In our study, the raters detected higher numbers of lesions on PSIR images than on FLAIR images, with satisfactory interrater agreement for the detection and topographic reclassifi cation of the lesions.In our study, the interrater ICC was moderate for the total lesion count on PSIR sequences, which is in contrast with the excellent agreement reported in some other studies (16,17) .However, in those studies, the raters had more experience than did raters 1 and 2 in our study.Overall, the higher agreement coeffi cients in previous studies suggest that training improves the rate of cortical lesion detection.However, our results indicate that a lack of training does not reduce the clinical effi cacy of PSIR sequences.
The ICC showed excellent interrater agreement for the total number of lesions reclassifi ed, whereas the κ FR statistic showed only moderate interrater agreement for lesion reclassifi cation.One possible explanation for this discrepancy is the difference in the size of the observed lesions.In the case of small lesions, the exact topographic localization of cortical lesions can be a challenge for less experienced examiners.However, the size of the lesions does not seem so relevant for their counting.That might have also accounted for the fact that the ICC for the lesion counts was higher on FLAIR sequences than on PSIR  Because larger lesions are easier to visualize, they are easier to count, leading to higher agreement coefficients.The fact that the FLAIR and PSIR sequences presented good intrarater agreement and repeatability on the Bland-Altman plot, with mean differences close to zero, indicates a non-significant systematic bias between the methods.The CIs and SDs of the mean differences in agreement remained at values without clinical significance and within the parameters established a priori.The intrarater mean of differences ≤ 3.3 lesions (upper limit of agreement added to the SD, after inverse logit transformation) suggested the potential for interchangeability between the sequences.
Cortical lesions constitute a common finding in MS, and the frequency of such lesions in the present study was comparable to those reported in studies involving patients with MS phenotypes similar to those identified in our sample (3,(34)(35)(36) .In our study, 77.5% of the patients presented cortical lesions, which is higher than the proportions reported in studies that included patients in the early stages of relapsing-remitting MS or clinically isolated syndrome, all of which were around 20% (36,37) .In contrast, studies including patients with progressive forms of MS showed higher proportions of patients with cortical lesions, ranging from 82% to 100% (8,13,15,16) .These findings suggest that the MS stage, which is directly associated with the EDSS score, as well as being indirectly associated with the duration of disease and with patient age, correlates with an increased risk of cortical lesions.We detected a significant correlation between the number of lesions detected on PSIR sequences and the EDSS scores, which is in line with data in the literature.A previous study, conducted by Nielsen et al. (9) , which used advanced 7.0-T MRI, found a significant association between cortical lesions and the EDSS score.A more recent study, conducted by Magliozzi et al. (31) , suggested that cortical pathology has a robust correlation with physical and cognitive disability in MS.These data should alert neuroradiologists to be aware of the clinical and demographic features of individual patients and to look for red flags during an MRI analysis.
In various studies, the presence of cortical lesions has been shown to be a robust independent predictor of the MS disability progression (8,9,17,31,33,34,38,39) .In the present study, the detection of cortical lesions on PSIR sequences (lesions that were not detected on FLAIR sequences) decreased the number of lesion-free patients by 43%.This reclassification lead to a relevant 40% reduction in the mean EDSS score when compared with the lesion-free group as determined on the FLAIR sequences.The reclassification from negative to positive for cortical lesions can have a significant influence on the prognosis of MS, given that the presence of cortical lesions has been correlated with pronounced disease activity and predisposition to progress to the advanced stages of the disease (40) .Our results indicate that the use of PSIR MRI is crucial for identifying patients with false-negative results for cortical lesions.Notably, 20% of the lesions detected on FLAIR sequences were topographically reclassified on PSIR sequences, and > 90% of the reclassifications were from juxtacortical to cortical.Juxtacortical lesions can occur in natural aging and may or may not be associated with progression of MS (8,9,11,13,40) .Therefore, if the presence of cortical lesions is considered a criterion for MS activity, their reclassification can influence decisions regarding treatment.One concern of the MAGNIMS group is the need for a higher level of expertise for the detection of cortical lesions in MRI analysis.The clinically satisfactory agreement observed in the present study, even among raters with limited experience, suggests that MRI protocols including PSIR sequences have an advantage over other MRI protocols, such as those including DIR sequences (16)(17)(18) .Although the MAGNIMS group recommended using the presence of cortical lesions as a marker of MS progression, they emphasized the low sensitivity of traditional MRI sequences for the detection of such lesions.However, our findings raise a question about using the emergence of cortical lesions as a definition of MS progression.One concern is that standardized advanced MRI sequences for the accurate detection of cortical lesions are not always available in some clinical settings.
Our study has some limitations.First, the fact that DIR sequences were not included in the local MRI protocol impeded the comparison with some data in the literature.In addition, we did not assess the cognitive status of the patients, which correlates significantly with cortical lesions in MS.However, we considered that evaluation to be beyond the scope of our study, the main objective of which was to assess the reliability of PSIR MRI, although such evaluation could be valuable in future studies of the validity of the PSIR protocol.Finally, the differences between PSIR sequences without volumetric acquisition and volumetric FLAIR sequences, in terms of spatial resolution and slice thickness (3.0 mm and 1.0 mm, respectively) should be taken into consideration.Those differences might even have influenced the detection and reclassification of some lesions, possibly leading to lower agreement coefficients, in our patient sample.

CONCLUSION
Cortical and juxtacortical lesions have taken on greater relevance after being incorporated into the diagnostic criteria for MS.New, advanced MRI sequences, such as PSIR, with a higher sensitivity to detect such lesions, have been gaining importance in investigative protocols, which creates the need to assess their accuracy in the clinical context.Our results indicate that PSIR sequences perform better in the detection and topographic localization of cortical lesions than do conventional MRI sequences, such as FLAIR sequences.In addition, our data show that the detection of cortical lesions by PSIR MRI affects the EDSS scores, which could influence decisions regarding the treatment MS.Future prospective studies aimed at determining whether and to what extent the findings on PSIR MRI correlate with physical and cognitive function over time in patients with MS could consolidate support for its incorporation into the routine MRI protocol for such patients.

9 *
One year of experience.† 14 years of experience.‡ Counts per patient.

Figure 3 .
Figure3.Intrarater differences and means of those differences.ULoA, upper limit of agreement; M, mean; LLoA, lower limit of agreement; Diff, difference.

Table 1 -
Frequency of cortical and juxtacortical lesions in patients with MS.

Table 3 -
Interrater reliability for the total counts of cortical and juxtacortical lesions in patients with MS.