Dynamic and static ultrasound features predictive of vesicoureteral reflux and renal damage in children and adolescents with neurogenic bladder

ABSTRACT Purpose: This study aimed to analyze the diagnostic accuracy of dynamic and static ultrasound (DSUS) in detecting vesicoureteral reflux (VUR) and renal scarring in a cohort of children with neurogenic bladder (NB). Materials and Methods: A retrospective, longitudinal, observational study was conducted using the Reporting Diagnostic Accuracy Studies guideline. The DSUS (index test) data were compared with voiding cystourethrography (VCUG) and renal scintigraphy 99mTc-dimercaptosuccinic (reference tests). Overall performance for predicting VUR and renal scarring was assessed using renal pelvic diameter (RPD)/distal ureteral diameter and renal parenchymal thinning on DSUS, respectively. Results: A total of 107 patients (66 girls, median age 9.6 years) participated. Seventeen patients (15.9%) presented VUR, eight bilateral. For overall reflux grade, the AUC was 0.624 for RPD and 0.630 for distal ureteral diameter. The diagnostic performance for detecting high-grade VUR was slightly better for DSUS parameters. The AUC was 0.666 for RPD and 0.691 for distal ureteral diameter. The cut-offs of 5 mm for RPD and 6.5 mm for distal ureteral diameter presented the best diagnostic odds ratio (DOR) to identify high-grade VUR. The increase of RPD during detrusor contractions showed an accuracy of 89.2%. The thinness of renal parenchyma presented an accuracy of 88% for renal scarring. Conclusion: DSUS predicts VUR and renal scarring in children with NB with fair to good accuracy, and all measurements exhibited a high negative predictive value (NPV). The increase in RPD during voiding or detrusor contractions proved to be the most accurate parameter for indicating the presence of VUR in this study.


INTRODUCTION
The most common cause of neurogenic bladder (NB) in children is neurospinal dysraphism (1)(2)(3).NB is present in up to 98% of children with myelomeningocele (4).About 25% of the most severe symptoms in pediatric urology are associated with neurogenic bladder (5), and 40% of children with NB develop chronic kidney disease (6).Patients with NB may present with various patterns of detrusor-sphincter dyssynergia and increased intravesical pressure, which can lead to urinary and/or fecal incontinence, urinary tract infections (UTIs), vesicoureteral refl ux (VUR) and renal impairment (1,3,7).The diagnosis and follow-up of patients with NB involves a multidisciplinary approach, including serial clinical, laboratory, and imaging tests.The goals of managing bladder dysfunction in children are maintaining a low-pressure, high-compliance bladder, and preventing upper urinary tract deterioration (8).
VUR, an important risk factor for pyelonephritis and renal scarring (1,7,9,10), is present in up to one-third of children with NB (8), making its diagnosis and approach essential (3,8).VCUG and renal scintigraphy are the gold standard tests for diagnosing VUR and renal scarring (9,11,12).The role of renal and bladder ultrasound as a screening tool for VUR and kidney damage in children with NB has been debated (13).However, in this sense, the lack of US accuracy for VUR or renal scarring may hinder its use in NB, given the need to prevent irreversible renal damage (3,9).However, the development of the dynamic and static ultrasound (DSUS) technique made it possible to obtain essential data for the diagnosis and followup of patients with NB (14).We hypothesize that the magnitude of specifi c DSUS measurements could predict the presence of VUR and renal scars.Therefore, this study aims to analyze the diagnos-tic accuracy of DSUS in detecting VUR and kidney damage in our cohort of children and adolescents with NB.

Ethical approval
The study was approved by Institutional Review Committee (IRB) under protocol CAAE 37450820000005149, position statement number 4.487.114.Legal guardians and participants aged 10 and 17 signed the Informed Consent Term and the Assent Term, respectively.The medical records were selected through an active search in the Medical Service and Archive after the institution's consent and signature of the Data Use Commitment Term.

Study design and patients
This retrospective cohort study included 127 consecutive patients enrolled in the Multidisciplinary Outpatient Clinic for Children and Adolescents with NB.We designed our study and reported our findings following the STAndards for the Reporting of Diagnostic accuracy studies (STARD) presented in Supplement 1 (15).Eligibility criteria were all patients with NB enrolled in the service between 1997 to 2022 who underwent DSUS, VCUG and renal scintigraphy (99mTc-DMSA) according to the care protocol.Twenty patients were excluded from the analysis: 15 due to lack of information in the medical records, and five refused to participate in the study.

Study protocol
A systematic clinical protocol was applied to all NB patients enrolled in the multidisciplinary outpatient clinic (1-3).On admission, we performed a clinical laboratory and imaging investigation (DSUS, urodynamic study, VCUG to assess VUR, and renal scintigraphy (99mTc-DMSA) to diagnose NB status and assess renal scarring.Our follow-up protocol included clinical examination, laboratory analysis at semiannual intervals, and DSUS, annually or as clinically necessary.

Index test
DSUS was considered the index test (test being evaluated) and was performed by the same trained examiner using a standard method (14).The exams were performed, on annual basis, using a Toshiba/Canon® Prima SLC Ultrasound Device, model Aplio 300 or 400 supplied with multifrequency convex (3.7 to 7.6 MHz), linear (8.0 to 12.0 MHz), high frequency electronic linear (13.0 to 18.0 MHz) transducers.
The assessed index tests were based on the first ultrasound after enrollment in the outpatient clinic.For patients with bilateral urinary tract alteration, index tests were generated for each renal unit.The following sonography indexes were determined as proposed by Dinkel et al. (16): Renal pelvic diameter (RPD) was determined by the greatest anteroposterior diameter of the renal pelvis acquired in a transverse plane on ultrasound dorsal images; Distal ureter diameter; Renal parenchyma thickness (RPT) was measured at the transverse view for each kidney; Bladder wall thickening; Bladder capacity; Presence of bladder residual urine (absent, mild, severe); Bladder trabeculation; Renal scarring.Renal scarring on DSUS was assessed using the following criteria: proximity of sinus echoes to the cortical surface, loss of pyramids, irregular outline, and loss of definition of capsular echo (17).In addition, in DSUS, we evaluated RPD increase during voiding or detrusor contractions as an indirect indicator of the presence of VUR (14) (Figure -1).

Reference tests
Examiners who were unaware of the other tests' results performed the index and standard reference tests.
The VCUG was considered the reference standard for VUR.The VCUG was requested at the beginning of the follow-up with a maximum interval of six months concerning the DSUS.We used the classification of VUR proposed by the Reflux Study Committee (18).In addition, according to reflux grade, we classified reflux as low-grade (I), mild to moderate-grade (II to III), and high-grade (IV-V) (19).
The reference test for renal scarring was renal scintigraphy (99mTc-DMSA), performed on admission (after the fourth month of life for infants) and later according to a clinical decision (episodes of recurrent UTIs/pyelonephritis) (1-3, 7).

Statistical Analysis
Continuous data were recorded as median and 25th to 75th interquartile range (IQ).The nonparametric Mann-Whitney test was used to compare these variables.Dichotomous variables were compared using the 2-sided chisquare test.
The diagnostic accuracy of the indexes tests was assessed by sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and likelihood ratios (LR).Receiver-operating characteristic (ROC) curves were analyzed for the overall diagnostic accuracy of continuous indexes (RPD and distal ureteral diameter) in discriminating infants who will present the events of interest (VUR and renal scarring).The area under the curve (AUC) was interpreted as the probability that a randomly selected patient with the event of interest had a larger maximum diameter than a randomly selected patient without the event of the interest.
We also analyzed the combined results of the continuous indexes (RPD and distal ureteral diameter) (20).Therefore, two possibilities were analyzed, using the "OR" rule, i.e., considering a positive diagnosis if either test was positive and a negative diagnosis if both tests were negative, and the "AND" rule, i.e., considering a positive diagnosis only if both tests were positive and a negative diagnosis if either test was negative.

RESULTS
A total of 107 patients (66 females) were included in the analysis.The main baseline clinical characteristics of patients are summarized in Table-1.Seventeen patients (15.9%) presented VUR, eight bilateral, giving a total of 25 refluxing units (11 mild to moderate grade (II-III) and 14 high-grade (IV-V) reflux).
Table-3 shows a similar analysis for highgrade VUR (Grade IV-V).Of note, the same cutoff points of 5 mm for RPD and 6.5 mm for distal ureteral diameter had the best DOR to identify children with high-grade reflux.Regarding the two tests in parallel, using the "OR rule", the sensitivity increased to 92.9% (CI 95%, 66.1-99.8)and the NPV to 94.8% (CI 95%, 93.5-99.8), the specificity increased to 89.5% (95% CI, 81.8-91.9).It still shows the diagnostic performance of categorical variables indicating high-grade VUR (Grade IV-V).Overall performance was relatively poor for all measurements, except for the observation, in the dynamic phase of the test, of an increase in RPD during urination, a finding that had a specificity of 92.5% (95% CI, 87.9-95.7),NPV of 95.9% (95% CI, 93.6-97.3)and an accuracy of 89.2% (95% CI, 84.3-93.0)for reflux high-grade.

Renal damage
A total of 92 patients (86%) had information regarding renal scintigraphy (99mTc-DMSA).Twenty children, out of the 92, had renal damage (two bilateral), giving a total of 22 kidney units.The presence of renal damage was associated with high-grade reflux.Twelve units had high-grade reflux, 4 (33.3%) had renal damage, whereas, in 172 units with mild-moderate or absence of reflux, 18 (10.5%)had an abnormality on renal scintigraphy (P = 0.04).The presence of thinning of the renal parenchyma in DSUS predicted damage to renal scintigraphy.This finding presented a sensitivity of 40.9% (95% CI, 20.7 -63.6), specificity of 94.4% (95% CI, 89.7 -97.4), a PPV of

DISCUSSION
In this retrospective cohort study, we evaluated DSUS measurements as predictors for VUR and renal scarring in a cohort of children and adolescents with NB.Our findings showed that sonography kidney measurements predict with fair to good accuracy the presence of VUR and renal scarring, which are crucial to managing children with NB.Overall, the PPV was low due to the relatively low prevalence of VUR, but the NPV was high for all renal measurements.
Clinical studies have reported that secondary VUR is prevalent in children and adolescents with NB (8).For instance, Bortolini et al. (21) described results like ours (15.9%) and demonstrated 19% of VUR in patients with NB.Sidi et al. (22) showed a higher prevalence of 52% (46.7% high grade).We evaluated the performance of two continuous measurements, RPD and distal ureter diameter, in predicting VUR.As previously mentioned, the magnitude of both measurements could indicate the presence of VUR.However, we described relatively low accuracy in identifying all grades of VUR, which did not improve for highgrade reflux.The literature is limited to specific ultrasound findings' contribution to predicting VUR in children with NB (13).Our findings agree with the study by Naseri et al. (19), who described that hydronephrosis (RPD ≥ 5mm) has low accuracy (0.65) for general VUR and does not improve for high-grade VUR (0.66) in patients without NB with UTI (1-18 years).On the other hand, a study demonstrated hydronephrosis in 28.8% of patients (1-144 months) with a first episode of febrile UTI and 18.5% with high-grade VUR (DOR 18.8) (23).Swanton et al. (24) showed the distal diameter of the ureter as a measure to predict VUR.In our analysis, the distal ureteral diameter (>6.5 mm) had relatively low accuracy (DOR 4.5) for generalgrade VUR and slightly better accuracy (DOR 7.5) for high-grade VUR.This finding agrees with a recent study that shows the presence of hydroureteronephrosis evidenced low accuracy (0.67) for general VUR but became moderate (0.82) for high-grade VUR (19). Lee et al. (23) reported that the presence of a hydroureter ≥ 7 mm in children without NB with a first UTI had a DOR of 20.4 for high-grade VUR.Recent studies suggest that measurement of the distal ureteral diameter is objective and reliable and is more predictive of the clinical outcome, regardless of the grade of VUR (24,25).Our findings showed a considerable improvement in overall performance when we combined the two measurements (RPD and distal ureter diameter), with a sensitivity of 92.9% and an NPV of 94.8% for high-grade VUR.
Regarding the categorical measurements, we emphasize that the overall performance was low for all measures, except for the increase in RPD in the dynamic phase of the test.One of the peculiarities of DSUS is the assessment of RPD during urination or detrusor contractions as an indirect sign of VUR (14).In our analysis, this finding demonstrated a good accuracy (89%) in predicting VUR.
DSUS has been used in our clinic since developing the technique in 2003 (14), including for diagnosing patients with non-neurogenic dysfunction (26).Filgueiras et al. (14) demonstrated that DSUS is a sensitive method and correlates well with urodynamic findings.In this sense, Bortolini et al. (21) showed excellent accuracy (90% accuracy, kappa coefficient of 0.8, p < 0.001) of DSUS in identifying detrusor overactivity in patients with NB due to myelomeningocele when compared to urodynamic testing.The DSUS, a noninvasive test, has guided us in the follow-up of children with NB, as it can anticipate clinical worsening and help us in decision-making.
One of the main goals in monitoring children with NB is to identify early changes in the upper urinary tract and thus prevent long-term kidney damage (2,7).Renal scintigraphy (99mTc-DMSA) is the gold standard test for detecting renal scarring, present in 25% of children with spina bifida with some degree of VUR (27).Scar detection was observed in adults with spinal dysraphism in 10% by ultrasonography and 46% by renal scintigraphy (99mTc-DMSA).Renal injury has been associated with high-grade VUR (28).Finkelstein et al. (12) demonstrated low accuracy in detecting renal scars by ultrasonography.Our findings showed the presence of renal scarring in 19.2% of patients submitted to renal scintigraphy (99mTc-DMSA).DSUS showed that renal parenchymal thinning predicts renal scarring on renal scintigraphy with moderate accuracy (88%).In a previous study in our clinic, renal scarring was detected in 31.7% of patients, with bladder wall thickness in DSUS being a marginal risk factor of renal scarring (29).
Our study has limitations.First, it is a retrospective study with inherent issues concerning this design, such as missing data.In this regard, we had to exclude some patients from the analysis due to the incomplete registry of the index tests.In addition, we tried to minimize the DSUS and VCUG findings of interpretation variability with a highly trained team using the same methodology.Also, we tried to mitigate the risk of verification bias by selecting the indexes and reference tests at the closest intervals with blinded radiologists.Thus, even though it is a retrospective study, when we present the results of a non-invasive test, such as the DSUS, with the possibility of predicting VUR in children with NB, we believe that it can be of great value in managing these patients.Thus, this finding could help minimize harm to these children and adolescents with such a severe and complex condition, including risks of urinary tract infection, exposure to ionizing radiation, discomfort and anxiety during an invasive test such as the VCUG.

CONCLUSIONS
Dynamic and static ultrasound predict vesicoureteral reflux and renal scarring in children with neurogenic bladder with fair to good accuracy, and all ultrasound measurements exhibited a high negative predictive value, meaning that the absence of these findings indicates the absence of vesicoureteral reflux clinically significant.The increase in renal pelvic diameter during urination or detrusor contractions proved to be the most accurate parameter to indicate the presence of vesicoureteral reflux in this study.The thickness of the renal parenchyma showed good accuracy for renal scarring.Thus, our findings suggest that dynamic and static ultrasound and voiding cystourethrography should be considered complementary in the initial approach for children and adolescents with neurogenic bladder.AIM STARD stands for "Standards for Reporting Diagnostic accuracy studies".This list of items was developed to contribute to the completeness and transparency of reporting of diagnostic accuracy studies.Authors can use the list to write informative study reports.Editors and peer-reviewers can use it to evaluate whether the information has been included in manuscripts submitted for publication.

EXPLANATION
A diagnostic accuracy study evaluates the ability of one or more medical tests to correctly classify study participants as having a target condition.This can be a disease, a disease stage, response or benefit from therapy, or an event or condition in the future.A medical test can be an imaging procedure, a laboratory test, elements from history and physical examination, a combination of these, or any other method for collecting information about the current health status of a patient.
The test whose accuracy is evaluated is called index test.A study can evaluate the accuracy of one or more index tests.
Evaluating the ability of a medical test to correctly classify patients is typically done by comparing the distribution of the index test results with those of the reference standard.The reference standard is the best available method for establishing the presence or absence of the target condition.An accuracy study can rely on one or more reference standards.
If test results are categorized as either positive or negative, the cross tabulation of the index test results against those of the reference standard can be used to estimate the sensitivity of the index test (the proportion of participants with the target condition who have a positive index test), and its specificity (the proportion without the target condition who have a negative index test).From this cross tabulation (sometimes referred to as the contingency or "2x2" table), several other accuracy statistics can be estimated, such as the positive and negative predictive values of the test.Confidence intervals around estimates of accuracy can then be calculated to quantify the statistical precision of the measurements.
If the index test results can take more than two values, categorization of test results as positive or negative requires a test positivity cut-off.When multiple such cut-offs can be defined, authors can report a receiver operating characteristic (ROC) curve which graphically represents the combination of sensitivity and specificity for each possible test positivity cut-off.The area under the ROC curve informs in a single numerical value about the overall diagnostic accuracy of the index test.
The intended use of a medical test can be diagnosis, screening, staging, monitoring, surveillance, prediction or prognosis.The clinical role of a test explains its position relative to existing tests in the clinical pathway.A replacement test, for example, replaces an existing test.A triage test is used before an existing test; an add-on test is used after an existing test.
Besides diagnostic accuracy, several other outcomes and statistics may be relevant in the evaluation of medical tests.Medical tests can also be used to classify patients for purposes other than diagnosis, such as staging or prognosis.The STARD list was not explicitly developed for these other outcomes, statistics, and study types, although most STARD items would still apply.

DEVELOPMENT
This STARD list was released in 2015.The 30 items were identified by an international expert group of methodologists, researchers, and editors.The guiding principle in the development of STARD was to select items that, when reported, would help readers to judge the potential for bias in the study, to appraise the applicability of the study findings and the validity of conclusions and recommendations.The list represents an update of the first version, which was published in 2003.
More information can be found on http://www.equator-network.org/reporting-guidelines/stard.

9 5 11 5 12a- 5 12b- 5 13a 5 13b
NB = Neurogenic bladder VUR = Vesicoureteral reflux VCUG = Voiding cystourethrography.DSUS = Dynamic and static ultrasound RPD = Renal pelvic diameter RPT = Renal parenchyma thickness ROC = Receiver operating characteristic curve AUC = Area under the ROC curve IQ = Interquartile range LR = Likelihood ratio DOR = Diagnostic odds ratio NPV = Negative predictive value PPV = Positive predictive value US = Ultrasound 99mTc-DMSA = 99-Technetium dimercaptosuccinic acid Whether participants formed a consecutive, random or convenience series 3 Test methods 10a Index test, in sufficient detail to allow replication 3-4 10b Reference standard, in sufficient detail to allow replication 4-Rationale for choosing the reference standard (if alternatives exist) 4-Definition of and rationale for test positivity cut-offs or result categories of the index test, distinguishing pre-specified from exploratory 4Definition of and rationale for test positivity cut-offs or result categories of the reference standard, distinguishing pre-specified from exploratory 4Whether clinical information and reference standard results were available to the performers/readers of the index test 4-Whether clinical information and index test results were available to the assessors of the reference standard 4-5 STARD 2015

Table 1 -Patient clinical and demographics characteristics.
99mTc-DMSA renal scintigraphy 99mTc-dimercaptosuccinic grade VUR.The combined results of both pelvic and ureter diameters are also shown in Table-2.