Accuracy of six minute walk test , stair test and spirometry using maximal oxygen uptake as gold standard

Purpose: To assess the accuracy of the variables stair climbing time (SCt), stair climbing power (SCP), six-minute walk test distance (6MWT), and forced expiratory volume in 1 second (FEV1) using maximal oxygen uptake on exercise (VO2max) as the gold standard. Methods: Tests were performed in 51 patients. FEV1 was measured by spirometry and 6MWT was performed in a flat 120-m corridor. Stair climbing test was performed on a 6-flight stairway to obtain SCt and SCP. VO2max was measured by ergospirometry, using the Balke protocol. Pearson’s linear correlation and p values were calculated between VO2max and the other variables tested. For accuracy calculations, variable cutoff points were obtained through receiver operating characteristic (ROC) curves, dividing individuals into normal or unhealthy. Kappa statistic was used to calculate concordance. Results: Accuracy was: SCt – 86%, 6MWT – 80%, SCP – 71%, FEV1(L) – 67%, FEV1(%) – 63%. SCt and 6MWT showed 93.5% sensitivity when combined in parallel, and 96.4% specificity in series. Conclusion: SCt presented the best accuracy. SCt and 6MWT combined showed nearly 100% sensitivity or specificity. Thus, these simple exercise tests should be more routinely used, especially when an ergospirometer is not available to measure VO2max.


Introduction
Major surgery under general anesthesia poses considerable stress to the cardiopulmonary system increasing morbidity and mortality in individuals with low cardiopulmonary reserves.Patients with lung disease have a higher incidence of postoperative complications with complication frequency increasing in proportion to the severity of the lung impairment.Similarly, the presence of several cardiac risk factors increases the risk of postoperative complications in patients with cardiac disease.The incidence of postoperative cardiopulmonary complications is highest in patients undergoing upper abdominal and thoracic surgery leading to longer hospital stays and higher costs.Risk stratification may help to select patients for appropriate counseling and prophylactic treatment 1,2 .
The ideal test to predict the risk of postoperative complications should determine the aerobic capacity and functional reserve that would enable the patient to cope with the physical demands of surgery.
Cardiopulmonary exercise testing (CPET) includes a wide spectrum of clinical applications.It can be used to evaluate patient physical fitness and has been considered as the gold standard for predicting surgical risk.In this regard, the best measure used is maximal oxygen uptake during exercise (VO 2 max) determined by ergospirometry, which reflects the maximum oxygen volume consumed during exercise and is, therefore, considered a maximum test.VO 2 max seems to be the best indicator of exercise capacity 3 , but despite its usefulness, ergospirometry is not available in most hospitals.Nonetheless, while cardiac performance and respiratory function each can be evaluated individually, cardiopulmonary exercise testing allows the examination of both systems in a single study.During exercise, oxygen consumption, carbon dioxide production, and cardiac output increase while the work level reached reflects how well the heart, lungs, and circulatory system interact with oxygen transport to the tissues.
CPET is the closest to ideal of all surgical risk prediction tools.It allows improved assessment of postoperative risk and has been increasingly used by surgeons for preoperative evaluation 4 .At first, its efficacy was not clear, but more recent works have resolved all questions and demonstrated that CPET, including VO 2 max, is the most important step for the physiologic assessment of a thoracotomy candidate 5 .Thus, CPET can be considered as the gold standard test for surgical risk evaluation.
A systematic review of the literature 6 showed that exercise capacity expressed as VO 2 max is lower in patients that develop clinically relevant complications after lung resections.Other CPET forms, which do not get the patient to maximum fatigue, are considered submaximal.These tests, namely the twelve-minute walk test 7 , the six-minute walk test (6MWT) 8 , and the stair climbing test (SCT) 9 , can be used to confirm whether an individual is physically fit for a specific surgical procedure.They are simple, low-cost tests that do not require specialized equipment.
VO 2 max and distance during the 6MWT have proven to better predict prognosis than resting lung and/or cardiac function.In major surgery exercise testing is recommended for the preoperative evaluation of patients 10 .
Given that the current economic environment calls for cheaper and more accessible testing forms, and that ergospirometers are not available at all hospitals, the purpose of this study was to standardize SCT in our service 11 and to assess the accuracy of stair climbing time (SCt), stair climbing power (SCP), 6MWT distance, and forced expiratory volume in one second (FEV 1 ) measured by spirometry in order to determine which best predicts surgical risk using VO 2 max as the gold standard.

Methods
After approval by the Research Ethics Committee of São Paulo State University, this study was initiated by contacting patients over 18 years of age who had been referred for spirometry and agreed to participate and sign the informed consent form.Eligible patients were those referred for spirometry for any clinical or preoperative reason.Exclusion criteria were the same as for ergospirometry: any acute conditions, systolic arterial pressure > 200mmHg and diastolic arterial pressure > 110mmHg, decompensated heart failure, infarction within the past 40 days, decompensated COPD, electrocardiogram showing complete left bundle branch block, and walking difficulty (orthopedic, neurological, vascular changes), inability to ascend the complete staircase or to perform ergospirometry.All patients enrolled underwent history taking, physical examination, and electrocardiography at rest before physical strength testing.Spirometry, followed by 6MWT and SCT were all performed on the same day.Minimum recovery time between tests was 30 minutes.Ergospirometry was scheduled for a later date according to laboratory availability.
Spirometry was performed using a Med-Graphics Pulmonary Function System 1070, according to the American Thoracic Society guidelines 12 .Forced vital capacity was measured at least three times choosing the curve with the highest FEV 1 .Readings were expressed in liters and percent predicted.
The 6MWT consisted of measuring the distance covered by the patient after six minutes of encouraged walking according to the guidelines of the American Thoracic Society 13 .It was performed in the shade, at a fast pace with encouragement from the examiner, along a flat 120-m corridor marked every 0.75m to determine the distance covered by the patient in six minutes.
SCT was performed in the shade, on a staircase of 30°i n incline which consisted of six flights, each flight having twelve steps (72 steps in total) measuring 16.9cm, with a total ascent height of 12.16m 11 .Patients were asked to climb all the steps in the shortest possible time with verbal encouragement from the same examiner.Between flights, patients had to take two or three paces on a flat surface trying to maintain the same speed while the examiner inquired if everything was fine.Testing was stopped only for fatigue, limiting dyspnea, thoracic pain, or exhaustion.The time taken to climb the stairs (SCt) was expressed in seconds.The amount of work (W) done to climb the stairs was calculated in joules using the formula "W= m x g x h", where m is patient mass in kilograms, g is gravity acceleration (9.8m/s 2 ), and h is the height of the staircase in meters (12.16m).Stair-climbing power (SCP) was calculated in watts as W / SCt.VO 2 max was measured using a Quinton ergospirometer (Q4500, Quinton Instruments, Seattle, WA, USA) coupled to a treadmill in a standard climate-controlled environment.Heart and respiratory rates, arterial blood pressure, oxygen saturation, and 12-lead electrocardiograms were monitored throughout the test.All ergospirometry variables were measured, but VO 2 max, expressed in ml/kg/min was the only one used.Testing was performed using the Balke protocol, which is an incremental protocol indicated for individuals with comorbidities.The examination was interrupted in the event of systolic arterial pressure drop > 10mmHg as compared to rest, angina, symptoms related to the central nervous system (ataxia, dizziness, lightheadedness), signs of low perfusion (cyanosis, pallor), technical difficulties in monitoring electrocardiograms or arterial blood pressure, sustained ventricular tachycardia, ST-segment elevation >2mm or depression >3mm, patient request, fatigue, dyspnea, hissing, cramps, limping, left bundle branch block or conduction delay, increasing chest pain, or hypersensitive response.When testing was interrupted for any of the above reasons, the patient was excluded from the study.
Data were statistically analyzed by using Pearson's coefficient to estimate the correlations of VO 2 max with other variables along with p-values.The sensitivity, specificity and accuracy of the variables that significantly correlated with VO 2 max were determined.The variable cut-off points used to distinguish normal from abnormal test results were calculated using receiver operating characteristic curves (ROC) 14 and rounded whole numbers.The Kappa (k) statistic was used to assess concordance.Serial and parallel combinations of the two most accurate tests were used to determine sensitivity and specificity.Serial and parallel combinations were used to increase test specificity and sensitivity, respectively.The software utilized was SAS 9.1.

Results
Tests were performed in 51 patients (30 males and 21 females) aged 18-77 years (Mean ± SD = 52±16).Testing interruption was not considered necessary in any case.Table 1 shows mean and standard deviation (SD), maximum and minimum values, as well as the cut-off points obtained in all tests performed.Linear correlations between VO 2 max and test variables are presented in Table 2. Since no significant correlation was observed between VO 2 max and W, data on this variable were discarded and its accuracy was not determined.Table 3 exhibits tests sensitivity, specificity, accuracy and Kappa concordance using VO 2 max as the gold standard.4) whereas series combination showed 50.7% sensitivity, 96.4% specificity, and 82% accuracy (Table 5).

Discussion
Preoperative CPET can detect changes in oxygen transport that would not be discovered unless metabolic demand increased during or after surgery.Ergospirometry, which can be used for this purpose, is not available in most services and requires costly equipment.Therefore, despite being highly efficient and considered as the gold standard for surgical risk prediction by most authors 4 , CPET is still far from being feasible, especially in poor countries.
The ideal test for preoperative investigation must be simple, cheap, and widely available.SCT shows these characteristics, and has been used in developed countries to evaluate cardiopulmonary training [15][16][17] .Considering that exercise capacity is limited by cardiac or pulmonary disease, it is not surprising that patients with cardiopulmonary disorders have difficulty climbing stairs, with the degree of limitation degree being proportional to the severity of cardiopulmonary impairment, while patients who accomplish multiple flight rapid stair climb without symptoms have considerable cardiopulmonary reserve.However, SCT standardization requires the use of an accurate, universal and adequately determined variable, as previously done for 6MWT.Attempts to stratify postoperative complications just by the number of floors or steps completed have sometimes been frustrating 18 .The step is not a universal unit of measurement, so the ideal would be to measure the height reached in meters rather than in flights or floors.It would be also difficult to apply SCT if height were considered as variable.In this case, very high stairways, not available in all services including ours, would be needed.Nevertheless, considering a minimum height of 12m as a constant, it would be possible to use SCt as a variable.Lower heights may not be as useful.According to some studies 15 , 50% of the patients unable to ascend 12m are likely to have complications.
In this same line, a previous study 19 has shown a significant correlation between stair climbing speed and VO 2 max measured by cycle ergometry, a correlation much greater than that of VO 2 max with the height climbed.In that study, patients were asked to climb as high and as fast as they could, to a maximum elevation of 20m.The height reached and the average speed of ascent were compared to VO 2 max.In patients with a speed > 15m/min (80s over 20m) VO 2 max was > 20ml/kg/min, and in patients with a speed > 12m/min (100s over 20m) VO 2 max was > 15ml/kg/min.In our study, patients with a speed > 18m/min (40s over 12m) had VO 2 max > 25ml/kg/min.When height is constant, time is the only variable and speed calculation is not necessary.Nevertheless, if comparing different stair heights is desired, the ideal is to use average speed.Our results are similar to those reported by these authors 19 (Figure 1).SCt must be adequately determined and there must be encouragement during stair climbing to prevent patients from walking at their own pace.Time measured without encouragement, besides not reflecting the actual physical capacity of an individual, also affects other SCT parameters already used by other authors, such as power and estimated VO 2 15,18   .Ultimately, time is needed to calculate these variables.
Regarding SCP, other authors 18 have estimated the amount of work done to ascend a stairway (W) by the formula "W = step height (m) x number of steps/t (min) x mass (kg) x 0.1635".This formula does not actually calculate work but power expressed in watts.It is, indeed, a more complicated form of the classical formula for power (P = m x g x h / t) 11 , and is still used by some authors 15 who refer to this variable through the name "work".Work calculations do not require information on SCt.As a matter of fact, in our study VO 2 max was found not to correlate with work but with power, which does depend on time.In order to estimate VO 2 during stair climbing, the above mentioned authors 15,18 used power under the name of "work": VO 2 (ml/min) = 5.8 x m (kg) + 151 + (10.1 x "W").Thus, time remains to be an important variable in estimating VO 2 .It must be measured as strictly as weight and height, and will never be adequate without encouragement.However, the literature shows that encouragement and adequate time measurement have not been a matter of concern.
Under our experimental conditions, SCt showed the best linear correlation with VO 2 max, and the highest accuracy indicating that the patients who take < 40s to climb 12m have a high probability of having VO 2 max > 25ml/kg/min, and those who take > 40s have a high probability of having VO 2 max < 25ml/kg/min.SCP greater or smaller than 200w, indicated a high probability of VO 2 max being greater or smaller than 25ml/kg/min, respectively.However, using SCP rather than SCt can lead to more errors.Those patients who do not manage to complete the 12m with SCt being considered infinite, should be carefully evaluated in order to detect and attempt to correct any changes in the oxygen transport system as previously demonstrated 15 .Brunelli et al. 20 demonstrated that cardiopulmonary complications were 2-fold higher and mortality was 13-fold higher in patients climbing less than 12m than in patients who climbed more than 22m.Our findings differ from those reported by Brunelli et al. 15,20 because they used the variable height in their study while stair climbing speed was used here.Nonetheless, patients should not be allowed to climb the stairs at their own pace.Instead, they should be encouraged to ascend as fast as possible, as suggested by Koegelenberg et al. 19 .
In this study, 6MWT showed the third best linear correlation with VO 2 max, the second best accuracy, and the highest specificity, with a strong ability for detecting fit individuals.Thus, it may be safely said that individuals who walk more than 500m in 6 minutes have a high probability of having VO 2 max > 25mL/kg/min, but not that individuals who walk less than 500m have VO 2 max < 25mL/kg/min, as test sensitivity was not so high.
FEV 1 results in liters were better than those expressed in percent predicted.VO 2 max linear correlation was good with FEV 1 (L), but worse with FEV 1 %.Both parameters were less accurate than 6MWT and SCT variables.FEV 1 (L) was better than FEV 1 % in detecting individuals without complications (VO 2 max > 25mL/kg/min) due to its high specificity, but it poorly detected individuals with complications because of its low sensitivity.
Given that SCt and 6MWT were considered the best tests, and there was a 80% concordance between them, their sensitivity could be improved by a parallel association that increased sensitivity by 93.5%.In this case, one positive test was enough to identify an individual with complications.This can be very useful when deciding for a minor resection such as a segmentectomy rather than a lobectomy, or insisting on better preoperative preparation for elective surgery, or even predicting the need for a greater support during postoperative intensive care.
In case it is necessary to increase specificity, 6MWT and SCt should be done in series, which provides a 96.5% specificity with both tests being positive to consider an individual as unhealthy.Thus, the ability to identify individuals without complications is enhanced, facilitating decision making in cases of patients with few or no comorbidities, young and clinically healthy patients, and pulmonary resection when postoperative predicted values are below the acceptable.This way, surgery would only be contraindicated if results were abnormal in both tests.
This study aimed at assessing the accuracy of the parameters obtained during stair climbing with encouragement, 6MWT and FEV 1 ; not at finding the cutoff points for the identification of patients at high or low surgical risk.The next step, which is in progress, is to ascertain the cut-off points for patients at high, average, and low surgical risk by evaluating the correlations of postoperative complications with the parameters provided by these tests.

Conclusion
By comparing the surgical risk prediction tests used in this group of patients, it may be concluded that the cheapest tests, namely the stair climbing test and the six-minute walk test, showed better accuracy than spirometry, and should, therefore, be more frequently used in preoperative assessments.

FIGURE 1 -
FIGURE 1 -Speed of ascent and VO 2 max.Full lines are results of Koegelenberg et al.19 and interrupted lines are our results

TABLE 1 -
Mean, standard deviation (SD), minimal and maximal test values, and cut-off points

TABLE 2 -
Linear correlation between VO 2 max and test variables (r and p values)

TABLE 3 -
Testing sensitivity, specificity, accuracy and Kappa agreement using VO 2 max as gold-standard As 6MWT distance and stair-climbing time had the best accuracy and concordance, they were combined in parallel and in series.Parallel combination yielded 93.5% sensitivity, 59.6%

TABLE 5 -
Series combination of the 6 minute walk test (6MWT) and Stair climbing time (SCt).Sensitivity, specificity, accuracy