Multi-center epidemiological studies must ascertain that their measurements are accurate and reliable. For laboratory measurements, reliability can be assessed through investigation of reproducibility of measurements in the same individual. In this paper, we present results from the quality control analysis of the baseline laboratory measurements from the ELSA-Brasil study. The study enrolled 15,105 civil servants at 6 research centers in 3 regions of Brazil between 2008–2010, with multiple biochemical analytes being measured at a central laboratory. Quality control was ascertained through standard laboratory evaluation of intra- and inter-assay variability and test-retest analysis in a subset of randomly chosen participants. An additional sample of urine or blood was collected from these participants, and these samples were handled in the same manner as the original ones, locally and at the central laboratory. Reliability was assessed with the intraclass correlation coefficient (ICC), estimated through a random effects model. Coefficients of variation (CV) and Bland-Altman plots were additionally used to assess measurement variability. Laboratory intra and inter-assay CVs varied from 0.86% to 7.77%. From test-retest analyses, the ICCs were high for the majority of the analytes. Notably lower ICCs were observed for serum sodium (ICC=0.50; 95%CI=0.31–0.65) and serum potassium (ICC=0.73; 95%CI=0.60–0.83), due to the small biological range of these analytes. The CVs ranged from 1 to 14%. The Bland-Altman plots confirmed these results. The quality control analyses showed that the collection, processing and measurement protocols utilized in the ELSA-Brasil produced reliable biochemical measurements.
Multi-center studies; Laboratory measurements; Quality control
The validity of inferences from clinical and epidemiological studies critically depends on the validity and reliability of measurements in the data collection process. Quality assurance and quality control measures are therefore needed throughout the stages of planning and data collection. For an individual patient in continued care, the use of quality specifications based on the intraperson biological variation of an analyte through a certain period of time is internationally accepted (11. Kenny D, Fraser CG, Hyltoft Petersen P, Kallner A. Consensus agreement. Scan J Clin Lab Invest 1999; 59: 585–586, doi: 10.1080/00365519950185409.
https://doi.org/10.1080/0036551995018540... ). For epidemiological studies, in which greater laboratory precision facilitates the detection of associations between variables, quality can be assessed through the agreement of repeated measurements from biological samples of a single participant visit (22. Szklo M, Nieto F. Epidemiology, beyond the basics. 3rd edn. Boston: Jones and Barllett Publishers; 2012.).
Investigators in Brazil and in other low and middle income countries have increasingly been involved in multi-center studies with centralized laboratory analysis, but the necessary procedures for quality control have been infrequently reported. The ELSA-Brasil (Estudo Longitudinal de Saúde do Adulto - Brazilian Longitudinal Study of Adult Health) study utilized a central laboratory for most of its analyses, offering an opportunity to describe the methodology and results of the quality assessment of these measurements.
The resource of having a centralized laboratory in the ELSA-Brasil allowed each research center to have a team for the collection and processing of the biological material, which would be stored in cryotubes at -80°C for up to 30 days, and further transported to the central facility. The negative aspects of this strategy were the need for centralized training and certification of the teams from each center, and the increased risk of delay in the return of results to participants. The most important positive aspects were the facility for acquisition of the lab kits and the absence of inter-laboratory variability. The local processing of the samples decreased the volume of material transported, resulting in a reduction of costs.
The objective of this report is to present the results of quality control analyses of baseline laboratory measurements of the ELSA-Brasil.
Material and Methods
The ELSA-Brasil study enrolled 15,105 participants in 6 field centers located in 3 different regions of the country from 2008 to 2010 (33. Aquino EM, Barreto SM, Bensenor IM, Carvalho MS, Chor D, Duncan BB, et al. Brazilian Longitudinal Study of Adult Health (ELSA-Brasil): objectives and design. Am J Epidemiol 2012; 175: 315–324, doi: 10.1093/aje/kwr294.
https://doi.org/10.1093/aje/kwr294... ). The study protocol was approved by ethics committees at each institution, and all participants gave their written informed consent. Participants underwent interviews, examinations and collection of blood and urine specimens, in approximately 6 h at the local research clinics.
As has been previously described in more detail (44. Fedeli LG, Vidigal PG, Leite CM, Castilhos CD, Pimentel RA, Maniero VC, et al. [Logistics of collection and transportation of biological samples and the organization of the central laboratory in the ELSA-Brasil]. Rev Saúde Pública 2013; 47 (Suppl 2): 63–71, doi: 10.1590/S0034-8910.2013047003807.
https://doi.org/10.1590/S0034-8910.20130... ), blood collection was performed in a fasting state and then, among those without a diagnosis of diabetes, 2 h after the ingestion of a 75-g oral glucose solution.
Blood was centrifuged within 30 min of collection, with aliquots then being separated in an ice bath into cryotubes previously labeled with bar codes, and stored in freezers at -80°C until transportation to the central laboratory on dry ice. Once samples were received, analysis for glucose (fasting and 2 h), glycated hemoglobin (HbA1c), creatinine, sodium, potassium, uric acid, aspartate transaminase (AST), alanine transaminase (ALT), gamma-glutamyl transferase, total cholesterol, HDL-cholesterol, LDL-cholesterol, triglycerides, thyroid stimulating hormone (TSH), insulin (fasting and 2 h) and ultrasensitive C-reactive protein, as well as serology for Chagas disease were performed.
Urine was collected over 12 h during the night, prior to the clinic visit, locally processed and stored for shipment in aliquots as described above. Determinations were performed centrally for sodium, potassium, calcium, creatinine and albumin.
Given the large number of participants, multi-center nature and diversity of measurements, the ELSA-Brasil required effective and efficient mechanisms of quality assurance and control for laboratory determinations. As previously reported (55. Schmidt MI, Griep RH, Passos VM, Luft VC, Goulart AC, Menezes GM, et al. [Strategies and development of quality assurance and control in the ELSA-Brasil]. Rev Saúde Pública 2013; 47 (Suppl 2): 105–112, doi: 10.1590/S0034-8910.2013047003889.
https://doi.org/10.1590/S0034-8910.20130... ), the main quality assurance activities were careful selection of research instruments, centralized training and certification, pretesting and pilot studies, and preparation of procedure manuals.
For laboratory measurements, inter-assay CVs were calculated using the results of internal controls, and intra-assay CVs using data from several pilot studies on both fresh and frozen samples.
In addition, we performed test-retest analyses of specific analytes in 10% of a randomly selected sample of study participants, with determination of intra- and inter-assay variability at the central laboratory. In selected participants, a single additional blood or urine sample was collected for duplication. Extra blood samples were collected at the end of the blood drawing process. These tubes ("blind replicates"), identified only by their bar code and thus, blind for determination, were handled identically to the original tubes and sent to the central laboratory at the same time as the original tubes.
Laboratory results were transmitted to the study's data center, where the original and quality control (QC) results were matched through the bar codes to individual study participants.
Reliability was estimated through the calculation of intraclass correlation coefficient (ICC) (66. Lachin JM. The role of measurement reliability in clinical trials. Clin Trials 2004; 1: 553–566, doi: 10.1191/1740774504cn057oa.
https://doi.org/10.1191/1740774504cn057o... ), using a random effects model. A mixed model that also included a fixed order effect (QC sample minus original sample) was used for the four analytes for which the fixed effect was statistically significant at the 0.05 level. Confidence intervals were estimated by the 2.5 and 97.5 percentiles of the empirical distribution obtained with bootstrap sampling with 1000 repetitions.
We calculated the CV as the ratio of the standard deviation of measurement error to the mean of the analyzed variable.
Bland-Altman plots were also produced to graphically explore the agreement between measurements (77. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 307–310, doi: 10.1016/S0140-6736(86)90837-8.
Analyses were performed initially on all pairs. To minimize the effect of extreme outliers, we then reanalyzed the differences after removing pairs presenting a difference between measurements >3 standard deviations from the mean or for whom the pair's mean was >5 standard deviations from the overall sample mean. Analyses were performed using SAS software (USA), version 9.3 (88. Statistical Analysis System (SAS). [Computer program]. Version 9.3. Cary: SAS Institute Inc.; 2011.).
The inter- and intra-assay CVs for the analytes are reported in Table 1. This table also presents external, analytical CVs (CVa) specifications based on biological variation, for comparison. CVs obtained were generally quite small. Intra-assay CVs varied from 0.86% for HbA1C to 3.97% for urinary calcium; inter-assay CVs varied from 1.28% for sodium to 7.77% for insulin. Most CVs were lower than the CVa, with serum creatinine, serum sodium, and glycated hemoglobin being notable exceptions.
The number of blind replicate QC samples obtained varied from 72 to 94 across analytes. Fewer QC samples were available for analytes obtained 2 h after the glucose load, as participants with diabetes were excluded from this test.
Table 2 presents QC results for all blind replicate pairs for each analyte, and Table 3 shows results for pairs after the removal of outliers. The number of pairs removed varied from 0 to 4. All analytes had an ICC above 0.93 except for serum sodium (0.50) and potassium (0.73) (Table 2). The removal of pairs with outliers (Table 3) generally produced a small improvement in ICCs values, except for serum sodium, which increased from 0.50 to 0.61.
Figure 1 presents Bland-Altman plots of the pair differences against average pair values for selected analytes. Panel A shows data for glycated hemoglobin and Panel B for fasting insulin, the latter demonstrating a small systematic difference between initial analyte and the QC sample (-3.79; 95%CI: -5.83 to -1.76). Other similarly small differences (Supplementary Figures) were seen between pairs for serum creatinine (2.34; 95%CI=1.19–3.49), serum potassium (0.16; 95%CI=0.12–0.20), and TSH (0.06; 95%CI=0.01–0.11). Although both glycated hemoglobin and insulin had outliers, their removal had only a small effect on CV and ICC values.
Bland Altman plots for glycated hemoglobin. Proportion of total hemoglobin (all pairs, Panel A) and insulin in pmol/L (all pairs, Panel B). The red line indicates the mean within-pair difference found between the original and the quality control (QC) measures. The green dashed lines indicate 2 standard deviations above and below the mean, and the blue dotted line the 95%CI for the mean of the within-pair difference.
In order to visualize the pairs and the influence of outliers, Figure 2 shows results for alanine transaminase, prior to (Panel A) and after exclusion of outliers (Panel B). As can be seen in Panel A, a single extreme outlier, probably due to liver disease, makes QC analysis difficult. In fact, as can be seen when comparing results for this analyte in Tables 2 and 3, after outlier exclusions the CV decreased from 12.5 to 6.33%. The ICCs, prior to and after this exclusion were high with laboratory variability accounting for less than 2% of the overall variance.
Bland Altman plots for alanine transaminase in U/L (all pairs, Panel A; after exclusion of outliers, Panel B). The red line indicates the mean within pair difference found between the original and the quality control (QC) measures. The green dashed lines indicate 2 standard deviations above and below this mean, and the blue dotted line the 95%CI for the mean of the within pair difference.
Figure 3 presents the Bland-Altman plot for serum sodium, demonstrating the narrow biological range of this analyte.
Bland Altman plot of serum sodium in mmol/L (all pairs), an example of an analyte with a narrow biological range. The red line indicates the mean within pair difference found between the original and the quality control (QC) measures, the green dashed lines indicate 2 standard deviations above and below this mean, and the blue dotted line the 95%CI for the mean of the within pair difference.
This study demonstrates the excellent quality of the sample collection and processing, and laboratory measurement of biochemical analytes of the ELSA-Brasil study, in a sample of free-living Brazilian adults. Laboratory intra- and inter-assay variability was almost always below recommended international standards. Between-person variability accounted for more than 90% of the total variability with few exceptions, and was usually greater than 98%. For epidemiological studies involving associations between exposures and diseases, the most inclusive and relevant of the QC measures is the ICC. The ICC, also called the reliability coefficient, can be interpreted as a ratio of the between-individual variance to the total variance. The existence of noise (underlying within-person variability) as a percentage of the signal (true value of exposure, in the case the laboratory values) lowers the ICC values. Therefore, when testing epidemiological associations, the greater the noise the more difficult it is to detect real associations at a level of statistical significance. One important, and partially controllable source of this variability is the intraindividual variation, sometimes called "measurement error". The overall reliability coefficient evaluates the fraction of the population variability of a given variable that are not due to measurement errors (intra-individual variability). This measurement error could originate from biological variation or from either pre-analytical or analytical sources. Here, with samples taken from a single participant visit, the biological variation was not evaluated. Pre-analytical errors include those related to variability in collection (e.g., hemolysis), processing (e.g., delay in chilling and centrifugation leading to metabolism of glucose in the collection tube), storage (e.g., inadequate temperature maintenance) and transportation (e.g., premature thawing). Also to be considered is the possible mixing of tubes among participants during the collection processes. In this study, none of the outliers were clustered to a particular individual participant, indicating that the possibility of serious errors resulting from switching or erroneously labeling collection tubes or aliquots, or assigning laboratory results to the wrong participant, seem unlikely.
Analytical errors include laboratory determination errors (e.g., reagent variability between kits). The intra- and inter-assay CVs available from the ELSA central laboratory indicate very small error in this part of the measurement process, in general. With few exceptions, VCs of measured analytes were lower than the analytical quality specifications based on components of biological variation (99. Fraser CG, Stevenson HP, Kennedy IMG. Biological variation data are necessary prerequisites for objective autoverification of clinical laboratory data. Accred Qual Assur 2002; 7: 455–460, doi: 10.1007/s00769-002-0526-3.
https://doi.org/10.1007/s00769-002-0526-... ). Of note, some analytes, such as insulin, are typically measured with greater variability.
Thus, the bulk of the variability in reported measurements seems likely to be originated from a combination of smaller problems in analyte collection, processing and transportation, coupled with the specific analyte susceptibility to each of these problems.
The markedly lower ICCs of the analytes serum sodium and potassium highlight an interesting phenomenon. As the CVs for these analytes were quite high, the low ICCs can be attributed to their low between-person variability, as illustrated by the narrow distribution of the values in the Bland-Altman plots for serum sodium (Figure 3) and potassium (Supplementary Figures). The ranges of these analytes, vital to body function, are tightly controlled within narrow limits. Despite this, their ICCs indicated that their relatively high measurement error would make it more difficult to find statistically significant associations involving either of them in the ELSA sample.
The CV, popular in reports of measurement error, is of utility in determining the extent to which the analyte value of a given participant will vary, in relation to the analyte's mean value, from one measurement to the next - a relevant issue in the determination of the analyte's clinical utility. However, as CV evaluation doesn't take other variabilities into consideration, it has a lesser role in evaluating the quality of measurements in epidemiologic studies of associations. In this study, the blind replicate CVs were higher than the intra- and inter-assay CVs, which was caused by the inclusion of not only laboratory measurement variability, but also pre-analytical sources of error usually present in multi-center studies with centralized measurements.
The Bland-Altman plots provide a visual analysis of these QC results. One can see potentially systematic differences among replicates, such as that found for fasting insulin (Figure 2). This difference, when present, was always small, and most likely due either to the fact that the QC sample was always drawn last, or to chance, given the multiple comparisons performed. The plots show no evidence of increased or decreased variability as a function of the analyte average value, which might be present, for example, if inadvertent thawing had led to loss of analytes. The +/- 2 standard deviations boundaries were relatively close to the means, once again indicating a small measurement error.
The decision to report results with and without outliers is not always easy. For example, it seems logical to use the ICC and CV obtained for alanine transaminase after exclusion of the outlier seen in the Bland-Altman plot, given that the mean of the outlier pair was more than 30 times the mean of the remaining pairs, leading to a potentially large weighting of this single value. The outlier value was apparently obtained from a participant with liver disease and, given the large difference within the pair (>50 U/L), likely outside the range of precision of the measurement technique. For precision evaluation in an epidemiological study in which the overwhelming majority of participants have values within or near the normal range, inclusion of such outliers makes interpretation of the findings difficult. As can be seen when comparing the Bland-Altman plots before and after the outlier exclusion, it obscures the analysis of measurement error of values within the relevant range. In contrast, outliers in other analytes, both for average values and those for differences between the QC sample and its pair, were close to the exclusion boundaries. As such, these values should probably be considered measurements within the relevant range of study, and their differences should be included in our final estimates of measurement variability.
Two limitations of our analysis merit mention. First, as QC samples were collected, processed, and shipped simultaneously with their corresponding pair samples, some of the measurement variability occurring during these steps could have been underestimated. Additionally, an important aspect of variability - temporal biological variability - was not assessed, as participants were not recalled to repeat measurements on another day. Thus, for a thorough evaluation of the measurement reliability for epidemiological studies, consideration of this within-participant day-to-day variability would require incorporation of data from other studies.
In conclusion, these analyses of measurement variability are important for QC documentation in multi-center studies during which strict QC measures are necessary to guarantee accurate results. The biochemical analytes of the ELSA-Brasil here reported were measured with high reliability. Based on this, they should serve well as exposure variables and co-variables for most studies of associations, especially given the large sample size of the cohort.
Click here to view [pdf].
We acknowledge the life-long commitment of the participants from the Brazilian Longitudinal Study of Adult Health, and the dedication of the study's staff. Without their commitment and enthusiasm, the ELSA-Brasil could have never reached its current state.
The ELSA-Brasil study was supported by the Brazilian Ministry of Health (Science and Technology Department) and the Brazilian Ministry of Science, Technology and Innovation (Financiadora de Estudos e Projetos - FINEP and the National Council for Scientific and Technological Development - CNPq) (grants #01 06 0010.00 RS; #01 06 0212.00 BA; #01 06 0300.00 ES; #01 06 0278.00 MG; #01 06 0115.00 SP; #01 06 0071.00 RJ).
1Kenny D, Fraser CG, Hyltoft Petersen P, Kallner A. Consensus agreement. Scan J Clin Lab Invest 1999; 59: 585–586, doi: 10.1080/00365519950185409.
2Szklo M, Nieto F. Epidemiology, beyond the basics. 3rd edn. Boston: Jones and Barllett Publishers; 2012.
3Aquino EM, Barreto SM, Bensenor IM, Carvalho MS, Chor D, Duncan BB, et al. Brazilian Longitudinal Study of Adult Health (ELSA-Brasil): objectives and design. Am J Epidemiol 2012; 175: 315–324, doi: 10.1093/aje/kwr294.
4Fedeli LG, Vidigal PG, Leite CM, Castilhos CD, Pimentel RA, Maniero VC, et al. [Logistics of collection and transportation of biological samples and the organization of the central laboratory in the ELSA-Brasil]. Rev Saúde Pública 2013; 47 (Suppl 2): 63–71, doi: 10.1590/S0034-8910.2013047003807.
5Schmidt MI, Griep RH, Passos VM, Luft VC, Goulart AC, Menezes GM, et al. [Strategies and development of quality assurance and control in the ELSA-Brasil]. Rev Saúde Pública 2013; 47 (Suppl 2): 105–112, doi: 10.1590/S0034-8910.2013047003889.
6Lachin JM. The role of measurement reliability in clinical trials. Clin Trials 2004; 1: 553–566, doi: 10.1191/1740774504cn057oa.
7Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 307–310, doi: 10.1016/S0140-6736(86)90837-8.
8Statistical Analysis System (SAS). [Computer program]. Version 9.3. Cary: SAS Institute Inc.; 2011.
9Fraser CG, Stevenson HP, Kennedy IMG. Biological variation data are necessary prerequisites for objective autoverification of clinical laboratory data. Accred Qual Assur 2002; 7: 455–460, doi: 10.1007/s00769-002-0526-3.
Publication in this collection
2 Mar 2016
20 Apr 2016