Aspect
|
Characteristic
|
Conceptual definition and strategies
|
What to observe
|
Validity |
Dimensional validity |
This refers to the correspondence that should
exist between the instrument's internal structure and the one that was
theorized regarding the phenomenon to be evaluated. For example, if the
instrument aims to measure mental disorders and includes depression and
anxiety as its two dimensions of interest, a statistical analysis of it
should reveal such dimensions. |
Results of exploratory and confirmatory factor analyzes,
demonstrating the correspondence between the postulated structure for the
phenomenon and the loading of the instrument items on their respective
dimensions. |
Returning to the example, a factor analysis of the instrument
for common mental disorders should demonstrate that the questions regarding
anxiety are grouped in the dimension that concerns them (anxiety) and the
questions about depression are associated with their underlying factor
(depression). |
|
Construct validity |
The instrument's ability to measure what it intends to assess
when there is not another tool considered the "gold standard" for measuring
the phenomenon of interest. Construct validity can be determined by several
methods, including: |
Finding that the instrument confirms the hypothesis that one
group has the feature of interest and the other does not, is an indication
of the instrument's validity through the comparison of extreme groups. |
|
|
• Extreme groups: the instrument is applied to two groups, one
supposedly with the presence of the characteristic of interest and the other
without it. |
In the convergent validity example, it is expected
that the results from both instruments point in the same direction (that
they are positively correlated with each other). |
|
|
• Convergent validity: comparison between the
assessments obtained with the instrument of interest versus those resulting
from another scale used for measuring the same phenomenon. |
|
|
The correlation between the results of different
instruments must be zero when evaluating the discriminant validity. |
|
|
• Discriminant or divergent validity: it can be obtained by
testing the correlation between the results of an instrument and those of
another one used for measuring a different construct. |
|
Criterion-related validity |
Ability of the instrument to measure what it proposes,
whenever there are instruments considered as the "gold standard". The
verification of this validity involves the application of two instruments,
the one intended to be used and another considered as reference, and also by
the observation of their correlation. Criterion validity is typically
divided into two subtypes: |
In both cases the correlation between the instrument of
interest and the "gold standard" one support the validity argument for the
former. |
|
|
• Concurrent or simultaneous validity: tests the correlation
of the instrument of interest with a "gold standard" after applying both
simultaneously. |
|
|
|
• Predictive validity: determined by the ability of the
instrument to predict a future event, which will be based on the subsequent
application of the reference instrument. |
|
Reliability |
Internal consistency |
As an illustration, if we wish to measure the functional
capacity of individuals and we have several items (questions) to measure it,
they should have a high correlation among themselves. The measures used to
assess internal consistency are the Cronbach's alpha coefficient and the
Kuder-Richardson coefficient, among others. In all cases, it is possible to
estimate the internal consistency with a single application of the
instrument to the sample under evaluation. |
The minimum acceptable value for these coefficients is
0.8. |
|
Temporal stability |
Stability may be assessed in different ways, including: |
The minimum acceptable value for these
coefficients is 0.5. |
• The degree of agreement between different observers, using
the same instrument (inter-observer reliability). |
|
|
• The consistency of the observations made by the same
examiner at different moments in time (intra-observer reliability or
test-retest). |