Introduction

Comparison is a fundamental part of science and the social sciences. The field of comparative education was initially based on more qualitative cultural and historical approaches to making comparisons (^{Edwards; Holmes; Van de Graff, 1973}; ^{Schriewer; Holmes, 1992}). I would argue this was generally true across many of the social sciences and applied fields like education until the 1950s and 1960s when statistical methods began to be widely applied. A turning point in comparative education was the publication of Harold ^{Noah and Max Eckstein’s 1969} book, Toward a Science of Comparative Education. They argued that the cultural and historical methods of comparison that had been used in comparative education were generally not sufficiently scientific or precise and that was needed was the widespread application of the quantitative methods that were being used in social sciences like economics and sociology. While today, there has been a resurgence of interest in qualitative methods, even in comparative education (^{Bray; Adamson; Mason, 2007}), quantitative methods still dominate, especially in the policy arena.

The focus of this paper is regression analysis. Regression analysis forms the core for a family of techniques including path analysis, structural equation modelling, hierarchical linear modelling, and others. Regression analysis makes all comparisons straightforward by turning all categories into variables (countries, regions, races, genders, classes, programs, policies etc.) whose impact is measured in regression results. Regression analysis is perhaps the most-used quantitative method in the social sciences, most especially in economics and sociology but it has made inroads even in fields like anthropology and history. Regression analysis in education research (along with experiments) is still seen as the most objective and scientific approach. Regression analysis forms the principal basis for determining the impact of education and other social policies and, as such, has enormous influence on almost all public policy decisions.

This paper raises fundamental questions about the utility of regression analysis for causal inference. I argue that the conditions necessary for regression analysis to yield valid causal inferences are so far from ever being met or approximated that such inferences are never valid. This dismal conclusion follows clearly from examining these conditions in the context of three widely-studied examples of applied regression analysis: earnings functions, education production functions, and aggregate production functions. Since, within comparative education, my field of specialization is the economics of education, I approach each of these examples from that perspective. Nonetheless, I argue that my conclusions are not particular to looking at the impact of education or to these three examples, but that the underlying problems exhibited therein generally hold to be true in making causal inferences from regression analyses about other variables and on other topics.

Overall Argument

In some fields, regression analysis is used as an ad hoc empirical exercise for moving beyond simple correlations. Researchers are often interested in the impact of a particular independent variable on a particular dependent variable and use regression analysis as a way of controlling for a few covariates. Despite being common, in many fields such empirical fishing expeditions are frowned upon because the result of particular interest (the coefficient on the key independent variable under examination) will depend on which covariates are selected as controls.

To the contrary, nowadays, most fields teach that one has to be serious about causal modeling in order to use regression analysis for causal inference. Causal models require certain conditions to hold for regression coefficients to be accurate and unbiased estimates of causal impact. While these conditions are often expressed as properties of regression residuals, they also may be expressed as three necessary conditions for the proper specification of a causal model examining a particular (or set of) dependent variable(s):

All relevant variables are included in the model;

All variables are measured properly; and

The correct functional interrelationships of the variables is specified.

In order to achieve proper specification, one must have a very well elaborated theory that allows one to fulfill these conditions^{2}. The fundamental problem with regression analyses is that we do not have sufficiently complete theories in any of our fields to properly specify causal models. Regression analysis application literatures therefore generally become discussions about the degree of misspecification and its consequences. Unfortunately, regression analysis theory is very unforgiving; with just one omitted variable, all regression coefficients may be biased to an unknown extent and in an unknown direction. While researchers sometimes use ad hoc reasoning to infer the direction of bias of particular omitted variables, they do so based on its potential correlation with a particular included independent variable of interest. However, this ad hoc reasoning is not valid. The direction of bias will depend on the intercorrelation of the omitted variable with all the included variables. Ad hoc reasoning does not offer a clue as to how biased included coefficients are.

More to the point, we are never talking about the simple case of a single omitted variable. We are faced with multiple failures of all three assumptions: many variables are always omitted, we have little idea of how to best measure the variables we are able to include; and we have hardly any idea of their functional form. This is best illustrated by looking at concrete examples of regression analyses literatures, as I do below.

Earnings Functions

Earnings functions are used principally by economists and sociologists to investigate the determinants of earnings differences. It is probably one of the most-regressed topics of study and has been especially relevant to the economics of education as the source of rate of return to education estimates (^{Blaug, 1976}; ^{Psacharopoulos; Patrinos, 2004}). I find earnings functions especially interesting because it is one of the few terrains where social scientists on the left and the right have competed, principally because of arguments about labor market segmentation. In economics, this was about a challenge to the neoclassical idea that there was just one big perfect labor market in which success was determined by your individual human capital characteristics. To the contrary, political economists and other critics of the neoclassical story saw an imperfect labor market with fractures (e.g., divisions into primary and secondary labor markets) and structures (e.g., large firms, unions, sexism, and racism) that greatly influenced whether an individual succeeded. In sociology, this was about a similar challenge to the idea generated from the dominant structural-functionalist theory and its derivative status attainment theory that, like economics, argued that individual success was determined chiefly by individual characteristics. To the contrary, critical sociologists, often sharing a conflict theory critique of structural functionalism, argued, like political economists, that success in the labor market was greatly determined by structural factors. Each side in this debate used regression analysis to prove their point of view (^{Klees; Milton, 1993}).

More to the point here, is that there have literally been hundreds of earnings functions studies with each study using anywhere from somewhat to vastly different specifications. The three principal conditions necessary for the regression coefficients of an earnings function to be accurate estimators of true causal impact are very far from being fulfilled. First, all relevant variables that may affect earnings can never be included. Our theories literally posit dozens of variables, and which variables are included in a particular regression study is again idiosyncratic. Examples of variables that some researchers have considered relevant are: health status, years of schooling, quality of schooling, type of schooling, cognitive ability, race, ethnicity, religion, socioeconomic status, gender, immigration status, marital status, participation in a union, job search, occupation status and differentiation, labor market segment, firm and industry characteristics, and many more. Second, we do not know the right way to measure most of these variables. Measurement is ad hoc and varies from study to study. Third, the functional interrelationship between variables is not known. While it is common to use the natural logarithm of income as the dependent variable, even neoclassical economists admit the basis for doing so is very weak and, in actuality, what would be needed is to specify some unknown set of complex simultaneous equations filled with variables subject to complex interactions (^{Blaug, 1976}; ^{Klees; Milton, 1993}).

The result of this state of affairs is endless misspecification - by necessity^{3}. Each researcher has an almost infinite array of choices in how they specify the earnings function they estimate. Each regression study is never a replication but always different from others in many respects. The upshot is each regression study is idiosyncratic. Since it is relatively easy to get significant coefficients, especially with large data sets, everyone finds their particular variable of interest to be significant. When there is controversy, everyone finds empirical evidence to support their side of the debate. Every segmentation theorist finds labor market segments to be a significant factor in determining earnings and other labor market outcomes, yet no neoclassical economist or structural functionalist sociologist ever does.

With respect to education, most everyone finds some effect of education on earnings, reports it, and sometimes uses it to estimate a rate of return. But alternative specifications always yield different results, and so the estimates are notoriously unstable and inconsistent. ^{Hanushek (1980}, p. 240) argued that

[…] estimated rates of return for years of schooling particularly in regression estimates [on earnings], considering other individual differences appear very unstable: changes in sample, changes in time periods, changes in precise model specifications yield enormous changes in estimated rates of return.

The estimated impacts of education on earnings and associated rates of return are basically arbitrary, the result of ad hoc empiricism run rampant.

Education Production Functions

Another very common use of regression analysis is to estimate what are called education production or input-output functions (^{Levin, 1976}; ^{Hanushek, 1986}). The dependent variable usually studied is a student’s score on some achievement test. The three conditions for proper specification are again impossible to fulfill. First, the array of potential independent variables is huge, including, for example: socioeconomic status, gender, race, ethnicity, age, homework effort, computer use in the home, previous learning, ability, motivation, aspiration, peer characteristics, teacher degree level, teacher practices, teacher ability, teacher experience, class size, school climate, principal characteristics, curriculum policies, to name a few. Second, there is no agreement on how to measure most, if not all, of these variables. Third, again the possible functional interrelationships are innumerable. Contrary to the linear formulation usually run, recursive and simultaneous equation formulations with an array of interaction terms among the independent variables have been posited but little used (Levin, 1976; Hanushek, 1986).

Economists of education, sociologists of education, and other educational researchers have estimated hundreds of these functions. Again, with such an infinite array of specification choices, almost every study is unique and idiosyncratic. ^{Hanushek (1979}; 1986; 2004) has, over the long term, studied and summarized the results of such studies. Not surprisingly, he and others have found inconsistent results. However, he and the vast majority of quantitative researchers cling to the hope that improvements in models and data can eventually show some clear results. To the contrary, I see the complete indeterminacy of this form of research built into the very assumptions on which it is based.

A particularly destructive use of these functions is for so-called performance pay for teachers. The value-added to student achievement test score by individual teachers are ascertained through estimating an educational production function, usually using only a few control variables, with teacher effects determined by dummy variables or residuals (American Educational Research Association, 2016). The problem, of course, is that with different control variables different teachers are ranked high or low and there is neither rhyme nor reason to choosing one specification over another. Yet around the U.S. teachers are being hired and fired based on these completely spurious results.

Aggregate Production Functions

While many economics of education studies have either looked at the impact of education inputs on student achievement and others have focused on the connection between education and earnings, as a proxy for productivity, some studies have tried to look more directly at the connection of education and productivity by looking at the effect of education on economic growth, as measured by GNP. Indeed, some of the earliest work on human capital examined the correlation between levels of education or school enrollments in a country and its GNP (^{Bowman, 1966}; ^{Blaug, 1970}). However, correlation is not causation, and these studies were quickly dismissed as neither controlling for other differences between countries nor demonstrating which was cause and which was effect (Blaug, 1970).

The most significant early, and still widely quoted, work that tried to take a more sophisticated look at the connection between education and GNP was by Edward ^{Denison (1961}, 1967). Denison focused on a particular form of what economists call an “[…] aggregate production function” (Denison, 1961; 1967). Like an earnings function tries to look at all the variables that might affect earnings, production functions look more directly at all the variables that might affect production output in a particular industry. An aggregate production function, as the name implies, looks at the effect of inputs on total production output, that is, GNP. This approach, in theory, could get around the need to assume earnings reflect productivity by directly looking at the impact of education on output. However, Denison’s famous work did not do this. Instead of estimating an aggregate production function, it assumed one of a particular form and then used education’s association with earnings as the evidence of education’s impact on GNP, thus offering nothing different than the results offered by the problematic education-earnings connection discussed above. ^{Blaug (1970}) dismissed all this early research: “In short, we learn from international comparisons [of education and GNP]…that we do not learn from international comparisons” (Blaug, 1970, p. 100).

Attempting to connect education directly to GNP generally fell out of favor until the late 1980s and 1990s when a few works in the area of what was called new growth theory signaled a broader vision of education’s contribution (^{Romer, 1986}; ^{Lucas, 1996}; ^{Psacharopoulos; Patrinos, 2004})^{4}. This vision is theoretically interesting in that education is seen not just as contributing to worker productivity but as enhancing growth through a variety of mechanisms and externalities. However, empirically these new directions have proven extremely difficult to model mathematically. Almost every researcher who attempts to estimate these connections therefore uses a different model and the results are, as one would expect, typically idiosyncratic, unstable, and inconsistent (Psacharopoulos; Patrinos, 2004; ^{Stevens; Weale, 2004}). In 1970, ^{Blaug said the “Mecca of the economics of education lies elsewhere” (Blaug, 1970}, p. 100), and I think that holds true today, for reasons similar to the ones I discussed for education and earnings and for educational inputs on outputs.

As I said, the results of the empirical research estimating the impacts above have been idiosyncratic, unstable, and inconsistent. The same is true for the impact of education on GNP for similar reasons. First, there is no agreement on how to measure the stock or flow of human capital in a country. Various proxies have been used but, as ^{Psacharopoulos and Patrinos (2004}) admit, such measurement may be the weakest point of these studies: “Such data have serious intertemporal and inter-country comparability problems, and there are data gaps often filled with constructed data based on interpolations and extrapolations” (Psacharopoulos; Patrinos, 2004, p. 13-14).

Second, more to my general point, as ^{Psacharopoulos and Patrinos (2004}, p. 15) also admit: “Countries also differ in many other aspects than those measured by physical and human capital stock…” that can affect GNP. Estimates of aggregate production functions have literally used dozens of different variables as inputs, such as climate, latitude, access to waterways, transportation infrastructure, technological development, investment climate, cultural and political differences, fiscal and monetary policy etc. (^{Stevens; Weale, 2004}; ^{Hulten; Issakson, 2007}; Hulten, 2009)^{5}. Empirical studies idiosyncratically choose some of these input variables, from those available in the data set being used, and always omit many others. As Psacharopoulos and Patrinos (2004, p. 15) again admit: “These omitted variables can lead to margins of error of hundreds of per cent in accounting for differences in the economic growth path between countries”.

Third, it is widely recognized by economists that the linear functional form so commonly used in regression analyses studies is not applicable to aggregate production functions. However, there is considerable debate over what functional form to use and different functional forms yield different estimates of the impact of education (and of all other inputs) on GNP (^{Stevens; Weale, 2004}). There is even a respected school of economics that says that there is no theoretical basis for even believing that an aggregate production function actually exists. Each good and service may have a production function, meaning some mathematical regularity in how inputs like land, labor, capital, and technology combine to produce televisions, yachts, insurance policies, hamburgers, etc. However, since there is no physical process by which aggregate GNP is produced, nor, from this perspective, is there some way to aggregate and measure physical capital, trying to specify an aggregate production function is seen as nonsensical (^{Cohen; Harcourt, 2003}). ^{Guerrien and Gun (2015}, p. 100) note that Paul Samuelson, Nobel laureate in economics, pointed out that aggregate production functions wrongly offer “[…] a statistical test of an accounting identity (which is by definition always true)”. They argue for the need “[…] to convince everyone to definitively abandon the aggregated [sic] production functions, both in theory and practice” (Guerrien; Gun, 2015, p. 99) (also see ^{Felipe; McCombie, 2013}).

Given these fundamental problems with fulfilling the conditions for regression analysis to yield accurate estimates of causal impact (discussed earlier), it is no wonder that consistent results of the impact of education on GNP are not found. Reviews of this literature report a bewildering array of idiosyncratic methodological choices resulting in a bewildering array of different results (^{Stevens; Weale, 2004}). ^{Psacharopoulos and Patrinos (2004}, p. 15) quote Temple and Voth (1998, p. 1359): “[A]ttempting to impose the framework of an aggregate production function is almost certainly the wrong approach for many developing countries”. I would say that this is the wrong approach for any country^{6}.

It should be noted that almost all these studies only offer some measure of the quantity of education, not its quality. In a widely quoted recent study, ^{Hanushek and Woessmann (2008}) try to remedy this by adding country average PISA test scores as a proxy for the quality of education in a country, concluding that a one standard deviation difference in test scores yields a 2 percentage point higher growth rate of GNP/capita. In the light of the foregoing problems, I find this claim completely unreasonable and its uncritical reception due to ignorance of the fundamental problems with human capital theory and empirics discussed in this paper (also see ^{Klees, 2016}). Hanushek and Woessmann’s measures of the quantity and quality of education, choice of other inputs to control for, and choice of functional form are all idiosyncratic^{7}. They are only one of literally thousands of reasonable alternative specifications of an aggregate production function. Different specifications will yield different results^{8}.

Discussion

While I have approached the examples above as an economist most interested in the impact of education, the problems are identical in looking at the impact of any of the other myriad independent variables in these equations. Moreover, as far as I can see, the impossibility of proper specification is true generally in regression analyses across the social sciences, whether we are looking at the factors affecting occupational status, voting behavior, etc. The problem is that as implied by the three conditions for regression analyses to yield accurate, unbiased estimates, you need to investigate a phenomenon that has underlying mathematical regularities - and, moreover, you need to know what they are. Neither seems true. I have no reason to believe that the way in which multiple factors affect earnings, student achievement, and GNP have some underlying mathematical regularity across individuals or countries. More likely, each individual or country has a different function, and one that changes over time. Even if there was some constancy, the processes are so complex that we have no idea of what the function looks like.

Researchers recognize that they do not know the true function and seem to treat, usually implicitly, their results as a good-enough approximation. But there is no basis for the belief that the results of what is run in practice is anything close to the underlying phenomenon, even if there is an underlying phenomenon. This just seem to be wishful thinking. Most regression analysis research doesn’t even pay lip service to theoretical regularities. But you can’t just regress anything you want and expect the results to approximate reality. And even when researchers take somewhat seriously the need to have an underlying theoretical framework - as they have, at least to some extent, in the examples of studies of earnings, educational achievement, and GNP that I have used to illustrate my argument - they are so far from the conditions necessary for proper specification that one can have no confidence in the validity of the results.

Moreover, what researchers do in practice invalidates their results even further. In theory, when using regression analysis, you are supposed to start with a complete model specification, and then take your data and estimate it, a one-shot deal. Given the indeterminacy of model specification, no one does that in practice. In his now classic article, Let’s Take the Con Out of Econometrics, ^{Leamer (1983}, p. 36) describes regression analysis in the real world and its consequences:

The econometric art as it is practiced at the computer … involves fitting many, perhaps thousands, of statistical models….This searching for a model is often well-intentioned, but there can be no doubt that such a specification search invalidates the traditional theories of inference. The concepts of unbiasedness, consistency, efficiency, maximum likelihood estimation, in fact, all the concepts of traditional theory utterly lose their meaning by the time an applied researcher pulls from the bramble of computer output the one thorn of a model he likes best, the one he chooses to portray as a rose.

The practical question to me then becomes whether we have learned anything from all this research? Most quantitative researchers would say they have, but I believe that such learning, if examined, would turn out to be from a subset of studies done from a perspective with which the researcher agreed. As ^{Leamer (1983}, p. 37) put it: “Hardly anyone takes data analyses seriously. Or, perhaps more accurately, hardly anyone takes anyone else’s data analyses seriously” (also see Leamer, 2010). Hardly anyone ever uses anyone else’s specification without improving on it, arguing explicitly or implicitly that the previous study was incorrect.

These remarks do not imply that, at least within paradigms, there is no cumulative learning from one another’s arguments. Such learning does take place. However, the argument here suggests that regression-based causal inference is simply an excuse for theorizing but does not provide any valid evidence for it. There’s an old saw in economics: If you torture the data long enough, nature will confess. In reality, nature never confesses. Studies from the three examples I have chosen have commanded the attention of educators and policymakers for over 50 years, yet, in reality, I believe that this approach has no validity, providing no reliable, or even approximate, information to help a sensible allocation of societal resources.

Econometricians and other regression analysts do recognize that there are many sources for biases of regression coefficients. They spend a lot of time on ways to correct for things like sample selection bias and measurement error - without much success unless you are willing to make some heroic assumptions. But these problems are minor compared to rampant misspecification. Regression analysts have tried to deal with one misspecification problem - that of omitted variables - through the use of instrumental variables (IVs). But this generally requires accurate measurement of included variables and correct specification of functional form, none of which is ever true. Instrumental variable techniques give different results depending on the IV chosen, as well as have other problems (^{Heckman; Urzua, 2009}; ^{Leamer, 2010}). Again, these and other techniques (regression discontinuity, differences-in-differences) require heroic assumptions to deal with any aspect of misspecification (^{Angrist; Pischke, 2009})^{9}.

I believe that, unfortunately, regression analysis methodology is a dead end, no better than alchemy and phrenology, and someday people will look back in wonder at how so many intelligent people could convince themselves otherwise. This is not a problem that better modeling, techniques, and data can fix^{10}.

Alternatives

I do not see the essence of the problem as quantification. Nor do I think it futile to try to look for causes and consequences of our practices and policies. Quantifying social phenomena clearly has its limits and, at best, yields approximations (^{Samoff, 1991}). But cross-tabulations and correlations are useful to suggest interrelationships. As is well-known, however, any associations found may be spurious or have a myriad of alternative explanations. For example, crosstabs may reveal that, on average, women earn $.75 compared to $1.00 earned by men. We can unpack this some by looking at women and men working full-time, where perhaps the data show a comparison of $.80 to the $1.00. We can further look at college-educated women working full-time compared to men in similar circumstances, perhaps giving us a comparison of $.90 to the $1.00. Crosstabs can give even finer comparisons. These comparisons, despite limitations, offer real, descriptive, face valid data. Unfortunately, social sciences’ hope that we can control simultaneously for a range of factors like education, labor force attachment, discrimination, and others is simply more wishful thinking.

The problem is that the causal relations underlying such associations are so complex and so irregular that the mechanical process of regression analysis has no hope of unpacking them. One hope for quantitative researchers who recognize the problems I have discussed is the use of experimentation - with the preferred terminology these days being randomized controlled trials (RCTs). RCTs supposedly get around the issues faced by regression analysis through the use of careful physical, experimental controls instead of statistical ones. The idea is that doing so will let one look at the effect of an individual factor, such as whether a student attended a particular reading program. In order to do this, one randomly assigns students to an experimental group and control group, which, in theory, will allow for firm attribution of cause and effect. Having done this, one hopes that the difference in achievement between the groups is a result of being in the reading program. Unfortunately, it may or may not be. You still have the problem that the social and pedagogical processes are so complex, with so many aspects for which to account, that, along some relevant dimensions, the control and experimental group will not be similar. That is, if you look closely at all potentially relevant factors, control groups almost always turn out systematically different from the experimental group, and the result is we no longer have the ability to make clear inferences. Instead, we need to use some form of statistical analysis to control for differences between the two groups. However, the application of statistical controls becomes an ad hoc exercise, even worse than the causal modeling regression approach. In the latter, at least there is a pretense of developing a complete model of potentially intervening variables whereas with the former a few covariates are selected rather arbitrarily as controls. In the end, one does not know whether to attribute achievement differences to the reading program or other factors (^{Leamer, 2010}).

If we are interested in looking at quantitative data, I am afraid we are mostly stuck with arguing from cross-tabulations and correlations. This is a dismal prospect for most quantitative researchers who have spent years becoming virtuosos at data analysis and see the implications of my argument as essentially abandoning the research enterprise. Fortunately, for many of us, the research enterprise is alive and well, with a myriad of more qualitative alternative methodologies with which to investigate our educational and social world.

When I went to graduate school, introduction to research methods courses often focused on regression analysis or on Campbell and Stanley’s (1963) examination of the design and analysis of quantitative experimental and quasi-experimental studies. This is still true today within certain fields and university departments. However, the past 30 years has seen a blossoming of alternative approaches to research methods, especially in education, but in other fields as well. Education has been at the forefront of such changes, largely, in my view, because many of the changes were generated within the field of program evaluation which grew, in large part, from the educational evaluations that were mandated by the U.S. Congress in the 1960s and 1970s. Many of those involved in evaluation fieldwork simply found that the quantitative approach to research and evaluation could not capture the experience of the programs they were studying and drew upon other traditions, such as in sociology or anthropology, or invented new approaches. In subsequent years, these forays yielded a wide array of alternative methods for research and evaluation (^{Mertens, 2015}).

For a number of years, I have been fortunate to teach our department’s Introduction to Research Methods course. While any grouping of methods is somewhat arbitrary and their labeling always problematic, the course is divided in three, focusing in turn on quantitative/positivist methods, qualitative/interpretive methods, and critical/transformative methods. Comparisons are as essential to the latter two paradigms as they are to the quantitative methods. There is a large literature on the qualitative/quantitative debate. Some argue too much has been made of it, while others, whom I agree with, argue that there are fundamental theoretical differences in outlook that need to be considered (^{Smith; Hesushius, 1986}; ^{Mertens, 2015}). Regardless, it is clear that there are lots of qualitative alternatives to quantitative experimental and regression analysis approaches, including: case study, ethnography, grounded theory, phenomenology, narrative, and oral history, to name a few.

Additional methodological alternatives are offered by critical/transformative perspectives which come out of the array of theories in the social sciences and applied fields such as radical political economy, critical sociology, feminisms, queer theory, and others focused on issues of marginalization (^{Klees, 2008}). These perspectives generally criticize the fundamental lack of objectivity of positivist/quantitative research and qualitative/interpretive research, arguing that there is no neutral research, and that too often such studies are done in support of dominant interests. Critical/transformative research takes an explicit position to work in the interests of marginalized people. This includes research under the labels of participatory, action, feminist, indigenous, critical, critical ethnography, and critical race (^{Denzin; Lincoln; Smith, 2007}; Smith, 2012; ^{Mertens, 2015})^{11}.

Proponents of quantitative research recognize that some of these alternative methods exist but usually, at best, relegate them to the realm of generating ideas, not to the scientific process of building knowledge of the social world. To the contrary, many proponents of alternative methods argue that they are as or more valid, reliable, and generalizable than quantitative^{12}. For example, ^{Miles and Huberman (1994}, p. 434) go so far as to argue;

Qualitative studies…are especially well-suited to finding causal relations; they look directly and longitudinally at the local processes underlying a temporal series of events and states, showing how these led to specific outcomes, and ruling out rival hypotheses. In effect we get inside the black box; we can understand not just that a particular thing happened, but how and why it happened.

Similarly, strong arguments are made for the transferability and generalizability of qualitative and critical research (^{Donmoyer, 1990}; ^{Mertens, 2015}).

Conclusions

For a number of years, a sociologist of education colleague and I taught a regression lab course. We used a good national data set and had the students spend the semester running alternative specifications of education production functions. Each class, groups came in and explained their specifications and their results. As expected, different specifications of included variables, decisions on how to measure variables, and functional forms yielded substantially different results. Each group was asked to explain their results as if they had written them up for a journal article. My colleague used to always comment on a group’s explanation of their results with: That makes sense. And it always did. As do the articles in the literature I reviewed above. We can always make sense of our results and always do. When running regressions, we stop making the many adjustments to our regressions - that always must be made - when we get results that make sense to us. However, taking these literatures as a whole, they simply result in divergent findings, all based on reasonable - at least to some - alternative specifications of their regression equation models.

In conclusion, I wish to say I’d like to be wrong about my argument in this paper. It would be useful if the emperor’s not-so-new clothes were more than the nakedness researchers seem to avoid looking at too closely. Unfortunately, theory and practice seem to strongly indicate otherwise. The theoretical conditions for regression analysis to work are never close to being met. And, in practice, regression analysis applications seem to result in interminable debates because specifications are so loose that researchers seem to be able to use this family of techniques to prove almost anything they want. Nonetheless, while we cannot find the simple cause-effect regularities that regression analysts would like to uncover, at the very least there are still many alternatives methods for investigating and making comparisons in our educational and social world.