Using Deep Learning Language Models as Scaffolding Tools in Interpretive Research

ABSTRACT Objective: the paper introduces a framework for conducting interpretive research using deep learning algorithms that blur the boundaries between qualitative and quantitative approaches. The work evidences how research might benefit from an integrated approach that uses computational tools to overcome traditional limitations. Proposal: the increased availability and diversity of data raises the utility of algorithms as research tools for social scientists. Furthermore, tuning and using such computational artifacts may benefit from interpretive procedures. Such circumstances turn the traditional debate between quantitative and qualitative research on its head: the research strategy that likely yields the most assertiveness and rigor is the one that may require vigorous hermeneutic effort. Along these lines, neural word embeddings can be instrumental in allowing researchers to read the data closely before and after interpretation. Conclusions: to take advantage of the opportunity generated by these new algorithms, researchers may broaden their previous conceptions and adopt a participative point of view. In the coming decades, the interweaving of computational and interpretive methods has the potential to integrate rigorous social science research.


INTRODUCTION INTRODUCTION
have recently published in Nature a paramount article showing the decline of innovation and disruption in science.The authors analyzed more than 45 million papers, noticing a substantive change in science and technology that reinforces concerns about slowing innovative activity.They attribute this trend partly to scientists' reliance on a narrower set of existing knowledge, with papers increasingly less likely to break with the past in ways that push science in new directions.This situation aligns with a period of normal science, accordingly to the definition proposed by Kuhn (Matthews, 2022), that we may be experiencing since researchers' beliefs and practices have become more pragmatic (Morgan, 2007).Given the greater complexity of the present-day phenomena and the structuring of academic careers, among other factors, the article mentioned above seems to herald a period of crisis in science, evidenced by stagnation.In this sense, it also highlights the urgency of new proposals for reformulating and adapting principles and methods.
Focusing on business and management research, we believe that scholars may face even more challenging situations in the future.Indeed, society's transformations demand changes in how managers run firms.To mention just one of the many aspects, such a requirement arises partly from stakeholders increasingly adopting a position in which businesses must contribute to contemporary issues (Friede et al., 2015;Scheyvens et al., 2016).However, the decision-making (epistemology) that can lead to this contribution depends on a growing and increasingly unstructured volume of data (El-Kassar & Singh, 2019).Therefore, management processes and toolboxes may need to change for organizations to bring such a desire to fruition (Sinkovics et al., 2021).
The observed lack of innovation in academic production combined with ongoing changes in its objects of study (businesses) represents one of the most critical issues researchers must address.This situation resembles the internet's early days when a portion of the interaction among consumers migrated to virtual spaces for the first time.As a result, a firm's efforts to gain marketing insights suddenly required analyzing data produced by individuals on digital platforms (Godes & Mayzlin, 2004).The same circumstances affected management researchers.Their ability to explain consumer behavior also started to require collecting and analyzing vast amounts of digital data.Traditional interpretive research methods, such as ethnography, were expanded and adapted to cope with these novel aspects (Kozinets, 2002).
This transformation has accelerated in the last few years.The amount and variety of marketing-related digital data have significantly increased (Balducci & Marinova, 2018).Thus, it does not sound credible to claim that the explosion of new data over the last two decades has not limited management researchers' ability to observe and interpret relevant marketing information using solely old approaches (Berger et al., 2022;Erevelles et al., 2016).In addition, research methods previously adjusted to deal with virtual contexts must be updated again.This time, it may not be possible to merely incorporate the observation of new data types while maintaining a position in which human reasoning is the only thing to rely on (Kozinets et al., 2018).Indeed, it may be the case that the fundamental notion of observing a business-related process has to change, and such a reconceptualization may require the adoption of a more participative epistemology (Campagnolo, 2021).
Aligned with the necessity of reflecting on new methodologies, we suggest a potential disruption with traditional methods, blurring the boundaries between qualitative and quantitative approaches.We advocate that research (and researchers) might benefit from an integrated approach that uses computational tools to overcome traditional limitations.In this sense, we aim to contribute to the contemporary debate on research methods by anticipating disruptions, detaching from the present, and trying to understand the contemporary as the result of a historical process (Bispo, 2022).
We structure this paper as follows: in the next section, we outline the current status of the quantitative-qualitative debate.Next, we discuss deep learning language models as reading aids in qualitative research.Later, we propose a framework to integrate algorithmic artifacts in interpretative research and briefly discuss how those elements interact in such a situation.Then we present an actual research case using a language model to scaffold an interpretative research "In interpretive research, BERT could potentially be used to analyze and interpret large amounts of text data, such as qualitative interview transcripts or social media posts.It could be used to identify patterns and trends in the data that might not be immediately apparent to a human researcher, or to generate summaries or extract key themes from the text data ..." Answer given by the chatGPT language model to the question: "Do you believe it is a good idea to use BERT to facilitate interpretive research?"(December 23, 2022) Fuzzies strike back: The quantitativequalitative debate in an era of data abundance Reviving the debate over quantitative versus qualitative research near the end of the first quarter of the twenty-first century risks instilling readers with a sense of boredom.Indeed, social scientists and philosophers have discussed this subject with less or more energy for several decades (Bryman, 1984).As a result, the relationship between these two ways of researching social phenomena has changed over time, going through periods of conflict and cooperation (Smith & Heshusius, 1986).A recent empirical article demonstrates that commonplace beliefs in the existence of entirely distinct methodological cultures, quantitative and qualitative, between which communication is supposed to be difficult (Goertz & Mahoney, 2012), are somewhat incorrect (Kuehn & Rohlfing, 2022).
Rapid social change rejuvenates the game field in which the quantitative-qualitative debate occurs.Social science researchers must all contend with an explosion of new data in terms of volume and heterogeneity as new digital sources provide "rich detail about the evolution of social relationships across large populations as they unfold" (Edelmann et al., 2020, p. 2).Indeed, the unmeasurable was turned into measurable by "the technological revolution in mobile, web, and internet communications" (Watts, 2011, p. 266).Such change "has the potential to revolutionize our understanding of ourselves and how we interact" (Watts, 2011, p. 266), amongst other elements.Thus, we argue that the societal transformation described above fuels discussions and redefinitions about rigor in social science research (Sells et al., 1995) for all practical matters.Closed-ended surveys, an example of a popular tool in quantitative-deductive inquiry, allow for progressively limited investigations if used alone.Indeed, scholarship work may become more inductive (Grimmer et al., 2021) and unobtrusive in comparison to asking individuals to answer a set of predefined questions in a form due to the data's permanent state of flow and researchers' interest in investigating each phenomenon with the maximum possible degree of realism (Chang et al., 2014).In addition, data abundance and complexity make it less practical or desirable to organize information in a conventional quantitative manner, with variables in columns and observations in rows (Lazer et al., 2020).In sum, the increasing intricacy of social phenomena and the abundance of "unfamiliar variables and data formats" (Brandt & Timmermans, 2021, p. 191) may make it less practical the utilization of hypotheses-testing approaches.
That being the case, machine learning algorithms (Borch, 2021) may be needed to identify hidden concepts in phenomena-related massive heterogeneous datasets (Lazer et al., 2021).As a result, it seems plausible to conjecture that a portion of the rigor of future social science research may derive from the appropriate selection and usage of AIrelated computational artifacts to scaffold both quantitative and qualitative work.
What matters most for the argument we aim to advance is that the transformations listed above may represent an opportunity for interpretive researchers, individuals commonly associated with the term fuzzy, as Hartley (2017) proposed.Similar to what happened with the designers of self-driving vehicles that had to ask for the help of anthropologists to "cope with the social and contextual complexity of mixed-traffic interaction" (Rothmüller et al., 2018, p. 482), the use of new algorithms in scholarship research on social sciences may create a larger space for qualitative reasoning.Indeed, when computational methods that can "provide much better predictive performance" than statistical inference methods (Grimmer et al., 2021, p. 398) are used to analyze large heterogeneous datasets, social science scholars face a dilemma.The mapping from data to the predictions made by such algorithms is not directly given as in a regression, where the estimated coefficients indicate the effects caused by the independent variables (Gentzkow et al., 2019).Thus, researchers' ability to explain social phenomena by leveraging deep learning models, for example, may strongly depend on a theory-based interpretation of results produced by brute-force computation (Grimmer et al., 2022).
These transformations seem to be aligned with Campagnolo's (2021) vision of participative epistemology (Heron & Reason, 1997) as "an empiricist notion of knowledge production whereby all those involved in the research endeavor (i.e., theories, methods, epistemic objects, and subjects) are seen as both co-researchers, whose agency contributes to generating ideas, designing and managing the project, and drawing conclusions; and co-subjects, participating in the activity that is being researched" (Campagnolo, 2021, p. 4).We claim that the increasing amount and complexity of data used in social sciences and the power of machine learning algorithms to uncover hidden relationships may end up generating a new way of producing knowledge in social sciences.In this sense, we follow Campagnolo's call for transcending the current division between paradigms and research approaches (Campagnolo, 2021).
Substantive research increasingly benefits from an epistemology in which the engagement of researchers before and after utilizing machine learning algorithms becomes necessary to produce helpful descriptions and explanations.According to Grimmer et al. (2021), the "current abundance of data allows us to break free from the deductive mindset that was previously necessitated by data scarcity" (p.396).As a result, it seems likely that scholars and institutions inevitably treat the quantitative-qualitative debate as an "internal and obsolete distinction" (Campagnolo, 2021).
The table summarizes some elements that we develop throughout this article.The first is that the proposed data analysis is neither sequential nor concurrent; it is rhizomatic.That is, it is not hierarchical and has no fixed order.Each step along the research path can lead to any other portion of the same course.
The second concerns ontology, which is at the same time subjective (reality is constructed), objective (reality is what the data show), and virtual (reality is what emerges from the execution of algorithms).The idea of virtual comes from the work of Deleuze (1968): 'the virtual is real'.
The third point refers to abductive epistemology.After analyzing the algorithms, the researcher asks himself: What could have made these algorithms tell me this?In such cases, without algorithms, the researcher would not see what they produce and, therefore, would not make such logical inferences.
In addition, the approach is participative, meaning that all forms of knowledge construction are involved in generating conclusions, followed by abductive reflection.Thus, provided that the widespread presence of computational elements in society expands the possibilities of academic research and modifies the social processes one wishes to study, we follow Marres (2020) in saying that algorithmic approaches to research greatly benefit from interpretative procedures and vice versa.All data types, particularly textual ones, can now be 'read' in "its full hermeneutic complexity and nuance" (Mohr et al., 2015, p. 3).

Deep learning language models as reading aids in qualitative research
"Deep learning text models are a type of neural network model that are specifically designed to process and understand natural language text.These models can be used for a wide range of natural language processing tasks, such as language translation, text summarization, sentiment analysis, and question answering.These models are usually trained on large amounts of text data and are able to learn to understand and generate human language."Answer given by the chatGPT language model to the question: What are deep learning text models?(January 17, 2023) Computational methods are 'observation and reading tools' (Mohr et al., 2020).They may assist scholars in perceiving and incorporating into their hermeneutics patterns that are difficult to visualize, such as those connected to the less accessible informants in an ethnographic study (Campagnolo, 2021).For example, algorithms can enhance interpretative research, analyzing with accuracy large amounts of text data, such as interviews or social media posts.Algorithms can be helpful during coding processes, identifying patterns or themes in the data.Thus, questioning whether a researcher would profit from learning to use word embedding models like Word2Vec (Mikolov et al., 2013) and BERT (Devlin et al., 2019), for example, corresponds to asking how well he wishes to 'read' in contrast to others.Although it may seem so, there is no exaggeration in the above statement.The truth is that the world's increasing complexity recommends adopting such methods in qualitative research.While the area of ethnography, for example, is rich in epistemic diversity (Abramson et al., 2018), even an academic who strictly views his work as a "humanist enterprise within the general field of cultural production" (Abramson et al., 2018, p. 257) may enormously benefit from the new possibilities offered by those algorithms.Indeed, Marcus (1995) warns us that "any ethnography of a cultural formation in the world system is also an ethnography of the system" (Marcus, 1995, p. 99), which means that it examines "the circulation of cultural meanings, objects, and identities in diffuse time-space" (Marcus, 1995, p. 96).Thus, an ethnographer's reading and observation abilities must be capable of dealing with the 'rhizomatic' (Deleuze & Guattari, 1980), in the sense of not being hierarchical, fixed, and stable, conditions inherent in modern social systems.That includes various contexts and circumstances involved so dynamically with each cultural setting that it is sometimes challenging for the researcher to observe and read everything.Why would a scholar attempt to deal with such diversity without utilizing those newly available computational tools?What epistemic cost, if any, do many researchers associate with using algorithms?

Integrating algorithms in interpretative research: A framework proposal
For decades, the goal of automated textual content analysis has essentially been to read "large textual corpora in such a way that critical bits of information could be extracted and a measure of informational reliability could be calibrated" (Mohr et al., 2015).This approach is about extracting only the essential ideas in a text, precisely those with little room for argument regarding their existence (Mohr et al., 2015).Such characteristic means that these methods commonly disregard the "subtleties of expression, the complexities of phrasing and the more nuanced meanings of text corpora" (Mohr et al., 2015, p. 2).Because of this limitation, it is not surprising that researchers interested in executing a 'close reading' (Gavin et al., 2019) of the texts involved in each phenomenon have not historically shown much enthusiasm for this way of doing social science research.
However, access to new forms of data, such as transactional purchase data, records of social media usage, and data from public and private surveillance, has considerably increased the volume and variety of 'texts' that need to be read to produce insights into social science (Meckin, 2021).Due to this, algorithms such as BERT, which makes it feasible to apply previously acquired learning to interpret new text sentences, can make a massive difference in interpretive research.These algorithms can execute the coding of interviews, for example, "at a fraction of the effort of humanonly coding while improving reliability" (Li et al., 2021, p. 1).Indeed, the explosion of data makes it expensive and difficult to 'close read' the totality of crucial 'texts' in each research situation.Luckily, deep learning algorithms may help reduce, amongst other elements, the number of dimensions in textual data, thus facilitating the interpretative process (Grimmer et al., 2021).
Because of the possibilities outlined above, we propose a framework for interpretive research in which algorithms and hermeneutic procedures are integrated.By doing this, we generalize the three-stage computational ground theory framework proposed by Nelson (2020).According to the notion that we want to advance, computational tools help researchers account for the multifaceted characteristics of modern social science phenomena.Thus, for researchers to have a better chance of answering "contemporary datarich questions" (Nelson, 2020, p. 34), their method of conducting interpretive social science research should have the characteristics of an assemblage (Deleuze & Guattari, 1980) capable of producing a significant number of research paths using both algorithms and interpretative tasks.Figure 1 depicts a 'research-as-assemblage-framework' containing two spaces of possibilities, hermeneutic and algorithmic, whose elements interact in a multidirectional way and can be activated by researchers without a predetermined order.Such activations originate actual research paths.
Furthermore, as illustrated in Figure 1 in a nonexhaustive manner, this characterization of interpretive research as an assemblage includes several other elements.It encompasses, for example, the coders who design and develop the algorithms and their beliefs and models.It also contains the data used to train the algorithms, which play an essential role in shaping their outputs.The precise balance between interpretation and computation in each research path is determined by the factors discussed by Campagnolo (2021).In the following section, we define an algorithm to help clear up any misconceptions preventing interpretive researchers from embracing such tools.

Who is afraid of the big bad algorithm?
In recent years, the term 'algorithm' has been used to refer to various items that were formerly grouped under 'software' (Matzner, 2022).Scholars commonly take advantage of such a broad concept to talk about technosocial phenomena, such as the alleged influence of some computational systems on the "structure of public discourse or the functioning of state power" (Matzner, 2022, p. 2).This imprecise and abstract notion of an algorithm may lead interpretive scholars to ascribe to its use an epistemological cost that is barely grounded in reality.
Indeed, a scholar may associate using an algorithm in qualitative research with surrendering one's epistemic beliefs to a "backstage device" that is responsible for "the constant optimization of experience" and undertakes forms of "infrastructural surveillance" (Cellard, 2022, p. 983).In these terms, it should be no surprise that ethnographers, for example, are often not very enthusiastic about using algorithms, especially those associated with deep learning.How would someone who views the research process as aiming for an "in-depth description of the phenomenon from the perspective of the people involved" (Yilmaz, 2013, p. 312) positively evaluate the use of a tool whose nature is currently associated with the construction of a social reality that disregards and possibly manipulates human will (O'Neil, 2016)?
The academic literature's casual usage of the term 'algorithm' has resulted in a very non-technical understanding, making it a pejorative "catchy word symbolizing an opacity and growing dehumanization" (Cellard, 2022, p. 983).Indeed, we posit that the more fundamental definition of an algorithm as the sequence of steps required to complete a task (Christin, 2020) that is not always computational (Cellard, 2022) needs to be reinstated.Interpretive scholars should view an algorithm's execution as an 'assemblage' (Siles et al., 2020) that encompasses the software being run in a computer (non-human) and a chain of human actors that includes each author of the code and the individuals who take advantage of the outputs, that is, themselves (Christin, 2020).Such an assemblage does not exist in a social vacuum.Because of that, Amoore (2020) argues that answering the ethical question of whether algorithms are good or bad depends on how we use their signals.Instead of criticizing the opacity of deep learning algorithms, she advocates for a larger 'aperture of observation' in the decision-making processes that are supported by such assemblages.Such a notion means observing how the moments when the algorithm is wrong "give accounts of algorithmic reason" (Amoore, 2020, p. 167).
Due to the impossibility of assigning an ethical value to algorithmic decisions, we follow Matzner (2022)in suggesting that researchers commonly embrace a relational account of algorithms to consider each of them "an abstraction that is inherently related to a necessary complement of that abstraction in something concrete and particular" (Matzner, 2022, p. 2).In other words, "algorithms need to be seen as becoming what they 'are' only in relations" (Matzner, 2022, p. 3).The data that an interpretive scholar decides to feed the algorithm is one of these complementary relations.The second is the 'aperture of observation' through which the algorithm's output influences the scholar's interpretation.Thus, it is necessary to dispel the notion that deep learning algorithms possess an agency that would compete with the researcher's agency.As we have said in this article, societal changes have pushed the amount of data available beyond the threshold that can be treated with the naked eye.While it is true that the employment of algorithms generates new ethical, ontological, and epistemological concerns (Christin, 2020), it is also true that they confer new capabilities on interpretive scholars.We believe developing a participative mindset is the key to taking advantage of these opportunities.Keeping an open and dogma-free mind, a crucial aspect of exploratory and inductive research, becomes increasingly important for developing substantive social science research that makes a difference.

Studying technology adoption with the help of a deep learning language model
In the following lines, we shall illustrate what it means to use machine learning in interpretive research by describing one of our most recent research projects.In work still in progress, we broaden the understanding of the continuous cycles of consumers' enchantment and disenchantment in technology adoption (Belk et al., 2021).The first author's research seeks to answer the following question: What produces, from a cultural point of view, the perpetuity of the desire to acquire each new version of a product whose current model may already meet consumer needs?
The project examines how current iPhone users express their feelings about new models on social media.Its goal is to compare the semantic distances between what they say and how Steve Jobs, the original product's market introducer, described the first version.The differences in these distances are then interpreted using cultural theories on consumer desire.Indeed, it would be challenging to comprehend the phenomenon in all of its nuances without the possibility provided by computation.The semantic proximity of the two types of discourses provides strong empirical evidence that supports and expands on the interpretation produced by articulating the chosen theoretical lens.In addition, the first author's understanding of the entrepreneurship phenomenon based on his experience and a dense reading of the same corpus, which the algorithm can scan to identify patterns and themes, contributes to developing a new theorization.
The actual research is described briefly and shown in Figure 2. Its path corresponds to one of the infinite possible actualizations inherent in the assemblage of interpretive research that employs deep learning language models.The order of activation of each hermeneutic and computational possibility is determined by the researcher's desire and the project's circumstances.The interaction between these two spaces of possibilities can be described as a dialog with the following format: 'What I think -> What the algorithm says -> What I think -> What the algorithm says', and so on.The epistemic process is participative in that it directly depends on all the elements being able to participate.
The dialogue concludes when the element who thinks last says enough for abductive reasoning to take place to produce a statement of the form: "Based on what we learned… it must be that…" In short, there are no set beginning or endpoints.The study concludes when the multiple interactions between the participating elements yield a sufficiently good explanation according to the criteria established inside the assemblage.The research path goes from left to right to represent 'the passage of time'.While there is just one instance of every activity described here, our framework has absolutely no impediment (see Figure 1) for the repetition of algorithmic and interpretation tasks.In the general case, any activity can occur an arbitrary number of times at any point.

CONCLUSIONS CONCLUSIONS
The proposed debate would be meaningless if not coherently aligned with relevant contemporary themes (Bispo, 2022).We believe that using advanced presentday computational tools in qualitative research represents an opportunity for social science scholars from developing countries.Aside from allowing researchers to understand and explain phenomena in a way that sometimes is impossible otherwise, algorithmic methods contribute positively to various operational aspects of theory-generating research (Nelson, 2020).They can, for example, help reduce the costs associated with coding textual content (Rodriguez & Storer, 2020).They also make it more practical and scalable to handle vast amounts of heterogeneous data obtained from various sources (Evans, 2014).In addition, computational methods can allow qualitative research in the Global South (Dados & Connell, 2012) to benefit from datasets originating in developed areas more effectively and accomplish the breadth of analysis that was previously only achievable in contexts or circumstances with higher budgetary abundance.This last aspect means that incorporating computational tools can help lessen firepower inequality in the global landscape of qualitative social science research.Using computational methods in qualitative research may increase overall productivity (Nelson et al., 2018).This benefit should suffice in justifying their adoption, given the difficulty of undertaking substantive social research in countries like Brazil.However, we posit that such an element is not the primary reason why an interpretivist, for example, needs to consider employing algorithms in his work.Indeed, deep learning algorithms play an ever-increasing role in modern society.Text models such as chatGPT, which was recently introduced, have the potential to completely transform content-generating professions such as teaching and journalism (Haque et al., 2022).Instead of arguing that algorithms such as chatGPT reduce the value of a researcher's thinking to the point that they may pose a risk, this article seeks to advance a different type of insight.
The chatGPT, derived from the GPT-3 algorithm developed by the company Open.ai,can be used by social science researchers as a type of co-author with whom they can converse.This article talks about applying an equally powerful algorithm: BERT.It enables the first author to identify semantically similar elements of a 2011 keynote speech in the discourse of present-day consumers.From this identification, he derives his interpretation.Although the dialogue with BERT was not conducted in natural language, the outcome was comparable: BERT enabled us to read vast Business and management research may benefit from adopting a more participative epistemology.Marketers, for example, already use conventional data mining tools to generate insights.However, the present-day scholarship has not contributed significantly to that.Adopting deep learning language models as components of the research process corresponds to a window of opportunity that can enhance researchers' capacity to address contemporary issues by properly informing business practices.Conversely, if traditional epistemological silos are maintained, and researchers decide not to expand their toolboxes, there is a risk that marketing research will not adequately explain the world of contemporary consumers, widening the relevance gap (Starkey & Madan, 2001).
The importance of a researcher's ability to interpret is multiplied many times by algorithms such as chatGPT, BERT, and others.In other words, we argue that these models help capture the more general and quantifiable relationships.What remains to be done is primarily contextual and incommensurable -for instance, that which depends on local cultural elements.If the researcher adopts a participatory epistemological stance incorporating what the algorithm has learned as his or her learning, the dialogue with such computational artifacts shortens the distance separating the research and what is more substantive while assuring a high level of accuracy in the predictions made.In sum, deep learning language models are not only fuzzies' best friends but may also increase the ability of social scientists as a whole to conduct research that can benefit society.

Figure 1 .
Figure 1.A 'research-as-assemblage-framework' for interpretive research in an era of data abundance.Hermeneutic tasks and algorithmic procedures interact epistemically.What the researcher interprets/learns serves as input for the execution of algorithms and vice versa.Source: Elaborated by the authors.

Figure 2 .
Figure 2.An interpretive study's research path using a deep learning language model as a scaffolding tool.
Note.Source: Elaborated by the authors based on Creswell (2013),