The need to enhance reproducibility in interaction research

Recent efforts to improve the reproducibility of psychological science have developed procedures for avoiding common pitfalls in confirmatory research. This paper aims to contribute to this process by introducing a set of inductive methodological procedures drawn from the field of human interaction research. Debate surrounding the current replication crisis in psychology (Pashler & Harris, 2012; Pashler & Wagenmakers, 2012) has focused on identifying and mitigating the biases and incentives that lead researchers to adopt questionable research practices (QRPs): a range of methods for manipulating experimental results and processes. Several common threats to experimental reproducibility include ‘P-hacking’ (Gelman & Loken, 2013) (using multiple statistical tests until achieving a p < .05 result), ‘HARKing’ (Kerr, 1998) (contriving a hypothesis after the results are known), and the long-standing ‘publication bias’ against negative findings (Dickersin, 1990). These threats to reproducibility can be positioned at specific points within an idealized hypothetico-deductive research cycle such as in Figure 1 (Munafò et al., 2017). Two of the most prominent mitigation strategies are preregistration (Nosek, Ebersole, DeHaven, & Mellor, 2018), which enforces an operational distinction between exploratory and confirmatory research by asking researchers to specify their hypotheses and methods before they begin (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012), and registered reports (Chambers, Feredoes, Muthukumaraswamy, & Etchells, 2014), which mitigate the publication bias by committing journals to publishing results even if they turn out to be negative. This paper seeks to contribute to these methodological reform efforts by proposing a series of inductive research procedures to improve ecological validity and the chances of scientific success without cutting corners or cheating. In the following sections we explore the problems of low ecological validity in psychological theorizing. We then outline the methodological principles and formal research procedures used in the field of conversation analysis (Hoey & Kendrick, 2017; Sidnell & Stivers, 2012) to mitigate these problems. We illustrate these procedures by working through a successful example of how conversation analytic research has formed the basis of a highly ecologically valid experimental study of doctor/patient interaction (Heritage, Robinson, Elliott, Beckett, & Wilkes, 2007). Finally we argue that even in research that is not centrally concerned with human interaction, conversation analytic methods can provide an effective approach to ecologically grounding the process of theorizing, leading to more robust and relevant research.

Figure 1 

An idealized version of the hypothetico-deductive model of the scientific method including commonly identified potential threats to this model from Munafò et al. (2017). The figure is CC-BY licensed, See: https://www.nature.com/articles/s41562-016-0021, https://creativecommons.org/licenses/by/4.0/legalcode.

Problems of groundless theorizing

In many of the failed replications reported in Open Science Collaboration (2015) it seems that QRPs are used to increase the probability of finding an effect predicted by the stated theory. Theorizing about a phenomenon that has no grounding in reality makes QRPs attractive simply because they make it more likely that researchers will be able to report significant effects that support their theory. One of the key issues underlying the use of QRPs in human interaction research, for example, is this broader problem of theories being formulated without reference to the everyday interactional situations that presumably give rise to the psychological effects and mechanisms being studied. In this section we argue that the problem of groundless theorizing stems from uncritical assumptions about the philosophy of science and human interaction. As a result, researchers often fail to limit their freedom to theorize when designing experimental studies, and can be too flexibile about procedures for coding data during the inductive phases of a research cycle.

The risk of too much freedom to theorize

The problem of researcher degrees of freedom (Simmons, Nelson, & Simonsohn, 2011) is usually seen as a risk to the integrity of data collection and analysis in confirmatory research where late-stage methodological choices can bias results. But too much freedom to theorize can be equally risky at the exploratory stage. A common assumption about falsificationism, still implicitly or explicitly a major philosophical underpinning of empirical science, is that as long as a theory can be falsified by testing a hypothesis, the scientist is free to theorize any conceivable causal relationship between any measurable variables. This freedom to theorize is tremendously powerful and there is nothing inherently wrong with this approach as long as all plausible confounding variables can be controlled and accounted for. Popper (2005, pp. 8–9) was inspired by the way cosmologists like Einstein used their freedom to theorize by making audacious leaps of intuition and then testing their predictions deductively. However, human interaction and communication tends to be under-theorized or inconsistent in its use of theory (Berger, 1991; Roloff, 2015) compared to fields such as cosmology. Therefore, taking large leaps of intuition and then using controlled experimental deduction is not necessarily a fruitful starting point. Complex systems such as, for example, patterns of gaze organization in multi-party interaction, may only emerge in hard-to-control conditions where many variables are in play simultaneously (Ward, 2002, pp. 53–60). Indeed there are many interactional behaviors that are very difficult, if not impossible, to emulate in controlled conditions (De Ruiter, 2013; Schegloff, 2006). Furthermore, there are serious difficulties with controlling for social context since people often behave very differently in specific institutional or experimental settings, and may limit or constrain their interactional behaviors accordingly (Schegloff, 1992). Even the act of recording or observing people may change the ways they interact in unpredictable ways (Labov, 1972) and these observer-influences must be analyzed and taken into account (Duranti, 1997). These limits to the extent of experimental control make the freedom to theorize particularly risky for interaction research.

The risk of too much inductive flexibility

To address the problem of too much freedom to theorize, data-driven, inductive methods such as grounded theory (Corbin & Strauss, 1990; Glaser & Strauss, 1967) use a highly flexible initial phase of iterative inspection, categorization and coding of data. This methodological flexibility can be helpful because it can accommodate empirical materials and methods from many different research areas (Urquhart & Fernandez, 2013). However, in human interaction research, too much flexibility to intuitively interpret, categorize, and code social behaviors can lead to misleading or circular inferences. Coding schema are often based on ‘common-sense’ categories rather than on formal, procedural descriptions of sequences of observable events (Stivers, 2015). Once video data are coded for quantification and experimention in this flexible way, minor errors and misleading assumptions can be magnified and exacerbated by deductive hypothesis testing. In practice, detailed, reproducible schema for coding social behaviors (e.g. Dingemanse, Kendrick, and Enfield (2016); Stivers and Enfield (2010)) are rarely published, and many studies simply report a procedurally unspecified ‘qualitative’ phase before going on to test ecologically ungrounded theories (Hepburn & Potter, 2011). Furthermore, Potter and te Molder (2005) point out that experiments rarely involve detailed, empirical studies of naturalistic interaction, so ecologically ungrounded variables and untested assumptions are imported directly in to experimental designs. Kingstone, Smilek, and Eastwood (2008) suggest that psychological science has repeatedly re-discovered these looming problems, but has tended to ignore them because transforming established research procedures would be too inconvenient. It is precisely these problems of too much freedom to theorize and overly flexible induction that have led conversation analysts to limit processes of coding and quantifying interaction (Schegloff, 1993). However, researchers are now beginning to combine conversation analysis with experimental methods (Kendrick, 2017), and to document the methodological issues and practical research procedures involved (De Ruiter & Albert, 2017; Hoey & Kendrick, 2017; Stivers, 2015). In this paper we argue that if we use inductive research procedures in a systematic, theoretically constrained way, and are clear about the methodological implications, we can ground our theories and hypotheses in the ecological context of natural human interaction.

Theoretical constraints and inductive research procedures

In the following sections, we recommend four practical methods inspired by conversation analysis (CA), an approach that places principled constraints on the freedom to theorize (Schegloff, 2007, pp. xii–xiii). This ‘theoretical asceticism’ (Levinson, 1983, p. 295) mitigates problems of groundless theorizing through a coherent set of research procedures that are systematic, inductive and rigorously empirical (Haddington, Mondada, & Nevile, 2013, p. 7). These include methods for choosing a relevant study setting, making detailed analytic transcriptions, building ‘collections’ of procedurally similar cases, and doing regular, informal peer review at analytic ‘data sessions’, all of which take place before any formal hypothesizing is allowed. These constraints are essential for research into human interaction to avoid common pitfalls and threats to the integrity of the research process. For example, in experiments using linguistic corpora, the ‘results’ (i.e. what actually happened in the interactions) usually are known before any hypotheses or research questions are formulated. It would be strange to avoid inspecting these data before coming up with testable hypotheses, but doing so without a clear set of exploratory, inductive research procedures risks (possibly unintentional) HARKing and/or P-hacking. Since existing data, researchers’ intuitions, and past results often provide the basis for pre-experimental theorizing, we advocate using CA’s research procedures transparently as a part of the published research process. We will first describe why and how conversation analysis (CA) has constrained its theorizing to match the relevant, reproducible facts in the domain of human interaction. We then introduce four core research procedures,1 and provide an illustrative example of each showing how it has been used in the development of an influential experimental study.

How to constrain theory to match the interactionally relevant facts

To test theories in an ecologically valid way, it is important to distinguish between the facts available to participants in the context of an interaction and those that may become available to researchers in the context of analysis. A related distinction is often made in the philosophy of science between ‘contexts of discovery’ and ‘contexts of justification’ (Schickore & Steinle, 2006), although there are many field-specific interpretations and applications of this distinction (Hoyningen-Huene, 2006). In the context of a broader project to improve research reproducibility, Nosek et al. (2018) suggest this distinction is equivalent to the differences between hypothesis-generation and hypothesis-testing, inductive and deductive methods, or exploratory and confirmatory research. However, here we argue that a particular interpretation of this distinction should be used in the field of human interaction research, and suggest that this interpretation is especially useful for constraining the process of theorizing in ways that can improve ecological validity.

Distinguish between contexts of discovery from contexts of justification

The ‘context of discovery’ is the situation in which a phenomenon of interest is first encountered. For example, when studying human interaction, a useful context of discovery would be an everyday conversation that happened to be recorded for analysis (Potter, 2002). ‘Contexts of justification’, in this example, might include the lab meeting, the conference discussion, and the academic literature within which the empirical details are reported, analyzed, and formulated as a scientific discovery (Bjelic & Lynch, 1992). Table 1 lists some resources for making sense of an interaction that either participants or analysts can use when discovering and justifying interactional phenomena. The third column shows some interactional resources that are available from both perspectives. For example, both participants and overlooking analysts can use observable features of the setting and the visible actions of the people within it to discover new phenomena. Both participants and analysts can also detect when these actions are produced smoothly, contiguously, and without uninterruption (Sacks, 1987). Both can see if certain actions are routinely matched into patterns of paired or ‘adjacent’ initiations and responses (Heritage, 1984, p. 256). Similarly, both analysts and participants can observe when flows of initiation and response seem to break down, falter, or require ‘repair’ to re-establish orderliness and ongoing interaction (Schegloff, Jefferson, & Sacks, 1977). By contrast, many other resources and methods for making sense of the situation are exclusively available from one perspective or the other. For example, analysts can repeatedly listen to a recording, slow it down, speed it up, and can precisely measure, quantify, and deduce cumulative facts that would be inaccessible to participants in the interaction. Similarly, participants may draw on tacit knowledge and use introspection—options which are not necessarily available for overlooking analysts—to make sense of the current state and likely outcomes of the interaction. The risk of ignoring these distinctions is that theories about how people make sense of social interaction can easily become uncoupled from empirical evidence about what the participants themselves treat as meaningful through their behavior in the situation (Garfinkel, 1964; Lynch, 2012).

Table 1

Participants’, analysts’ and shared resources between contexts of discovery and justification.

Context Participants’ resources Analysts’ resources Shared resources

Discovery Knowledge & experience beyond current interaction Ability to fast forward, rewind, & replay interactions Observable social actions & settings
Justification Introspection, inductive reasoning Quantification & deductive analysis Sequential organization of talk & social action

Consider participants’ situational imperatives

In order to ecologically ground theories in the context of interaction, we should constrain our theorizing to take account of what can be tested using the different kinds of evidence and methods available to both analysts and participants. Analysts should try to harness as many resources from the participants’ ‘context of discovery’ as possible, but it is also important that they take into account how the participants’ involvement in the situation is motivated by entirely different concerns. The drinker and the bartender do not usually go to a bar to provide causal explanations for interactional phenomena discovered in that setting for the benefit of scientific research. Their actions are mobilized by the mutual accountability of one person ordering a drink and the other person pouring it. As (Garfinkel, 1967, p. 38) demonstrates, failure to fulfill mutually accountable social roles can threaten to ‘breach’ the mutual intelligibility of the situation itself. Bartenders who fail to recognize the behavior of thirsty customers risk appearing inattentive or unprofessional. In an extreme case, failing to behave as a bartender may lead to getting fired and actually ceasing to be one. Similarly, customers who fail to exhibit behaviors recognizable as ordering a drink risk remaining unserved or, in an extreme case, being kicked out of the bar. If neither participant upholds their interactional roles, the entire situation risks becoming unrecognizable as the jointly upheld ordinary activity of ‘being in a bar’ (Sacks, 1984b). Interactional situations have this reflexive structure: they depend on participants behaving in certain ways in order to make the situation recognizable as the kind of situation where that kind of behavior is warranted. This makes it especially important to ground theories about interaction with reference to the resources and methods that are accessible to participants in the situation, and to take account of participants’ situational imperatives.

Focus on reciprocal interactional behaviors

Theories about interaction, then, should focus on whatever people in a given interactional situation discover and treat as relevant through their actions. For participants in an interaction what counts as a ‘discovery’ is any action that they, in conversation analytic terms, observably orient towards and treat as relevant in the situation. Justification in the participants’ terms, then, consists of doing the necessary interactional ‘work’ to demonstrate their understanding and make themselves understood to others (Sacks, 1995, p. 252). When people interact they display their understandings and uphold the intelligibility and rationale of their actions (Hindmarsh, Reynolds, & Dunne, 2011). This reflexive process upholds the intelligibility of the social situation they’re currently involved in: an imperative that Garfinkel (1967) describes as ‘mutual accountability’. In our bar example, prospective drinkers and bartenders monitor one another’s behavior and discover, respectively, who is going to serve a drink, and who needs one. The resources they may rely on in order to make these discoveries include their bodily positions, head and gaze orientation, speech, and gesture. Each participant may also rely on cultural knowledge and prior experience of this kind of situation. However, these tacit resources are not directly accessible—neither to the other participants, nor to the overlooking analysts. Similarly, analysts could code and quantify any visible bodily movements then use statistical methods for ‘exploratory data analysis’ (Jebb, Parrigon, & Woo, 2016) to develop a theory. This could be very misleading and ecologically invalid though, since this form of analysis is not something participants could to use as a resource to make sense of the situation, and it doesn’t necessarily take account of their displays of mutual observation and accountability. Theories about behavior in bars, therefore, should start by trying to explain this situation using only resources that are mutually accessible to participants and analysts. These resources could include any reciprocal interactional behaviors such as how drink-offerings and drink-requests are linked in closely timed sequences of social interaction.

Research procedures for ecological grounding

In this section we introduce research procedures developed by conversation analysts for grounding theories and findings in the contexts of discovery and justification shared by both analysts and participants. We introduce four key conversation analytic research practices using an illustrative example of how they were used to develop an experimental study. Firstly, we explain how to get started by matching a research question to a relevant naturally occurring interactional situation where informative recordings can be made. Secondly, we introduce methods for the detailed transcription of recordings of interaction. Thirdly, we describe methods for inductive analysis through rapid peer review during collaborative ‘data sessions’. Finally, we describe the process of building ‘collections’ of specific constructs or phenomenona, from which we can develop conversation analytic findings and ecologically grounded coding schema.

Match interactional settings with research questions

The first step in ecologically grounding a theory in natural interaction is to find a situation where a relevant interactional behavior already takes place. In practice, a research question might arise from observing any interactional situation or analysts may look for a situation to match to a related a research question. The challenge is to find a situation where the outcomes of the interaction are evidently relevant to the participants themselves, and where observable variations in their behavior can be shown to influence the end results systematically. A ‘result’ in the participants’ context of discovery can be as simple as successfully ordering a drink in a bar. For example, Loth, Huth, and De Ruiter (2013) showed that observing how drink-ordering is achieved through interaction in a bar provides informative and surprising results as the basis for formulating theories. To initiate a successful drink-ordering, customers simply had to stand at the bar looking towards the bartender. Use of stereotypical ordering actions such as calling or waving to the bartender proved to be unnecessary and even potentially disruptive. These results were very different from what they had anticipated when intuitively discussing the behaviors that should enable people to obtain drinks in bars. The value of this first step in drawing together contexts of discovery and justification is to ecologically ground theory in observable social actions. The researcher needs to find a setting where participants do observable interactional work to achieve their results (getting a drink in a bar) in ways that are informative for the analyst’s research questions (finding out how people go about getting drinks in bars). The bar is an obvious choice as a setting for exploring drink-ordering, but even if a researcher has no specific domain of inquiry yet, new research questions and ideas for a study may emerge from repeated viewing and ‘unmotivated’ analysis of any interactional data (Sacks, 1984a, p. 27). For example, a corpus of video recordings of guided walking tours has provided a setting for discovering questions about how people organize themselves into mobile groups (De Stefani & Mondada, 2013), about the roles and procedures involved in getting the group to examine something (De Stefani, 2010), and to coordinate the process of walking away together through interactional behaviors (Broth & Mondada, 2013).

The ‘perspicuous setting’ of the doctor-patient interaction

Social situations that provide a starting point for observational analysis are sometimes called ‘perspicuous settings’ because they work like a microscope that analysts can use to explore details and answer questions about human affairs. For example, Heritage et al. (2007) used acute primary care visits as a perspicuous setting to develop an ecologically valid experimental study about how to improve a key outcome of the doctor-patient interaction: the number of unmet patient concerns at the end of the visit. This experiment started with conversation analytic studies of the interactional structure of primary care visits (Heritage & Maynard, 2006; Maynard & Heritage, 2005). These revealed that the overall organization of the visit was usually organized into six distinct phases. There is an opening phase for greetings, a problem presentation phase, a data-gathering phase for history taking and/or a physical exam, a diagnosis phase, and a treatment phase before a closing phase for goodbyes. These phases are not announced explicitly, but the shifts between them are evidently observable to participants and analysts alike. Getting from one phase to the next is clearly relevant to both participants, and is very consequential for the outcomes of the visit. For example, in some of the earliest conversation analytic work in this area, Heath (1984) describes cases where the doctor initiates the shift from the opening greetings to the problem presentation phase with minor variations on the question “What can I do for you today?”, at which point the patient responds with a concern. Subseqent conversation analytic work on how doctors solicit patients’ concerns (Heritage & Robinson, 2006; Robinson, 2006) shows how doctors use variations of this question systematically. These variations index aspects of their relationship with the patient, for example if the doctor uses the patient’s name they are treating the patient as a regular whereas if no name is used it displays that the doctor is treating the situation as a first visit. Similarly, omission of this initial problem-solicitation question and a move ‘straight to business’ shows whether the doctor is treating this as a ‘new’ concern or an ongoing one. The primary care visit works as a perspicuous setting because we get to see how both participants work to achieve orderly shifts between these routine phases interactionally. Garfinkel and Wieder (1992, pp. 184–186) emphasize that, in perspicuous settings, participants’ affairs are “locally produced, locally occasioned and locally ordered”. These perspicuous settings work as shared contexts of discovery and justification, co-constructed and motivated by participants themselves, revealing what is relevant for them without reference to analysts’ concerns.

Transcribe interactionally relevant details

After recording audiovisual data in a perspicuous setting, the next research procedure involves creating detailed analytic transcriptions of talk and social interaction.2 In the 1960s Gail Jefferson, one of the founders of conversation analysis, designed its transcription system to highlight patterns of overlap and variations in prosody in a simple and intuitive way (Hepburn & Bolden, 2012). A technical transcription in IPA notation (International Phonetic Association, 1999) would provide more objective accuracy about phonetic pronounciation than standard orthography or Jeffersonian transcription. However, this level of detail is not necessarily relevant to the participants in the interaction, so for interaction-oriented analysis a specific pronunciation should only be picked out on the rare occasions when participants themselves make an issue of it by, for example, re-doing a mis-pronunciation (Hepburn & Bolden, 2017, p. 16). Jeffersonian transcription is relatively simple to read and use, and it is optimized to represent the features of talk most easily recognized as relevant to participants in an interactional situation such as speed-ups, prosodic stress, sound stretches, overlaps, pauses, and gaps between speaker turns. Most importantly, manually transcribing conversational data is a very useful analytical activity in itself through which researchers can become intimately familiar with their data. Watching and listening repeatedly while trying to capture the fine details of talk helps analysts to focus on transcribing the features that are observably relevant to the participants themselves (Bolden, 2015). Of course all transcription systems introduce their own analytic perspectives, assumptions, and biases (Ochs, 1979). Jeffersonian transcription is intentionally biased towards emphasizing the details that participants use to maintain the smooth operation of naturalistic conversation. This bias makes it ideal for highlighting details that fall into the shared contexts of discovery and justification between participants and analysts. On the one hand, these details tend to show how people accomplish basic conversational procedures such as one-at-a-time turn-taking (Sacks, Schegloff, & Jefferson, 1974). On the other hand, minute details such as disfluencies, hesitations and overlaps in talk also reveal when participants are having trouble with the larger task, situation, or topic at hand (Jefferson, 1974).

What transcripts can reveal about doctor-patient interactions

In a primary care visit, transcripts can reveal ‘surface level’ troubles with speaking, and these tiny details may also point to wider situational problems. For example, in Extract 1 from (Robinson, 2006, p. 41), the patient hesitates when the doctor moves to shift the interaction from the greeting to the patient-presentation phase of the visit. The Lines 6, 7, and 9 show ‘micro-pauses’3 where a response may be due, but is hearably withheld by the speaker. In line 6 and 7 the patient’s “Uh:m-”, (the colon denotes a sound-stretch on the preceding syllable), suggests that the patient is having trouble responding to the doctor’s question.

Extract 1

10: DIZZINESS from (Robinson, 2006, p. 41).


   5    DOC: So what can I do for you today.
–>    6    (0.2)
–>    7    PAT: Uh:m- (0.2)
   8    DOC: Oh yes. yes.
–>    9    (0.2)
   10    DOC: .hhh How’s the dizziness.=hhh
   11    PAT: Well I went to a therapi:st ...

Before going through Extract 1 to show how these minute disfluencies can reveal the participants’ broader situational problems, it is useful to show how this shift to patient presentation is usually structured.4

In Extracts 2 and 3 the patient responds to the doctor’s ‘new problem solicitation’ question by immediately addressing the problem. This is how doctors routinely initiate a shift from greetings into the problem presentation phase, and how patients routinely respond. In Extract 4, however, where there are disfluencies and pauses, these still constitute an orderly shift into the problem presentation phase in accordance with these norms. This is because the choking, pausing and coughing literally present the problem: that the patient can’t clear their throat. If we now look again at Extract 1, we can see that both the structure and texture of the talk is different. The patient does not immediately present their problem. The disfluencies and hesitations in the patient’s responses are not part of their problem presentation. Instead the doctor treats these disfluencies as related to their own failure to remember. For example, in line 8 the doctor says “Oh yes. yes”, then after a short pause and a loud inbreath (transcribed with a period followed by “.hhh”), they re-do the problem-solicitation question. This time, however, the doctor does not use the standard ‘new concern solicitation’ format (i.e. “what can I do for you?”). Instead, the doctor uses a question format that displays their awareness of the patient’s pre-existing problem, in this case by asking “How’s the dizziness”. Transcribing interaction at this level of detail gives us access to insights that would be lost to a standardized orthographic transcript5 and even this tiny collection of three cases demonstrates the importance and value of detailed transcription. This example of analyzing a small collection of individual cases also provides an indication of the value of the next research procedure: building large ‘collections’ of procedurally similar cases.

Extract 2

AudioBNC 021A-C0897X098900XX-0200P0/Acne.


1    DOC: What can I do for you this morning.
2    PAT: Ah it’s the acne ...
                                                          Audio (http://bit.ly/ecological_grounding_eg2)

Extract 3

AudioBNC 021A-C0897X00971X-0100P0/Back.


1    DOC: Well Curly what can we do for you.=
2    PAT: =It’s my back
                                                   Audio (http://bit.ly/ecological_grounding_eg3)

Extract 4

AudioBNC 021A-C0897X098900XX-0200P0/Throat.


1    DOC:    What can I do for you today.
2    (0.6)
3    PAT:    It’s (.) ((chokes)) me throat at the back eh:: trying to keep-
4    trying to clear it and I ca::nt. ((cough))
                                                                              Audio (http://bit.ly/ecological_grounding_eg4)

Build collections of procedurally similar cases

‘Collections’ include multiple instances of a target phenomenon with minor variations in terms of their composition, sequential structure, and what they accomplish through interaction. Analysts usually begin with a ‘single case analysis’ involving a few highly detailed episodes. Over time analysts tend to build up collections of many similar cases (Schegloff, 1996) that may eventually form the basis of a more ambitious systematic study. Once a phenomenon has been described procedurally it becomes much easier to spot variations and ‘deviant cases’ where things do not go as usual (such as the failed shift into the patient presentations phase we saw in Extract 1). For example, in one of CA’s foundational studies, Schegloff (1968) describes collecting 499 cases of telephone call openings before considering his collection almost complete and ready to be analyzed. It was the 500th case that provided him with a single ‘deviant case’ that forced him to re-evaluate his findings about the sequential order of ringing sounds and greeting exchanges in telephone call openings. This famous ‘500th call’ is one type of deviant case (Maynard & Clayman, 2003) that is often cited to demonstrate the difference between CA’s approach to data and more conventional qualitative ‘case studies’. Each single case analysis starts from first (interactional) principles in trying to explore the setting from a vantage point as close to the participants’ contexts of discovery and justification as possible. For this reason, Schegloff’s (1968) example functions as a kind of applied falsificationism: the only way the 500th case could make sense from within the analyst’s context of justification was to radically alter the theory to fit the data. This is the benefit of empirically constrained but flexible induction. Long-standing collections of often-analyzed phenomena become theory-like over time, and can be subject to a form of inductive falsification and modification through contradiction by subsequent findings. This flexibility also allows for widespread societal changes in people’s patterns of interactional behavior. For example, since the mid-2000s, telephone call opening sequences have changed significantly due to the prevalence of caller-ID on phone handsets (Raudaskoski, 2009).

Building collections forces researchers to test their detailed single case analyses against one another over a long term research process (Clift & Raymond, 2018). This process is based on structured, empirical work, but it still provides researchers with the flexibility to discover and test inductive analyses rather than prematurely theorizing about ecologically ungrounded constructs. Over time, collections can support other kinds of research alongside conversation analysis. As our example below shows, they can also provide a valuable resource for the design and operationalization phases of experimental research that can enhance construct validity and reproducibility.

Operationalizing a collection of doctor-patient interactions

Heritage and Robinson (2011) describe how long-held collections featuring doctor patient interactions informed the experimental design they used to test the effects of a medical communication intervention (Heritage et al., 2007). Doctor-patient interactions are a particularly useful study setting since the institutional roles and phase structure of the primary care interview are intended to have routine, predictable outcomes (Drew & Heritage, 1992, pp. 43–45). This makes it easier to spot ‘deviant cases’ where an interaction is structured differently and has different outcomes. Ecologically grounded variations in natural interaction, once identified, can be operationalized as independent variables with interactional outcomes providing the dependent variables. In the case of studying medical communication, Heritage et al. (2007) noticed that doctors often (but not always) asked medical history-taking questions using the word “any”, and that the interactional outcome was usually a “no” response.

In Extract 5, the question in Line 4 “No heart disease?” is negatively framed to favor a “no” response. Kadmon and Landman (1993) show how “any” functions as a ‘negatively polarizing’ word by exploring its frequent use in sentences such as “I haven’t got any X”. It makes sense, then, that doctors use this term in the context of history taking during a primary care interview while running through a long list of possible medical conditions where the ideal answer is “no”. This regular pattern of questioning is easy to find in any corpus of doctor-patient interactions. For example, in Extract 6 the doctor uses “any” in a sequence of negatively framed questions, each of which gets a negative answer.

Extract 5

A series of history-taking questions from (Heritage et al., 2007, p. 1430).


1    DOC  –>    And do you have any other medical problems?
2    PAT    Uh No
3    (7.0)
4    DOC  –>    No heart disease?
5    PAT    ((cough)) No
6    (1.0)
7    DOC  –>    Any lung disease as far as you know?
8    PAT    No

Extract 6

AudioBNC 021A-C0897X098900XX-0100P0/Any.


–>    1    DOC:    We don’t need to do any blood tests if you’re fi::ne. .hh you’ll be
   2    delighted to hear [having checked that in the first place. ]
   3    PAT:                                [((laughs in overlap with doctor’s talk))]
–>    4    DOC:    Erm (0.4) any questions?
   5    PAT:    No:,
–>    6    DOC:    No? okay you’re not on any other pills and tablets are you.
   7    PAT:    Not really no.
   8    DOC:    Good.
                                                                         Audio (http://bit.ly/ecological_grounding_eg5)

Their large collection of primary care consultations allowed Heritage et al. (2007) to spot the opportunity to test the effect that variations on the design of the doctor’s questions could have on the likelihood of getting a “no” response. Specifically, they noticed that if the patient did not bring up a concern at the opening phase after the problem-solicitation question (as in Extracts 2, 4, and 3) then these unmet concerns would rarely get discussed before the end of the visit. They decided to test whether the number of unmet patient concerns would be effected by the doctor using two differently formulated questions to shift into the closing phase of the visit. The independent variable was the doctor either asking “Are there any other concerns you’d like to address during this visit?” or asking “Are there some other concerns…” (substituting the word ‘some’ for ‘any’). The dependent variable was the number of unmet concerns remaining after the end of the consultation, which were counted using pre- and post-consultation patient surveys.6 The way these variables were ecologically grounded in hundreds of closely transcribed observational analyses made it far more likely that experimental tests would be relevant and reproducible in the context of doctor-patient interactions. In the following section we discuss some inductive methods conversation analysts use to develop and refine these kinds of observational analyses.

Use informal peer review as a check on inductive analysis

The ‘data session’ is essential to improveing the reliability of the collection-building process. Researchers present transcripts and recordings of single cases and prototypical collections for repeated viewing and group observation. In this situation other analysts can check the validity and recognizability of procedural descriptions and test individual cases against agreed criteria for their inclusion in a collection. Data sessions are interactional settings in themselves, and empirical studies of these sessions show how analysts use their interactional abilities to check each other’s intuitions (Antaki, 2008; Harris, Theobald, Danby, Reynolds, & Rintel, 2012; Tutt & Hindmarsh, 2011). Data sessions are such effective analytical procedures because where the object of inquiry is human interaction itself, our own interactional abilities may be our most useful heuristic measuring device, even if we do not yet fully understand how our abilities work. The problem, of course, is that if we use our intuition to describe and code interactional phenomena we will be subject to unacknowledged, unchallenged biases and may make unreliable subjective judgments. Data sessions mitigate this problem because analysts must describe each single case and how it matches the criteria of a collection in precise detail in a form of live peer review. In effect, data sessions test our intuitions and descriptions against the observations and reasoned arguments of other analysts. This process is ongoing at every stage of the research cycle, from the first data viewing, to in-depth analysis of fully developed collections. Ten Have (1999, pp. 140–141), provides a brief description of how a data session works.

“[The data session] often involves playing (a part of) a tape recording and distributing a transcript…The session starts with a period of seeing/hearing and/or reading the data, sometimes preceded by the provision of some background information by the ‘owner’ of the data. Then the participants are invited to proffer some observations on the data, to select an episode which they find ‘interesting’ for whatever reason, and formulate their understanding, or puzzlement, regarding that episode. Then anyone can come in to react to these remarks, offering alternatives, raising doubts, or whatever.”

The data session is also a reminder to focus on phenomena that both analysts and participants can access within the interactional context of discovery. Heath et al. (2010, pp. 156–157) provide a useful guide to running a data session, which warns analysts not to “cheat and look ahead, or rely on information exogenous to the clip itself”. The data session gives analysts a chance to re-examine their cases from first interactional principles, and to justify their inclusion or exclusion from collections.7

How the culture of the data session enhances reproducibility

The kind of collaborative inductive analysis that happens in data sessions enhances reproducibility by encouraging ongoing critical reflection and fostering a rigorous, cooperative research culture, which reduces the incentives to cheat. The data session is the culmination of all the research processes detailed here, from analytical transcription to collection-building. Working with naturally occurring data also facilitates continuous revision and checking of theories and constructs throughout—and beyond—a research cycle. For example, where an experiment shows a surprisingly strong effect, it is useful to be able to quickly check the construct validity of its variables and design. The ‘some’/’any’ study by Heritage et al. (2007) showed that 37% of patients reported still having more than one unmet concern at the end of their visit to the doctor. In the condition where the doctor ended the visit by asking if they had ‘some’ other concerns, this proportion went down to only 9%. This effectively eliminted 78% of unmet concerns compared to the ‘any’ condition where there was no significant effect relative to the ‘no intervention’ control group. Without replicating the whole experiment, we could check the construct validity of the study by finding cases similar to the ‘any’-formulated questions in Extract 6 then transcribing and studying them at a data session. Another useful test of the ecological validity of the ‘some’/‘any’ manipulation variable would be to collect naturally occurring cases of what happens when the a doctor uses a different formulation of the problem-solicitation question. In Extract 7, for example, the doctor solicits additional patient concerns towards the end of the interview after dealing with the initially cited ‘main’ issue.

Extract 7

AudioBNC 021A-C0897X098900XX-0200P0/Else.


1    DOC:    NOW. (1.2) what else can I do for you.
2    PAT:    I just need a repeat prescr[iption] for Dianette [please, ]
3    DOC:                               [OH YES]              [Dianette]
                                                                       Audio (http://bit.ly/ecological_grounding_eg6)

The doctor marks the end of the previous business with “NOW” and a long pause in line 1, then asks “what else can I do for you”, which turns out to be a highly effective format for problem-solicitation at this late stage in the visit. The square brackets on consecutive lines in the transcript show where the doctor’s speech overlaps with the patient’s response. His closely timed overlaps display his immediate uptake and recognition of the patient’s request as something he has already anticipated or known about. This is a deviant case of problem-solicitation since it occurs so late in the visit. It also provides us with a naturalistic example of how the doctor solves the problem of soliciting a patient concern outside of the usual early-stage ‘slot’. Analysis of these kinds of ‘deviant’ cases not only raises new questions about problem-solicitation in general, the “what else” format that the doctor uses in Extract 7 could inspire new ecologically grounded manipulation variables for use in a follow-up study.

Conclusion: Ecological grounding for greater reproducibility

Research practices in psychological science are currently under review (Nosek et al., 2018), which presents a useful opportunity to challenge some conventional methodological assumptions. From the perspective of the predominant research practices in cognitive science and psychology (Toomela, 2014) inductive methods such as the ones outlined here are seen as qualitative preliminaries to deductive experimental research. However, the last fifty years of conversation alalysis has shown that incremental research into the procedural structures of natural interaction can lead to generalizability across languages and contexts (Heritage, 2008). Taken together, the body of work derived from these empirical studies constitutes a broad set of findings about interaction against which psychological theories can be developed, operationalized, and tested deductively.8 Although all research procedures are designed to prioritize the central methodological issues and research questions within their fields of origin, they can also be effectively transposed and adapted for cross-disciplinary exchange. We suggest that experimental studies of human interaction should adopt the inductive research procedures outlined here in order to ecologically ground the process of theorizing. This is especially important within interaction research since experimental results in the field are usually several steps of task design, coding and quantification removed from the natural behaviors they investigate. Thankfully this form of cross-disciplinary exchange is already underway. CA researchers are beginning to adapt their research processes of transcription, collection, and detailed procedural analysis to create coding schema (Dingemanse et al., 2016; Stivers, 2015; Stivers & Enfield, 2010) and to quantify phenomena for statistical tests and experimental studies. These developments are opening up new opportunities for laboratory-based CA (Kendrick, 2017) and new cross-disciplinary methods of replicating and testing the reliability of experimental designs (Hofstetter, 2018). These cross-disciplinary initiatives build on the successful model of ‘naturalistic experiments’ (Heath & Luff, 2017) exemplified here by Heritage et al. (2007). This article aims to encourage many more such cross-overs by providing an overview of key research procedures that can work as practical methodological interfaces between conversation analytic and experimental methods for human interaction research within psychology and cognitive science (Albert & de Ruiter, 2018; De Jaegher, Peräkylä, & Stevanovic, 2016; De Ruiter & Albert, 2017).9

In this paper we have suggested that researchers in the field of human interaction can enhance the reproducibility and reliability of their findings by grounding their theories in the details of the interactional ‘context of discovery’. To achieve this, we recommend researchers devote attention and resources to exploring this context while constraining their theorizing to phenomena and resources that are observable to both analysts and participants. We introduce a structured set of inductive research procedures including choosing a ‘perspicuous setting’, doing detailed analytic transcription, building collections of cases, and participating in data sessions. These procedures can be used before, during, and after the process of testing theorized predictions experimentally. These constraints and procedures help us to theorize about interactional practices and variables that are psychologically relevant, interactionally consequential, and ecologically grounded in natural behaviors. We also advocate that these inductive research practices should be used alongside formally structured deductive procedures such as pre-registration of experimental designs, hypotheses and data analysis plans. This will further mitigate publication bias since the inductive research procedure becomes an end in itself rather than just a token ‘qualitative’ preliminary. What might be seen as ‘negative results’ in an experimental study can become a useful and publishable contribution to the cumulative wealth of inductive, observational conversation analytic research. If an empirical study is based on naturalistic interaction its findings can always broaden our understanding of interactional norms and specific patterns of deviation from them. Most importantly, while these research procedures are clearly laborious, they do provide the reassuring advantage that the phenomena described are guaranteed to have actually occurred in reality, not only in our theoretical imagination. Ecological grounding should, in the end, save us the far greater wasted effort, expense and time of pursuing weak effects and ecologically ungrounded theories.