An English speaker in a coffee shop, desiring coffee, might say, “A coffee,” or “I’ll take coffee, a small one, please,” or “May I have a small?” Even in this highly constrained context, many different utterances can convey the speaker’s message. This flexibility of language production, and speakers’ abilities to navigate the high number of lexical and syntactic options to converge on a single utterance form, are central features of the production process and have become important topics of research. The research presented here addresses these issues by comparing production across three different languages and two different sentence constructions. This approach provides a detailed picture of how production flexibility—the availability of alternative syntactic and lexical options to convey a message—can vary across language and message, and how these factors shape speakers’ implicit choices of words and sentence structures, providing insight into how language production processes settle on an utterance plan over alternative possibilities.

Multiple Forces Shaping Production Choices

Language production researchers have long recognized that the properties of both words and sentence structures affect producers’ implicit utterance choices. Recently, several researchers have aimed to characterize a range of these effects by contrasting two means by which speakers converge on a single string of to-be-articulated words. Sentence construction can be word driven—initially a word or words are selected and the selection order of these words dictates sentence structure, or structure driven—relations between entities in the message are encoded first, and words are then fit into this structure (Bock & V. Ferreira, 2014). Sentence production need not proceed only one way or the other, and indeed examples of both types of production processes are evident in the existing literature.

In word-driven sentence production, highly accessible words—those that are easily recalled from long-term memory—are the first to be encoded into their linguistic form and planned for articulation, and the sentence is built up, incrementally, as additional words are encoded and planned. For example, conceptual representations of animate entities are thought to be more accessible and consequently retrieved from long term memory more quickly than representations of inanimate entities, with the consequence that animate entities tend to be mentioned earlier in utterances than inanimate entities (e.g., Bock & Warren, 1985), often interpreted as a direct link between conceptual accessibility and word order (for review see Jaeger & Norcliffe, 2009; Tanaka, Branigan, McLean, & Pickering, 2011). Externally cued entities also seem to affect the ease with which entities are encoded and articulated, with cued entities generally taking more prominent positions, or positions earlier in the sentence (Gleitman, January, Nappa & Trueswell, 2007; Myachykov & Tomblin, 2008; c.f., Konopka & Meyer, 2014; Myachykov, Garrod & Scheepers, 2010). Taken together, these findings suggest that noun accessibility can have enormous consequences for sentence structure in many languages, particularly in utterances in which word order—through structure choice or scrambling—is flexible.

Whereas word-driven processes have typically been focused on the mapping between event entities and the nouns that describe them, structure-driven processing addresses the mapping of events into language. In structure-driven sentence production, the gist of the message is encoded first, generally the inter-relationships between the entities in the message, so the hierarchical sentence structure is built first, with later stages of planning filling the appropriate words into the structure. Structure-driven production is often observed when a message contains a highly namable event (Brown-Schmidt & Konopka, 2015; Konopka & Meyer, 2014) or other situations in which the message is characterized by highly salient relations between entities in the utterance (Bock, Irwin, Davidson & Levelt, 2003) or when certain sentence structures are primed or otherwise highly accessible (Konopka, 2012; Myachykov, Garrod & Scheepers, 2010; Myachykov & Tomlin, 2008).

These results emphasize how choice of utterance form is guided by properties of both words and event-structure mappings, and they suggest that the relative balance of these factors can vary across situations. Consider someone describing a picture of a boy kicking a ball, where the boy, ball, and kicking event must all be recognized and relevant words integrated into the linguistic description. If picture properties, an interlocutor’s question, or an attention-directing manipulation results in the producer retrieving the word boy first, boy might be placed early in the utterance, such as the active the boy is kicking the ball. Likewise, if ball were retrieved and planned first, a speaker might produce an utterance placing ball early in the utterance, such as the passive the ball is being kicked by the boy. The third alternative is that the verb kick could be retrieved and planned first, owing to some visual context making the event highly codable or some contextual factor making kick highly accessible. In this situation, kick does not appear first, but the lexico-syntactic verb properties of kick strongly influence the final utterance form.

All accounts of word- and structure-driven planning suggest that both types of planning can occur, but here we focus on several factors that modulate the nature of this planning. First, we propose that structure-driven planning could be tied to activation of verbs, as in the early planning of kick in the example above, meaning that structure- and word-driven processing might both be linked to lexical accessibility. Second, we emphasize statistical dependencies between events and event participants, and between nouns and verbs in our account of how planning proceeds. Claims for statistical information in language use have become central to many accounts in language comprehension, but they have been less widely studied in production research. For example, in comprehension, there is evidence that the frequency with which verbs are used in certain syntactic contexts have important implications for ambiguity resolution and sentence comprehension (Garnsey, Perlmutter, Myers & Lotocky, 1997; Juliano & Tanenhaus, 1994; MacDonald, Pearlmutter & Seidenberg, 1994; Roland, Mauner, O-Meara & Yun, 2012; Staub & Clifton, 2006; Trueswell, 1996; Trueswell, Tanenhaus & Kello, 1993). Similarly, comprehension research has suggested that learning the statistical patterns of both events in the world and lexico-syntactic pairings guides sentence comprehension (Altmann & Mirkovic, 2009; Hare, Elman, Tabczynski & McRae, 2009; McRae & Matsuki, 2009; Willits, Amato & MacDonald, 2015). These same contingencies also appear to shape sentence planning. For example, in language production, the frequencies with which verbs appear in alternative syntactic contexts has consequences for sentence production choices of sentences containing those verbs (Arnold, Wasow, Asudeh & Alrenga, 2004; Bernolet & Hartsuiker, 2010; Stallings et al., 1998) as do the distributional pairings between noun animacy and sentence structure (Bresnan & Ford, 2010; Reali & Christiansen, 2007; Gennari & MacDonald, 2009). Like comprehenders, language producers implicitly learn statistical patterns of their linguistic environment, and this information affects production choices and accuracy (Boyd & Goldberg, 2011; Chang, 2009; Dell, Reed, Adams & Meyer, 2000; Warker & Dell, 2006). Language users also have learned the statistics of their visual environment, with consequences for codability in picture description tasks, where, for example, recognition of a ball is influenced by recognition of a throwing action and vice versa (Almor et al., 2009; Handy et al., 2003; Knoeferle & Crocker, 2006; Palmer, 1975). Implicit learning over these lexical-syntactic-event contingencies (Chang, Dell & Bock, 2006) should promote higher frequency event-lexio-syntactic choices over atypical mappings, including higher frequency lexico-syntactic pairings and event-syntactic pairings. Artificial language studies show that speakers learn event-structure mappings, with consequences for the particular construction used to describe that event (Perek & Goldberg, 2015; in press). Similarly, a natural language example might be seen in Korean, where events in which a patient is adversely affected are more associated with passive structure descriptions than events with non-adverse outcomes (Oshima, 2006; Park, 2005; Song & Choe, 2007).

These results suggest that a complex set of factors shape the nature of utterance planning and producers’ production choices. It is difficult to investigate interactions between many forces at once, but here we begin some aspects of this investigation by comparing production in three languages. Specifically, we investigated how noun accessibility (in this case, noun animacy), word order flexibility, and mappings between events and the passive form affected speakers’ sentence structure choices. We investigated these factors in English, a language with limited word order flexibility that is strongly affected by noun animacy (Bock & Warren, 1985; F. Ferreira, 1994, McDonald, Bock & Kelley, 1993), Japanese, a language with flexible word order for which word order alternatives (often called scrambling) and passive use both seem to be viable options (Tanaka et al., 2011), and Korean, which is syntactically similar to Japanese in both word order and the availability of scrambled alternatives but with the event-structure link (adversity and passives) noted above. We compared active and passive production rates in two different clause types: main clauses, where (unlike English) Japanese and Korean allow word order variation, and relative clauses, where (like English) Japanese and Korean do not allow word order variation.

The intersection of three languages and two structures allows us to examine several contrasts relevant to sentence planning. We next describe the similarities and differences across these three languages that appear to be most relevant to speakers’ production of active and passive forms in main and relative clauses, and we develop predictions for production choices across languages and clauses.1

Actives and Passives in English, Japanese, and Korean

Our discussion of cross-linguistic differences in this section is focused on the specific production demands of our experiments. In Experiment 1, we showed participants pictures and provided a prompt about the entity being acted on in the picture, such as Tell me about the ball. This patient-focusing prompt, also used by Christianson and Ferreira (2005) in their study of Odawa, yields either simple main clause active sentences such as The boy is kicking the ball or passives such as The ball is being kicked by the boy. In Experiment 2, we used more complex pictures and question prompts that have previously been used to elicit relative clauses (Gennari et al., 2012; Hsiao & MacDonald, 2016; Montag & MacDonald, 2014; 2015). In this case the alternative forms elicited are the active voice object relative The ball that the boy kicked (called an object relative or object extracted relative because the head noun ball is the object of the relative clause verb kicked) versus a passive voice subject relative (or passive relative) The ball that was kicked by the boy.

The top portion of Table 1 shows four ways in which cross-linguistic differences may affect production choices in Experiments 1–2. The first feature is case marking on nouns. Whereas English has case marking only on pronouns (e.g., she vs. her) to indicate grammatical roles, Japanese and Korean mark all nouns for grammatical case (optionally omitted in some discourse conditions). Case marking is likely related to the second feature, flexibility of word order in active main clauses. In addition to case marking and greater word-order flexibility, Japanese and Korean both have a Subject-Object-Verb (SOV) main clause basic word order in contrast to English’s SVO order, and this word order difference itself may contribute to different planning constraints and demands. Recent work suggests that incremental sentence planning may proceed differently in verb-final languages, when the verb is uttered at the end (SOV word order), rather than in active sentences with SVO order (Hwang & Kaiser, 2014; Momma, Slevc & Phillips, 2016). Although the basic word order in Japanese and Korean may be SOV, these languages also have an alternative scrambled main clause word order in which the object precedes the subject in an OSV order. Evidence from Japanese suggests that speakers tend to produce sentences with a scrambled word in order to place more accessible nouns, such as animate or discourse-given words, earlier in the utterance (V. Ferreira & Yoshita, 2003; Kondo & Yamashita, 2011; Tanaka et al., 2011, Yamashita, 2002), which is also accomplished by producing a passive. Most relevant to the current studies, scrambling allows the grammatical role assignment and word order of an utterance to vary independently, which may have implications for the underlying planning processes of these utterances. We expect that the patient-focusing prompt in Experiment 1 (e.g. Tell me about the ball) will yield a high proportion of passives in English and a mix of passives and scrambled sentences in Japanese and Korean, because this prompt makes the patient of the action a given (already mentioned) entity in the discourse and highly salient as the topic of discussion. The balance between scrambled and passive forms may vary in Korean and Japanese, however—given that passives are known to be disfavored in Korean, it is likely that scrambled active sentences will predominate in this language. Thus, we expect that structure choices will vary with both the conceptual properties of the entity being discussed (animates should still differ from inanimates) and the viability of alternative structures in each language.

Table 1

Features of English, Japanese, and Korean that may contribute to cross-linguistic differences in structure choice.


English Japanese Korean

1. Case Marking on Nouns no yes yes
2. Main Clause Word Order and Flexibility Active: SVO-fixed
Boy kicked ball
Ball was kicked {by boy}
Active: SOV-flexible
Boy ball kicked
Scrambled: OSV
Ball boy kicked
Ball boy kicked-passive
Active: SOV-flexible
Boy ball kicked
Scrambled: OSV
Ball boy kicked
Passive (dispreferred):
Ball boy kicked-passive
3. Relative Clause Head Direction Head-first
Head-noun [Rel clause]
[Rel clause] head-noun
[Rel clause] head-noun
4. Active-Passive Word Orders in Relative Clauses Different Same Same
Active: ball [that* boy kicked]
Passive: ball [that was kicked by boy]
Active: [boy kicked] ball
Passive: [boy kicked-passive] ball
Active: [boy kicked ADN] ball
Passive: [boy kicked-passive ADN] ball
Additional Potential Influences on Structure Choices

Patient Animacy
Event Adversity
Main Clause – Relative Clause similarities

S = Sentence Subject, V = Verb, O = Object, ADN = Adnominal, an affix that allows the verb to modify a noun, but unlike a nominalizer, does not change the semantic category of the verb (Han, 2013). [Square brackets] mark relative clauses. Case-marking on Japanese and Korean nouns are not shown here.


*Rates of relative pronoun omission (e.g., that, who) in English active and passive relative clauses are discussed Montag and MacDonald (2014).

Japanese has an “indirect passive” form that is used for adverse events, but as we are not eliciting that kind of passive in these studies, no adversative effect is predicted for Japanese.

Features 3 and 4 in Table 1 refer to relative clauses and are therefore most relevant to Experiment 2. Head direction (Feature 3) refers to the position of the relative clause and the noun (the head noun) it modifies. Whereas English relative clauses follow their heads, as in the girl [sleeping quietly], where [sleeping quietly] is the relative clause modifying the head noun girl, Japanese and Korean have the word order [sleeping quietly] girl, in which the relative clause precedes the head noun. As we manipulate noun animacy, we will therefore investigate whether head noun animacy affects active versus passive relative clause choice in to the same degree when the head noun appears early (English) versus after the relative clause (Japanese and Korean). This question is highly relevant to questions of incremental production and the degree to which utterance planning precedes execution during language production. In radically incremental accounts of production (Levelt, 1999; Roelofs, 1998; Wheeldon & Lahiri, 1997), portions of the utterance are planned and uttered in quick succession, so that it is unlikely that speakers would go so far as to plan ahead beyond a clause boundary. However, other researchers have suggested that, depending on speaker needs, there is more advance planning (F. Ferreira & Swets, 2002; Konopka & Meyer, 2014; Stallings, MacDonald & O’Sheaghdha, 1998). Languages with head-final relative clauses would seem to provide an argument for advance planning beyond the clause-boundary, because speakers must presumably plan the head noun before or simultaneous with the relative clause that modifies it, even though the head noun is uttered later. If so, we would expect that animacy properties of the head noun, known to affect relative clause structure in languages with head-first relative clauses (English, Spanish, Serbian: Gennari et al., 2012; English: Montag & MacDonald, 2014; 2015), would also affect relative clause structure in head-final languages, as in head-final relative clauses in Mandarin (Hsiao & MacDonald, 2016). Thus we expect clear animacy effects in Japanese and Korean, even in head final structures in Experiment 2.

Finally, feature 4 refers to word order within the relative clauses. In contrast to the flexibility of word order in Japanese and Korean main clauses, Japanese and Korean relative clauses are not flexible. Another feature of Japanese and Korean relative clauses is that the active and passive forms have identical word order, so that the only production consequences for choosing the active versus passive form are case markings on the nouns and the presence of a passive marker on the verb. This allows us to examine choices of syntactic structure that have no consequences for constituent word order. This property has interesting implications for sentence planning processes, and speakers’ motivations for producing one sentence structure alternative over another.

The bottom portion of Table 1 provides some additional considerations and predictions for passive use across the three languages. First, we consider factors related to the mapping between a conceptual message and utterance form. As noted, we expect that we will see effects of animacy in all languages and across both main and relative clauses. The second message factor is adversity, where Korean passives are thought to be generally reserved for adverse events, while Japanese and English do not have this restriction (though Japanese does have an adversity restriction on another kind of passive construction; see Oshima, 2006). Our picture materials were not originally designed to manipulate adversity, but subsequent norming shows that they do show a range of pleasant and adverse events, and so it is reasonable to expect passive rates to vary with adversity in Korean. The adversity-passive link in Korean may reveal effects of event-structure patterns, or event-verb-structure mappings on producers’ choice of sentence structures (Konopka & Meyer, 2014; Perek & Goldberg, 2015; in press).

Finally, we will investigate the degree to which there is consistency in passive use in main and relative clauses within each language. It is reasonable to expect that all else being equal, passive behaviors in main clauses should predict passive use in the much rarer relative clauses, on the view that the lifetime of experience with main clauses provide syntactic priming or plan reuse (Chang, Dell & Bock, 2006; MacDonald, 2013) for subsequent use of the passive. That is, we view syntactic priming as a form of implicit learning (Bock & Griffin, 2000; Chang, Dell, Bock & Griffin, 2000; cf. Pickering, Branigan, Cleland & Stewart, 2000), and we expect that the effects would generalize from main clauses to relative clauses.

In sum, and not surprisingly, predictions across three languages, two clause types, and two animacy configurations are rather complex. However, it is this complexity that motivates comparison across structures and across languages, potentially yielding a better understanding of the processes that underlie sentence production than could be gleaned from a much narrower investigation. A number of our predictions might be characterized as word-based structure choices: Structure choice and scrambling in main clause utterances in all three languages and structure choice in English relative clauses, where noun animacy or accessibility can have consequences for word order. Other predictions might be characterized as structure-based and/or reflecting mappings between events and structures: Production choices in Japanese and Korean relative clauses, where word order is fixed, consequences of message-level factors such as adversity, and main clause-relative clause parallels across the three languages. Whether these choices are interpreted to be word- or structure- driven is a consequence of the affordances of the language being spoken and the broader speaker goals and experimental context. For these reasons, we present a counter-hypothesis that word- or structure-driven production processes reflect two potential outcomes of a single lexically-driven process. We discuss this proposal in greater detail in the General Discussion.

We first investigate production of main clauses (Experiment 1), followed by relative clauses (Experiment 2), and then present a series of analyses investigating adversity and other factors that account for production choices across the two experiments.

Experiment 1: Main Clause Production

In order to investigate the rate of passive productions in main clause and relative clauses, we investigated the choice of active and passive sentences in a main clauses using a task and context similar to that of our relative clause task in Experiment 2, which has previously been shown to be effective in eliciting relative clauses in several languages (Gennari et al., 2012; Hsiao & MacDonald, 2016; Montag & MacDonald, 2014; 2015). Therefore, the materials for Experiment 1 were derived from the relative clause materials for Experiment 2.



Twenty-four native English speaking undergraduates at the University of Wisconsin-Madison participated in this experiment in exchange for course credit in an introductory psychology course. In addition, 24 native Japanese speaking and 24 native Korean speaking UW-Madison students and community members participated for course credit or pay. The average age of the Japanese speakers was 31.8 (SD = 8.1) with an average age of first exposure to English of 10.8 years old (SD = 3.4) and an average of 5.0 years (SD = 5.2) years in the United States. The average age of the Korean speakers was 24.4 (SD = 3.97) with an average age of first exposure to English of 10.8 years (SD = 3.6) and an average of 4.5 years (SD = 2.7) years in the United States.


Twenty cartoon images were selected from those previously used to elicit active and passive relative clauses (Gennari et al., 2012; Montag & MacDonald, 2014). An example is shown on the left in Figure 1a. These complex scenes each depicted two actions that are representative of a single verb, in this example, the verb “kick.” In one instance of the action, a human agent is acting on another human patient, and in the other instance, a human agent is acting on an inanimate theme. To develop stimuli for Experiment 1, the 20 complex scenes for Experiment 2 were cropped and edited to make two simple illustrations, one with the animate agent acting on the animate patient (Figure 1b), and the other showing the animate agent acting on the inanimate theme (Figure 1c).

Figure 1 

Examples of pictures used in Experiments 1–2, in this case for the verb kick. 1a: a relative clause elicitation picture for Experiment 2. 1b–c: Edited pictures for Experiment 1 showing the action with an animate patient (1b) and an inanimate theme (1c). 1b–c were also used for a pretraining task in Experiment 2 that familiarized speakers with the verbs to be used in the main experiment.


Participants were tested by experimenters who spoke the native language of the participant, and all interactions with participants, including consent forms, written and spoken instructions, and experiment materials, were conducted in the participant’s native language.

Participants were seated in front of a computer and instructed that they would see some illustrations and should type responses to the questions that appeared under the illustrations. Japanese and Korean fonts were installed on experiment computers so participants could use the keyboard to type in Japanese and Korean. The questions that appeared asked participants to describe the patient or theme of the picture (e.g. Tell me about the ball; Tell me about the girl) and were designed to elicit active or passive main clause utterances. On each trial, participants were presented with an illustration and after two seconds, a written question appeared under that picture. Participants were instructed to type responses to the written questions. In addition to 20 test pictures (10 with an animate patient of an action, 10 inanimate, with different groups of participants answering questions about the animate patient and inanimate theme of each illustration), each participant completed 22 filler trials randomly intermixed with the experimental trials.

Participants’ responses were coded by a native speaker of the language as either active (e.g. The boy is kicking the ball), passive (The ball is being kicked by the boy) or for Japanese and Korean, which permit these constructions, benefactive, which is described below.


Trials were excluded in which participants did not produce a sentence with an action verb (e.g., It is an orange soccer ball). This accounted for 30.8% of animate and 36.7% of inanimate trials (English), 22.5% of animate and 30.4% of inanimate (Japanese) and 35.4% of animate and 34.2% of inanimate (Korean). These rates are not unexpected in a task in which there was no explicit instruction to describe actions. Rates of exclusion did not differ as a function of animacy of the element acted on or across language.

In English, on the trials in which participants produced a sentence with a verb, 92.0% (SD = 16.9) of animate and 89.4% (SD = 20.2) of inanimate trials were passive.

In Japanese, on the trials in which participants produced a sentence with a verb, animate trials were 3.2% active (SD = 5.8), 86.3% passive (SD = 13.5) and 10.5% benefactive (SD = 12.8). Inanimate trials were 74.5% passive (SD = 32.5). Benefactives cannot be used to describe an inanimate entity (Shibatani, 1994), so none were produced. The benefactive is a sentence type permitted in Japanese and Korean that is used in situations in which an entity being acted on is benefitting in some way from the action, but not taking part in the action. It would not be used to describe the illustration in Figure 1, as the girl being kicked is not being benefitted by that action, but it would be an acceptable form to describe a picture in which a woman is pushing a girl on a swing. Thus, the benefactive translates approximately as The girl is having her mother push the swing for her, with the entity being acted upon as the subject. The benefactive is not a passive, as the benefactive does not employ passive morphology, and in a true passive, the subject would be taking part in the pushing action. However, the subject and object of the sentence are both marked with the same subject and object case markers as they would be in a passive sentence. The benefactive thus has some features in common with passive utterances and some features distinct, and for this reason we performed all analyses twice, once with the benefactive utterances eliminated and once grouping them with the passive utterances.

In Korean, on the trials in which participants produced a sentence with a verb, 18.3% (SD = 21.9) of animate trials yielded active responses, 74.8% (SD = 23.6) yielded passive responses, and 6.9% (SD = 7.3) yielded benefactive responses. When the target entity was inanimate, 64.7% (SD = 33.6) of inanimate trials were passive. As in Japanese, in Korean benefactives cannot be used to describe an inanimate entity. Figure 2 displays the rate of passive and benefactive productions to animate and inanimate target nouns in the three languages.

Figure 2 

Production choices in main clauses in English, Japanese and Korean as a function of the animacy of the patient/theme of the entity being acted on in the picture.

In addition to these structure choices, we investigated scrambling rates in Japanese and Korean, which allow scrambling in main clause utterances. In Japanese, 14.5% (SD = 18.4) of active responses to inanimate targets were scrambled and in Korean, 18.2% (SD = 35.6) of active responses to animate and 14.9% (SD = 21.1) of active responses to inanimate targets were scrambled. This provides some evidence that speakers of Japanese and Korean occasionally produce scrambled active utterances rather than passive utterances in response to a prompt focusing the object of the action (e.g. Tell me about the ball), but the passive is still a favored option. These results are similar to those observed in Odawa (Christianson & Ferreira, 2005) where speakers often produced passive utterances despite the availability of a scrambled structure.

Rates of active and passive utterances varied across the three languages and varied by the animacy of the target noun. Data analysis was performed using mixed-effects logistic regression (glmer) analysis (Baayen, Davidson & Bates, 2008) with the lme4 package version 1.1–12 (Bates, Maechler, Bolker, & Walker, 2015) in R (version 3.3.2). In this model, we employed the maximal random effects structure (Barr, Levy, Scheepers & Tily, 2013) to investigate the effects of both animacy and language on production choices. Random intercept-slope correlations were included in all models. In these analyses, which are summarized in Table 2, we used mean centered classic dummy coding of our language variable to investigate differences in production choices across languages. Japanese was coded as our reference group, so in the model presented below, one dummy variable refers to the difference between English and Japanese, the other to the difference between Korean and Japanese. We opted for this coding scheme for a number of reasons. First, it was the coding scheme that matched the descriptive statistics of the production frequencies. Second, it allowed us to investigate differences between typologically dissimilar languages (English and Japanese) as well as typologically similar languages (Japanese and Korean). Despite the fact that Japanese and Korean are typologically similar, passive formation in the two languages tends to be quite different, with a higher rate of lexicalized passives in Korean and additional lexical constraints that do not exist in English or Japanese (Park, 2005).

Table 2

Effect of animacy and language on structure choice: Results of mixed-effects logistic model (Jaeger, 2008) predicting structure choice of active (reference group) or passive main clause utterances by animacy (centered) of the target noun and dummy coded language (Japanese as reference group).

Benefactives coded as Passives Coefficient SE z p Random Slope

Intercept 2.85 0.35 8.03 < 0.001*
Animacy 1.28 0.57 2.24 < 0.05* s, i
Korean (vs. Japanese) –1.85 0.69 –2.69 < 0.01* i
English (vs. Japanese) 0.41 0.73 0.57 < 1 i
Animacy * (Korean) –1.70 0.84 –2.03 < 0.05*
Animacy * (English) –2.88 0.95 –3.03 < 0.01*
Benefactives Removed Coefficient SE z p Random Slope

Intercept 2.82 0.36 7.75 < 0.001*
Animacy 1.24 0.58 2.12 < 0.05* s, i
Korean (vs. Japanese) –1.85 0.70 –2.64 < 0.01* i
English (vs. Japanese) 0.55 0.74 0.75 < 1
Animacy * (Korean) –1.69 0.82 –2.05 < 0.05*
Animacy * (English) –2.61 0.95 –2.76 < 0.01*

All models contained random intercepts for subjects (s) and items (i). Random slopes were selected to build the fullest model that converged.

* p < 0.05.

We found a significant main effect of animacy, such that all participants produced more passive utterances when describing animate entities. This result is consistent with other work showing that speakers tend to describe animate and inanimate entities differently in main clauses (e.g., Bock & Warren, 1985; Christianson & F. Ferreira, 2005; F. Ferreira, 1994; Tanaka et al., 2011). We further found an effect of language. Korean participants produced fewer passive utterances than Japanese speakers. This pattern of passive use is consistent with previous work (Song & Choe, 2007), which suggests that passives use is restricted in Korean. The Korean (that is, the Japanese vs. Korean contrast) by animacy interaction reflects the fact that while passives were produced at lower rates in Korean than Japanese, the language difference was larger for animate trials than for inanimate ones. There was no overall difference between speakers of English and Japanese in passive rates, though the significant interaction between the English-Japanese contrast and animacy suggests that Japanese speakers produced fewer passive utterances than English speakers when describing inanimate target entities, but not on animate trials. Removing benefactive utterances did not change the pattern of results.


We found an effect of animacy on production choices in all three languages, despite differences in word order, flexibility of word order and case marking. These results suggest that the mapping from animate patient to grammatical subject is a strong cross-linguistic bias, even in the face of other options (scrambling) to promote the animate patient to an early sentence position.

We also found several other interesting patterns in production choices across these three languages. First, despite the typological similarity between Japanese and Korean, including highly similar word order and case marking patterns in active and passives, Korean participants produced significantly fewer passive utterances than Japanese speakers. Word order, case marking, and availability of the scrambling option therefore do not fully predict passive choices. However, typology still affected structure choice in some way; both Japanese and Korean have the option of benefactive constructions, and speakers of both of these languages produced this sentence type.

Interestingly, despite a restriction on passives in Korean, participants did not produce more scrambled sentences, one means of fronting of the patient/theme of the action that is focused in the question prompt (Tell me about the girl/ball). Passive rates in Korean were reliably lower than in Japanese, but as Figure 2 shows, scrambled sentences are relatively rare in both languages. Indeed, for both languages, passives are by far the most frequent structure produced. At least for this task with the strong patient-focusing prompts, passive was a viable option in Korean, and scrambling was a rarely-used alternative.

Experiment 2 will provide additional evidence to help us better understand the role of animacy, as well as specific language features and typology in a relative clause production task. In the main clauses, all three languages have a different word order between their active and passive utterances. However, in the relative clause utterances elicited in Experiment 2, noun order (in English) and all word order (in Japanese and Korean) are the same in active and passive utterances. Moreover, scrambling is not an option in Korean and Japanese relative clauses, as it is in main clauses. If the choice of an active or passive reflects a solely noun-ordering choice, we may not see a systematic preference for actives or passives in Japanese and Korean, where structure choice is not accompanied by a different word order. Investigating production choices in both a main and relative clause production task will help us better understand how semantic factors (like animacy) might interact with language-specific features and typology to drive structure choice. Experiment 2 also affords an opportunity to revisit the passive restriction in Korean, because we can compare Korean speakers’ passive rates to those of Serbian speakers who performed the same task with the similar materials and who produced far lower rates of passives than English or Spanish speakers (Gennari et al., 2012). Thus these results could offer a preliminary comparison of passive rates in two unrelated languages (Korean and Serbian) that have both been described as tending to avoid passives.

Experiment 2: Relative Clause Production

The goal of this study was to examine production choices in utterances containing relative clauses, because as Table 1 indicates, relative clauses differ in several ways from the main clauses examined in Experiment 1, which could lead to insight on several aspects of sentence production. As in Experiment 1, we manipulated the animacy of the entities being described in order to investigate the effect of animacy on structure choice.

Following these investigations of relative clause production below, we also combine the data sets across the two experiments, which will allow us to test effects of clause type and other factors. We will first investigate whether rates of passive use are similar across clause types within a language. Next, we will investigate whether across the three languages, independent of overall passive rates, speakers tend to produce passive utterances on similar trials or different trials. Korean speakers more often produce passive utterances when the entity being acted on is adversely affected by that action, so we may see a pattern of by-item passive use in Korean that reflects this constraint. Speakers of Japanese and English, who purportedly do not share this adversity constraint, may choose to produce passives in different contexts than Korean speakers.



Sixty-eight native English speaking undergraduates at the University of Wisconsin-Madison participated in this experiment in exchange for course credit in an introductory psychology course. A portion of these participants’ data was reported in Montag and MacDonald (2014).

In addition, 58 native Japanese speakers, 19 from the University of Wisconsin-Madison and other American universities and 39 tested at Hiroshima University participated in exchange for pay. The sample of Japanese speakers living in the United States had an average age of 27.4 (SD = 6.9) with an average age of first exposure to English of 10.8 years old (SD = 5.4) and an average of 3.2 years (SD = 3.6) years in the United States. All participants from Hiroshima University were native Japanese speaking undergraduate students. In all data reported below, there were no significant differences in the two groups of Japanese speakers, and this factor is not discussed further.

Finally, 39 native Korean speaking UW-Madison students and community members participated for course credit or pay. These speakers had an average age of 21.0 (SD = 3.0) with an average age of first exposure to English of 9.7 years old (SD = 4.7) and an average of 3.6 years (SD = 2.5) years in the United States.


Twenty pictures such as the one in Figure 1a were used for this study. The materials were taken from Montag & MacDonald (2014) and were similar to those used in Gennari et al. (2012). The 20 pictures each depicted a transitive action with both an animate patient and inanimate theme. These animate and inanimate entities were the target items in the experiment. Pictures included other similar entities to the animate and inanimate targets; for example, there is more than one girl in Figure 1a and more than one ball. This addition, together with the cover task described below, encouraged relative clause modifications of the target entities in order to distinguish them from the other similar entities in the picture. For the target item girl in Figure 1a, relative clauses that distinguish this girl from the other girl in the picture are the active relative form “the girl that the boy is kicking” or the passive “the girl being kicked by the boy”.

In addition to the twenty test pictures, there were 43 filler pictures for a total of 63 trials. The filler pictures showed a variety of actions, humans, and objects.

Questions were recorded in each language to elicit descriptions for each picture. Two questions were recorded for each experimental item, one for the animate target and one for the inanimate target. For example, the English questions corresponding to Figure 1a was “Who is wearing blue?” for the animate girl target and “What is orange?” for the inanimate ball target. One question was recorded for each picture. Filler questions addressed a variety of topics and were designed to elicit a variety of non-relative clause responses. Examples include “What is the boy doing?” and “what is on the table?”

All written and spoken materials were prepared in English, Japanese, and Korean so that all instructions and task materials were entirely in the participant’s native language. All spoken materials were recorded in a quiet room by a native speaker of the language being recorded (English, Japanese or Korean).


All test sessions were conducted in the participant’s native language. Participants were seated at a computer and first completed a pretraining task designed to encourage them to use the specified verb associated with each picture (for example, to use “carry” as opposed to “hold” for a picture with carrying events) when describing the pictures in the later task. Different verbs tend to occur in active and passive sentences with different frequencies, so the verb pretraining was designed to limit the effects of these verb-specific tendencies. In pretraining, participants viewed only the segments of each test picture that illustrated the verb, as in Figure 1b and 1c. All participants saw both the animate and inanimate uses for each verb so they would not be able to anticipate their target when viewing the complete picture in the main task. After two seconds of exposure, a verb describing the action appeared underneath the picture. Participants were instructed to simply read aloud the word underneath the picture. For filler pictures, participants viewed a segment of picture containing a person or object and a corresponding noun. The order of presentation was randomized.

After completing the pretraining task, participants performed the main task of the experiment. Detailed instructions with a cover task were utilized to prompt relative clause productions. Participants were told that the experiment was about interpreting pictures, and that their responses would be shown to a later group of participants who would try to guess which pictures their responses described. They were told that because colors or clothing might be changed, or items in the picture might be rearranged, describing the actions in which the people and objects were taking part would be the best strategy to employ in order to complete the task.

In each trial, a color picture appeared on the screen. After three seconds, participants heard a question asking about the target person or object in the picture. Participants were instructed to answer the question by speaking into a microphone. Each participant saw ten pictures with a question about an animate patient (e.g. the girl being kicked in Figure 1a) and ten pictures with questions about inanimate themes (e.g. the ball being kicked in Figure 1a). A different set of participants saw the other half of the animate-inanimate target pairs, so that participants saw each picture only once. Test and filler trials were pseudo-randomized such that there were always at least two filler trials between test trials.


Before analysis of relative clause choice and production difficulty measures, several types of irrelevant responses were removed.

One English participant, two Japanese participants and one Korean participant were excluded for producing almost no relative clauses with verbs. An additional two Japanese participants were excluded due to equipment failure.

In English, most responses for the verb spray contained a different verb, so all trials for this verb were removed. Though the present study does not employ any initiation latency analyses, we eliminated the same three participants as in Montag and MacDonald (2014) and 25 trials for having excessively long initiation latencies to be consistent with the data reported there. The inclusion of these trials and participants did not change the pattern of results in any analysis. In Japanese, we also excluded all responses to the verb spray, as participants never used the trained verb to describe the animate targets. Likewise, in Korean, responses to the verbs spray, splash and throw were removed, as participants produced very few utterances with the verb trained during pretraining. These exclusions mean that there were slightly more items in the Japanese and English analyses than in the Korean analyses. We opted not to exclude the throw and splash items in Japanese and Korean for the sake of consistency with prior reports with these materials (Gennari et al., 2012; Montag & MacDonald, 2014). None of the patterns reported here are changed if we eliminated throw and splash from the English and Japanese data to equalize the number of items across languages.

Next, we excluded individual utterances that did not contain relative clauses (e.g. the ball, or the ball on the right), affecting 17.6% of animate trials and 25.8% of inanimate trials (English), 4.9% of animate trials and 22.8% of inanimate trials (Japanese) and 27.9% of animate and 25.5% of inanimate trials (Korean). We suspect the higher exclusion rate among inanimate trials in English and Japanese is due to the fact that participants occasionally failed to locate some inanimate competitors in the pictures and subsequently produced a simple noun phrase (e.g. the ball) that did not distinguish the target from similar elements in the scene. Participants almost never failed to notice an animate competitor, with the result that animates were more often modified; see Montag & MacDonald (2014) for discussion of animacy and visual salience effects in picture descriptions. It is unclear why we did not see this pattern among Korean speakers, or why overall rates of errors might be different among speakers of different languages. The different populations or different experimenters that are necessarily a component of cross-linguistic research may have contributed to this group differences, but rates of non-target forms are all well within the normal range for sentence production experiments and are not unexpected in this task, in which there was never an explicit instruction to use relative clauses.

Next, relative clauses in which the participant produced a different verb than the one provided in pretraining were also excluded from analysis, affecting an additional 8.8% of animate and 12.3% of inanimate trials (English), 16.4% of animate and 13.1% of inanimate trials (Japanese) and 21.8% of animate and 13.9% of inanimate trials (Korean). Inclusion of these different-verb trials in the analyses reported below did not change any of the results, and we took the more conservative path of excluding these trials so that for each item, speakers of each language all produced the same verb.

After these exclusions, in English, a total 834 relative clause utterances from 64 participants (65% of total utterances) were analyzed. In Japanese, 723 trials from 54 participants were analyzed (67%) and in Korean, where two additional experimental pictures were excluded, 421 responses from 38 participants were analyzed (55%). As a check against consequences of these uneven sample sizes, we selected 30 random samples of 38 participants (Korean sample size) from our samples of English and Japanese participants, and mean production frequencies of these size-matched samples did not differ from total sample means (with one-sample t-tests, English: Animate: t(29) = 0.82, p > 0.4, Inanimate t(29) = 0.31, p > 0.7; Japanese: Animate: t(29) = 0.39, p > 0.6, Inanimate t(29) = 0.42, p > 0.6). Subsetting our data in order to match sample sizes across languages did not change the patters of effects, so the full samples are reported below.

Participants’ responses were coded either as active or passive relative clauses (or causative, a construction permitted in Japanese and Korean relative clauses). Figure 3 illustrates the proportion of active, passive, and causative utterances across all three languages. In English, relative clauses with animate target head nouns were overwhelmingly passive (98.7% passives, SD 4.1) while inanimate targets were nearly evenly split between active and passive relative clauses, with 50.1% (SD 38.9) passives. This analysis was reported in Montag & MacDonald (2014) and is consistent with previous studies of English relative clause frequencies (Gennari et al., 2012; Gennari & MacDonald, 2009). In Japanese, when the target noun was animate, speakers also produced relative clauses that were overwhelmingly passive (90.0%, SD = 11.4). An additional 7.3% (SD = 9.7) of productions were causative, a construction that shares some features with passives (e.g., The boy that is caused to be wiped by the woman). With inanimate target nouns, relative clauses were 38.9% (SD = 37.3) passive, and no responses were causative. Finally, in Korean, when the target noun was animate, speakers produced 43.5% (SD = 28.9) passive relatives and 7.4% (SD = 12.6) causatives, and when the target noun was inanimate, 6.6% (SD = 15.3) passive relatives. This rate of passives for inanimates is similar to the rate of about 4% passives with inanimate heads in this task with Serbian speakers (Gennari et al., 2012), but Korean speakers here produced far more passives with animate heads than did the Serbian speakers in that study (15%). Thus even though Korean is widely known to have a passive restriction, it is not as severe as in Serbian and possibly other Slavic languages, at least within these experiment and discourse conditions.

Figure 3 

Production choices in relative clauses in English, Japanese and Korean.

To understand the contribution of animacy and language to production choices, we performed a mixed-effects logistic regression, predicting structure choice with the animacy of the target noun and the language spoken. We performed two separate analyses, one which grouped causative utterances with passive utterances, and one in which causative utterances were excluded. We chose to analyze the data in this way because the causative utterances share some features with passives. The grammatical roles of the agents and patients are the same across these two structures, but the causatives are constructed with the causative, not passive, inflectional morphology. The models are summarized in Table 3.

Table 3

Effect of animacy and language on structure choice in Relative Clauses: Results of mixed-effects logistic model predicting structure choice of active (reference group) or passive main clause utterances by animacy (centered) of the target noun and dummy coded language (Japanese as reference group).

Causatives coded as Passives Coefficient SE z p Random Slope

Intercept 0.76 0.41 1.86 < 0.1
Animacy 6.11 0.56 10.90 < 0.001* s, i
Korean (vs. Japanese) –4.73 0.64 –7.37 < 0.001* i
English (vs. Japanese) 1.21 0.61 2.00 < 0.05* i
Animacy * (Korean) 0.41 1.13 0.37 < 1
Animacy * (English) 0.50 0.96 0.53 < 1
Causatives Removed Coefficient SE z p Random Slope

Intercept 0.75 0.42 1.77 < 0.1
Animacy 6.07 0.58 10.52 < 0.001* s, i
Korean (vs. Japanese) –4.76 0.64 –7.42 < 0.001* i
English (vs. Japanese) 1.32 0.60 2.19 < 0.05* i
Animacy * (Korean) 0.29 1.15 0.25 < 1
Animacy * (English) 0.72 0.98 0.73 < 1

* p < 0.05.

All three languages showed a robust effect of animacy such that passive structures were more common in productions to animate targets across languages. This pattern of results is consistent with Gennari et al. (2012), who found that despite large differences in absolute passive frequencies in English, Spanish and Serbian, speakers of all languages produced more passive utterances when describing animate targets than inanimate ones.

This pattern of results is interesting for a number of reasons. First, we found an effect of animacy when not just noun order (English) but when all word order between the active and passive forms was constant (Japanese and Korean). Second, we found that despite the typological differences between Japanese and English, and the typological similarities between Japanese and Korean, Korean speakers produced far fewer passive utterances than Japanese speakers. The fact that both Japanese and Korean have case marking, head-final relative clauses, and identical word order in active and passive relative clauses did not result in similar production choices for speakers of these two languages suggests that other factors, beyond these features of the language, contribute to production choices in profound ways.

In the next analyses, we report additional, finer-grained cross-clause and cross-language analyses to better understand speakers’ productions across clause types and languages.

Comparisons of Main and Relative Clauses

To better understand the similarities and differences across languages and clauses, we created one large dataset, merging the main and relative clause utterances in order to test several additional hypotheses concerning production processes. First, we investigate the effect of Clause, that is, the contrast between main clauses in Experiment 1 and relative clauses in Experiment 2, in order to better quantify the impression from the individual experiments that the passive rates are similar across the two clause types. Second, we investigate cross-language behavior by items, specifically whether the adversity of the depicted event tended to promote the production of passives in Korean, where this tendency is well-attested in these sentence types (Oshima, 2006), as well as in English or Japanese.

Production in Main vs. Relative Clauses

To compare rates of passive vs. active forms in main and relative clauses, we created models that include Clause (main vs. relative clause) as a fixed factor, to address whether there are different proportions of passive/active forms across clause type, and the degree to which Clause interacts with the other factors we investigate. Table 4 summarizes these models2.

Table 4

Effect of animacy, language and clause type on structure choice in Main and Relative Clauses: Results of mixed-effects logistic model predicting structure choice of active (reference group) or passive utterances by animacy (centered) of the target noun, dummy coded language (Japanese as reference group) and clause type (Main Clause or Relative Clause, centered).

Causatives and Benefactives Coded as Passives Coefficient SE z p Random Slope

(Intercept) 1.67 0.25 6.59 < 0.001*
Animacy 2.98 0.38 7.79 < 0.001* s, i
Clause 2.11 0.36 5.88 < 0.001*
Korean –3.10 0.43 –7.24 < 0.001*
English 0.46 0.40 1.14 < 1
Animacy x Korean –0.37 0.64 –0.58 < 1
Animacy x English –1.11 0.58 –1.913 = 0.056
Clause x Animacy –4.60 0.56 –8.22 < 0.001*
Clause x Korean 2.55 0.76 3.38 < 0.001*
Clause x English –0.81 0.82 –0.98 < 1
Causatives and Benefactives Removed Coefficient SE z p Random Slope

(Intercept) 1.66 0.27 6.20 < 0.001*
Animacy 2.97 0.41 7.28 < 0.001* s, i
Clause 2.06 0.36 5.70 < 0.001*
Korean –3.17 0.43 –7.30 < 0.001*
English 0.60 0.41 1.46 < 1
Animacy x Korean –0.55 0.65 –0.86 < 1
Animacy x English –0.87 0.59 –1.49 < 1
Clause x Animacy –4.68 0.56 –8.32 < 0.001*
Clause x Korean 2.71 0.78 3.49 < 0.001*
Clause x English –0.70 0.84 –0.83 < 1

* p < 0.05.

These results show important effects of the contrast between main and relative clauses. First, there is main effect of clause type, such that speakers produced fewer passives in the relative clause than in the main clause production task. This main effect likely stems from the fact that we used a patient-focusing question (Tell me about the girl) in Experiment 1, which strongly elicits passives, but this type of question cannot be used in relative clause elicitation, so we used the identification prompts (e.g., Who is wearing blue?) of previous relative clause elicitation studies. Another difference across the experiments was that Experiment 1 participants produced written responses, while Experiment 2 participants gave spoken responses. Gennari et al. (2012) found no differences in structure choice in English written vs. spoken responses to the relative clause elicitation pictures used in Experiment 2 here, and so we do not expect that response modality is a strong contributor here.

We also found an interaction of Clause by Animacy. While animate patients of action promote passive use in both clause types (the main effect of Animacy), the effect of animacy is larger in the relative clause production task than in the main clause task. This result likely reflects the strong passive promoting effect of the patient focusing question in Experiment 1, showing that under appropriate discourse context, speakers of all languages produce numerous passives—more often for animate patients, but also a fair amount for inanimates as well. Under the less strong passive-promoting context in Experiment 2, the effect of animacy is stronger—passives continue to be highly frequent for animate relative clause heads, but they are less frequent for inanimates. These results are consistent with Prat-Sala and Branigan’s (2000) observations that two kinds of noun accessibility can affect structure choices: “inherent accessibility,” owing to the conceptual prominence of animate entities; and “derived accessibility” stemming from a discourse that makes an entity given and salient.

Next, the main effects of language suggest continuity within language across the two clause types. In main and relative clauses, regardless of whether causative and benefactive utterances were coded as passives or removed, Korean speakers produced fewer passives than Japanese (and English) speakers. When comparing English to Japanese, when causative and benefactive utterances were coded as passive, English and Japanese speakers produced similar rates of passives, though the marginal animacy by English interaction reflects the fact that English speakers produced numerically more passives when describing inanimate entities. These main effects suggest that though overall rates of passives varied across the two clause types, there was consistency within each language across these two production tasks.

Finally, the Clause by Korean interaction refers to a greater difference between Korean and Japanese speakers in the relative clause task than in the main clause task. Again, these interactions may reflect the prompts by which we elicited utterances in the two tasks and the ceiling effect in passive rates in English and Japanese speakers in the main clause production task. Alternatively, animacy effects or differences in cross-linguistic sentence preferences may be greater in the distributions of certain sentence types. And of course discourse factors may themselves vary with structure, in that main and relative clauses have different discourse functions. Future work is needed to investigate these factors, and a central result here—that cross-linguistic differences in structure can vary with the structural or discourse environment—will be an important guide to those additional investigations.

Adversity in Passives

To this point, we have argued that the lower rate of passives in Korean likely stems from a restriction to passive use mainly for adverse events (Oshima, 2006), in contrast to Japanese and English, where this restriction is not present. Japanese does have an adverse passive, but the adverse meaning generally is associated with indirect passives, as when an intransitive verb is passivized, as in a sentence that could translate as Becky was showed up late by Karen, meaning that Becky was adversely affected by Karen’s showing up late. As we did not study this structure and elicited only direct passives with transitive verbs, we predicted no adversity effects in Japanese or in English. We tested these hypotheses by collecting adversity ratings of the pictures used in our studies, and including these ratings in a mixed-effects model predicting passive utterance choices in both main and relative clauses, and by correlating these ratings with the rates of passives over items (that is, pictures) by animacy, averaged across clause type, with causative and benefactive utterances, along with actives, coded as non-passive responses.


Twenty-three native English speaking undergraduates at the University of Wisconsin-Madison who did not take part in either the main clause or relative clause production tasks participated for participated in this experiment in exchange for course credit in an introductory psychology course.


Pictures used were the same used in our main clause production task, pictures b and c depicted in Figure 1. Each participant saw all pictures (20 animate-patient and 20 inanimate-theme events)


Participants viewed each picture and were asked to rate the picture on a scale of 1–7 based on how positively affected the patient or theme of the event seemed. Participants began with six practice trials to allow them to practice using the rating scale and to allow them to experience a broad range of positive and negative events. Pilot testing revealed that it was conceptually easier for participants to complete this task if higher rating values referred to more positive events, and so participants in this study used a scale in which 1 was maximally negative and 7 maximally positive. In subsequent analyses, we reversed the direction of this scale to make the results more interpretable in the context of our predictions, so that higher values refer to more adverse events.

Table 5 shows that our adversity predictions were correct for Korean, but not for the other languages. The results of the mixed-effect model predicting passive choice (versus all other non-passive options) across main and relative clause tasks showed a main effect of adversity ratings, suggesting that across all three languages, speakers were more likely to produce passive utterances when the patient of the depicted action was more adversely affected by the action. Adversity did not interact with language, and including clause type in the model did not yield a significant main effect of clause type, or any interactions between clause type and language. This result is somewhat surprising given that we had predicted an effect of adversity ratings only in Korean. To better understand the pattern of results, and to address a potential concern that these mixed effect logistic models may not be robust when there is low variability (there were very few non-passive responses in English), as a follow-up analysis, we calculated correlations between the proportion of passive utterances for each verb and adversity ratings, averaged across clause type.

Table 5

Effect of adversity and language on structure choice in Main and Relative Clauses with animate target nouns: Results of mixed-effects logistic model predicting structure choice of non-passive utterances (reference group) or passive utterances by adversity of the depicted action (higher values refer to more adverse events, mean centered), and dummy coded language (Japanese as reference group).

Coefficient SE z p Random Slope

(Intercept) 3.99 0.84 4.74 < 0.001*
Adversity 0.84 0.35 2.41 < 0.05* s
Korean –21.00 2.17 –9.67 < 0.001* i
English 1.01 1.69 0.60 < 1 i
Adversity x Korean –2.04 0.83 –2.45 < 0.5*
Adversity x English 0.37 0.56 0.66 < 1

Figure 4 illustrates these adversity-verb relationships across language. We indeed found significant correlations between adversity ratings and passivization in both Japanese (r(17) = .64, p < 0.01) and Korean (r(17) = .64, p < 0.01). However, in this analysis, we found no correlation between adversity ratings of events containing animate patients and rates of passives in English (r(17) = .15, p = 0.53). Given the small amount of variability in response type in English, it is difficult to say definitively whether we indeed detect an effect of event adversity in the choice of passives—whether our significant result in the mixed effect model is an artifact of low variability, or if it is a true effect. Future work in paradigms that elicit a greater proportion of non-passive utterances will be key to understanding this possible effect of adversity in English. After numerous findings across Experiments 1–2 that Japanese is similar to the typologically dissimilar English in passive production rates and dissimilar from its typological neighbor Korean, here we see for the first time the opposite effect, that the pictures and events that Japanese passivize tend to be the same ones that yield passives in Korean, while English is, at least possibly, the outlier. We found no reliable correlation in any language between adversity ratings of events containing inanimate patients and rates of passives (English: r(17) = .38, p = 0.11; Japanese: r(17) = .20, p = 0.42; Korean: r(17) = .14, p = 0.56).

Figure 4 

Correlations between the overall passive rate for animate targets and adversity ratings by picture in English, Japanese and Korean. In these graphs, higher ratings indicate more adverse events).

To our knowledge, discussions of adversity in Japanese passives are limited to the indirect passive (e.g. Kuno, 1973; Oshima, 2006), which is not tested here. However, it is possible that Japanese speakers generalize from the adversity restriction in indirect passives to direct passives—essentially a syntactic neighborhood effect where patterns in one construction affect another construction (e.g., Jared, McRae & Seidenberg, 1990; Juliano & Tanenhaus, 1994; Pearlmutter & MacDonald, 1995). While an investigation of indirect vs. direct passives is beyond the scope of this study, our data is consistent with adversity as a source of influence in structure choice in Japanese direct passives. This result is contrary to claims for distinct differences between Korean and Japanese for the use of passive: “In Japanese an adversative implicature arises only in an indirect passive (which is not attested in Korean at all).” (Oshima, 2006, p. 138). We find evidence of an adversity effect in both Korean, where it has been previously attested in direct passives, as well as in Japanese, where it has not. We also see the suggestion of an effect in English, which is also not previously attested, though there is some evidence that the English get-passive is commonly used to describe particularly adverse or beneficial events (as in The window got smashed or The intern got promoted); Collins (1996) reported that the adverse use of the get-passive was nearly three times as common as the beneficial usage. If subsequent investigations indeed detect an adversity effect in English, this would suggest an interesting example of event-structure motivations of structure choice that persist across a variety of languages. In this regard it will be important to distinguish adversity from the general degree to which patients are affected in the event that producers are describing. That is, passives are used to focus the patient, often because the patient is affected by the event in some important way. While both positive and negative events can affect patients, it may be that adversity and affectedness tend to be correlated in our materials, other materials, and even in the world. For example, the adverse events of getting punched or pushed may more strongly affect a patient than the positive events of getting hugged or kissed. This potential relationship between affectedness and adversity suggests several possible scenarios for some or all languages: a) that what has seemed to be adversity is really an influence of affectedness on passive use, b) that affectedness and adversity are at least partially independent forces in passive use, or c) that adversity, not affectedness is the primary driver of passive use. Future investigations that include strongly positive, affecting events (was rescued, was elected, was worshipped) could help to distentangle adversity and affectedness of the patient and clarify motivations to produce passives.


There are a number of similarities and differences in structure choice in relative clauses across English, Japanese and Korean. Despite typological differences between English and Japanese and Korean, in all three languages, speakers tended to use more passive productions when describing animate rather than inanimate targets. This is consistent with the empirical findings of Tanaka et al., (2011) as well as linguistic analyses (Oshima, 2006) that suggest that even in languages with more fluid word order, speakers more often produce passive utterances with animate than inanimate subject nouns. Though this effect of animacy was robust in all three languages, the absolute number of active and passive utterances varied across languages. Notably, these differences did not take place along typology lines; despite almost identical typology in these sentences in Japanese and Korean, Japanese speakers produced almost identical proportions of active and passive utterances as English speakers, while Korean speakers produced significantly fewer passives. This was expected, given that passive structures tend to be disfavored in Korean (Park, 2005; Song & Choe, 2007).

That said, characteristics of the three individual languages indeed contributed to production choices. In Japanese and Korean, languages that permit scrambling word order, speakers, speakers produced scrambled active sentences rather than passive utterances on some trials, but the overall rate of scrambling was low. Japanese tends to have a low overall rate of scrambling in transitive sentence like those elicited here (Kondo & Yamashita, 2011, Yamashita, 2002), so the low rate of scrambling and high rate of passive utterances in our task is not surprising. More generally, our results fit into a growing body of work showing variability in how speakers of different languages choose between passive and scrambled options. On the one hand, speakers of Slavic languages appear to produce many more scrambled sentences than passives (Gennari, Mirkovic & MacDonald, 2012; Myachykov & Tomlin, 2008), while in other languages that permit scrambling, speakers eschew that choice and frequently produce passives (Christianson & Ferreira, 2005).

The variability in overall passive rate across languages is consistent with our finding that structure choice in main clause utterances (Experiment 1) predicted production choices of relative clause utterance (Experiment 2). Languages may or may not permit scrambled sentences as an alternative to passive utterances, but this language affordance itself does not fully predict the overall rate with which speakers of a language produce passive utterances. Other language-specific factors must play a role.

We did identify some factors that seemed to affect production choices across multiple languages. We found that in both Japanese and Korean (and with some evidence for English), speakers were more likely to produce passive utterances when the patient was adversely affected by the action. This is somewhat surprising because though we predicted this effect in Korean, we did not predict this effect in Japanese, where the implied adverse meaning of a passive is only thought to occur on indirect passives, not the direct passives elicited here (Kuno, 1973; Oshima, 2006). This finding, along with the finding that passive rates in main clause utterances predicted passive rates in the relative clause utterances, is consistent with claims for the non-independence of different structures within a language, i.e. priming or neighborhood effects among related structures (Juliano & Tanenhaus, 1994; Pearlmutter & MacDonald, 1995), though more work is needed to identify the boundaries over which structures can be “related.” If speakers’ production choices reflect a lifetime of implicit learning, then speakers seem to be learning across structure neighborhoods—patterns of main clause passives are not learned or produced independently of passives in relative clause passives, and indirect passives are not learned or produced independently of direct passives.

General Discussion

In a main clause and a relative clause production task, we investigated production choices in English, Japanese and Korean. Looking across languages, clause types and head noun animacy, we saw three important commonalities: 1) effects of noun animacy on sentence structure across languages and clause type, 2) structure choice varying with event adversity in Japanese and Korean and 3) parallel production choices in main and relative clauses within each of the three languages. In the next sections, we consider implications of these results for theories of sentence production, and some future directions.

Implications for Conceptual Accessibility

Across all three languages and both clause types, animate entities elicited more passive utterances than inanimate ones. Effects of animacy on passives have been replicated many times (Bock, Loebell & Morey, 1992; Bock & Warren, 1985; F. Ferreira 1994; Gennari et al, 2012; Prat-Sala & Branigan, 2000; Tanaka et al., 2011; van Nice & Dietrich, 2003), but the Japanese and Korean results are nonetheless interesting because they show that the availability of alternative word orders (scrambling in main clauses in Experiment 1) does not supplant the passive option, which was more common for these speakers, consistent with some other studies of passive choice in the face of other word order options (Christianson & F. Ferreira, 2005; Tanaka et al., 2011). More novel are the results of the relative clause experiment, where the effect of animacy held for Japanese and Korean speakers despite the fact that the active and passive relative clauses have identical word order, so that structure choices cannot be attributed to pressures to put animate nouns in an early sentence position; indeed, the animacy effects we see come before the head noun, which is fixed at the end of the relative clause. To our knowledge, these are the first demonstrations of such effects in Japanese and Korean production (see Hsiao & MacDonald, 2016, for similar results in Mandarin). Together, these results support a strong link between animacy and grammatical role assignment, beyond any effects on linear order (Christianson & F. Ferreira, 2005; McDonald, Bock & Kelly, 1993; Tanaka et al., 2011). On this view, speakers implicitly assign some element of their message as the topic, then follow language-specific constraints for an utterance about a topic or focused entity. Generally, this places the more accessible topic entity into a more prominent sentence position, or assigns that accessible entitiy a more prominent grammatical role, and these prominent positions are often early in the sentence, allowing speakers to place more fully planned words (topics) earlier in an utterance, or to conform to well-reherased patterns in a langauge (Arnold, 2016; Bock et al., 1992; Christianson & F. Ferreira, 2005; Prat-Sala & Branigan, 2000; Tanaka et al., 2011). However, an important goal for future research is to understand why some other languages appear to resist the effect of animacy on passive choice, including in experiments using the same tasks and materials as in the current studies (Gennari et al., 2012; Perera & Srivastava, 2015).

Implications for Lexical, Syntactic, and Event Knowledge in Sentence Production

The noun accessibility effects described above can be seen as examples of word-driven production. However, there are also clear effects of events on structure choices: Japanese and Korean speakers (and possibly English speakers) were more likely to use passives when animate patients were adversely affected by their depicted events. Whether these results turn out to be an effect of adversity, affectedness of the patient, or some combination, these results show clear effects of event encoding on language production. It is less clear that these event effects warrant a different mechanism of grammatical encoding than the lexically-driven processes that seem to underlie animacy and other accessibility effects. An alternative view is that the effects of animacy and accessibility of nouns are filtered through the affordances of the language, linguistic context and rich event-lexico-sytactic statistics that speakers learn from their language environment.

Figure 5 illustrates important relationships between events, words and structures that appear to emerge from language users’ experience and which affect the form of utterances. Support for Leg 1, between Events and Structures comes from evidence that language users learn the statistics of events in the world and their correlations with structures (Altmann & Mirkovic, 2009; Goldberg, Casenhiser & Sethuraman, 2004; Willits, Amato & MacDonald, 2015). Applied to language production, we see evidence for influences of these relationships in the adversity effect in the present studies, in that certain events were more often mapped to certain sentence structures as a consequences of aspects of the event. Other evidence for fine-grained event-lexico-syntactic learning can be found in studies that suggest that the ease with which a speaker can capture the gist of an event has consequences for structure choice (Konopka & Meyer, 2014), and findings that show statistical learning between events and constructions. In artificial grammar learning studies, Perek and Goldberg (2015; in press) found that speakers learned not only lexico-syntactic patterns of the likelihood of certain verbs appearing in certain syntactic frames, but also learned event-syntactic patterns in the likelihood of certain syntactic forms describing certain event types, and applied this knowledge when using the novel language to describe pictures. The mapping between adverse events and passive sentence structures is a clear natural language example of these same phenomena.

Figure 5 

Schematic of the relationships between events, structures and words that speakers learn and apply to online sentence production.

Evidence for Leg 2, between Structures and Words, comes from the observed animacy effect, that animate entities were more likely to be described with different structures than inanimate entities. This type of lexico-syntactic learning also underlies a variety of verb-bias effects (F. Ferreira, 1994; Stallings et al., 1998; in comprehension: Garnsey et al., 1997; Trueswell, Tanenhaus & Kello, 1993) and generally findings that certain words are more likely to occur in some contexts than other words.

The present study does not provide direct evidence for Leg 3 –participants were explcitly pre-trained on a set of verbs to use to describe the depicted events—but had we not done so, certain verbs would surely have been mapped to certain events more often than others. In addition, there is evidence that aspects of events, such as salience or namability of event entities has consequences for lexical encoding of those entities (Gleitman, January, Nappa & Trueswell, 2007; Knonopka & Meyer, 2014; Myachykov, Thompson, Scheepers & Garrod, 2011; Perek & Goldberg, 2015; in press), which then has consequences for word ordering or structure choice. We believe that this type of combinatorial learning—event-structure, lexico-structure, word co-occurrence, and other—is an important aspect of the sentence production process and can guide many of the choices that speakers make in these, and other sentence production tasks (Perek and Goldberg, 2015; in press). These relationships merit further study.

Implications for priming across structures (neighborhoods)

Claims for learning combinatorial patterns across words, structures, and events naturally leads to questions about generalization across exemplars, a central component of language acquisition (Tomasello, 2000; Goldberg, Casenhiser & Sethuraman, 2004). In these studies, we found evidence for syntactic “neighborhoods” of similar sentence types and/or event types, in that patterns of main clause productions predicted patterns of relative clause productions describing the same events. These patterns may reflect priming during language use, as a form of implicit learning (Chang et al., 2006) from message to structure, where mappings from producing common main clauses in a language prime the same mappings in rarer relative clauses in a language. Alternatively, these patterns might reflect animacy and adversity in main clauses and relative clauses independently, with no generalization from one clause to the other. Some evidence for this view comes from research showing priming of sentence structure forms as a function of event parsing. Thematic role ordering can be primed independently of sentence structure (Chang, Bock & Goldberg, 2003), and structured non-linguistic events, such as action sequences, have been shown to prime similarly structured linguistic sequences (Allen, Ibara, Seymour, Cordoba & Botvinivk, 2010; Kaiser, 2012). Our studies are not priming studies or training studies that directly manipulate speakers’ experiences in one structure to observe effects in another, but there are several results in the literature that make priming/learning across structures a logical contributor to our results, although priming from events to structures may also occur. Some evidence for cross-structural priming comes from language comprehension, where a number of researchers have found sentence-level frequency by regularity interactions, such that some low-frequency structures receive benefit from similarity to high frequency ones (Juliano & Tanenhaus, 1994; Pearlmutter & MacDonald, 1995; Wells, Christiansen, Race, Acheson & MacDonald, 2009). Second, there are many other reported instances in which patterns that emerge in certain sentence contexts are hypothesized to influence patterns in unrelated contexts that nonetheless may share certain features. Evidence from English, Portuguese and German suggests that children’s acquisition of relative clauses may depend on patterns in main clause utterances, with relative clauses that maintain more word order and morphological features of main clauses generally being easier to learn both within and between languages (Brandt, Diessel & Tomasello, 2008; Kidd & Bavin, 2002; Kidd, Brandt, Lieven & Tomasello, 2007). These results are consistent with the model presented in Chang (2009), which suggests that representations across similar structures, like the English main clause and the short-before-long dative, are shared, and it is these shared representations, and the relative frequencies of these shared representation that bring about short-before-long preferences in English and long-before-short preferences in Japanese. These results suggest a number of different regularities across sentence types: surface-level sentence word order, morphology or event structure may contribute to the observed phenomenon of active and passive relative clause frequencies in a language mirroring those of main clause active and passive sentence frequencies.

In addition, the adversity effect we found in Japanese passive utterances may be additional evidence for syntactic neighborhood effects, and generalization of event-lexico-syntactic pattern learning across different sentence types. Recall that Japanese has an intransitive passive not studied here, whose use is modulated by adversity; if Japanese speakers’ experience with this intransitive adversity passive contributes the passive inflection being associated with adverse events, speakers may generalize the adversity effect more broadly, to any context in which the any passive is used. This hypothesis is speculative but consistent with a linguistic analysis that the intransitive adversative passive in Japanese derives from an extension of the direct (transitive) passive, and that both are related to a single passive prototype (Shibatani, 1985). Future work may explain the contribution of neighborhood effects and generalization across similar structures in both language use and diachronic language change.

In sum, speakers seem to learn fine-grained patterns regarding the use of different words and structures to describe different events and event participants, so that subtle aspects of event adversity, noun animacy, and even aspects of verb type (F. Ferreira, 1994), individual verbs (Stallings et al., 1998), and word co-occurrences (Wasow, 1997) affect the tendency to produce particular lexico-syntactic combinations to describe different events. However, there also appears to be generalization from one neighborhood to another, where neighborhood here can be seen both as structure (main clause and relative clause) and event-structure, where intransitive adverse events may generalize to transitive ones in Japanese. While future research is necessary to further delineate how learned patterns may affect sentence production, these potential neighborhood effects are informative about the range of statistical regularities that language users are capable of learning and applying to online sentence production tasks, and they provide evidence of the types of pattern-learning abilities that contribute to human sentence production abilities.

Future Directions

The goal of the present work is not to simply say that “everything matters” in sentence production, but rather to identify the range and strength of factors that contribute to producers’ choices. Beyond documenting these forces, an important ongoing research goal is to identify why production choices have this character. We see at least part of the answer in the inherent difficulty of language production, as described in the Production-Distribution-Comprehension framework (PDC; MacDonald, 2013), which hypothesizes that producers make implicit choices of utterance forms that minimize production difficulty (for related claims for production efficiency, see Kurumada & Jaeger, 2015). The repeated pairings of events, words, and structures seen in the current studies are examples of Plan Reuse in this framework, where previously-uttered forms become easier as a function of implicit learning over past production and comprehension events (e.g., Bock & Griffin, 2000; Chang, 2009). The preference to map animate patients to early sentence positions or prominent grammatical roles can be seen as emergent both from plan reuse (Montag & MacDonald, 2014) and also from efficient interleaving of execution and planning in an incremental production system, promoting early execution of easier plan components (Bock & Warren, 1985; Branigan, Pickering & Tanaka, 2008). On this view, learning over past experience serves to reduce production difficulty. The results from these studies clearly support a role for learning and applying regularities across words, structures, and events. An important goal for future research will be to understand this learning—its limits, its origin in past productions and/or past comprehension of others, and how these experiences are weighed similarly or differently cross-linguistically.