The Ontogenesis of Action Syntax

Authors: {'first_name': 'Laura', 'last_name': 'Maffongelli'},{'first_name': 'Alessandro', 'last_name': "D'Ausilio"},{'first_name': 'Luciano', 'last_name': 'Fadiga'},{'first_name': 'Moritz M.', 'last_name': 'Daum'}


Language and action share similar organizational principles. Both are thought to be hierarchical and recursive in nature. Here we address the relationship between language and action from developmental and neurophysiological perspectives. We discuss three major aspects: The extent of the analogy between language and action; the necessity to extend research on the yet largely neglected aspect of action syntax; the positive contribution of a developmental approach to this topic. We elaborate on the claim that adding an ontogenetic approach will help to obtain a comprehensive picture about both the interplay between language and action and its development, and to answer the question whether the underlying mechanisms of detecting syntactic violations of action sequences are similar to or different from the processing of language syntactic violations.
Keywords: actionlanguagedevelopmentinfantssyntax 
 Accepted on 10 Apr 2019            Submitted on 11 Dec 2018

Language and action are fundamental for human social cognition (Repetto, Colombo, & Riva, 2012). However, the relationship between language and action, to date, is still controversially discussed (Caramazza, Anzellotti, Strnad, & Lingnau, 2014). On the one hand, traditional accounts of cognition assume conceptual representations as amodal, thus independent from representations in the perceptual and motor systems of the brain (Repetto et al., 2012). On the other hand, recent accounts argue that representations of different formats such as language and action are not independent from each other but (highly) intertwined (Egorova, Shtyrov, & Pulvermüller, 2016). Accordingly, actions and bodily experiences are considered necessary for cognitive processing and their interplay is crucial, especially regarding the development of higher order capabilities (Engel, Maye, Kurthen, & König, 2013).

The assumption that human cognition is grounded in action has similarly been adapted to theoretical considerations regarding human language processing (Glenberg & Gallese, 2012). Human language represents one of the most prominent characteristics of mankind and its primary function is to allow efficient communication between two interlocutors in social interactions. However, humans use various means to communicate, and do not rely exclusively on verbal communication but also use gestures, body movements, and facial expression to interact with others (Pezzulo et al., 2018). A comparative approach on the language-action interrelation is itself a long-standing debate in cognitive neuroscience (Boeckx & Fujita, 2014). Understanding whether language relies on a unique cognitive mechanism, or whether it has homologues in other cognitive domains is of central importance. Thus, it is necessary to investigate if language-specific computational components use neurocognitive mechanisms grounded in the human sensorimotor system (Pulvermüller & Fadiga, 2010; Rizzolatti & Craighero, 2004).

Taking into account recent experimental results (Maffongelli et al., 2015; Maffongelli, Antognini, & Daum, 2018) when addressing this question, we propose that one way to contribute to the on-going debate is to add a description of the language-action interrelation starting from two basic linguistic concepts, semantics and syntax, and to relate them to the perception and production of actions. Both concepts are assumed to be shared between language and action (e.g., Pulvermüller & Fadiga, 2010). Semantics refers to the meaning of language (e.g., words in a sentence) and action (bodily movements in an action sequence). Syntax refers to organizational rules governing how words or bodily movements are ordered to achieve a meaningful linguistic or action sequence (see Box 1). Thus, perceiving how (syntax) actions are performed and why (semantics) they are performed in a certain way is crucial for the observer to anticipate the actor’s goal and to prepare an appropriate response.

We wish to stress that in both domains, the separation between syntax and semantics is an artificial distinction between highly interdependent components, with the scope of simplifying their definition and investigation. Concerning syntax, psycholinguistics theories usually take for granted the central role of hierarchical sentence structure (Hauser, Chomsky, & Fitch, 2002) at all description levels of language, that is comprehension, production, and at the acquisition level. However, it has to be noted that the role given to language comprehension is still a matter of debate. It has indeed been proposed that hierarchical structures are not central for language use (Frank, Bod, & Christiansen, 2012) and that language structure is considered to be processed in a sequential fashion. In this view, the embedding characteristic, a central key component of hierarchical description of language, is not considered to be important for language processing (Goldberg, 2006; Tabor, Galantucci, & Richardson, 2004).

So far, the language-action interrelation has primarily been addressed with respect to action semantics, both in adults (Balconi & Vitaloni, 2014) and infants (Kaduk et al., 2016; Ní Choisdealbha & Reid, 2014). The violation of the semantic correctness within a sentence results in a similar brain response as the violation of expectancy of an observed action sequence – specifically, such events elicit a N400, an event-related potential (ERP) with a central-parietal topography. Accordingly, domain-independent, semantic principles seem to govern the predictive nature of upcoming events in a given information stream. A violation in prediction of single elements engages semantic-related brain operations independent of the kind of perceived information (Amoruso et al., 2013; Reid et al., 2009). Moreover, evidence derived from different experimental techniques suggests that the N400s reported for action and language share an anatomical similarity (i.e. they show overlapping activation of brain areas) and provides the existence of a widely distributed semantic network (for a Review see Amoruso et al., 2013). Although the semantic aspects of action processing and how this relates to aspects of language processing is taken for granted, the aspects concerning the syntactic processing of action are less discussed and investigated. In the following section we discuss theoretical assumptions supporting the language-action interrelation that might apply for syntactic processing as well. We then present first evidence coming from investigations in adults and infants pointing to the notion that language and action may share similar neural underpinnings as far as syntactic processing is concerned.

A look at the syntactic processing of linguistic and non-linguistic stimuli in adults

Recent evidence suggests that during action processing the brain performs syntactical operations similar to those used during language processing. A sentence, a grammatically structured unit of language, is based on a set of elements (i.e. words) that follow a well-defined scheme, governed by a precise hierarchical organization (Chomsky, 1957). Although controversially debated (Martins & Fitch, 2015; Vicari & Adenzato, 2014), it is generally assumed that language is of recursive nature and that recursion is language-specific: Linguistic elements and structures may be used repeatedly in sequence to form a part of a larger structure of the same kind (Hauser et al., 2002). Similarly, action sequences consist of basic elements (movements), organized in action chains, and subject to a hierarchical organization (Behmer & Crump, 2017; Lashley, 1951). A combination of basic units only results in a meaningful goal-directed action sequence if these units are put together in a correct sequence. In this vein, we assume an action structure to be hierarchical when single basic units are subordinated to other units. For example, the action of drinking water from a glass requires the succession of precise motor acts including grasping, lifting, tilting the glass at the correct location (the mouth), with the required angle, adjusting the angle to the amount of water remaining and so on (Pulvermüller & Fadiga, 2010). Here, the motor system can embed formerly acquired motor elements into action to initiate a new movement pattern that is structured similarly. This adaptation ability is known – in analogy to language – as recursion principle (Pastra & Aloimonos, 2012). For instance, as described by Pulvermüller (2014), the elementary rule open x to perform some other actions, close x can be recursively applied to increase the self-embedding levels of actions. Thus, the motor system can embed single action elements or structures (e.g., open x) into other similar structures to produce a potentially infinite number of more complex actions. In the language and action domains there are rules that concatenate and coordinate the order in which single elements are executed according to a hierarchical organization. This organization not only constrains the production of well-formed sequences, rather it likewise serves internal simulation processes (Jeannerod, 2001) and therefore impacts the prediction of our own and others’ behaviour. Accordingly, actions, similar to language, follow a syntactical organization with a specific temporal and hierarchical order of constitutional elements, and share the recursivity principle (Pulvermüller, 2014).

A candidate neural substrate for the hierarchical and recursive analysis of language is Broca’s area, located in the caudal part of the inferior frontal gyrus (IFG). This area is also involved in the processing of extra-linguistic tasks, such as action production and perception (Pulvermüller & Fadiga, 2010; van Schie, Toni, & Bekkering, 2006). Importantly, when looking at the clinical literature, lesions in and around the IFG are associated with Broca’s aphasia (Sirigu et al., 1998). Indeed, agrammatic patients are impaired in the extraction and generalization of the abstract structure underlying action sequences and linguistic sequences (Dominey, Hoen, Blanc, & Lelekov-Boissard, 2003; van Schie et al., 2006). Broca’s aphasic patients are impaired in the reconstruction of sequences depicting biological actions, as opposed to physical events (Clerget, Winderickx, Fadiga, & Olivier, 2009; Fazio et al., 2009; Pazzaglia, Smania, Corato, & Aglioti, 2008). These studies provide further evidence for the intriguing possibility that Broca’s area could represent both the hierarchy of linguistic and action sequences (Christiansen, Conway, & Onnis, 2012; Dominey et al., 2003; Pulvermüller & Fadiga, 2010).

Pursuing whether the ability to deal with complex hierarchical structures is language-specific or likewise applies to other cognitive systems, has led to the development of experimental paradigms, which focus on non-linguistic stimuli. Brain-imaging investigations on the processing of linguistic hierarchical structures considering center-embedded rules (AnBn) as opposed to adjacent dependency rules (AB)n in natural language1 (Friederici, Bahlmann, Heim, Schubotz, & Anwander, 2006; Tettamanti et al., 2002), in artificial grammar paradigms (Bahlmann, Schubotz, & Friederici, 2008) as well as in the visuo-spatial domain (Bahlmann, Schubotz, Mueller, Koester, & Friederici, 2009) suggest that manipulating the regularities within sequentially occurring linguistic and extra-linguistic stimuli engages Broca’s area.

First attempts investigating the processing of action syntax adopted the expectancy-violation paradigm, a widely-used method to investigate language syntax processing during the presentation of ungrammatical sentences, as assessed in electroencephalography (EEG) studies. In this paradigm the order of adjacent elements constituting the action sequence was disrupted. In this way the violation is perceived as such when for a given action goal, one action part is temporally prior or later to and necessary for the subsequent action part (see Figure 1 and Box 1). This kind of violation recalls the typological generalization regarding the majority of human languages, where the basic word order is either subject-object-verb or subject–verb–object. Specifically, such an arrangement is explained by the prototypicality of transitive action scenarios in which an animate agent acts on an inanimate patient (i.e. the entity upon whom an action is carried out) to induce a change of state. Indeed, actions, like verbs, show a similar argument-structure, which connects agents and objects (Comrie, 1989; Greenberg, 1963). Along these lines, it is suggested that violating the transitive relation between specific action parts will interrupt the essential agent-patient relation. These violations elicited brain responses in the observer typically associated with language syntactic violations processing (ELAN,/P600, Friederici, 2004; Kim & Osterhout, 2005), occurring in frontal brain regions (Maffongelli et al., 2015). Further, a study on the role of long-distance dependencies (as assessed in relative clauses constructions) in sentence and action processing supports the idea of a tight relationship between motor and linguistic structural processing (Casado et al., 2017). The non-linear self-administration task (involving a discontinuity in pressing a button with the foot depending on prior finger button press) as opposed to a linear one (involving a linear sequence of finger button presses) resulted in an increased P600 component, reflecting late syntactic processing. Taken together, these studies suggest that structural manipulations in both domains activate similar cortical regions of the adult brain (Maffongelli et al., 2015) and that the execution of a motor sequence driven by center-embedded relative clauses may share similar neural resources (Casado et al., 2017).

Box 1: Dichotomy of principles: Action Semantics and Action Syntax.

A detailed and comprehensive examination of the interrelation between language and action requires an unambiguous terminology, in particular with respect to the terminology of semantics and syntax. In the domain of language, semantics refers to meaning, whereas syntax refers to structure. In the domain of action, in particular in the literature on infants’ action processing, the terminology used is – in our view- too heterogeneous. Concepts belonging to action semantics are often described with a terminology that would be more appropriate to the description of hierarchical rules and action syntax. To exemplify, it is not clear what is meant by semantic rules (Reid et al., 2009; Reid & Striano, 2008) or semantic structure (Kaduk et al., 2016) when referring to action stimuli. Further, the use of the term sequential information (e.g., Reid et al., 2009) is ambiguous with respect to actions. It can be contrasted to the sequential information pertaining the action structure, that is, action syntax. For future investigation of the interrelation between language and action, it is essential to use an unambiguous terminology, where terms are specifically explained and consistently used in both domains to avoid contradictions and misconceptions.

So far, action semantics has often been described in analogy to the language N400 paradigm, in which words that do not fit within the preceding sentence context elicit a negativity over central-parietal brain sites. The classical example The pizza was too hot to… eat/cry (Kutas & Hillyard, 1984) shows that this response is specific to the processing of semantic information in sentences, in which the prediction of the final verb is violated. The increased N400 response for violations of expectations is used as a marker of semantic processing in a broad sense (Kutas & Federmeier, 2011). In action, this marker has been observed in manipulations occurring along various dimensions such as action purpose, inappropriateness, or plausibility of events occurring in an action sequence (for a detailed description see Amoruso et al., 2013; Kutas & Federmeier, 2011).

At a general level of description and starting from a general definition of the N400 marker shared between language and action, we propose a dichotomy describing what the two action principles, semantics and syntax might represent in the action domain. Action semantics, as suggested by other authors, describes the build-up of meaning arising from the expectancy created by contextual information and previous experiences (Amoruso et al., 2013). Along these lines, the interpretation of others’ action is context-dependent. In case the processed event information fits within the previous context, the processing of upcoming information is facilitated. On the contrary, when this information does not fit with prior prediction a mismatch occurs, resulting in a similar N400 component as in language.

We define as action syntax a sequence of bodily movements (movements, motor acts, actions) required to follow a given order for an overall goal to be achieved. The correctness of the movement sequence is the basic element for the achievement of the action and, as a consequence, inevitable for an understanding of the action sequence. In case of action steps being presented in a different order, the rules underlying the action hierarchy are violated. At the neural level this results in specific brain signatures, typically elicited when the rules underlying language hierarchy are violated (ELAN/P600 in adults; late positivity effects in infants).

Figure 1 

Example of experimental stimulus used for the investigation of action semantics and action syntax in the mature brain. In the upper panel of the figure a correct action sequence for the action “to prepare a coffee” is depicted. In the middle of the figure the zoom of two individual frames belonging to the sequence are represented to show the syntactic (red frame) and semantic (green frame) manipulation of the action sequence. Figure adapted and modified from Maffongelli et al. 2015.

Processing sophisticated structure events: The valuable contribution of a developmental approach to the study of action syntax and first experimental evidence

The perception of complex structure events in language, such as the perception of cue boundaries in speech, seems to be an easy task already at a young age. Indeed, human speech puts together sounds into a linear speech stream to convey complex meanings. Talkers hardly make a pause between words. Instead, one word usually glides into the other. Infants, similar to adults, use boundary cues (e.g., pause, preboundary pitch-change) to segment the incoming speech into prosodic phrases. For example, 8- to 10-month-olds detect speech units embedded in a continuous speech flow using as cue the speaker’s intonation (Jusczyk, 1997). Further, considering phonotactics, newborns are sensitive to a universal phonological constraint concerning the internal structure of syllables, revealing that certain syllable structures are preferred to others (e.g., lbif is disprefered to blif;Gómez et al., 2014). Additionally, taking into account the structural regularities of speech, it has been shown that newborns are able to detect speech structure. For instance, the immediate repetition of auditory syllable sequences, such as ABB as opposed to control sequences such as ABC, causes an increased response to repetition sequences over temporal and left frontal brain areas (Gervain, Macagno, Cogoi, Peña, & Mehler, 2008), suggesting that the newborn brain is sensitive only to adjacent repetitions and that in general it is able to detect structural regularities. Moreover, 3-month-olds already manifest adaptation to regular temporal sequences and react differently when regularities are violated in an auditory local-global violation paradigm (Basirat, Dehaene, & Dehaene-Lambertz, 2014). Taking into account the ability of young children to retain sequential order information of words in sentences, 2-month-olds detect changes in word order when these are embedded within a meaningful prosodic structure as compared to the detection of sentential fragments (Mandel, Nelson, & Jusczyk, 1996). Together, previous research indicates that sequential information promotes memories for speech, and that the infant brain detects violations pertaining to the hierarchical structure underlying speech, suggesting that starting very early on in development infants process linguistic content and already show first manifestation of hierarchical processing.

As far as language production is concerned, around 24 months of age, children start to combine words into multiword sentences (Lieven, Behrens, Speares, & Tomasello, 2003). Only from the age of 3, children begin to form syntactically correct short sentences (Silva-Pereyra, Rivera-Gaxiola, & Kuhl, 2005). The understanding of passive or non-canonical object-first word order constructions appear to be difficult tasks for children until at least 4 years of age (e.g., Schipke, Knoll, Friederici, & Oberecker, 2012). Two years later they correctly understand embedded relative clauses (de Villiers, Tager Flusberg, Hakuta, & Cohen, 1979) and can produce embedded structures (e.g., Clahsen & Hansen, 2012).

With this in mind, we shift the focus on the development of action processing and its hierarchical characteristics. The scientific evidence resulting from research with adults (Maffongelli et al., 2015) is –in our view- not conclusive to demonstrate the specialized brain mechanisms controlling action syntax processing. Adults are highly language proficient and brain activity during action syntax violation tasks may suffer from a fundamental confound: when observing particular familiar actions, adults might represent observed actions in linguistic terms (e.g., using silent verbalization/implicit descriptions). The observed neural connection between language and action might thus be independent of a shared action network (Perani et al., 1999). The question whether language and action develop separately or not is a long-standing debate based on two contrasting views. On the one hand the step-wise view of development suggests the interrelation between perception, cognition and language (e.g., Barsalou, 2008; Bruner, 1964). On the other hand, a parallel development of cognition and language has been proposed (e.g., Mandler, 1988). Both views stand in contrast to the Chomskyan perspective, suggesting that language is innate and that it does not relate to any other representation (Chomsky, 2006). Recently, neuroconstructivism theories suggest a bidirectional interaction within neural structures, functions, genes and the environment as the basis of cognitive development (Johnson & De Haan, 2015; Westermann et al., 2007). Hereby, the specialisation of neural structures is taken to be driven by experience, understood as both the external inputs/environmental inputs and the bidirectional interactions between the different levels of analysis (Westermann et al., 2007).

Accordingly, in adults, language and action are so intertwined that unveiling the specific impact of each one of them in terms of brain mechanisms is challenging. Developmental research, conducted at an age when both systems start to become functional, is essential to increase our knowledge on the language-action interrelation. In this way, it will be possible to clarify whether brain mechanisms involved in processing language and action syntax are from the beginning distinct, already share core neural functions, or become related/separated with increasing age and experience. We suggest that taking a developmental perspective can provide insights into the nature of the interrelation of mechanisms by looking at the relation between changes in brain structure/function and cognition and behaviour (Crone, Poldrack, & Durston, 2010), at least when considering a system in which language production is not yet at work. Indeed, considering children who do not yet produce verbs (i.e. action words), provides the unique opportunity to study the characteristics of action processing in isolation, that is, before being influenced by the productive language, which will lead to reorganization of conceptual knowledge (Göksun, Hirsh-Pasek, & Michnick Golinkoff, 2010).

Compared to language studies in infancy, the question about the way syntactical regularities underlying the achievement of action goals are processed early in life was insufficiently considered. A substantial amount of previous research focused on children’s processing of action semantics. These studies showed that observing action sequences, in which the action outcome was unexpected (e.g., a cup was brought to the ear) elicited a greater EEG mu-desynchronization (as indicator of motor activation; Marshall and Meltzoff, 2011) in 12-month-olds when compared to an expected action endstate (e.g., a cup was brought to the mouth; Stapel, Hunnius, van Elk, & Bekkering, 2010). Similarly, 6-month-olds anticipated an unexpected action outcome less frequently (Hunnius & Bekkering, 2010) showing a stronger expectancy-violation response (indicated by a larger EEG N400 amplitude; Kaduk et al., 2016; Reid et al., 2009) than an expected outcome.

Around their first birthday, infants are able to predict one single subsequent action step (Paulus, 2011). One year later, they understand that different action steps are interconnected (Paulus, Hunnius, & Bekkering, 2011). Learning the planning and coordination of joint actions leading to a common goal develops between 3 and 6 years of age (Paulus, 2016; Warneken, Steinwender, Hamann, & Tomasello, 2014). At the same age children are able to control that the end of goal-directed movements are executed avoiding uncomfortable postures (Knudsen, Henning, Wunsch, Weigelt, & Aschersleben, 2012), and they can structure action steps to follow a hierarchy of rules (Freier, Cooper, & Mareschal, 2017). Further, tool-use and tool-making abilities also develop in the same age range. All together, these findings suggest that this ability relies on hierarchical action control (Gönül, Takmaz, Hohenberger, & Corballis, 2018).

Few studies investigated infants’ responses to the interruption of an on-going intentional action. Already infants can detect that an action is stopped before its goal has been reached. Eight-month-olds showed stronger EEG gamma activity over left-frontal regions during the observation of incomplete vs. complete actions (Reid, Csibra, Belsky, & Johnson, 2007). In behavioural studies, 6-month-olds inferred the goal of an uncompleted reaching action (i.e. the reaching movement was stopped between the starting position and the position of the object to be grasped; (Daum, Prinz, & Aschersleben, 2008), and 10- and 11-month-olds reacted with increased attention when actions were paused within action sequences compared to being paused at intentional action boundaries (Baldwin, Baird, Saylor, & Clark, 2001). This suggests that infants, like adults (Zacks, Tversky, & Iyer, 2001), represent action goals and extract relevant information based on recurrent regularities, and rely on syntactical properties of perceived actions to categorize, and make sense of others’ actions (Baldwin et al., 2001; Saylor, Baldwin, Baird, & LaBounty, 2007).

Moreover, imitation studies showed that children starting early on in development, tend to imitate hierarchically executed action sequences more than executed action sequences based on simple juxtaposition of action events (Whiten, Flynn, Brown, & Lee, 2006). At test, they organize new action sequences following hierarchical rules applied to the previously observed action sequences, performed by an experimenter. They therefore transfer the knowledge of hierarchical planning of action. This further supports the hypothesis that children are sensitive to hierarchical structure (Bauer, Hertsgaard, Dropik, & Daly, 1998; Want & Harris, 2001) and that the hierarchical organization of action may be important for action understanding.

Recently, following the paradigm introduced by Maffongelli and colleagues (2015), we investigated the neural processing of structural violations of observed actions in infants at 6–7 months (Maffongelli et al., 2018). We presented sequences of familiar goal-directed actions either in the correct temporal order (control condition) or we inverted the order of two temporally adjacent steps of the sequence (violation condition) (see Figure 2). Importantly, for each action, one particular hierarchical order was necessary; the action goal could only be achieved if the order of the elements constituting the sequence was preserved. By changing the order of the individual steps of the actions, we violated the action structure. Indeed, in the incorrect position, the structure of the action sequence and the respective within-sequence subgoals and their related expectedness were explicitly violated. The processing of such a violation resulted in bilateral frontal positivity effects in the EEG. The positivity effects might reflect – in analogy to language studies (e.g., Friederici, 2004) – the reanalysis of the processed sequences and the structural reintegration of the inverted frame. This study adds a crucial element to the comprehension of general syntactic regularities and their violation from an ontogenetic perspective, and suggests that infants capture structural regularities.

Figure 2 

Experimental design and action sequence structure. Panel 1a: Example of an action sequence in the control condition (eating action sequence). Following the presentation of the context picture (A), all other pictures were presented. Each picture was presented for 1200 ms. Panel 1b: Example of the same action sequence in the violation condition. The order of the pictures in this condition was equal to the sequence represented in panel 1a except for the two pictures depicted. Pictures highlighted within the dotted black frame show the point in which the inversion of the temporal order of two adjacent pictures occurred (black arrow). Pictures with the red outline and connected by the red arrow represent the critical frames considered in the analysis: F (control) belonging to the control condition and F (violation) belonging to the violation condition. Figure adapted from Maffongelli et al., 2018.

How to further investigate this issue in infants: Proposal for experiments

Taking a developmental approach to investigate the onset of the language-action interrelation goes beyond previously suggested comparative studies in adults. It allows to explore whether, when, and how brain mechanisms involved in the processing of language and action structure become related. Potential questions that guide future research are: When and how do these cognitive and neural processes become similar to those elicited in adults? How is a manipulation of the action hierarchy processed? Does an increase of the complexity of the action hierarchy elicit different brain signatures as compared to a less complex one?

To tackle the hypothesis that already in infants – therefore in the absence of productive language skills – it is possible to track the development of the perception of such a complex system, we can benefit from paradigms used in research with adults. Beside the use of a violation-paradigm as in Maffongelli and colleagues (2015, 2018), another fruitful approach may be to study how the action sequence complexity is processed in the infant brain. Two potential paradigms might be applicable. First, the use of a long-dependency rule established between action parts: Increasing the dependency length between observed single action parts we might expect effects correlating with the dependency length, in analogy to the relative clauses processing in language (Phillips, Kazanina, & Abada, 2005) as well as in the action execution domain (Casado et al., 2017). Second, hierarchical sequences of action elements (AnBn) can be compared to adjacent dependency structures (AB)n. As suggested by studies on natural (Friederici et al., 2006; Tettamanti et al., 2002) and artificial grammar (Bahlmann et al., 2008), the hierarchical embedding should result in an activity increase in frontal brain regions.

A suitable technique to further investigate this issue in infants is electroencephalography (de Haan, 2013). With its precise temporal resolution, it provides an exemplary method in the investigation of perception aspects. Further, based on research with adults, the neurophysiological signature is well-defined and described for both language and action processing. Thereby, the comparison with language studies might be more direct, since one can test and adapt language-relevant paradigms and language-relevant ERPs to action processing paradigms. Moreover, research focusing on “action mirroring” suggests that cortical motor activity is modulated by action observation (e.g., Hari, 2006; Marshall & Meltzoff, 2011; Rizzolatti & Craighero, 2004). For example, sensorimotor oscillations are modulated during the observation of erroneous or unexpected actions (Meyer, Braukmann, Stapel, Bekkering, & Hunnius, 2016; Stapel et al., 2010). Therefore, in particular in the case of the temporal disruption of action sequences, it is likely to find the same attenuation of oscillatory power and might be further interpreted as an index of sensorimotor involvement in the processing of structural incongruences in action sequences.

A second suitable means for the investigation of the ontogenesis of the interrelation between language and action is the use of Eye-tracking. In particular, the dilation and constriction of participant’s pupil during the perception of events is an efficient marker of arousal, cognitive load, and surprise in the observer (Aslin, 2012; Gredebäck, Johnson, & von Hofsten, 2010; Porter, Troscianko, & Gilchrist, 2007). The idea is that action sequence incongruences either assessed by elements disruption in the temporal sequence of events or by the processing of increasing complexity of action, can provide information on action processing mechanisms. In this case, if the observed action sequence presents for example structural incongruences a difference in pupil dilation should be visible compared to when the same action sequence is presented without incongruence.


In this paper, we focused on the interrelation between language and action. We highlighted findings of previous research and suggest an extended approach with emphasis on two yet under-researched aspects: (1) action-syntax processing and (2) ontogenetic approach to the topic (Caramazza et al., 2014; Meteyard, Cuadrado, Bahrami, & Vigliocco, 2012). This may help to translate the broad knowledge about syntactical aspects of language to the domain of action, where our knowledge about syntax is still strongly limited. With the current overview and outlook, we suggest that future investigations are essential to comprehensively describe the language-action interrelation and for understanding how action is related to language in ontogeny (Egorova et al., 2016; Leshinskaya & Caramazza, 2016).

Importantly, we do not claim that this relation can be investigated with respect to all facets of language phenomena. Language is unique to humans and some linguistic operations (e.g., active/passive constructions) or linguistic categories (e.g., nouns, verbs) cannot be translated into action terms (Moro, 2014). Another limitation refers to the degrees of freedom problem. Degree of freedom refers to the number of independent elements of a system and the problem occurs when a complex system needs to be organized to produce a result (for example, the movement of articulators needed to produce a specific sound) (Magill, 2001). It might be argued that the degrees of freedom problem is much larger for action than it is for language. Language has arbitrary boundaries (e.g., morphological markers leading to acceptable/unacceptable sentences, based on conventions and language specific) reducing the possible degrees of freedom any sentence can have. Contrarily, considering movements as the lowest unit of action, degrees of freedom come at first sight from what is biomechanically possible rather than from the more arbitrary (i.e. language specific) rules derived from syntax and/or morphology.

Given these foundational differences, the questions that should be addressed in comparative studies are: what are the mechanisms of action syntax, and how do those mechanisms relate to the mechanisms of other cognitive domains. We might focus at a more abstract level by avoiding language concepts’ adaptation to action. Therefore, one can think that the interrelation between action and language could be better studied taking the reverse perspective, that is, starting from specific action markers and trying to transfer these to language. However, this is an approach that might be taken only after having investigated in-depth how a manipulation of action semantics and action syntax is reflected in the human brain. This would also support the evolutionary proposal posing the possibility that language develops from action (e.g., Leroy-Gourhan, 1964; Rizzolatti & Arbib, 1998; Rizzolatti & Craighero, 2004). In sum, the approach discussed in this paper provides an informative starting point for future research on the basic architecture of human cognition.


1(AnBn) rules involve the processing of elements of the two categories A and B with a hierarchical, complex dependency. On the contrary, adjacent-dependency rules involve the processing of local transitions between two kinds of categories (A and B). 

Funding Information

This work was supported by the Forschungskredit University of Zurich awarded to LM under Grant no. FK-15-077; Italian Ministry of University PRIN funds to LF.

Competing Interests

The authors have no competing interests to declare.

Authors Contributions

LM, AD, LF, MMD conceived the ideas and wrote the paper. All authors gave final approval for publication.


  1. Amoruso, L., Gelormini, C., Aboitiz, F., Alvarez González, M., Manes, F., Cardona, J. F., & Ibanez, A. (2013). N400 ERPs for actions: Building meaning in context. Frontiers in Human Neuroscience, 7(3), 57. DOI: 

  2. Aslin, R. N. (2012). Infant eyes: A window on cognitive development. Infancy, 17(1), 126–140. DOI: 

  3. Bahlmann, J., Schubotz, R. I., & Friederici, A. D. (2008). Hierarchical artificial grammar processing engages Broca’s area. NeuroImage, 42(2), 525–34. DOI: 

  4. Bahlmann, J., Schubotz, R., Mueller, J. L., Koester, D., & Friederici, A. D. (2009). Neural circuits of hierarchical visuo-spatial sequence processing. Brain Research, 1298, 161–70. DOI: 

  5. Balconi, M., & Vitaloni, S. (2014). N400 effect when a semantic anomaly is detected in action representation. A source localization analysis. Journal of Clinical Neurophysiology, 31(1), 58–64. DOI: 

  6. Baldwin, D. A., Baird, J. A., Saylor, M. M., & Clark, M. A. (2001). Infants parse dynamic action. Child Development, 72(3), 708–717. DOI: 

  7. Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59(1), 617–645. DOI: 

  8. Basirat, A., Dehaene, S., & Dehaene-Lambertz, G. (2014). A hierarchy of cortical responses to sequence violations in three-month-old infants. Cognition, 132(2), 137–50. DOI: 

  9. Bauer, P. J., Hertsgaard, L. A., Dropik, P., & Daly, B. P. (1998). When even arbitrary order becomes important: developments in reliable temporal sequencing of arbitrarily ordered events. Memory (Hove, England), 6(2), 165–98. DOI: 

  10. Behmer, L. P., & Crump, M. J. C. (2017). The dynamic range of response set activation during action sequencing. Journal of Experimental Psychology: Human Perception and Performance, 43(3), 537–554. DOI: 

  11. Boeckx, C. A., & Fujita, K. (2014). Syntax, action, comparative cognitive science, and darwinian thinking. Frontiers in Psychology. DOI: 

  12. Bruner, J. S. (1964). The course of cognitive growth. American Psychologist, 19(1), 1–15. DOI: 

  13. Caramazza, A., Anzellotti, S., Strnad, L., & Lingnau, A. (2014). Embodied cognition and mirror neurons: A critical assessment. Annual Review of Neuroscience, 37(1), 1–15. DOI: 

  14. Casado, P., Martín-Loeches, M., León, I., Hernández-Gutiérrez, D., Espuny, J., Muñoz, F., de Vega, M., et al. (2017). When syntax meets action: Brain potential evidence of overlapping between language and motor sequencing. Cortex. DOI: 

  15. Chomsky, N. (1957). Syntactic Structure. The Hague: Mouton. 

  16. Chomsky, N. (2006). Language and mind. Cambridge University Press. DOI: 

  17. Christiansen, M. H., Conway, C. M., & Onnis, L. (2012). Similar neural correlates for language and sequential learning: Evidence from event-related brain potentials, 27(2), 231–256. DOI: 

  18. Clahsen, H., & Hansen, D. (2012). Profiling Linguistic Disability in German-Speaking Children. In P. Ball, M. Crystal, & D. Fletcher (Eds.), Assessing grammar: The languages of LARSP (pp. 77–91). Bristol: Multilingual Matters. DOI: 

  19. Clerget, E., Winderickx, A., Fadiga, L., & Olivier, E. (2009). Role of Broca’s area in encoding sequential human actions: a virtual lesion study. Neuroreport, 20(16), 1496–9. DOI: 

  20. Comrie, B. (1989). Language universals and linguistic typology. Cambridge, MA: MIT Press. 

  21. Crone, E. A., Poldrack, R. A., & Durston, S. (2010). Challenges and methods in developmental neuroimaging. Human Brain Mapping, 31(6), 835–837. DOI: 

  22. Daum, M. M., Prinz, W., & Aschersleben, G. (2008). Encoding the goal of an object-directed but uncompleted reaching action in 6- and 9-month-old infants. Developmental Science, 11(4), 607–619. DOI: 

  23. de Haan, M. (2013). Infant EEG and event-related potentials. Hove & New York: Taylor & Francis Group. DOI: 

  24. de Villiers, J. G., Tager Flusberg, H. B., Hakuta, K., & Cohen, M. (1979). Children’s comprehension of relative clauses. Journal of Psycholinguistic Research, 8(5), 499–518. DOI: 

  25. Dominey, P. F., Hoen, M., Blanc, J.-M., & Lelekov-Boissard, T. (2003). Neurological basis of language and sequential cognition: Evidence from simulation, aphasia, and ERP studies. Brain and Language, 86(2), 207–225. DOI: 

  26. Egorova, N., Shtyrov, Y., & Pulvermüller, F. (2016). Brain basis of communicative actions in language. NeuroImage, 125, 857–867. DOI: 

  27. Engel, A. K., Maye, A., Kurthen, M., & König, P. (2013). Where’s the action? The pragmatic turn in cognitive science. Trends in Cognitive Sciences, 17(5), 202–209. DOI: 

  28. Fazio, P., Cantagallo, A., Craighero, L., D’Ausilio, A., Roy, A. C., Pozzo, T., Fadiga, L., et al. (2009). Encoding of human action in Broca’s area. Brain: A Journal of Neurology, 132(Pt 7), 1980–8. DOI: 

  29. Frank, S. L., Bod, R., & Christiansen, M. H. (2012). How hierarchical is language use? Proceedings. Biological Sciences, 279(1747), 4522–31. DOI: 

  30. Freier, L., Cooper, R. P., & Mareschal, D. (2017). Preschool children’s control of action outcomes. Developmental Science, 20(2), e12354. DOI: 

  31. Friederici, A., Bahlmann, J. R., Heim, S., Schubotz, R. I., & Anwander, A. (2006). The brain differentiates human and non-human grammars: Functional localization and structural connectivity. Proceedings of the National Academy of Sciences, 103(7), 2458–2463. DOI: 

  32. Friederici, A. D. (2004). Event-related brain potential studies in language. Current Neurology and Neuroscience Reports, 4(6), 466–470. DOI: 

  33. Gervain, J., Macagno, F., Cogoi, S., Peña, M., & Mehler, J. (2008). The neonate brain detects speech structure. Proceedings of the National Academy of Sciences of the United States of America, 105(37), 14222–7. DOI: 

  34. Glenberg, A. M., & Gallese, V. (2012). Action-based language: A theory of language acquisition, comprehension, and production. Cortex, 48(7), 905–922. DOI: 

  35. Göksun, T., Hirsh-Pasek, K., & Michnick Golinkoff, R. (2010). Trading Spaces: Carving up Events for Learning Language. Perspectives on Psychological Science, 5(1), 33–42. DOI: 

  36. Goldberg, A. E. (2006). Constructions at work: the nature of generalization in language. Oxford University Press. 

  37. Gómez, D. M., Berent, I., Benavides-Varela, S., Bion, R. A. H., Cattarossi, L., Nespor, M., & Mehler, J. (2014). Language universals at birth. Proceedings of the National Academy of Sciences of the United States of America, 111(16), 5837–41. DOI: 

  38. Gönül, G., Takmaz, E. K., Hohenberger, A., & Corballis, M. (2018). The cognitive ontogeny of tool making in children: The role of inhibition and hierarchical structuring. Journal of Experimental Child Psychology, 173, 222–238. DOI: 

  39. Gredebäck, G., Johnson, S., & von Hofsten, C. (2010). Eye tracking in infancy research. Developmental Neuropsychology, 35(1), 1–19. DOI: 

  40. Greenberg, J. H. (1963). Some universals of grammar with particular reference to the order of meaningful elements. Universals of language. In J. H. Greenberg, (Ed.). Cambridge, MA: MIT Press. 

  41. Hari, R. (2006). Action-perception connection and the cortical mu rhythm. Progress in Brain Research, 159, 253–60. DOI: 

  42. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science (New York, N.Y.), 298(5598), 1569–79. DOI: 

  43. Hunnius, S., & Bekkering, H. (2010). The early development of object knowledge: A study of infants’ visual anticipations during action observation. Developmental Psychology, 46(2), 446–454. DOI: 

  44. Jeannerod, M. (2001). Neural simulation of action: A unifying mechanism for motor cognition. NeuroImage, 14(1), S103–S109. DOI: 

  45. Johnson, M. H., & de Haan, M. (2015). Developmental cognitive neuroscience: an introduction. Retrieved from 

  46. Jusczyk, P. W. (1997). Finding and Remembering Words. Current Directions in Psychological Science, 6(6), 170–174. DOI: 

  47. Kaduk, K., Bakker, M., Juvrud, J., Gredebäck, G., Westermann, G., Lunn, J., & Reid, V. M. (2016). Semantic processing of actions at 9 months is linked to language proficiency at 9 and 18 months. Journal of Experimental Child Psychology, 151, 96–108. DOI: 

  48. Kim, A., & Osterhout, L. (2005). The independence of combinatory semantic processing: Evidence from event-related potentials. Journal of Memory and Language, 52(2), 205–225. DOI: 

  49. Knudsen, B., Henning, A., Wunsch, K., Weigelt, M., & Aschersleben, G. (2012). The End-State Comfort Effect in 3- to 8-Year-Old Children in Two Object Manipulation Tasks. Frontiers in Psychology, 3, 445. DOI: 

  50. Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621–647. DOI: 

  51. Kutas, M., & Hillyard, S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307(5947), 161–163. DOI: 

  52. Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms in behavior: The hixon-symposium (pp. 112–147). New York: Wiley. 

  53. Leroy-Gourhan, A. (1964). Le Geste et la Parole. Paris. 

  54. Leshinskaya, A., & Caramazza, A. (2016). For a cognitive neuroscience of concepts: Moving beyond the grounding issue. Psychonomic Bulletin & Review, 23(4), 991–1001. DOI: 

  55. Lieven, E., Behrens, H., Speares, J., & Tomasello, M. (2003). Early syntactic creativity: a usage-based approach. Journal of Child Language, 30(2), 333–70. Retrieved from DOI: 

  56. Maffongelli, L., Antognini, K., & Daum, M. M. (2018). Syntactical regularities of action sequences in the infant brain: When structure matters. Developmental Science, (in press). DOI: 

  57. Maffongelli, L., Bartoli, E., Sammler, D., Kölsch, S., Campus, C., Olivier, E., D’Ausilio, A., et al. (2015). Distinct brain signatures of content and structure violation during action observation. Neuropsychologia, 75. DOI: 

  58. Magill, R. A. (2001). Augmented feedback in motor skill acquisition. In C. M. Singer, N. R. Hausenbas, & H. A. Janelle (Eds.). New York: John Wiley and Sos. 

  59. Mandel, D. R., Nelson, D. G. K., & Jusczyk, P. W. (1996). Infants remember the order of words in a spoken sentence. Cognitive Development 11(2): 181–196. Retrieved from DOI: 

  60. Mandler, J. M. (1988). How to build a baby: On the development of an accessible representational system. Cognitive Development, 3(2), 113–136. DOI: 

  61. Marshall, P. J., & Meltzoff, A. N. (2011). Neural mirroring systems: Exploring the EEG mu rhythm in human infancy. Developmental Cognitive Neuroscience, 1(2), 110–123. DOI: 

  62. Martins, M. D., & Fitch, W. T. (2015). Do we represent intentional action as recursively embedded? The answer must be empirical. A comment on Vicari and Adenzato (2014). Consciousness and Cognition, 38, 16–21. DOI: 

  63. Meteyard, L., Cuadrado, S. R., Bahrami, B., & Vigliocco, G. (2012). Coming of age: A review of embodiment and the neuroscience of semantics. Cortex, 48(7), 788–804. DOI: 

  64. Meyer, M., Braukmann, R., Stapel, J. C., Bekkering, H., & Hunnius, S. (2016). Monitoring others’ errors: The role of the motor system in early childhood and adulthood. British Journal of Developmental Psychology, 34(1), 66–85. DOI: 

  65. Moro, A. (2014). Response to Pulvermüller: the syntax of actions and other metaphors. Trends in Cognitive Sciences, 18(5), 221. DOI: 

  66. Ní Choisdealbha, Á., & Reid, V. (2014). The developmental cognitive neuroscience of action: semantics, motor resonance and social processing. Experimental Brain Research, 232(6), 1585–1597. DOI: 

  67. Pastra, K., & Aloimonos, Y. (2012). The minimalist grammar of action. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585), 103–117. DOI: 

  68. Paulus, M. (2011). How infants relate looker and object: evidence for a perceptual learning account of gaze following in infancy. Developmental Science, 14(6), 1301–1310. DOI: 

  69. Paulus, M. (2016). The development of action planning in a joint action context. Developmental Psychology, 52(7), 1052–1063. DOI: 

  70. Paulus, M., Hunnius, S., & Bekkering, H. (2011). Can 14- to 20-month-old children learn that a tool serves multiple purposes? A developmental study on children’s action goal prediction. Vision Research, 51(8), 955–960. DOI: 

  71. Pazzaglia, M., Smania, N., Corato, E., & Aglioti, S. M. (2008). Neural underpinnings of gesture discrimination in patients with limb apraxia. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 28(12), 3030–41. DOI: 

  72. Perani, D., Cappa, S. F., Schnur, T., Tettamanti, M., Collina, S., Rosa, M. M., & Fazio, F. (1999). The neural correlates of verb and noun processing a PET study. Brain, 122(12), 2337–2344. DOI: 

  73. Pezzulo, G., Donnarumma, F., Dindo, H., D’Ausilio, A., Konvalinka, I., & Castelfranchi, C. (2018). The body talks: Sensorimotor communication and its brain and kinematic signatures. Physics of Life Reviews. DOI: 

  74. Phillips, C., Kazanina, N., & Abada, S. H. (2005). ERP effects of the processing of syntactic long-distance dependencies. Cognitive Brain Research, 22(3), 407–428. DOI: 

  75. Porter, G., Troscianko, T., & Gilchrist, I. D. (2007). Effort during visual search and counting: Insights from pupillometry. The Quarterly Journal of Experimental Psychology, 60(2), 211–229. DOI: 

  76. Pulvermüller, F. (2014). The syntax of action. Trends in Cognitive Sciences, 18(5), 219–20. DOI: 

  77. Pulvermüller, F., & Fadiga, L. (2010). Active perception: Sensorimotor circuits as a cortical basis for language. Nature Reviews. Neuroscience, 11(5), 351–60. DOI: 

  78. Reid, V. M., Csibra, G., Belsky, J., & Johnson, M. H. (2007). Neural correlates of the perception of goal-directed action in infants. Acta Psychologica, 124(1), 129–138. DOI: 

  79. Reid, V. M., Hoehl, S., Grigutsch, M., Groendahl, A., Parise, E., & Striano, T. (2009). The neural correlates of infant and adult goal prediction: Evidence for semantic processing systems. Developmental Psychology, 45(3), 620–629. DOI: 

  80. Reid, V. M., & Striano, T. (2008). N400 involvement in the processing of action sequences. Neuroscience Letters, 433(2), 93–97. DOI: 

  81. Repetto, C., Colombo, B., & Riva, G. (2012). The link between action and language: recent findings and future perspectives. Biolinguistics, 6(3–4), 462–474. 

  82. Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21(5), 188–94. Retrieved from DOI: 

  83. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169–92. DOI: 

  84. Saylor, M. M., Baldwin, D. A., Baird, J. A., & LaBounty, J. (2007). Infants’ on-line segmentation of dynamic human action. Journal of Cognition and Development, 8(1), 113–128. DOI: 

  85. Schipke, C. S., Knoll, L. J., Friederici, A., & Oberecker, R. (2012). Preschool children’s interpretation of object-initial sentences: Neural correlates of their behavioral performance. Developmental Science, 15(6), 762–774. DOI: 

  86. Sirigu, A., Cohen, L., Zalla, T., Eeckhout, P., Van Grafman, J., Agid, Y., Neuroscience, C., et al. (1998). Distinct frontal regions for processing sentence syntax and story grammar. Cortex, 1, 771–778. DOI: 

  87. Stapel, J. C., Hunnius, S., van Elk, M., & Bekkering, H. (2010). Motor activation during observation of unusual versus ordinary actions in infancy. Social Neuroscience, 5(5–6), 451–460. DOI: 

  88. Tabor, W., Galantucci, B., & Richardson, D. (2004). Effects of merely local syntactic coherence on sentence processing. Journal of Memory and Language, 50(4), 355–370. DOI: 

  89. Tettamanti, M., Alkadhi, H., Moro, A., Perani, D., Kollias, S., & Weniger, D. (2002). Neural correlates for the acquisition of natural language syntax. NeuroImage, 17(2), 700–9. DOI: 

  90. van Schie, H. T., Toni, I., & Bekkering, H. (2006). Comparable mechanisms for action and language: Neural systems behind intentions, goals, and means. Cortex, 42(4), 495–498. DOI: 

  91. Vicari, G., & Adenzato, M. (2014). Is recursion language-specific? Evidence of recursive mechanisms in the structure of intentional action. Consciousness and Cognition, 26, 169–88. DOI: 

  92. Want, S. C., & Harris, P. L. (2001). Learning from other people’s mistakes: causal understanding in learning to use a tool. Child Development, 72(2), 431–43. DOI: 

  93. Warneken, F., Steinwender, J., Hamann, K., & Tomasello, M. (2014). Young children’s planning in a collaborative problem-solving task. Cognitive Development, 31, 48–58. DOI: 

  94. Westermann, G., Mareschal, D., Johnson, M. H., Sirois, S., Spratling, M. W., & Thomas, M. S. C. (2007). Neuroconstructivism. Developmental Science, 10(1), 75–83. DOI: 

  95. Whiten, A., Flynn, E., Brown, K., & Lee, T. (2006). Imitation of hierarchical action structure by young children. Developmental Science, 9(6), 574–82. DOI: 

  96. Zacks, J. M., Tversky, B., & Iyer, G. (2001). Perceiving, remembering, and communicating structure in events. Journal of Experimental Psychology. General, 130(1), 29–58. DOI: 

Peer Review Comments

The author(s) of this paper chose the Open Review option, and the peer review comments are available at: