Joint Attention Without Recursive Mindreading: On the Role of Second-Person Engagement

León, Felipe

Published in: Philosophical Psychology

DOI: 10.1080/09515089.2021.1917533

Publication date: 2021

Document version Peer reviewed version

Citation for published version (APA): León, F. (2021). Joint Attention Without Recursive Mindreading: On the Role of Second-Person Engagement. Philosophical Psychology, 34(4), 550-580. https://doi.org/10.1080/09515089.2021.1917533

Download date: 23. sep.. 2021 Joint Attention Without Recursive Mindreading: On the Role of Second-Person Engagement

Felipe León1

1) Center for Subjectivity Research, Department of Communication, University of Copenhagen, Karen Blixens Plads 8, DK-2300 Copenhagen, Denmark. E-mail: [email protected]

Abstract: On a widely held characterization, triadic joint attention is the capacity to perceptually attend to an object or event together with another subject. In the last four decades, research in developmental psychology has provided increasing evidence of the crucial role that this capacity plays in socio-cognitive development, early language acquisition, and the development of perspective-taking. Yet, there is a striking discrepancy between the general agreement that joint attention is critical in various domains, and the lack of theoretical consensus on how to account for it. This paper pursues three interrelated aims: (i) it examines the contrast between reductive and non-reductive views of (triadic) joint attention, by bringing into focus the notion of recursive mindreading; (ii) it assembles, advances, and discusses a number of arguments against reductive views; (iii) finally, in dialogue with some prominent non-reductive views, it concludes by outlining the case for a non-reductive view that gives pride of place to the idea that co-attenders relate to one another as a ‘you’.

Keywords: joint attention; recursive mindreading; second-person engagement; phenomenology; developmental psychology

1. Introduction

On a widely held characterization, triadic joint attention is the capacity to perceptually attend to an object or event together with another subject (Carpenter, Nagell, Tomasello, Butterworth, & Moore, 1998; Hobson, 2005; Tomasello, 2014; Campbell, 2005; Peacocke, 2005). In the last four decades, psychological research has provided mounting evidence of the critical role that joint attention plays in a variety of domains, including socio-cognitive development (Trevarthen & Hubley, 1978), early language acquisition (Dunham et al., 1993;

1 Tomasello & Farrar, 1986), and the development of perspective-taking (Moll & Tomasello, 2007; Moll & Meltzoff, 2011a, 2011b). Joint attention has also been given center stage in research on spectrum condition (Loveland & Landry, 1986; R. P. Hobson, 2002; Mundy, 2016), in comparative and evolutionary psychology (Carpenter & Call, 2013; Tomasello, 2014), joint action research (Böckler & Sebanz, 2013), and, more recently, in the literature on collective intentionality (Rakoczy, 2018; León et al., 2019). Yet, there is a striking contrast between the widespread agreement that joint attention is critical in a variety of domains, and the lack of consensus on how to account for it. Although properly accounting for joint attention is of obvious relevance for better understanding its nature and significance, and has the potential of providing guidance to empirical research on it, there has been over the years a pronounced mismatch between the massive amount of empirical research about joint attention, and the relatively scarce theoretical engagement with this topic. Some of the key and still unresolved questions in the domain of joint attention research are: What is the relationship between individual and joint attention? Can the latter be analyzed reductively in terms of the former? What kind of understanding of other minds does joint attention involve? This paper has three main goals. In the first place, it aims at contributing to the clarification of the theoretical landscape of research on joint attention, by charting and discussing two broad families of approaches to it: reductive and non-reductive views. Secondly, the paper gathers, advances, and discusses a number of arguments against reductionism about joint attention, by bringing into focus the notion of recursive mindreading. I suggest that, taken together, these arguments provide a compelling case in favor of non- reductionism. Finally, the paper explores the prospects of non-reductionism about joint attention. In critical dialogue with two prominent non-reductive views (Campbell, 2005, 2011, 2018; Eilan, ms), and focusing on the emergence of joint attention in ontogeny, I outline the case for a non-reductive view of joint attention in which the idea that co-attenders relate to one another as a ‘you’ —rather than as a ‘he’ or ‘she’— plays an important role. To anticipate, the overarching line of argument pursued in the paper has the structure of an argument by elimination, which can be summarized as follows: if joint attention is to be accounted for either reductively or non-reductively, and if there are convincing and general arguments against reductionism, such arguments will also speak in favor of non-reductionism. The follow-up question of how to develop a compelling non-reductive view of joint attention will be addressed in the final sections of the paper.

2 In order to set the stage for the discussion, it will be helpful to cover some preliminary issues, by saying something more about (i) what precisely is meant by the expression ‘joint attention’, (ii) how to frame an investigation about it, and (iii) which aspect(s) of joint attention will be under focus in what follows. I address each of these in turn. First, the capacity to attend to a common perceptual target together with another person can be best illustrated by considering two types of paradigmatic situations in which it may arise (Carpenter & Liebal, 2011). In the first type of situation, one subject actively draws the attention of another towards a target of attention. Suppose, for example, that a person standing next to you establishes eye contact with you, and then points to the moon (there is a full moon in the sky). You follow the , make eye contact again, and as a result of this interaction you are both aware of attending to the moon together. In the second kind of situation, a salient environmental stimulus attracts the attention of the involved subjects without any intervening goal-directed behavior on their part to attend together to the target. For example, you are sitting at a meeting, and a loud fire alarm unexpectedly goes off (for a similar example, see Eilan, ms, p. 14). You and the person sitting across from you make eye contact, and you both become aware of attending to the fire alarm together. The first example illustrates what has been called “top-down” joint attention, whereas the second case exemplifies “bottom-up” joint attention (Tomasello, 2008; Carpenter & Liebal, 2011, pp. 170–171; Kaplan & Hafner, 2006) 1. Although the two cases differ in how joint attention is generated, both situations involve two subjects who are aware of attending together to a relevant object or event that becomes a common target of their attention. 2 A second preliminary issue concerns how to frame an investigation into joint attention. One challenge here is that the phenomenon of joint attention lies at the intersection of a number of topics that, in and of themselves, have been extensively investigated in philosophy and psychology. Joint perceptual attention concerns our perceptual contact with the world, and our knowledge of other minds, not to mention the status of joint attention as a possible instantiation of collective or we-intentionality. 3 Consider that the phenomenon that most researchers refer to when talking about joint attention could also be characterized as ‘joint perception’ or ‘joint perceptual experience’. Yet, framing the investigation of joint attention as a sub-topic within the philosophy and psychology of perception immediately raises the question of what theory of perception should underlie the analysis of joint attention. While there is no denial of the relevance of this question, I suggest that progress in joint attention research should not primarily depend on progress in ‘the problem of perception’ (Crane & French, 2017).

3 A similar difficulty arises if one chooses to frame the investigation of joint attention from the outset as a sub-topic within the field of collective intentionality, in accordance with the proposal that joint attention is “a primordial form of perceptual we-intentionality” (Rakoczy, 2018, p. 410; cf. Engelmann & Tomasello, 2018). Framing an investigation on joint attention from the outset as a problem within the philosophy of perception or the philosophy of collective intentionality makes the prospects for making progress in understanding joint attention quite bleak. The reason is that each one of these ways of framing the investigation relies on highly contested notions and open questions. I suggest that a more natural and promising strategy for circumventing these difficulties is to focus on the context in which the very notion of joint attention emerged and was coined as a distinct research topic, i.e. developmental psychology. In fact, I think that one of the reasons why joint attention has proven to be particularly challenging for philosophical analysis is that it was not originally coined as a philosophical notion. At the same time, however, it raises some fundamental philosophical questions. The notion of joint attention emerged in the context of empirical investigations on early language acquisition, and on how infant and caregiver manage joint reference in pre-speech communication (Bruner, 1974; Scaife & Bruner, 1975; Bruner, 1977). In the context of developmental psychology, joint attention is generally understood as a triadic perceptual relation in which infant and adult are attending to an object, by means of gaze coordination and alternation, and in which their attending to the object is ‘out in the open’ or ‘mutually manifest’ between them, i.e. they are both aware of attending jointly or together to the relevant object. 4 Thus characterized, joint attention differs from two individuals looking at the same thing at the same time (parallel attention). It is also different from one individual’s following another’s gaze to discover what the second is attending to (gaze following), and it differs from social referencing, in which one individual tracks the affective reactions of another to a target of attention (Walden & Ogan, 1988). Rather, what singles out joint attention is the ‘openness’ or ‘mutual manifestation’ of the common object of attention. This characterization has been widely influential in both philosophy and psychology (Eilan, 2005, p. 1; Campbell, 2011, p. 417, 2018; Peacocke, 2005, p. 303; Carpenter & Liebal, 2011; Moll & Meltzoff, 2011b, p. 290; Schilbach, 2015, p. 132; Eilan, ms), and it is the one that will be under focus in the rest of this paper.5 The third preliminary issue is whether the notions of ‘openness’ and ‘mutual manifestation’ can serve the purpose of providing a sufficiently firm grasp on a target explanandum that a theory of joint attention can be concerned with. Going back to the

4 examples mentioned above, in both top-down and bottom-up joint attention there is a ‘mutual manifestation’ or ‘openness’ of the target of attention that makes these cases quite different from cases of parallel attention and gaze following. But the ‘openness’ of joint attention has proven to be notoriously difficult to conceptualize. As Rakoczy writes:

It is not sufficient that each of them [i.e. the co-attenders] looks at the same target, nor that, asymmetrically, one sees the other looking somewhere and follows her gaze to the same target. It is not even sufficient, more symmetrically, that each looks at the same target while knowing that the other does so as well […] Rather, in some intuitive sense that conceptually proves notoriously difficult to spell out, both have to attend to the same target in joint and coordinated ways. (Rakoczy, 2018, p. 409)6

It is significant, however, that in spite of the difficulties conceptualizing joint attention, different theorists—and of very different persuasions—do after all appear to have a common grasp of what the openness of joint attention is about. I don’t think this should come as a surprise. It is part of the pre-theoretical understanding of the world of perception that objects and events in one’s surroundings have a public character, and allow for being attended to together with others. For the purpose of this paper, the following cautionary remark will be important. It can be easily assumed that the only way in which the notion of perceptual openness can be made theoretically tractable is in terms of ‘common knowledge’ (or some cognate notion), construed along the seminal contributions of Schiffer (1972) and Lewis (2002; see also Vanderschraaf & Sillari, 2014). Understood in terms of common knowledge, the perceptual openness of a target (X) of triadic joint attention, for subjects A and B, can be spelled out as follows: ‘A perceives X, B perceives X, A knows that B perceives X, B knows that A perceives X, A knows that B knows that A perceives X, B knows that A knows that B perceives X’, and so forth (cf. Schiffer, 1972). One immediate reaction to this analysis is that it can’t be applied to joint attention as previously characterized, because an analysis like Schiffer’s appears to lead to an infinite regress of mental states, whereas a common target of attention is intuitively there for the co- attenders who are jointly aware of it. Although Schiffer’s topic was not joint attention in particular, he didn’t take the possibility of such an infinite regress to be a major problem in his analysis of mutual knowledge. According to him, having a disposition to believe the infinite chain of epistemic states is sufficient to block the regress (see Wilby, 2010). However, this is not an option for a ‘rich’ understanding of joint attention, according to which the

5 openness of joint attention is an experiential, and not a dispositional phenomenon. As Peacocke remarks, that openness “is not merely something which exists: it also seems to be present to the consciousness of the participants” (Peacocke, 2005, p. 301).7 But quite independently of the regress issue, one might say that an analysis along Schiffer’s lines does after all capture something crucial about joint attention, an element that any promising theory of joint attention would have to retain. This is the idea that joint attentional interactions constitutively involve nested psychological states that each co-attender would have about each other’s states. The capacity to attribute mental states to another subject which are themselves about some of the attributor’s mental states is the hallmark of recursive mindreading. The proposal that a compelling account of openness has to retain the feature of recursive mindreading can be found in Michael Tomasello’s ground-breaking and influential work on joint attention. As he writes,

[t]he basic cognitive skill of shared intentionality is recursive mindreading. When employed in certain social interactions, it generates joint goals and joint attention, which provide the common conceptual ground within which human communication most naturally occurs. (Tomasello, 2008, p. 321).

Leaving aside the scope of applicability of recursive mindreading beyond joint attention, it is remarkable that elsewhere in his work Tomasello concedes that there is a prominent lacuna in how to characterize recursive mindreading:

No one is certain how best to characterize this potentially infinite loop of me monitoring the other, who is monitoring my monitoring of her, and so forth (called recursive mindreading by Tomasello, 2008), but it seems to be part of infants’ experience—in some nascent form—from before the first birthday (Tomasello, 2011, pp. 34–35, see also 2009, p. 69).

I suggest that this lacuna points to a real difficulty, one that shouldn’t be overlooked, and that should motivate a cautious examination of the putative role of recursive mindreading in joint attention. In particular, for the purposes of this paper, understanding the richness or openness of joint attention from the beginning in terms of recursive mindreading is not going to provide us with a theory-neutral characterization of joint attention, one that will allow us to contrast and assess different theoretical approaches to it. In fact, the general characterization that I started with in this section, and the examples that I have mentioned, do not require for

6 their intelligibility any appeal to the notion of recursive mindreading. They are comprehensible if one has a grip on the idea that co-attenders are aware of attending together to the same thing or event. In the rest of this paper, I will assume that we have an intuitive and sufficiently firm grasp on this notion of openness as a characterization of a core component of paradigmatic cases of joint attention. Such notion of openness is neutral between different theoretical approaches to joint attention, in particular, between reductive and non-reductive views of it.

2. Two Families of Approaches to Joint Attention: Reductive vs. Non-reductive Views

According to reductive views, the perceptual openness of triadic joint attention can be accounted for in terms of suitably interrelated individualistic states and properties. Non- reductive views deny this claim. Representative theories of both types of views can be found in philosophy and psychology. Although Schiffer does not discuss joint attention as such, his discussion of an example of seeing a candle light together (1972, p. 31), which he analyses in terms of “mutual knowledge”, anticipates reductionism about joint attention. Peacocke (2005) offers a detailed reductive view, based on the notion of “mutual open-ended perceptual availability”, and Stueber provides a simulation-based reductive account of joint attention (Stueber, 2011). A further example of a reductive view is Tomasello’s, according to which in joint attention “[t]he infant is attending not only to the adult’s attention to the object, but also to the adult’s attention to her attention to the object, and to the adult’s attention to her attention to the adult’s attention to the object, and so on” (Tomasello, 2019, p. 56, see also 1995, 2008, 2014).8 Turning now to non-reductionism, one of its main proponents is John Campbell (2002, 2005, 2011, 2018), who argues that joint attention should be appraised as a primitive three- place relation (see also Seemann, 2011b). Margaret Gilbert suggests a non-reductive analysis of joint attention that appeals to the non-individualistic concept of joint commitment, and proposes that joint attention involves “a joint commitment to attend as a body to some particular in the environment of the parties” (2013, p. 337). Naomi Eilan puts forward the claim that an account of joint attention ought to appeal to a primitive notion of “communication-as-connection” (Eilan, ms). Referencing Searle’s non-reductionist concept of we-, Fiebich and Gallagher suggest a non-reductionist view of “intentional joint

7 attention”, according to which the latter qualifies as a joint action involving we- towards a common goal, common knowledge, and cooperative behavior (Fiebich & Gallagher, 2013, p. 581). It is worth noting that non-reductive views need not be committed to the idea that there is anything ‘spooky’ about joint attention. In particular, they are not committed to the idea that joint attention involves any supra-individual conscious entity, nor do they require in any way that co-attenders somehow merge or fuse into a single individual, nor that joint attention is not metaphysically realized in the distinct organisms of the co-attenders.9 While reductionism and non-reductionism are families of views, I propose that it is possible to abstract from the specificities of the different theories, and sketch a general line of argument against reductive views. This depends on identifying a common denominator of (most) reductive views. I submit that such common denominator is the appeal, in one way or another, to recursive mindreading as a necessary precondition for joint attention. In the past decades, there has been a vast amount of research about (first-order) mindreading or , understood as an individual’s capacity to ascribe mental states about the world to other individuals (see Premack & Woodruff, 1978; Goldman, 2012). Less has been said about higher-order or recursive mindreading, which concerns the attribution of mental states which are themselves about mental states (either the attributor’s or a third party’s) (Liddle & Nettle, 2006, p. 233). I will characterize recursive mindreading as the capacity to attribute mental states to another subject which are themselves about some other subject’s mental states. In the case at hand, recursive mindreading concerns the capacity to attribute perceptual and attentional states which are themselves about some of the attributor’s own perceptual and attentional states. Reductive views of joint attention concur in recognizing recursive mindreading as a necessary precondition for the coordination of attention constitutive of joint attention and its distinctive openness. While recursive mindreading plausibly plays an important role in a variety of socio-cognitive phenomena (O’Grady et al., 2015), in the rest of this section I will present four arguments against the necessity of recursive mindreading for the openness of triadic joint attention.

2.1. The Argument from Phenomenality

The argument from phenomenality is an analogue of the “simple phenomenological argument” in debates about direct , concerning the claim that some aspects of other people’s psychological states are experientially accessible in a direct manner (Gallagher,

8 2007, pp. 65–66; Zahavi, 2008, 2014, p. 170; León, 2013). In a nutshell, the gist of that argument is that if we examine from a first-person perspective the experience of, say, seeing someone writhing in pain, we won’t find consciously performed inferences or simulations in our attribution of pain to the other person. An analogy with that argument is helpful in the present context, because as much as the direct perception of some mental states is supposed to be an experiential matter, as much is the openness of joint attention supposed to be an experiential matter too. Now, consider again the example of bottom-up joint attention mentioned in Section 1, the example of hearing the fire alarm together with someone. According to the argument from phenomenality, it is inaccurate to describe the experience of co-attending to the fire alarm as involving consciously performed recursive mindreading. To put it differently, claiming that such forms of interpersonal interaction involves, experientially, recursive mindreading, is a misdescription of the phenomenal aspect of that experience. Whatever else is part of the description of that experience, when your eyes lock into those of the person sitting across from you, it is implausible to claim that the experience you are having is an experience of nested states of awareness. The key observation here is that there is a certain immediacy to the joint attention situation, and that there is a discrepancy between the experiential immediacy of joint attention, and the complexity of recursive mindreading. It is important to be careful here, though. After all, in debates about direct social perception, some authors have argued that claims concerning the experiential immediacy in the direct perception of mental states ought to be kept clearly separate from the question of how to characterize the sub-personal processes that enable the attribution of such states, given that those processes don’t have to be consciously available (Spaulding, 2010, p. 131; Jacob, 2011, p. 526; Carruthers, 2015, p. 499). One could sketch a parallel line of response to the argument from phenomenality, and claim that accounts of joint attention that appeal to recursive mindreading as an explanation of its openness need not be committed to the idea that recursive processing is consciously available. My response is that this line of response to the argument from phenomenality leads to the following difficulty. If one claims that recursive mindreading is a fully sub-personal process, while at the same time endorsing the rich characterization of joint attention that I identified in the previous section, then one would have to say more about how sub-personal recursive mindreading would generate the personal- level openness and immediacy of joint attention. But it isn’t easy to see how this can be done. In fact, one potential risk that arises from taking this route is to generate a puzzling gap between sub-personal recursive-mindreading processing and the personal-level experience of

9 joint attentional openness. Though there is no denial of the relevance of investigating the sub- personal (including both cognitive and neural) infrastructure of joint attention, we should be wary of generating a gap that could hinder instead of facilitate research on the latter.10 A different reaction to the argument from phenomenality is that the complexity of recursive mindreading, even if consciously performed, should not be overemphasized. On closer inspection, maybe a few levels of recursive mindreading (i.e. second- or third-order mindreading) might after all be necessary for the openness of joint attention? There are reasons to be skeptical about this, though, since one can construct situations involving a few levels of recursive mindreading, and which for all one can say differ markedly from joint attention and its characteristic openness. Consider the following example: Bert and Benny are two professional spies who are extremely good at spying on each other, in particular at tracking each other’s attention to (let us suppose) the same object, which in the present example is a bird called Birdie. Not only this. Bert and Benny are also extremely good at covertly tracking the other’s attention towards their own attention towards Birdie. Bert and Benny, we might say, are professional recursive mindreaders. Now, one key observation here is that for any n-level of the recursive mindreading that is supposed to be necessary for the openness of joint attention, one can construct an analogue case with the spies. For example, suppose one says that second-order perceptual mindreading is necessary for joint attention. This is problematic, because we can construe an analogue case with Bert, Benny, and Birdie: ‘Bert perceives Birdie / Benny perceives Birdie’ (First-order intentionality); ‘‘Bert perceives that Benny perceives Birdie / Benny perceives that Bert perceives Birdie’ (First-order mindreading); ‘Bert perceives that Benny perceives that Bert perceives Birdie / Benny perceives that Bert perceives that Benny perceives Birdie’ (Second- order mindreading). The critical issue is that there is a clear and intuitive contrast between the openness of joint attention, in which the target is mutually manifest to you and your co-attender, and the type of social interaction that the two spies are engaged in. Intuitively, you and your co- attender can attend together to a target in a rather effortless and immediate manner, without engaging in anything similar to what the spies are doing.

2.2. The Argument from Cognitive Demandingness

The plausibility of appealing to recursive mindreading in accounting for joint attention increases the more one pushes such recursive structures to underlying cognitive machinery. If

10 reductive analyses are right, joint attention requires the capacity of most likely unconscious recursive mindreading. But is this convincing? As mentioned above, there is good evidence that joint attention usually appears around 9-12 months of age (Carpenter et al., 1998). The problem, however, is that there is no evidence that infants at that age are capable of recursive mindreading (Bohn & Köymen, 2018; Carpenter & Liebal, 2011, pp. 165–166).11 But if joint attention is to be analyzed reductively, that is precisely what would be required. To put it differently, if reductive analyses were right, the capacity for recursive mindreading would have to be in place quite early in life, around 9-12 months of age. Recall, however, Tomasello’s observation: “No one is certain how best to characterize this potentially infinite loop of me monitoring the other, who is monitoring my monitoring of her, and so forth […], but it seems to be part of infants’ experience—in some nascent form—from before the first birthday.” (Tomasello, 2011, pp. 34–35) I suggest that this should motivate a cautious attitude towards the alleged role of recursive mindreading in development, rather than a commitment to it. There is strong and convincing evidence that infants are capable of joint attention, but the available evidence leaves open whether they can actually comply with the processing demands of recursive mindreading. Now, the idea that recursive mindreading might be too demanding, in particular for young infants, has of course been considered by reductionists. Referring to the coordination of attention characteristic of joint attention, Tomasello writes: “Underlying this coordination is, once again, some notion of common ground, in which each individual —at least potentially— can attend to his partner’s attention, his partner’s attention to his attention, and so forth” (2014, p. 44). On Tomasello’s view, the appeal to the notion of common ground is supposed to be an alternative to the classical notion of common or mutual knowledge introduced by Schiffer and Lewis, and to the infinite iterations that that notion involves. Common ground is presented as a more “realistic” notion, sufficient, for example, for making joint decisions toward joint goals (2014, p. 38). According to Tomasello, mature individuals tend to explain perturbation of common ground by reasoning in terms of “‘he thinks that I think he thinks’ […], suggesting an underlying recursive structure” (2014, p. 38). But even if this would be so, it is important not to conflate two different issues. It is one thing to say that perturbations of common ground may be eventually fixed via recursive mindreading. This is compatible with recursive mindreading being a compensatory strategy employed to fix disruptions of common ground. It is another thing to say that recursive mindreading is a necessary precondition for the establishment of common ground. One can

11 accept the former claim without endorsing the latter, so additional arguments would have to be in place for accepting the second claim.

2.3. The Argument from Perspective-taking

Recursive mindreading is problematic not only because it doesn’t square easily with the phenomenological immediacy of joint attention, and because of the processing demands that it introduces on co-attenders, but also because it poses constraints in terms of what potential co-attenders ought to understand about each other’s perspectives in order to engage in joint attention. To see this, consider again a situation of recursive perceptual mindreading involving two subjects (A and B) and the same target of attention (X). Recall that the situation can be described as follows: ‘A perceives X / B perceives X’; ‘A perceives that B perceives X / B perceives that A perceives X’ (First-order mindreading); ‘A perceives that B perceives that A perceives X / B perceives that A perceives that B perceives X’ (Second-order mindreading). To get the whole process going, A and B must be able to identify and track what the other is perceiving. Put differently, A and B must be, at least, Level 1 perspective- takers (Flavell et al., 1981). As adults, we simply take for granted that other adult subjects are bearers of perspectives on the world. However, developmental research on perspective-taking has greatly contributed to shed light on the trajectory of this capacity. More specifically, it has been argued that infants acquire an understanding of perspectives on the basis of joint attentional engagements, and that perspective-taking is consequently founded on joint attention (Moll & Meltzoff, 2011b, 2011a; Moll & Kadipasaoglu, 2013). Now, if it is a precondition for the exercise of recursive mindreading about a perceived object that each mindreader has an understanding of the other’s distinct world-directed perspective —to put it differently, if recursive mindreaders have to be perspective-takers in the first place— and if joint attention is the foundation of perspective-taking, the latter cannot be a necessary condition for joint attention to occur. It follows that, developmentally, joint attention cannot be founded on perspective-taking capacities, but the latter must instead be founded on the former. Thus, developmental research supports the claim that infants can partake in joint attention without having the capacities for perspective-taking required by recursive mindreading. There is a further, more speculative point, that I would like to mention. Not only is it implausible that infants at 9-12 months of age could be cognitively sophisticated enough for

12 something like recursive mindreading. One might also consider the possibility that joint attention actually plays a role in the acquisition of recursive mindreading, and that recursive mindreading is an outcome, and not a precondition of the process of socialization. This would be congenial with Vygotsky’s suggestion that

[e]very function in the child’s cultural development appears twice: first, on the social level, and later, on the individual level; first, between people (interpsychological) and then inside the child (intrapychological). [...] All the higher functions originate as actual relations between human individuals. (Vygotskij, 1978, p. 57).

2.4. The Argument from Normative Force

One of the main arguments against the alleged role of recursive mindreading in joint attention is based on the idea that joint attention situations have a specific normative force, in the sense that they ground that co-attenders take a certain course of action instead of another. Although this argument pursues a somewhat different direction than the previous ones, which were more focused on developmental issues, it is worth mentioning, since it is John Campbell's master argument against reductionism about joint attention. The point that Campbell presses is that there are forms of rational coordinated behavior grounded in joint attention which cannot be accounted for from a reductionist perspective. The example that he elaborates on is the so-called coordinated attack scenario. Consider a situation in which two agents have the pre-defined task of attacking a target they are both attending to. For example, while playing a war game, they have the task of coordinating attack to a target displayed on a screen in front of them (Campbell, 2002, p. 165), or they have the task of coordinating attack against a tiger who is approaching them (Campbell, 2018, p. 117). Importantly, the task has a pre-assigned payoff structure: if both agents coordinate attack to the target, a substantial but limited payoff is guaranteed, but if only one of them attacks, a disaster follows (anything would be better than this scenario). The coordinated attack scenario generates the puzzle of coordinated attack. On the one hand, coordinated attack under conditions of joint attention is intuitively rational for the two agents. When two subjects are both aware of attending together to the target to be attacked, it will be rational for them to attack it. As Campbell puts it, “[i]n a joint attention case like this, where you and I are perceptually attending to the same thing, what we’re attending to can be sufficiently “out in the open” for it to be perfectly straightforward what we ought to do” (2018,

13 p. 118). On the other hand, though, the rationality of coordinating attack under conditions of joint attention seems hard to account for in terms of mutual knowledge. The critical issue, as Campbell emphasizes, is that in the coordinated attack scenario the identification of the target must be ‘out in the open’ in a very strong sense. However, mutual knowledge of the sort: ‘I know which object is the target, you know which object is the target, I know that you know which object is the target, you know that I know which object is the target’, and so on (cf. Schiffer, 1972), will inevitably fall short of explaining why it is rational for the agents to coordinate attack. The reason is that for any finite level of this sequence of iterations of mutual knowledge, that level will not be enough for making the coordinated attack rational: it could always be possible that the next level up in the hierarchy is not secured, and therefore that the identification of the target as a common target of attack is not secured either. The puzzle of coordinated attack motivates the argument from coordinated attack against reductionism about joint attention (Campbell, 2002, 2005, 2011, 2018). Insofar as reductive approaches to joint attention attempt to solve the puzzle of coordinated attack by appeal to individualistic states of attention supplemented by mutual knowledge, and insofar as such a strategy will fail to account for the normative force of joint attention in coordinated attack, reductive approaches to joint attention are unconvincing. They fail to incorporate in a persuasive way a critical dimension of joint attention, i.e. its specific normative force.12 One might have reservations about how the argument from normative force relates to the arguments I have been previously discussing. This argument appears to involve a demanding notion of rationality, whereas the second argument concerns precisely the high demands posed by recursive analyses. But consider that the way in which rationality comes into the picture is not by saying that the jointness of joint attention is established by means of reasoning. The perceptual openness of joint attention, the sense in which the target is ‘out in the open’ for co-attenders, doesn’t really differ from joint attention situations other than the coordinated attack scenario. The further point is simply that for agents with a grasp of rationality, that openness is going to ground a certain course of action.

3. Non-reductionism about Joint Attention and the Role of Interpersonal Engagement

Taken together, the preceding arguments provide a compelling case against reductionism about joint attention, according to which recursive mindreading is a necessary precondition

14 for joint attention to occur. Indirectly, on the plausible assumption that the distinction between reductionism and non-reductionism is exhaustive of theories of joint attention, arguments against reductionism also count as arguments in favor of non-reductionism. Although I believe that assembling these arguments contributes to advance the theoretical debate on joint attention, I don’t take the considerations presented so far to settle the debate between reductionism and non-reductionism. The disagreement is deeper, and as Campbell suggests, it points to how ‘rich’ our starting point is when thinking about joint attention. For reductionists, one might say, it is not-so-rich. The challenge is then to explain how we can get the richness of joint attention from that basis. The opposing view, expressed by Campbell, is that “here as elsewhere in epistemology, it is important to recognize the richness of the starting points we have, and not generate spurious problems by supposing that our epistemic base is far thinner than it is in fact” (Campbell, 2018, p. 121). As a way of making one step further in the debate, examining the prospects of non-reductionism about joint attention becomes a critical task. Since non-reductionism is a family of rather disparate views, where to go from here? In this section, I will discuss two prominent non-reductive theories of joint attention, due to Campbell and Eilan, by bringing into focus a central challenge for non-reductive approaches: how to specify the interpersonal component of joint attention, i.e. the way that co-attenders understand and relate to one another qua co-attenders, while upholding non- reductionism. This is a challenge that Campbell does not address, and that Eilan proposes to meet by appealing to a primitive notion of “communication-as-connection” (Eilan, ms). Starting with Campbell, his proposal is that

we should regard ‘X and Y are jointly attending to Z’ as a primitive, not to be explained in terms of the knowledge or beliefs that each of X and Y have individually. We should regard joint attention as a fundamental type of conscious state that can explain other cognitive achievements of the subjects who are jointly attending, but that is not itself susceptible to explanation in terms of individualistic knowledge or beliefs of the two participants. (Campbell, 2018, p. 120, cf. 2011, 2005).

On the relational approach to joint attention defended by Campbell, a description of the state of each subject engaged in joint attention necessarily involves the fact that she is co- attending with someone else. That would be the bedrock of the explanation. The three-place

15 joint attention relation is primitive at least in the sense that it cannot be broken down into simpler explanatory constituents. At the same time, though, it is significant that Campbell remarks that in joint attention the other co-attender and the object attended to are not registered in the same way: “The object attended to, and the other person with whom you are jointly attending to that object, will enter your experience in quite different ways” (Campbell, 2005, p. 288, see also 2011, p. 419, 2002, p. 162). Although Campbell acknowledges that co-attenders enter into each other’s experiences in a different way, he does not elaborate on this idea. This by itself raises the question of how this lacuna could possibly be filled in.13 But there is another, more pressing issue that I would like to focus on, which goes beyond the observation that an emphasis on the primitiveness of the joint attention relation, as well as the notion of “co-consciousness” of a target (Campbell, 2018, pp. 122, 124), are not particularly illuminating. The point is that, as already mentioned, there is a widespread consensus in developmental research that triadic joint attention is a capacity that appears around 9-12 months of age (Carpenter et al., 1998). Now, concededly, Campbell’s claim that joint attention is a primitive relation is not advanced as a developmental claim. Nonetheless, a question that remains open for Campbell’s approach is how to square his distinctive emphasis on the primitiveness of the joint attention relation with the documented developmental trajectory that leads to triadic joint attention. This is all the more relevant given Campbell’s appeal to developmental psychology in support of his theory (Campbell, 2011, p. 425, 2018, p. 125). Eilan develops a version of non-reductionism that is more congenial with developmental research. She rejects the assumption that joint attention is a purely perceptual phenomenon —an assumption she attributes to Campbell—, “sandwiched in between” the communicative engagements that make it possible (such as pointing ), and the communicative interactions that it allows for (such as commenting about the object of attention). Instead, she proposes that joint attention is itself an essentially communicative phenomenon, under a specific understanding of the notion of communication. On her proposal, the relevant notion of communication is not transmission of information, but what she calls “communication-as-connection”. “Communication-as-connection”, Eilan suggests, should be appraised as a primitive notion:

we should treat ‘communication-as-connection’ as a basic psychological concept, which cannot be reductively analyzed -- one of the concepts, along with those of perception, belief and the

16 like, that we should take as basic when explaining our engagement with the world, in this case the world of other persons. (Eilan, ms, p. 13)

On this view, co-attenders relate to one another in an essentially communicative way, which involves adopting attitudes of mutual address toward each other (ms, p. 12). Co- attenders are aware of each other as a you, in a mutually interdependent way. But this way of relatedness, together with the feeling of connection with the other (ms, p. 13) is not all there is to say about the communicative character of joint attention. Eilan’s notion of communication- as-connection retains the idea that in joint attention there is a communicated content, and a “message”: “You are letting each other know, in some way, that you both see the same thing” (ms, p. 14). This situation would be aptly verbalized by saying, for example, ‘we hear this’. This, according to Eilan, is not a matter of each co-attender telling the other something, i.e. that a specific content is shared, if that telling is spelled out as a “mutually known” content. It is rather a matter of having “the same thought or experience” (ms, p. 15). Moreover, having the same thought or experience is not something that co-attenders passively register: “the jointness of the experience […] isn’t a feature of the world with which we are presented, but something we establish, something we make happen” (ms, p. 15).14 Eilan’s notion of communication-as-connection suggests a way of understanding the interpersonal engagement between co-attenders in a way that potentially avoids two problems. On the one hand, the difficulties that analyses that appeal to recursive mindreading run into. On the other hand, Eilan’s analysis goes beyond Campbell’s somewhat uninformative emphasis on the primitive character of joint attention. However, there are a couple of noteworthy challenges that remain open for Eilan’s proposal. First, the jointness of joint attention, as a shared experience, is strongly modelled on the sharing of thoughts. In fact, Eilan suggests that joint attention involves an ‘I-you’ thought (Eilan, ms, p. 15). To take this as the ground level of joint attention seems problematic, though, because part of the attractiveness of the appeal to joint attention as a platform for the development of linguistic communication is to show how the sharing of thoughts about the world in verbal communication can get off the ground from the sharing of experiences about the world. Thoughts and experiences are best dealt with separately, and while the idea of sharing thought has been argued for on independent grounds (see Longworth, 2013), the idea of a numerically identical experience being shared by a plurality of subjects is problematic for various reasons15 (see León et al., 2019). So, one challenge for Eilan’s proposal is to say how shared

17 thoughts would relate to shared experiences, and how to get from the latter to the former in the context of joint attention. Secondly, characterizing joint attention as an essentially communicative phenomenon raises the question of what experiential preconditions, if at all, joint attention may have. In related work, Eilan has suggested that communication goes all the way down, until the foundations of our understanding of other minds (Eilan, 2020). But does communication-as- connection have any experiential preconditions? Eilan suggests that the jointness of joint attention is something that we ‘establish’ and ‘make happen’, and this observation motivates a broader question of whether the capacity to do things together with others is a precondition for joint attention to occur. My main point in the next section will be that appreciation of the developmental context of joint attention provides us with further tools to illuminate how interpersonal engagement comes into the picture of joint attention, while at the same time allowing to uphold non- reductionism about it.

4. Second-person Engagement and the Significance of Reciprocity

I start by noting that we should distinguish between a developmental and a non- developmental version of the challenge of how to specify the interpersonal component of joint attention. The for this distinction is that proper acknowledgement of the asymmetries between adult-adult and infant-adult cases of triadic joint attention is relevant for accommodating the documented role of joint attention in the process of socialization. Typically, adult-adult joint attention involves individuals who already have competence in understanding other subjects as bearers of perspectives on the world, who master linguistic capacities, and who have a grasp of environmental objects as potential targets of co-reference with others. As adults, one might say, we tend to take for granted the public character of the world of perception. We shouldn’t overlook, though, the extent to which the capacity to refer to a publicly available world is a developmental achievement. If joint attention is to play a critical role in enabling the understanding of others’ perspectives (Moll & Meltzoff, 2011a, 2011b) and in language acquisition (Tomasello & Farrar, 1986), the kind of capacities that infants should be credited with when joint attention makes its onset must in some way prepare the ground for the joint attentional skills displayed by adults. In this Section, I will focus on the developmental question of how to understand the interpersonal aspect of joint attention.

18 One reason why, developmentally, triadic joint attention is thought to be important is that it introduces a significant change in the infant’s engagement with the world. One way to frame this change is in terms of the distinction between primary and secondary intersubjectivity. These are concepts introduced by Trevarthen and colleagues for capturing the early-developing intersubjective competences of young infants. Primary intersubjectivity refers to the distinctive sensitivity that infants have to the mindedness of others, and the natural responsiveness to them that signals an awareness of the differentiation between self and other, as well as patterns of connectedness between them (Trevarthen & Aitken, 2001, p. 6). From around 2 to 3 months of age, infants engage in mutual exchanges of gazes and smiles with caregivers. It has been shown that such exchanges or ‘‘proto-conversations’’ (Bateson, 1979) are cross-cultural and have a structured character, involving turn-taking and specific timing (Trevarthen, 1998, p. 23). The notion of secondary intersubjectivity refers to triadic social interactions, including social referencing and triadic joint attention, that typically appear around one year of age, and in which external objects and the context of interaction are involved (see Zahavi & Rochat, 2015 for a review). Developmental research indicates that when joint attention makes its onset, the infant is already partaking in a richly structured social life. The significant transition between primary and secondary intersubjectivity should not make one overlook that patterns of reciprocal engagements are present prior to triadic joint attention, and the latter builds upon them (Striano & Rochat, 1999a). Reciprocity is important in early ontogeny because it provides the ground for exchanges of emotions between ‘I’ and ‘you’, which in turn presupposes that reciprocating partners are in a position to address and being addressed by another. I suggest that later, when infant and adult engage in triadic joint attention, for example via the exchange of ‘sharing looks’, a basic situation is reciprocally ascertained between them: you and I are on the same ground with respect to a particular target of attention, you and I partake in a commonly available world. As argued by Moll and Meltzoff, this ‘ground’ shouldn’t be narrowly conceived in purely perceptual or visuo-spatial terms. It is, in the first place, a pragmatic ground, a ground of familiarity with respect to an object of engagement. Young infants “start out with an understanding of ‘engagement’ holistically conceived” (Moll & Meltzoff, 2011a, p. 400), that becomes progressively refined and differentiated, until eventually delivering an understanding of others’ visual perspectives. Following the proposal that reciprocity is a distinctive feature of second-person relations (Gomez, 1996; de Bruin et al., 2012; Fuchs, 2012; Zahavi, 2014), I will refer to such engagements between infant and caregiver as second-person engagements.16

19 There are two key aspects of second-person engagements that I will focus on in what follows, because of their potential in shedding light on the interpersonal component of joint attention. The first one is that second-person engagements build on the recognition of others as like oneself. The second one is their interactive aspect. In the first place, turn-taking sequences of addressing and responding to another require from each partner to relate to the other as a co-partner, someone who can respond to one’s addressive engagements, and who, in doing so, addresses oneself in turn. Bruner captures this feature with his concept of “interaction format”, by which he means “a contingent interaction between at least two acting parties, contingent in the sense that the responses of each member can be shown to be dependent on a prior response of the other” (Bruner, 1983, p. 132). The relevant point here, as Bruner indicates, is that it appears to be a primitive of socio-cognitive development that ““other minds” are treated as if they were like our own minds” (Bruner, 1983, p. 122). In this sense, second-person engagements are plausibly construed as a basic predicament of human sociality (Moll & Meltzoff, 2011b, p. 294; Striano & Rochat, 1999b, p. 8; Reddy, 2018). Experiencing and relating to the other as if the other were like oneself prepares the infant’s and caregiver’s momentous appreciation of standing on equal ground with respect to a commonly available world. This proposal finds support in Meltzoff’s contention that “the ‘like me’ nature of others is the starting point of , not its culmination” (Meltzoff, 2007, p. 126), which he elaborates appealing to developmental research concerning neonatal imitation (Meltzoff & Moore, 1977; Meltzoff & Brooks, 2001; Meltzoff et al., 2018). The point here is that neonatal imitation presupposes the newborn’s registration of its differentiation with respect to the imitated adult, as well as “a responsiveness to the fact that the other is of the same sort as oneself” (Gallagher & Zahavi, 2012, p. 209).17 Secondly, second-person engagements have a distinctive practical and interactive dimension. They are less a matter of two subjects entertaining perceptual states and beliefs about one another, and more a matter of they treating and acting towards one another as reciprocating partners (see Gallagher, 2012, p. 198). But why, one might ask, is this relevant for joint attention? It is relevant because the idea that infant and caregiver build upon the capacity to treat one another as reciprocating partners provides some ground for reconsidering the widely held assumption that the openness of joint attention is primarily a purely cognitive phenomenon, to be explained by focusing on relevant beliefs or perceptual states that each co- attender would have. Instead, a focus on the developmental context of joint attention suggests that the latter is better conceptualized as a motor and skill-like phenomenon, than as a

20 perception- and belief-like phenomenon. The developmental evidence concerning primary intersubjectivity and neonatal imitation suggests the view that the infant is equipped with primitive, i.e. non-reductively analyzable, motor propensities to reciprocate and treat the caregiver as a co-partner, and to engage with her in experientially salient coordinated activities. Whereas at the stage of primary intersubjectivity such skills for second-person engagement are limited to turn-taking sequences and exchanges of emotions, triadic joint attention emerges out of this rich social infrastructure as coordinated attentional activity with respect to a common target of engagement. If pragmatic engagement has primacy over detached ways of relating to the environment, the idea of attending together in insulation from doing things together becomes hardly sustainable from a developmental perspective (see Moll & Kadipasaoglu, 2013).18 If this analysis is on the right track, one reason why approaches to joint attention that rely on recursive mindreading run into difficulties is that they tend to miss on the active dimension of early joint attention. They tend to construe joint attention primarily as a belief- like and perception-like phenomenon, and thereby tend to overlook the extent to which joint attention in ontogeny draws on action-oriented capacities involved in the joint activities performed by infant and caregiver. An alternative to analyzing joint attention in terms of recursive mindreading is to understand it as an expansion the infant’s propensity to reciprocate with the caregiver, and thereby engage in a new range of joint activities concerning a common world. In a nutshell, I propose that in joint attention infant and caregiver are actively involved in a social interaction that may be aptly described, from the perspective of each co-attender, in terms of ‘I am like you, and you are like me with respect to this target of engagement’, instead of ‘I see object X, I see that you see X, I see that you see that I see X’, and so on. One reservation one might have about this proposal is that, while it emphasizes the interconnection between joint attention and early-developing joint activities, there isn’t any consensus about how to account for such activities.19 In fact, one might discern in the relevant literature a distinction between reductive (Tollefsen, 2005; Butterfill, 2012; Brownell, 2011; Pacherie, 2013) and non-reductive (Tollefsen & Dale, 2012, p. 400; Schmid, 2014, p. 23) views of such activities, that somewhat resembles the distinction between reductive and non- reductive approaches to joint attention that has been under focus in the present paper. If so, it seems, the question of how to understand the openness of joint attention would depend on clarifying the jointness of early-developing joint activities. This is certainly a point that

21 deserves clarification and elaboration, even though a detailed discussion goes beyond the scope of this paper. One way to make headway is by considering the kind of joint activities that some developmentalists have taken to pre-date the emergence of joint attention, and in the context of which the latter appears. Relevant activities before the onset of joint attention include a caregiver picking-up an infant who adjusts her bodily posture to make the pick-up successful (Reddy, 2015, p. 32), and joint activities in the context of which joint attention emerges include games with a simple joint goal, such as rolling a ball back and forth and stacking blocks together (Tomasello, 2009, p. 69). One feature of many of these activities is that they can be quite spontaneous, and don’t appear to require much prior planning. Accounting for these activities in terms of prior planning would be misplaced. Likewise, it seems unnecessary to account for them in terms of other sophisticated socio-cognitive capacities, including recursive mindreading. One way to avoid concerns regarding the cognitive sophistication required by these activities is by considering the independent proposal that two agents may have, at a basic level, a shared grasp of what they are doing together—of what their joint goal is—, a kind of understanding in terms of which their individual contributions to the activity become intelligible for them (see Haase, 2012, pp. 249–251). Placing this shared understanding of what ‘we’ (‘you’ and ‘I’) are doing together as a constitutive factor of some early-developing joint activities, and as a feature that is experientially salient for the involved agents, puts pressure on the need to appeal to recursive mindreading. This is because each agent wouldn’t have to secure that her own individualistic states are properly interlocked with those of the other, but rather act on the presumption that both are acting together as a plural subject or ‘we’, and playing their respective parts in it.20 If a shared understanding of what ‘we’ are doing together is built into a significant range of early-developing joint activities, and particularly into those in the context of which joint attention emerges, the openness of joint attention may itself be understood in terms of the openness and shared understanding of those joint activities. While, admittedly, the present proposal could be further developed, it fulfils two modest desiderata. First, it is more informative than Campbell’s notion of “co-consciousness” (Campbell, 2018, pp. 122, 124). Secondly, it captures a plausible precondition of Elian’s emphasis on the communicative character of joint attention, which would require from mutually aware communicators to treat and act towards one another, in some minimal sense, as co-contributors and co-operators in joint activities, particularly in communicative joint

22 activities. Moreover, the present proposal converges with and supplements enactive and embedded accounts of social cognition, which in different ways have emphasized the primacy of interaction in early-developing capacities for joint attention and social cognition (Gallagher, 2001, 2010b; Reddy, 2015). However, while joint attention has been investigated in the enactive tradition (Gallagher, 2010a, 2010b, 2011; Fiebich & Gallagher, 2013; Hutto, 2011), few, if any, enactive approaches to joint attention focus on its distinctive openness, which plays a critical role in the overarching discussion carried out in this paper.21

5. Concluding Remarks

Joint attention is a type of social interaction that is essentially anchored in the common world shared by co-attenders. This convergence or crossing of perspectives on a common and public world is not merely a matter of a probabilistic belief or expectation that the other is attending to the same particular item of the world as oneself. Rather, it discloses the publicity and overtness of the world of perception. Scaife and Brunner started out their 1975 Nature paper with the observation that “[l]ittle is known about how visual attention of the mother-infant pair is directed jointly to objects and events in the visual surround during the first year of the child’s life” (Scaife & Bruner, 1975, p. 265). More than four decades later, empirical research on joint attention has bloomed, whereas theoretical investigations about it has lagged behind. In this paper, I have addressed, and attempted to partly remedy this discrepancy. After discussing a number of arguments against reductionism about joint attention, which target the notion of recursive mindreading, I have further inquired into the prospects of non- reductionism about it. I have argued that progress in understanding joint attention depends on distinguishing between the developmental and the non-developmental challenge of specifying how co-attenders relate to one another, and—concerning the first challenge—on giving center stage to the rich, second-personal social context, in which joint attention is embedded in ontogeny.

Acknowledgements

Thanks to Dan Zahavi, Malinda Carpenter, Henrike Moll, and three anonymous reviewers for helpful comments on earlier versions of this paper. The author acknowledges funding from the Independent Research Fund Denmark (grant ID: DFF-7013- 00032) and from the

23 European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 832940).

References

Bateson, M. C. (1979). “The epigenesis of conversational interaction”: A personal account of

research development. In M. M. Bullowa (Ed.), Before Speech: The Beginning of

Interpersonal Communication (pp. 63–77). Cambridge University Press.

Battich, L., & Geurts, B. (2020). Joint attention and perceptual experience. Synthese.

https://doi.org/10.1007/s11229-020-02602-6

Böckler, A., & Sebanz, N. (2013). Linking Joint Attention and Joint Action. In Janet Metcalfe

& H. S. Terrace (Eds.), Agency and Joint Attention (pp. 206–215). New York: Oxford

University Press. https://doi.org/10.1093/acprof:oso/9780199988341.003.0013

Bohn, M., & Köymen, B. (2018). Common Ground and Development.

Perspectives, 12(2), 104–108. https://doi.org/10.1111/cdep.12269

Botero, M. (2016). Tactless scientists: Ignoring touch in the study of joint attention.

Philosophical Psychology, 29(8), 1200–1214.

https://doi.org/10.1080/09515089.2016.1225293

Brownell, C. A. (2011). Early Developments in Joint Action. Review of Philosophy and

Psychology, 2(2), 193–211. https://doi.org/10.1007/s13164-011-0056-1

Bruner, J. (1974). From communication to language—A psychological perspective. Cognition,

3(3), 255–287. https://doi.org/10.1016/0010-0277(74)90012-2

Bruner, J. (1977). Early social interaction and language acquisition. In H. R. Schaffer (Ed.),

Studies in Mother-Infant Interaction (pp. 271–289). Academic Press.

24 Bruner, J. (1983). Child’s talk: Learning to use language. New York: Oxford University

Press.

Butterfill, S. (2012). Joint Action and Development. The Philosophical Quarterly, 62(246),

23–47. https://doi.org/10.1111/j.1467-9213.2011.00005.x

Butterworth, G. (1995). Origins of mind in perception and action. In C. Moore & P. J.

Dunham (Eds.), Joint attention: Its origins and role in development (pp. 29–40).

Lawrence Erlbaum Associates.

Campbell, J. (2002). Reference and consciousness. New York: Oxford University Press.

Campbell, J. (2005). Joint Attention and Common Knowledge. In N. Eilan, C. Hoerl, T.

McCormack, & J. Roessler (Eds.), Joint attention: Communication and other minds:

Issues in philosophy and psychology. New York: Oxford University Press.

Campbell, J. (2011). An Object-Dependent Perspective on Joint Attention. In A. Seemann

(Ed.), Joint Attention: New Developments in Psychology, Philosophy of Mind, and

Social Neuroscience (pp. 415-430). Cambridge MA: The MIT Press.

Campbell, J. (2018). Joint Attention. In M. Janković & K. Ludwig (Eds.), The Routledge

Handbook of Collective Intentionality (pp. 115–129). New York: Routledge.

Carpenter, M., & Call, J. (2013). How Joint Is the Joint Attention of Apes and Human

Infants? In J. Metcalfe & H. S. Terrace (Eds.), Agency and Joint Attention (pp. 49–61).

New York: Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780199988341.003.0003

Carpenter, M., & Liebal, K. (2011). Joint Attention, Communication, and Knowing Together

in Infancy. In A. Seemann (Ed.), Joint attention: New developments in psychology,

philosophy of mind, and social neuroscience (pp. 159–181). The MIT Press.

Carpenter, M., Nagell, K., Tomasello, M., Butterworth, G., & Moore, C. (1998). Social

Cognition, Joint Attention, and Communicative Competence from 9 to 15 Months of

25 Age. Monographs of the Society for Research in Child Development, 63(4), i.

https://doi.org/10.2307/1166214

Carruthers, P. (2015). Perceiving mental states. Consciousness and Cognition, 36, 498–507.

https://doi.org/10.1016/j.concog.2015.04.009

Crane, T., & French, C. (2017). The Problem of Perception. In The Stanford Encyclopedia of

Philosophy. URL =

problem/>.

Darwall, S. (2006). The second-person standpoint: Morality, respect, and accountability.

Cambridge MA: Harvard University Press. de Bruin, L., van Elk, M., & Newen, A. (2012). Reconceptualizing Second-person Interaction.

Frontiers in Human Neuroscience, 6, 1–14. https://doi.org/10.3389/fnhum.2012.00151

Dunham, P. J., Dunham, F., & Curwin, A. (1993). Joint-attentional states and lexical

acquisition at 18 months. Developmental Psychology, 29(5), 827–831.

https://doi.org/10.1037/0012-1649.29.5.827

Eilan, N. (2005). Joint Attention, Communication, and Mind. In N. Eilan, C. Hoerl, T.

McCormack, & J. Roessler (Eds.), Joint attention: Communication and other minds:

Issues in philosophy and psychology. New York: Oxford University Press.

Eilan, N. (2007). Consciousness, Self-Consciousness and Communication. In T. Baldwin

(Ed.), Reading Merleau-Ponty: On Phenomenology of Perception. New York:

Routledge.

Eilan, N. (2014). The You Turn. Philosophical Explorations, 17(3), 265–278.

https://doi.org/10.1080/13869795.2014.941910

Eilan, N. (ms). Join Attention and the Second Person.

https://warwick.ac.uk/fac/soc/philosophy/people/eilan/jaspup.pdf (Retrieved on April

11, 2021)

26 Eilan, N. (2020). Other I’s, communication, and the second person. Inquiry, 1–23.

https://doi.org/10.1080/0020174X.2020.1788987

Engelmann, J. M., & Tomasello, M. (2018). The Middle Step: Joint Intentionality as a

Human-Unique Form of Second-Personal Engagement. In M. Janković & K. Ludwig

(Eds.), The Routledge Handbook of Collective Intentionality (pp. 433–446). New

York: Routledge.

Fiebich, A., & Gallagher, S. (2013). Joint attention in joint action. Philosophical Psychology,

26(4), 571–587. https://doi.org/10.1080/09515089.2012.690176

Flavell, J. H., Everett, B. A., Croft, K., & Flavell, E. R. (1981). Young children’s knowledge

about visual perception: Further evidence for the Level 1-Level 2 distinction.

Developmental Psychology, 17(1), 99–103. https://doi.org/10.1037/0012-1649.17.1.99

Fuchs, T. (2012). The Phenomenology and Development of Social Perspectives.

Phenomenology and the Cognitive Sciences. https://doi.org/10.1007/s11097-012-

9267-x

Gallagher, S. (2001). The practice of mind: Theory, simulation, or interaction? Journal of

Consciousness Studies, 8, 83–107.

Gallagher, S. (2007). Logical and Phenomenological Arguments Against Simulation Theory.

In D. Hutto & M. Ratcliffe (Eds.), Folk Psychology Re-Assessed (pp. 63–78).

Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-1-4020-5558-4_4

Gallagher, S. (2008). Inference or interaction: Social cognition without precursors.

Philosophical Explorations, 11(3), 163–174.

https://doi.org/10.1080/13869790802239227

Gallagher, S. (2010a). Joint attention, joint action, and participatory sense-making. Alter.

Revue de Phénoménologie, 18, 111–123.

27 Gallagher, S. (2010b). Movement and Emotion in Joint Attention. In S. Flach, J. Söffner, & D.

Margulies (Eds.), Emotion and motion (pp. 41–54). Bern: Peter Lang.

Gallagher, S. (2011). Interactive Coordination in Joint Attention. In A. Seemann (Ed.), Joint

Attention: New Developments in Psychology, Philosophy of Mind, and Social

Neuroscience (pp. 293–305). Cambridge MA: The MIT Press.

Gallagher, S. (2012). In Defense of Phenomenological Approaches to Social Cognition:

Interacting with the Critics. Review of Philosophy and Psychology, 3(2), 187–212.

https://doi.org/10.1007/s13164-011-0080-1

Gallagher, S. (2020). Action and interaction. New York: Oxford University Press.

Gallagher, S., & Zahavi, D. (2012). The Phenomenological Mind: An Introduction to

Philosophy of Mind and Cognitive Science. New York: Routledge.

Gilbert, M. (2013). Joint Commitment: How We Make the Social World. New York: Oxford

University Press. https://doi.org/10.1093/acprof:oso/9780199970148.001.0001

Goldman, A. I. (2012). Theory of Mind. In E. Margoilis, R. Samuels, & S. Stich (Eds.), The

Oxford Handbook of Philosophy and the Cognitive Sciences (pp. 402–424). New

York: Oxford University Press.

Gomez, J. C. (1996). Second person intentional relations and the evolution of social

understanding. Behavioral and Brain Sciences, 19 (1), 129–130.

Haase, M. (2012). Three Forms of the First Person Plural. In G. Abel & J. Conant (Eds.),

Rethinking Epistemology: Volume 2 (pp. 229–256). De Gruyter.

https://doi.org/10.1515/9783110277944

Haase, M. (2014). Am I You? Philosophical Explorations, 17(3), 358–371.

https://doi.org/10.1080/13869795.2014.949065

Heidegger, M. (2001). Einleitung in die Philosophie. Frankfurt: Vittorio Klostermann.

28 Hobson, J., & Hobson, R. P. (2007). Identification: The missing link between joint attention

and imitation? Development and Psychopathology, 19(02), 411–431.

Hobson, R. P. (2002). The Cradle of Thought. London: Macmillan.

Hobson, R. P. (2005). What Puts the Jointness into Joint Attention? In N. Eilan, C. Hoerl, T.

McCormack, & J. Roessler (Eds.), Joint attention: Communication and other minds:

Issues in philosophy and psychology (pp. 185–204). New York: Oxford University

Press.

Hobson, R. P. (2011). Autism and the Self. In S. Gallagher (Ed.), The Oxford Handbook of

the Self. New York: Oxford University Press.

Hutto, D. (2011). Elementary Mind Minding, Enactivist-Style. In A. Seemann (Ed.), Joint

Attention: New Developments in Psychology, Philosophy of Mind, and Social

Neuroscience (pp. 307–341). Cambridge MA: The MIT Press.

Jacob, P. (2011). The Direct-Perception Model of Empathy: A Critique. Review of Philosophy

and Psychology, 2(3), 519–540. https://doi.org/10.1007/s13164-011-0065-0

Kaplan, F., & Hafner, V. (2006). The challenges of joint attention. Interaction Studies, 7(2),

135–169. https://doi.org/10.1075/is.7.2.04kap

León, F. (2013). Experiential Other-Directness: To What does it Amount? Tidsskrift for

Medier, Erkendelse Og Formidling, 1(1).

León, F. (forthcoming). Attention in Joint Attention: From Selection to Prioritization. In M.

Wehrle, D. D'Angelo, & E. Solomonova (Eds.), Access and Mediation. A New

Approach to Attention De Gruyter. Age of Access? Grundfragen der

Informationsgesellschaft.

León, F., Szanto, T., & Zahavi, D. (2019). Emotional sharing and the extended mind.

Synthese, 196, 4847–4867. https://doi.org/10.1007/s11229-017-1351-x

Lewis, D. (2002). Convention: A philosophical study. Oxford: Blackwell.

29 Liddle, B., & Nettle, D. (2006). Higher-order theory of mind and social competence in

school-age children. Journal of Cultural and Evolutionary Psychology, 4(3), 231–244.

https://doi.org/10.1556/JCEP.4.2006.3-4.3

Longworth, G. (2013). Sharing Thoughts About Oneself. Proceedings of the Aristotelian

Society, 113(1), 57–81. https://doi.org/10.1111/j.1467-9264.2013.00345.x

Loveland, K. A., & Landry, S. H. (1986). Joint attention and language in autism and

developmental language delay. Journal of Autism and Developmental Disorders,

16(3), 335–349. https://doi.org/10.1007/BF01531663

Meltzoff, A. N. (2007). ‘Like me’: A foundation for social cognition. Developmental Science,

10(1), 126–134. https://doi.org/10.1111/j.1467-7687.2007.00574.x

Meltzoff, A. N., & Brooks, R. (2001). ‘Like me’ as a building block for understanding other

minds: Bodily acts, attention, and intention. In B. F. Malle, L. J. Moses, & D. A.

Baldwin (Eds.), Intentions and Intentionality: Foundations of Social Cognition (pp.

171–191). MIT Press.

Meltzoff, A. N., & Moore, M. K. (1977). Imitation of Facial and Manual Gestures by Human

Neonates. Science, 198(4312), 75–78. https://doi.org/10.1126/science.198.4312.75

Meltzoff, A. N., Murray, L., Simpson, E., Heimann, M., Nagy, E., Nadel, J., Pedersen, E. J.,

Brooks, R., Messinger, D. S., Pascalis, L., Subiaul, F., Paukner, A., & Ferrari, P. F.

(2018). Re-examination of Oostenbroek et al. (2016): Evidence for neonatal imitation

of tongue protrusion. Developmental Science, 21(4), e12609.

https://doi.org/10.1111/desc.12609

Merleau-Ponty, M. (2012). Phenomenology of Perception (D. A. Landes, Trans.). New York:

Routledge.

Merleau-Ponty, M. (1964). The Primacy of Perception. Evanston: Northwestern University

Press.

30 Moll, H., & Kadipasaoglu, D. (2013). The primacy of social over visual perspective-taking.

Frontiers in Human Neuroscience, 7. https://doi.org/10.3389/fnhum.2013.00558

Moll, H., & Meltzoff, A. N. (2011a). Joint Attention as the Fundamental Basis of

Understanding Perspectives. In A. Seemann (Ed.), Joint Attention: New Developments

in Psychology, Philosophy of Mind, and Social Neuroscience (pp. 393–413).

Cambridge MA: The MIT Press.

Moll, H., & Meltzoff, A. N. (2011b). Perspective-Taking and its Foundation in Joint attention.

In J. Roessler, H. Lerman, & N. Eilan (Eds.), Perception, Causation, and Objectivity

(pp. 286–304). New York: Oxford University Press.

Moll, H., & Tomasello, M. (2007). How 14- and 18-month-olds know what others have

experienced. Developmental Psychology, 43(2), 309–317.

https://doi.org/10.1037/0012-1649.43.2.309

Mundy, P. (2016). Autism and joint attention: Development, neuroscience, and clinical

fundamentals. The Guilford Press.

Núñez, M. (2014). Joint attention in deafblind children: A multisensory path towards a

shared sense of the world.

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahU

KEwjc_evN9PfvAhXoct8KHb3HAXoQFjABegQIBRAD&url=https%3A%2F%2Fw

ww.sense.org.uk%2Fumbraco%2Fsurface%2Fdownload%2Fdownload%3Ffilepath%

3D%2Fmedia%2F1303%2Fresearch-joint-attention-deafblind-in-

children.pdf&usg=AOvVaw3YGGeK3mY7SedhNDzKSOki. (Retrieved on April 11,

2021).

O’Grady, C., Kliesch, C., Smith, K., & Scott-Phillips, T. C. (2015). The ease and extent of

recursive mindreading, across implicit and explicit tasks. Evolution and Human

Behavior, 36(4), 313–322. https://doi.org/10.1016/j.evolhumbehav.2015.01.004

31 Onishi, K. H., & Baillargeon, R. (2005). Do 15-Month-Old Infants Understand False Beliefs?

Science, 308(5719), 255–258. https://doi.org/10.2307/3841358

Pacherie, E. (2013). Intentional joint agency: Shared intention lite. Synthese, 190(10), 1817–

1839. https://doi.org/10.1007/s11229-013-0263-7

Peacocke, C. (2005). Joint Attention: Its Nature, Reflexivity, and Relation to Common

Knowledge. In N. Eilan, C. Hoerl, T. McCormack, & J. Roessler (Eds.), Joint

attention: Communication and other minds: Issues in philosophy and psychology.

New York : Oxford University Press.

Premack, D., & Woodruff, G. (1978). Does the Chimpanzee Have a Theory of Mind?

Behavioral and Brain Sciences, 1(04), 515.

https://doi.org/10.1017/S0140525X00076512

Rakoczy, H. (2018). Development of Collective Intentionality. In M. Janković & K. Ludwig

(Eds.), The Routledge Handbook of Collective Intentionality (1 [edition], pp. 407–419).

Routledge/Taylor & Francis Group.

Reddy, V. (1996). Omitting the second person in social understanding. Behavioral and Brain

Sciences, 19 (1).

Reddy, V. (2008). How Infants Know Minds. Cambridge MA: Harvard University Press.

Reddy, V. (2015). Joining Intentions in Infancy. Journal of Consciousness Studies, 22(1–2),

24–44.

Reddy, V. (2018). Why Engagement? A Second-Person Take on Social Cognition. In A.

Newen, L. De Bruin, & S. Gallagher (Eds.), Oxford Handbook of 4e Cognition (pp.

433–452). New York: Oxford University Press.

Scaife, M., & Bruner, J. (1975). The capacity for joint visual attention in the infant. Nature,

253, 265–266.

Schiffer, S. (1972). Meaning. New York: Oxford University Press.

32 Schilbach, L. (2010). A second-person approach to other minds. Nature Reviews

Neuroscience, 11(6), 449–449. https://doi.org/10.1038/nrn2805-c1

Schilbach, L. (2015). Eye to eye, face to face and brain to brain: Novel approaches to study

the behavioral dynamics and neural mechanisms of social interactions. Current

Opinion in Behavioral Sciences, 3, 130–135.

https://doi.org/10.1016/j.cobeha.2015.03.006

Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T., & Vogeley, K.

(2013). Toward a Second-Person Neuroscience. Behavioral and Brain Sciences,

36(04), 393–414. https://doi.org/10.1017/S0140525X12000660

Schmid, H. B. (2014). Plural self-awareness. Phenomenology and the Cognitive Sciences,

13(1), 7–24. https://doi.org/10.1007/s11097-013-9317-z

Schweikard, D., & Schmid, H. B. (2013). Collective intentionality. In Stanford Encyclopedia

of Philosophy. https://plato.stanford.edu/entries/collective-intentionality/ (Retrieved

on Aoril 11, 2021)

Seemann, A. (2011). Joint Attention: Toward a Relational Account. In A. Seemann (Ed.),

Joint Attention: New Developments in Psychology, Philosophy of Mind, and Social

Neuroscience (pp. 183–202). Cambridge MA: The MIT Press.

Spaulding, S. (2010). Embodied Cognition and Mindreading. Mind & Language, 25(1), 119–

140. ufh.

Striano, T., & Rochat, P. (1999a). Developmental link between dyadic and triadic social

competence in infancy. British Journal of Developmental Psychology, 17(4), 551–562.

https://doi.org/10.1348/026151099165474

Striano, T., & Rochat, P. (1999b). Social-Cognitive Development in the First Year. In P.

Rochat (Ed.), Early Social Cognition: Understanding Others in the First Months of

Life (pp. 3–34). Erlbaum.

33 Stueber, K. R. (2011). Social Cognition and the Allure of the Second-Person Perspective: In

Defense of Empathy and Simulation. In A. Seemann (Ed.), Joint attention: New

developments in psychology, philosophy of mind, and social neuroscience (pp. 265–

292). Cambridge MA: The MIT Press.

Tollefsen, D. (2005). Let’s Pretend!: Children and Joint Action. Philosophy of the Social

Sciences, 35(1), 75–97. https://doi.org/10.1177/0048393104271925

Tollefsen, D., & Dale, R. (2012). Naturalizing joint action: A process-based approach.

Philosophical Psychology, 25(3), 385–407.

https://doi.org/10.1080/09515089.2011.579418

Tomasello, M. (1995). Joint Attention as Social Cognition. In C. Moore & P. J. Dunham

(Eds.), Joint attention: Its origins and role in development. Lawrence Erlbaum

Associates.

Tomasello, M. (2008). Origins of human communication. Cambridge MA: The MIT Press.

Tomasello, M. (2009). Why we cooperate. Cambridge MA: The MIT Press.

Tomasello, M. (2011). Human culture in evolutionary perspective. In M. J. Gelfand, C. Chiu,

& Y. Hong (Eds.), Advances in Culture and Psychology (pp. 5–51). Oxford University

Press. https://doi.org/10.1093/acprof:oso/9780195380392.001.0001

Tomasello, M. (2014). A natural history of human thinking. Cambridge MA: Harvard

University Press.

Tomasello, M. (2019). Becoming human: A theory of ontogeny. Cambridge MA: Harvard

University Press.

Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and

sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences,

28(05), 675–691.

34 Tomasello, M., & Farrar, M. J. (1986). Joint Attention and Early Language. Child

Development, 57(6), 1454. https://doi.org/10.2307/1130423

Trevarthen, C. (1998). The concept and foundations of infant intersubjectivity. In S. Bråten

(Ed.), Intersubjective communication and emotion in early ontogeny. Cambridge

University Press ; Editions de la Maison des sciences de l’homme.

Trevarthen, C., & Aitken, K. J. (2001). Infant Intersubjectivity: Research, Theory, and

Clinical Applications. Journal of Child Psychology and Psychiatry, 42(1), 3–48.

https://doi.org/10.1111/1469-7610.00701

Trevarthen, C., & Hubley, P. (1978). Secondary Intersubjectivity: Confidence, Confiding and

Acts of Meaning in the first Year. In A. Lock (Ed.), Action, Gesture and Symbol: The

Emergence of Language (pp. 183–229). Academic Press.

Vanderschraaf, P., & Sillari, G. (2014). Common Knowledge.

https://plato.stanford.edu/entries/common-knowledge/ (Retrieved on April 11, 2021)

Vygotskij, L. S. (1978). Mind in society: The development of higher psychological processes.

Cambridge MA: Harvard University Press.

Walden, T. A., & Ogan, T. A. (1988). The Development of Social Referencing. Child

Development, 59(5), 1230. https://doi.org/10.2307/1130486

Watzl, S. (2012). Review of Seemann, A. (2011) Joint Attention: New Developments in

Psychology, Philosophy of Mind, and Social Neuroscience. Notre Dame

Philosophical Reviews. https://ndpr.nd.edu/news/joint-attention-new-developments-in-

psychology-philosophy-of-mind-and-social-neuroscience/ (Retrieved on April 11,

2021)

Wilby, M. (2010). The simplicity of mutual knowledge. Philosophical Explorations, 13(2),

83–100. https://doi.org/10.1080/13869791003759963

35 Zahavi, D. (2008). Simulation, Projection and Empathy. Consciousness and Cognition, 17(2),

514–522. https://doi.org/10.1016/j.concog.2008.03.010

Zahavi, D. (2014). Self and other: Exploring subjectivity, empathy, and shame (First edition).

New York: Oxford University Press.

Zahavi, D., & Rochat, P. (2015). Empathy ≠ sharing: Perspectives from phenomenology and

developmental psychology. Consciousness and Cognition, 36, 543–553.

https://doi.org/10.1016/j.concog.2015.05.008

1 This distinction maps onto to the distinction between endogenously driven and exogeneously driven joint attention (Campbell, 2018). 2 I leave here aside the interesting topic of the distribution of joint attention across sensory modalities, which would merit a more focused treatment (see Botero, 2016; Núñez, 2014). Note, however, that the mentioned example of bottom-up joint attention involves hearing and vision. 3 Somewhat curiously, the literatures on attention and joint attention have remained almost completely disconnected from each other. For discussion, see (León, forthcoming). 4 The phenomenon identified by Bruner and colleagues in the mid 1950’s hadn’t been completely ignored until this point. Consider the following quote from Merleau-Ponty’s Phenomenology of Perception: “My friend Paul and I point to certain details of the landscape, and Paul’s finger, which is pointing out the steeple to me, is not a finger-for-me that I conceive as oriented toward a steeple-for-me; rather, it is Paul’s finger that itself shows me the steeple that Paul sees. […] Paul and I see the landscape “together,” we are co-present before it, and it is the same for the two of us not merely as an intelligible signification, but also as a certain accent of the world’s style, reaching all the way to its haecceity” (Merleau- Ponty, 2012, p. 428, see also 1964, p. 17; Heidegger, 2001, pp. 89, 97). Merleau-Ponty doesn’t employ the label ‘joint attention’, and his example appears to involve an adult-adult interaction. 5 It is worth noting that not everyone agrees with this characterization of joint attention. According to the ‘rich’ characterization of joint attention that I adopt, joint attention involves

36 an awareness of attending together to a relevant target. For theorists who adopt a leaner characterization, gaze following and focusing on the same object might be sufficient conditions for joint attention. (see Butterworth, 1995). 6 It has been proposed that reliable signatures of joint attention are “sharing looks” (J. Hobson & Hobson, 2007) and “knowing smiles” (Moll & Meltzoff, 2011b, p. 289). 7 It is a merit of some key philosophical discussions of joint attention to distinguish the issues of common knowledge and joint attentional openness (Campbell, 2005, p. 295; Peacocke, 2005, pp. 298–299). However, the implications of this move have not always been appreciated, particularly in the psychological literature on joint attention. 8 One can consistently be a reductionist about joint attention, in the sense in which I have introduced the distinction between reductionism and non-reductionism, and a non-reductionist concerning related issues. For example, concerning the presence of human-unique for cooperation and shared intentionality, which—it has been argued—could not be accounted for in terms of the cognitive capacities of nonhuman primates (Tomasello et al., 2005; Tomasello, 2014). When Tomasello refers approvingly to understanding joint attention as an “irreducibly social” (2014, p. 152) phenomenon, what he means by irreducibility is merely that “joint attention only exists when two or more individuals are interacting” (2014, p. 152). Significantly, he then goes on to write that acknowledging that joint attention is irreducible in that sense should not prevent us from investigating “what does the individual bring to the interaction that enables her to engage in joint attention […] for us this means that something like recursive mind-reading or inferring […] has to be part of the story of shared intentionality” (2014, p. 152). As I explain below, this reference to recursive mindreading justifies the attribution to Tomasello of a reductionist view of joint attention. Thanks to an anonymous reviewer for requesting a clarification of this point. One reviewer points out that Tomasello is best understood as holding that analyses are clarificatory, and not reductive- explanatory. I find this interpretation of Tomasello’s work unsupported (see Tomasello, 2019, p. 32). 9 A detailed analysis of the different positions goes beyond the scope of this contribution. Let me note, however, that some approaches to joint attention might not be straightforwardly amenable to the classificatory scheme that I am proposing. I suggest, however, that instead of being a shortcoming of the classificatory scheme between reductive and non-reductive views (fairly standard in other philosophical domains, such as the investigation of collective or shared intentions), this raises the interesting challenge for these approaches of how they

37 would spell out in full their theoretical commitments. For example, Carpenter and Liebal appear to take distance from non-reductionism, when they write that “a common criticism of Campbell’s and especially Searle’s accounts is that they do not really spell out how this “we- ness” is achieved (see, e.g., Pacherie, 2007; Peacocke, 2005). We thus need another approach, one that solves all of these problems at once […]” (Carpenter & Liebal, 2011, pp. 166–167) While this could be taken to indicate some proximity with reductionism, they don’t provide a reductive explanation of the notions, critical on their account, of “communicative” or “sharing looks” (Carpenter & Liebal, 2011, p. 170; see Watzl, 2012). Moreover, if Eilan is right that Carpenter and Liebal’s notion of a sharing look can be spelled out in terms of Eilan’s own non-reductive notion of “communication-as-connection” (Eilan, 2018, p. 10 footnote), Carpenter and Liebal’s account might well be compatible with non-reductionism. Other approaches that lean towards non-reductionism are Moll and Meltzoff’s (2011a, b), and Hobson’s, who refers favourably to Campbell’s work on joint attention (R. P. Hobson, 2011, p. 586). 10 For a related cautionary remark, in the context of debates about social cognition, see (Zahavi, 2014, p. 187) 11 Note that non-verbal false-belief paradigms such as (Onishi & Baillargeon, 2005) target infants’ capacity of first-order mindreading, and not recursive mindreading. This research has supported the view that infants as young as 15 months can attribute false beliefs to another agent. 12 Note that, as presented here, the argument from normative force aims at a negative conclusion against reductionism about joint attention. The argument doesn’t by itself support Campbell’s relational view of joint attention, nor does it depend on Campbell’s account of perceptual experience. Criticisms of Campbell’s account of joint attention that fail to engage with the argument from normative force (Battich & Geurts, 2020) arguably miss on a crucial aspect of that account. 13 I leave aside the question of whether Campbell’s relational theory of perceptual experiences provides appropriate tools to do this. For critical discussion, (see Eilan, ms). 14 Incidentally, this raises the question of whether, on Eilan’s account, joint attention presupposes some sense of we-ness to get off the ground. 15 In a previous publication (Eilan, 2007), Eilan has endorsed what has been called elsewhere the “token identity view” of experiential sharing (see León et al., 2019).

38

16 In spite of the increasing interest in the topic of the second person perspective, a cursory look at some of the ongoing debates will reveal reveal a large disagreement about what is distinctive about a second-person relation. Apart from Eilan’s proposal, according to which a second-person relation requires and is secured by the adoption of a communicative stance towards someone (Eilan, 2014, 2015), it has been proposed that the critical mark of a second- person relation is action (not necessarily communicative) towards someone, in contrast to passive observation (Schilbach, 2010; Gallagher, 2012, 2001, 2008). Further proposals are that marks of the second-personal are affective engagement with someone, in contrast to affective detachment (Schilbach et al., 2013, p. 396; Reddy, 2008, 1996, p. 140), reciprocal interpersonal understanding (Gomez, 1996; de Bruin, van Elk, & Newen, 2012; Fuchs, 2012; Zahavi, 2014) , or the adoption of an ethical stance towards someone (Darwall, 2006; Haase, 2014). To be sure, these proposals are not necessarily incompatible with each other. In the following, I primarily focus on a conceptualization of the second-person in terms of reciprocity and action-oriented social understanding. 17 One might worry that second-person engagements, as I have characterized them, appear to square badly with the evident socio-cognitive asymmetries between infant and caregiver. The worry can be dealt with by noting that the proposed sense in which infant and adult treat one another as co-partners or equals is a fairly minimal sense, i.e. as reciprocating partners. The recognition of the other as like oneself can explain that in spite of the asymmetries between adult and infant, the process of socialization can get off the ground. 18 If so, there might be a relevant difference between how joint attention relates to joint activities in ontogeny and in adulthood. Whereas it is plausible to hold that, in adulthood, joint attention often scaffolds and enables joint activities (suppose that upon jointly attending to the fire alarm, you and your co-attender rush together to help a person who is in shock), or shared emotional responses to targets of concern (León et al., 2019), embedding joint attention within early-developing joint activities raises the question of how and in what sense the latter can enable the former. 19 Thanks to an anonymous reviewer for pressing this point. 20 On the notion of plural subject, see (Schweikard & Schmid, 2013§3.3) 21 Enactive approaches to joint attention have highlighted at least two key issues: (1) joint attention involves motoric and affective coordination, instead of coordination of mental states (Gallagher, 2010a, p. 117, 2020, pp. 107–113); (2) basic forms of joint attention may be accounted for in non-representational terms, appealing to an elementary capacity of “mind

39 minding” which only requires “being able to target and track another’s intentional attitudes” (Hutto, 2011, p. 325).

40