BRIAN MACWHINNEY: In a way that's sort of a segue right here. I guess I had this little fantasy in my mind that we were going to spend two days talking about sharing and standards and as Tim has just pointed out haven't really gotten to it. But, you know, the real world is the real world. Right? I mean, you know, I know Tim's data rather intimately and I know how there are these places where, you know, the way he codes, he's not happy the way I reformat it and this kind of thing that we, you know. And there really are, you know, details that have to be worked out in terms of linking up the gesture line. I think, you know, there's the problem of should the gesture be in a footnote or should it be in a column or should it be in a row and how should it be displayed. Right? I mean just to begin, you know, one of millions of issues. So, you know, that would have been a great discussion, but we didn't have the time I guess to do all that, you know. So I want to talk a little bit about these issues but, you know, the depth I think will not all be there. This is on sharing standards and this work has been funded by NSF and NIH.

Actually, McArthur, I forgot them, long ago also funded this. NSF has funded the Top Bank Project, the Diver

Project to my colleague here to the right, the SCOTUS

Project which is, SCOTUS by the way, can anyone guess what SCOTUS means?

AUDIENCE MEMBER: Dennis Scodis?

BRIAN MACWHINNEY: Very good. No. Supreme Court of the United States. Okay? How about that? Isn't that cute? And it's Dennis Scodis which is also -- Dennis

Scodis was a poet though, not a lawyer. Right? Logician.

Right. Thank you. Okay. That's the Oxford there. Okay.

And it's Chile's funding from NIH and also now, we're getting funding for a project called PHONE which is to look at the phonological level and building a project called

Aphasia Bank which is also known as Can't Talk Bank, but don't tell anyone I said that. Okay. It wasn't said.

Well, I'll tell you we've only done -- why video. I told you my answer about why video earlier that I thought time scales was a crucial reason and that video actually captures the interaction of the time scales in the moment.

But we've sort of said that. I think another important thing, I don't see any IES people here, Institute of

Education, you know, everything has to be evidence based analysis. I think that videos can fit in very well with evidence based analysis as long as you have a research design sort of on the side there. You know, it could be micro and these are some of the eight standard methods that everyone would approve in research design. Video just, you know, fits in well with all of these. In fact, some of these could only be done well with video. It's just that sometimes you will need, say, a design like I've forgotten who it was, same consultant, different communities. Who was that? Whose paper was that? Okay. Rogers. Thank you. You had some more. Actually, on that very slide you had you had a few more and you just didn't have time for the question I wanted to know about them. But there's no reason that -- another one is diffusion analysis. This is great stuff. In a museum study, too, right, was a fusion analysis in a sense, and you can use video for all of these standard research methods. I just really, really think that's important. Not to think that video is just this blah thing you do and you can't prove anything blah, you know.

So standards. Now, we have developed standards and actually, Tim, occasionally, for example, and other people have had input to them. And my approach to standards is that XML is actually standards, neutral assistant and notation of and that within our XML standard our goal is to try to be sort of a Hindu like, that is to take in the many different religions and make them all merged under one. So we spend time after time trying to take different people's standards and actually crunching their data. And it's a lot of work. It's programs that can reformat one standard into the other, and eventually they all become inter-translatable. But you have to then also do a lot of work to verify that you went from one standard to the other and came back and you had identical data. It's called round tripping. Okay. And, of course, we have analytic tools, transcription tools, linkage to media. You know, all of this technology I think is now getting fairly mature, and I don't think the technology really is in any way the barrier. And actually,

I don't really, really think the standards are a barrier either. I think we need to have a lot of communication and talk about them, and there are some rough edges. We've recently tried to pull in Elan gestural data, and we found that our chat coding system has to be hacked in certain ways. And so there are certain places where, you know things have to be done, but they can all be done. Conduct standardization as Roy and others have said, that's a problem that's going to go away. And another thing we've done is streaming media server which is locally deployable.

So we have streaming media from the child servers at CMU, but we also, I'll have another slide on this later, can also deploy that so that if you don't really, really want your data to be public you can run all the software locally and still get the same effects within your local group.

Then, of course, the meta data issue. Not all groups subscribe to OLAC, put all the data which is Online

Language Archive Community. There's the Online Archive

Initiative and so on. And if we also put ISBN numbers on everybody's corpus and so on and so forth. So there's a lot of meta data. And that's an open ended thing. Meta data can go on I think till the end of the field, and that's good. But we have frameworks for distributing it and there are depositories and all that stuff. So the technology's really there. The standards of available programs are available, stable, and tested. Streaming is solved, and now, by the way, Bennett Brentendahl over here is not talking, but he should be in a sense a major player in this. Bennett? Yes. And he's, you know, developing a project with NSF support called Super Lab. Is that the right name for it or you have another name?

ROY PEA: Social Informatics.

BRIAN MACWHINNEY: Okay. And Super Lab is the program inside? Yeah, Social Informatics Data Grid. So it's not just SID. Right, okay. Yeah, right, okay. And taking a lot of this and making it on the grid is the emphasis there. And then, of course, collaborative commentary as

Roy talked about and I guess Ken talked about. In fact, everybody talked about it. Ricki talked about it, too.

Right. So really, Orion and Web Diver and Project PAD and

Talk (Inaudible) Viewer and Clan Web Data and all these things are allowing us to do that. So technology's really there. Data sharing is a bit more questionable. And I sort of -- my view of data sharing in this community has slightly changed over the last two days. What I see here is data sharing within labs. I think this -- and then maybe that will change in the future, and that would be great. But the reality is that I see people who when they talk about data sharing, they mean sharing with my students. Which, you know, is still maybe a useful thing.

Within Chile's we take a very different approach which is data sharing within the community and that the shared database becomes a definition of the community. So now, I wrote this before I sort of started to have a change of feeling on this regarding this field. But in any case, it's certainly true that data sharing's not important for an established researcher. They don't have to share their data. It isn't going to change their career in any way.

It's more a question of what's important for the field.

And I think for the field it's pretty important. You might say, for example, would Google have happened if there hadn't been open access to data? You know, just think of that. The other way is that raw data is really infinitely rich. And you're not really going to ever get scooped.

Okay. I've never seen a case. You might not get cited, and that we have to do something about. That's happened about three times in the 25 years of Chile's where people didn't get cited for the work they did. But really they never, ever got scooped. Okay. I believe a tenured faculty have a responsibility to share data. That's what tenure is about. Right? But I know that guidelines are crucial and we, you know, we've got to work within that.

And that's crucial. I also think federal agencies have a responsibility here. I see some federal people here.

Okay. Okay. So anyway, transcripts (inaudible), we know about that. (Inaudible) some of the groups that have organized, obviously, Child Language with both the Chile's and Phone Bank. Conversation Analysis is sort of moving along. There's a lot of data there in different stages of readiness. Second has become really big with a huge corpora now coming in. And, of course, the

SCOTUS, the Supreme Court, we now have 30 years of oral argumentation at the Supreme Court. You know the number of hours that is? I mean those guys talk a lot, you know.

Each court case is two -- we have about 100 cases per year.

It's just enormous. And we have all the transcripts and all the audio. All we have to do is link the audio to the transcripts. And we can do that in roughly real time, but it's still thousands and thousands of hours. Aphasia Bank is being built. Of course, that will be password protected, but there's a very enthusiastic community behind that, and they're really keen on data sharing. Classroom discourse, of course, we're here. And Linguistic

Exploration, that's the sharing data on endangered languages. Gesture would be probably the work and the

Social Informatics data is a trial balloon trying to get a real data community going for sharing of gesture data.

Social and so on and so forth are other areas.

The Chile's model, I'll just go quickly. There are about 2,000, really over 2,000 published articles based on Chile's data and groups and, you know, we just do lots of stuff. Okay. Okay. Classroom database. There are now about 18 pieces of classroom data in the Chile's bank database. Tim's has just recently put that in. Actually,

I think it's more than six countries, maybe seven. Problem based learning from Tim and Curtis LeBaron, there's about three cases, really beautiful data. Gravity discussions from Turk. Science museum stuff from Irene Ram, from Kevin

Crowley. Some data, classrooms and second language, several classrooms in second language acquisition from Dresden with English, French, and Czech. And, of course, the Grimshaw Oral Defense that we mentioned yesterday from back in the seventies. Some data from Jim Greneau on the garden plot, sort of math learning. A numerical display work from Anna Slar, Kim Clane, Pocabre. Rich Lera has given us, you know, hundreds of hours of video from Carmen

Curtis and quilt patterns in geometry. I have data on my lectures which have been very detailed audio -- video analysis. I'm sorry, gesture analysis has been done on them. Michael Roth, Rich Stevens and so on and so forth,

Moscovich, Horiwich and so on, a different database. Also tutorial interactions. This could greatly be expanded. I think there's very little -- there's usually typically no big IRE problems here. We just need to get a little bit more of these tutorial interaction data.

Okay. Now, Talk Bank Light, there's a couple of version of Talk Bank Light. One is that you have no data sharing. You keep data locally. And we're doing that, for example, in trial areas like at the Medical College at the

University of Southern Denmark. They've, you know, taken all of our software, all of our devices, all of our guidelines and simply have set up a talk bank internally.

Also at the University of Antwerp, University of

Pittsburgh, perhaps Arizona. Another way of looking at Talk Bank Light is to say, well, we're not really going to do transcript. We're just going to do the video. I call it naked video. And we have several of these when we are digitizing all the video even though there are no transcripts. Yes, a little lewd, huh? Naked video. I know. Okay. So and, of course, you know, you have hundreds of hours and you just don't have time to transcript it all, and we're going to have signposts in the video about some interesting things here and there.

Okay. Now, I want to finish up with talking about some secondary analysis. Rogers asked me yesterday has there been evidence of some secondary analysis. Well, of course, there wasn't any data out there so they couldn't really have any secondary analysis. But now, that we do have it out there, it really does offer that possibility.

And Jim Greneau, Carla Vandersend and I are, you know, this is something that actually interests me to really do. So

I'm really into this. Taking a look at these data sets trying to examine a particular claim that I've been working on in terms of my theory of intra perspective taking. And the idea is that learning is the construction of mental models that basically explain things as device representations and that we represent devices through perspectival embodiment and special imagery, actually a combination of inactive and depictive representation. So actually here is, for example if I could get this little baby to play, oh, wait. I don't have audio. Where's audio? No audio plug?

(Setting up the presentation.)

BRIAN MACWHINNEY: Well, the next one's better anyone.

Gabriel's cranking at the science museum and he's learning that the trolley moved back and forth by cranking one way and cranking the other. And so he has a physical causal model of what's going on. It's a fairly limited one, but it's actually working fine for him. It's fairly coherent.

I turn this way, it goes that way. I turn this way, it goes that way. I'm the agent. I'm the cause of it. Okay?

Now, the dad says, hey, now, let's see.

(Playing the video.)

DAD: -- is it useful in the cities. Electricity can be generated far away from where it is needed and transported over the power lines. Oh, Gabriel, you know what this means? Do you know what's going on here?

GABRIEL: What?

DAD: When you turn this crank, you're making electricity. And it goes, let's see, it goes through some wires which I think (inaudible) all the way across to

(inaudible). Where are the wires? BRIAN MACWHINNEY: Okay. So where's the wires?

Right. So he, Gabriel, this was the father. He's interacting with Gabriel, and he has a more elaborated device model about this wire that goes, although he couldn't quite find the wire and he couldn't quite find what was driving it. So they're building bigger and bigger device models. And actually, that's what we found, Jim and

I is that every single one of the instructional interactions can be interpreted as the construction of these devices with more intricacy. And, of course, in the end, none of us really know what electricity is. Right?

But we have this more and more mechanistic idea rather than

I turned it and something magic happened up there, and I don't know how. Now, this one I found really quite interesting.

(Playing the videotape.)

MALE SPEAKER: First, the transmitter is proportional to the difference in how massive they are which is I guess why they always accelerate at the same speed when you drop them which I think is so cosmically weird that I don't -- just assuming that --

FEMALE SPEAKER: Okay. But I am still stuck cause yesterday I thought I understood this. When we stayed after and we were talking, and I said that's why, I don't know, that's why, because of the difference in force depending on mass. That's why they drop the same speed.

MALE SPEAKER: Speed. I have an idea. If --

FEMALE SPEAKER: But I don't get it.

MALE SPEAKER: This is something that --

FEMALE SPEAKER: I can't explain it.

MALE SPEAKER: Let me see. Let me try that. You say this ball here, let's say you have another ball that's ten times more massive where as you were trying to say this is a ten ball, another ball that's --

FEMALE SPEAKER: Ten pounds, right.

MALE SPEAKER: Okay. If you see this larger ball has ten small ball like that, they're all being pulled next to each other. Boom.

FEMALE SPEAKER: Great.

MALE SPEAKER: That's cool.

FEMALE SPEAKER: That is so good. That's great. I love it.

FEMALE SPEAKER: Wow.

FEMALE SPEAKER: And it's so cool.

FEMALE SPEAKER: It's nice. It's (inaudible) to see it right there on the same (inaudible).

MALE SPEAKER: Look at each of these ten balls and

(inaudible). FEMALE SPEAKER: Yes. There is (inaudible) as ten

(inaudible) or something cause you can't --

MALE SPEAKER: That's interesting.

FEMALE SPEAKER: That is great. And so --

MALE SPEAKER: So it's just like you have 11 balls the same size, weight.

FEMALE SPEAKER: Right.

MALE SPEAKER: And so what was the --

BRIAN MACWHINNEY: Okay. Right. So, you know, now, you can go on and on with that. You can also take this whole device notion and use it for math and you understand how people -- here it's much more of an I feel the border of an area. I walk around it. My hands cross and so it's a -- and the embodiment here is not necessarily balls dropping or cranks turning, but it's the embodied representation of your own physical movement which a lot of becomes proceduralized. You know, we understand how to balance equations because we did that maybe back in third grade, you know, and this sort of standard AI notion becomes chunks and proceduralized, and we're working all these details. But that's the idea. And we are formulating -- so we're formulating essentially diagrams based on propositionally representations for all this. The idea is in the end, well, often learner models are fragmentary learning working devices, must have full perspectival linkage and we want to annotate all this that teachers can facilitate link and formation we hope and believe by retracing these perspectival links, building up missing links and so on. Is this cherry picking? So, you know, here's a complete set of everything that has been made available to us so far. We can look at all that.

But, of course, it's just an accidental sampling by the people who were willing to share their data. We obviously need much more, but still it's not total cherry picking.

We're trying to do a comprehensive look of what we have so far. Obviously, coding has -- I mean coding of this stuff is enormously difficult. And we're going to try to, you know, amplify this by collecting more from physics and chemistry in the context of the NSF Pittsburgh Science

Learning Center. So just to give you an example of how I think secondary analysis can work if, you know, if you know what you're, you know, basically looking for. Yeah, again, well, it's just basic (inaudible). So the conclusion is that the infrastructure here is ready, that we really need more data, more awareness perhaps of the data we have and data sharing. Data sharing can provide resources for students, teachers, and parents, too. It is not crucial for individual researchers, you know. If you ask yourself do I need to share data? No, I don't have to, you know.

But it is crucial for the of coherence in this field. Okay. Thank you.