<<

4/07/12'

"Practising Metadata <emph> "The Text Encoding Initiative and "the Architecture of Meaning </emph> </p><p> " Dr Paul Scifleet Charles Sturt University, Sydney, AU [email protected]

With Professor Susan P Williams University of Koblenz, Koblenz, DE

… research focuses on the organisation of knowledge in virtual spaces. He is interested in the design and management of information resources in networked environments, and in particular, the challenges individuals and organisations face in practice. He is concerned with the ways digital networks and information flows shape (and are shaped by) social and organisational communication and the changing dimensions of this for documenting society.

The social organisation of knowledge: the exploration of how our documentary practices are [socially] constituted

1' 4/07/12'

Background and

! Markup Languages (ML) available for over 20 years

! Widespread and growing agreement among professional communities and standards developers about the type of information that must be supported within each domain

! However literature indicates the community vision faces challenges …and many goals may not be realised (cf. Debreceny & Gray (2003); Wrightson (2007);Sperberg-MacQueen & Burnard (2004))

! Arguments for and against the role of markup languages are continuing now, in social media, as programmers argue whether inclusion of metatadata for short message communications (XMPP Vs ATOM Vs JSON .. and combinations within) is worth all the time and effort

4

2' 4/07/12'

The research challenge

XML transformation

XML validation

XMLification Shared

Data store Document input The process and practices of documenting are black-boxed and/or treated as unproblematic and routine The articulation work involved in documenting and documentation is largely invisible

How are the definitions of content and the design of encoded documents being determined in practice?

Aim of my work has been to extend existing understandings by providing an in- depth investigation of documentary practices

5

The Study

› A survey focusing on how ML’s are being used in practice - 2008, 32 respondents, 12 countries {Australia [1], Denmark [1] France [1] Japan [1] Taiwan [1] Canada [2], Italy [2], Nederland [2], Norway [2] Slovenia [2] United Kingdom [4],USA [13]} - Text Encoding Initiative (TEI): All kinds of texts: focus on literary and linguistic works, widely adopted - Chosen for its maturity. Has been very influential for the design of other semantically rich vocabularies, e.g. Extensible Business Reporting Language (XBRL), Health Markup (HL7) and Legal Markup (LegalML) - Research design comprised 3 data collection instruments: (1) the completion of a questionnaire booklet, (2) the contribution of encoded texts to support the study’s analysis (3) interviews conducted by the researcher

6

3' 4/07/12'

Openning up the black box

7

TEI fragment

8

4' 4/07/12'

Criticism and concern

› Paul Caton (2001) has described this type of encoding as: “…a signifying practice strongly implicated in a politically conservative human ideology.” › … because a normative encoding practice favours a mechanistic view of content designation that tends to hide the political and performative aspects of the activity of encoding. › Geoffrey Nunberg (1996) has described digitisation as the “morselization” of text into uniform, structured, quantifiable components. › Both share a belief that there is a changing conception of information within this that requires further attention to the practices that surround the design of the material documentary form if we are to ensure that the effort of changing to digital structures and processes does not result in the loss of substantial, meaningful and material human knowledge.

9

TEI fragment

10

5' 4/07/12'

Research lens: Practice Theory

1. Prioritises the array of practices that converge on the research phenomena as a field of practice 2. Explores more than routine and normative elements of the activity. 3. Presents a field of practice as a structuring space where interactions amongst the elements of the field are determined by the resources available to the field, their interactions with the habitus (background, knowledge and feel for the game) of the practitioner and the interrelationships of these things with each other. 4. Is allied with a social constructionist epistemology that emphasises the relational idiom between people and artefacts and their interaction. 5. Is an interpretive approach to research that presents practice as theory. 11

Conceptual framework for the Field of Practice

12

6' 4/07/12'

Mixed method study

" 3 interrelated data collection techniques (questionnaire, markup analysis, in- depth interviews) to account for both sides of the encoding relationship (the documenter’s relationship to the document) " Questionnaire survey of 32 TEI markup projects (quant. and qual. questions) " In-depth automated markup analysis (28 projects submitted 630 text files) " In-depth interviews with participants representing 15 projects

13

Managing findings

14

7' 4/07/12'

Themes in documentary practice

15

Conceptual framework for the Field of Practice

16

8' 4/07/12'

Themes in documentary practice

17

Key findings: ML as Standard

18

9' 4/07/12'

Themes in documentary practice

19

Key findings: Mission

" Rate the degree of autonomy your unit has in deciding which text encoding projects to proceed with.

" While more than 78% of scholar-practitioners working on digital encoding projects rated their autonomy as highly autonomous or more, all library- respondents rated their autonomy from somewhat autonomous to not all

Raising two issues that we explored further in the study…

20

10' 4/07/12'

Questions of Professional Difference

“The TEI’s adoption as a model in projects raised some interesting issues about the whole philosophy of the TEI, which had been designed mostly by scholars who wanted to be as flexible as possible… A rather different philosophy prevails in library and information science where standards are defined and then followed closely – this to ensure that readers can find books easily.” (Susan Hockey, discussing the history of the TEI 2004) › Are there different encoding practices emerging within different professional areas of responsibility in the governing institutions, most particularly between academic scholarship and librarianship? › This study could find no evidence of differences between the encoded texts that could be attributed to scholar and librarian › A sense of autonomy may play out in practice and there are different encoding models in play, but they are not attributable to distinctions between scholars and librarians

21

Example of markup analysis

Figure 6.12: Profile of encoded document batches showing tag use

Paul Scifleet Documentary Practice 181

22

11' 4/07/12'

Figure 6.12: Profile of encoded document Example batches showing of tagmarkup use analysis Figure 6.12: Profile of encoded document batches showing tag use

23

Example of markup analysis

24

Paul Scifleet Documentary Practice 181 Paul Scifleet Documentary Practice 181

12' 4/07/12'

The Perception of Autonomy

› The study identified significant organisational influences on the decision making of encoders that influences both what to encode and how it is encoded (depth & detail) › All participants acknowledged - The service orientation of their work (production, for a sometimes unknown user) - Conformance to collaborative projects or production for database distribution and subscription services › However, in a scholarly environment particularly, the influence of organisational arrangements on encoding choices and the institutional influence on documentary practice is often subtle and evidenced only in the general context of organisational doxa (factors like in-kind support and limitations on resources are playing a role but often not recognised for what they are): “…under these conditions many constraints are not noticed until they are breached.”

25

Key finding

1. An emergent documentary practice - The study presents a profile of documentary practice previously unexplored 2. The document - The profiles of encoded document show no discernible differences based on pre- existing professional domains of practice (not traditional humanities scholarship or librarianship) 3. The documentary task - The categorized list of professionals involved in documentary practice presents an unexpectedly complex picture of collaborative document production - Project directors; IT specialists; content, metadata and markup specialists; digital collections and preservation specialists; editors; financial management and other specialised advisory roles including legal 4. A design process - The patterns show a consistent pattern in design: document analysis, similar procedures for encoding, end user analysis and post implementation review occurring

26

13' 4/07/12'

Implications and directions

› Conceptual framework of practice informs the logic of practice but it is not a logical model for document description - It is arguable that such models are still needed › Findings present a description of the processes and procedures for working with markup languages - Help to achieve a better articulation of the dimensions of the problem space - Improve confidence, education and training › Variables and significant relationships for further investigation are now known - The use of batch text analysis could be extended - Applying the study (theory & methodology) to other ML’s › Characteristics of practice inform our understanding (practitioner & academic) of a rich and generative social practice

27

Social IA:

Investigating the information architecture of social media to support an understanding of the social construction of knowledge (through Web 2.0)

• Perceptions of Privacy & consumer information management in Web 2.0 • Enterprise Social Networks (genres of communication in EMB ~ the Yammer project) • Social media, data journalism and (the GNIP project)

14' 4/07/12'

Picture: AAP Image/Lukas Coch, 7 March, 2012

Implications and directions

30

15' 4/07/12'

Implications and directions

31

Implications and directions

32

16' 4/07/12'

Implications and directions

33

Implications and directions

› How can information managers work with the information architecture of social media (acquiring, managing disseminating)? › How can that information architecture support qualitative research in social studies? › QMA › Genre analysis › Communicative practice (Jurgen Habermas and Speech Act theory)

34

17' 4/07/12'

Some references made in this presentation

› Caton, P.: Towards a Politics of Text Encoding, Association for Computers and the Humanities/Association for Literary and Longuistic Computing, Annual Conference, 2001, New York › Debreceny, R. & Gray G.L.: ‘The production and use of semantically rich accounting reports on the internet: XML and XBRL. International Journal of Accounting Information Systems. 2, 47-74 (2003) › Hockey, S.; History of Humanities Computing, in S. Schreibman, R. Siemens and J. Unsworth (Eds.), A Companion to . Blackwell Publishing, Malden MA, 3-19 (2004) › Nunberg, G.: Farewell to the Information Age, in G. Nunberg (Ed.), The Future of the Book, University of California Press, Berkeley & Los Angeles: California, 103-138. (1996) › Wrightson, A.: Is it Possible to be Simple Without Being Stupid? Exploring the Semantics of Model-driven XML. Extreme Markup Languages, (2007) › Sperberg-MacQueen, C.M., Burnard, L.: Guidelines for Electronic text Encoding and Interchange, The TEI Consortium, (2004)

35

Questions?

[email protected]

18'