Digital Content Annotation and Transcoding for a Listing of Recent Titles in the Artech House Digital Audio and Video Library, Turn to the Back of This Book
Total Page:16
File Type:pdf, Size:1020Kb
Digital Content Annotation and Transcoding For a listing of recent titles in the Artech House Digital Audio and Video Library, turn to the back of this book. Digital Content Annotation and Transcoding Katashi Nagao Artech House Boston * London www.artechhouse.com Library of Congress Cataloging-in-Publication Data Nagao, Katashi. Digital content annotation and transcoding / Katashi Nagao. p. cm. — (Artech House digital audio and video library) Includes bibliographical references and index. ISBN 1-58053-337-X (alk. paper) 1. File processing (Computer Science). 2. Data structures (Computer Science). 3. Information storage and retrieval systems. 4. Digital media. 5. File conversion (Computer Science). I. Title. II. Series. QA76.9.F53N34 2003 005.74’1—dc21 2002043691 British Library Cataloguing in Publication Data Nagao, Katashi Digital content annotation and transcoding. — (Artech House digital audio and video library) 1. Digital media 2. Multimedia systems I. Title 006.7 ISBN 1-58053-337-X Cover design by Christina Stone q 2003 ARTECH HOUSE, INC. 685 Canton Street Norwood, MA 02062 All rights reserved. Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system without permission in writing from the publisher. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. International Standard Book Number: 1-58053-337-X Library of Congress Catalog Card Number: 2002043691 10987654321 To Kazue Contents Preface xiii Acknowledgments xv 1 Introduction: Survival in Information Deluge 1 1.1 Digital Content Technology 2 1.2 Problems of On-Line Content 3 1.3 Extension of Digital Content 4 1.4 Organization of This Book 9 1.4.1 Chapter 2—Transcoding: A Technique to Transform Digital Content 9 1.4.2 Chapter 3—Annotation: A Technique to Extend Digital Content 9 1.4.3 Chapter 4—Semantic Annotation and Transcoding: Towards Semantically Sharable Digital Content 9 1.4.4 Chapter 5—Future Directions of Digital Content Technology 10 References 10 2 Transcoding: A Technique to Transform Digital Content 11 2.1 Representation of Digital Content 13 2.1.1 HTML 13 2.1.2 XHTML 15 2.1.3 WML 19 2.1.4 VoiceXML 21 2.1.5 SMIL 23 vii viii Digital Content Annotation and Transcoding 2.2 Content Adaptation 24 2.2.1 XSLTs 25 2.3 Transcoding by Proxy Servers 28 2.3.1 HTML Simplification 28 2.3.2 XSLT Style Sheet Selection and Application 31 2.3.3 Transformation of HTML into WML 32 2.3.4 Image Transcoding 34 2.4 Content Personalization 35 2.4.1 Transcoding for Accessibility 35 2.5 Framework for Transcoding 43 2.5.1 Transcoding Framework in Action 46 2.5.2 Limitations of the Transcoding Framework 49 2.5.3 An Implementation of Transcoding Proxies 50 2.5.4 Technical Issues of Transcoding 54 2.5.5 Deployment Models of Transcoding 55 2.6 More Advanced Transcoding 58 References 59 3 Annotation: A Technique to Extend Digital Content 61 3.1 Frameworks of Metadata 62 3.1.1 Dublin Core 62 3.1.2 Warwick Framework 65 3.1.3 Resource Description Framework 67 3.1.4 MPEG-7 75 3.2 Creation of Annotations 81 3.2.1 Annotea 82 3.2.2 Web Site–Wide Annotation for Accessibility 92 Contents ix 3.3 Applications of Annotation 101 3.3.1 Multimedia Content Transcoding and Distribution 101 3.3.2 Content Access Control 105 3.4 More on Annotation 116 3.4.1 Annotation by Digital Watermarking 116 3.4.2 Semantic Interoperability by Ontology Annotation 117 3.4.3 Resource Linking and Meta-Annotation 122 References 128 4 Semantic Annotation and Transcoding: Towards Semantically Sharable Digital Content 131 4.1 Semantics and Grounding 133 4.1.1 Ontology 133 4.1.2 Information Grounding 135 4.2 Content Analysis Techniques 138 4.2.1 Natural Language Analysis 138 4.2.2 Speech Analysis 149 4.2.3 Video Analysis 152 4.3 Semantic Annotation 155 4.3.1 Annotation Environment 157 4.3.2 Annotation Editor 158 4.3.3 Annotation Server 158 4.3.4 Linguistic Annotation 160 4.3.5 Commentary Annotation 164 4.3.6 Multimedia Annotation 167 4.3.7 Multimedia Annotation Editor 168 4.4 Semantic Transcoding 175 4.4.1 Transcoding Proxy 176 4.5 Text Transcoding 178 4.5.1 Text Summarization 178 x Digital Content Annotation and Transcoding 4.5.2 Language Translation 181 4.5.3 Dictionary-Based Text Paraphrasing 181 4.6 Image Transcoding 188 4.7 Voice Transcoding 189 4.8 Multimedia Transcoding 189 4.8.1 Multimodal Document 191 4.8.2 Video Summarization 191 4.8.3 Video Translation 193 4.8.4 Multimedia Content Adaptation for Mobile Devices 193 4.9 More Advanced Applications 194 4.9.1 Scalable Platform of Semantic Annotations 194 4.9.2 Knowledge Discovery 196 4.9.3 Multimedia Restoration 198 4.10 Concluding Remarks 203 References 204 5 Future Directions of Digital Content Technology 209 5.1 Content Management and Distribution for Media Businesses 210 5.1.1 Rights Management 211 5.1.2 Portal Services on Transcoding Proxies 213 5.1.3 Secure Distribution Infrastructure 214 5.2 Intelligent Content with Agent Technology 214 5.2.1 Agent Augmented Reality 215 5.2.2 Community Support System 217 5.3 Situated Content with Ubiquitous Computing 218 5.3.1 Situation Awareness 219 5.3.2 Augmented Memory 220 5.3.3 Situated Content in Action 220 Contents xi 5.4 Towards Social Information Infrastructures 222 5.4.1 Content Management and Preservation 225 5.4.2 Security Mechanisms 225 5.4.3 Human Interfaces 226 5.4.4 Scalable Systems 227 5.5 Final Remarks 228 References 228 About the Author 231 Index 233 Preface Today digital content is no longer limited to exclusive people. In addition to World Wide Web pages, which are the epitome of digital content, there are many more familiar examples such as i-mode for cell phones, photographs taken by digital cameras, movies captured by digital camcoders, compact discs, music clips downloaded from the Internet, and digital broadcasting. Audiences do not have to care if content is digital or not; however, whether they are produced digitally or not means a great deal. Digitization has made it easier for us to duplicate these products, and as a result, information remains stable. In short, digitization has facilitated the storage of information. Information storage, however, is not the only advantage of digitization. It is more important to recognize that manipulation (like editing) has become automatic to some degree. This is not to say that information was unable to be stored and edited at all before. Long ago, for instance, some could change the information on a paper, such as a Buddhist scripture, into another form, such as inscription on a stone. In modern times, some painters imitate paintings and add their own originality, and even some ordinary people record TV programs to edit to extract their favorite scenes and reproduce their own programs. Digital content enables more because it is programmable to manipulate information. Suppose a person consistently records a favorite program. Its first 15 minutes is boring, but the person wants to see the opening. So the person has to record the whole program, from the beginning to the end. If the structure of the TV program is the same, it should be possible to automatically extract the unwanted first 15 minutes by programming. In another example, it would be difficult to extract a specific part from printed material that contains more than one million pages; it would be very easy, however, to extract the necessary part with a program for extracting topics whose content is digitized. Thus, digital content has largely changed the way of dealing with information. This book describes some experiments of further developing xiii xiv Digital Content Annotation and Transcoding digital content to make it possible for people to customize and enjoy it more interestingly. This book focuses on two technologies: transcoding and annotation. Transcoding is a technology to program the transformation of digital content. This enables content to be changed to the most preferable form for audiences. Annotation is a technology that provides additional information to extend content so as to facilitate the programming. Annotation can also play the role of hyperlink, which connects several on-line resources. The purpose of this book is not only to introduce these two technologies precisely but to propose the system that integrates these two and to prospect the future directions. I have great expectations for the future technology of digital content for I believe that the digital content produced by a number of folks can motivate people to better understand both themselves and world events so that society at large will change for the better. I also believe that it is the principal responsibility for researchers to produce a technology that maximizes the utilization of the digital content. It might be outrageous to intend to change social systems; however, on the condition that these technologies can facilitate the voluntary actions of many people and can turn them to the right direction, people will come to notice the nature of the issue. As a result, these technologies will put the things that may at first seem impossible into practice.