Algorithms and Architectures for Multimedia and Beamforming in Communications

Yao, K. "Algorithms and Architectures for Multimedia and Beamforming Communications" The VLSI Handbook. Ed. Wai-Kai Chen Boca Raton: CRC Press LLC, 2000 © 2000 by CRC PRESS LLC 79 Algorithms and Architectures for Multimedia and Beamforming in Communications 79.1 Introduction 79.2 Multimedia Support for General Purpose Computers Extended Instruction Set and Multimedia Support for General Purpose Processors • Multimedia Processors and Flavio Lorenzelli Accelerators • Architectures and Concepts for Multimedia ST Microelectronics, Inc. Processors Kung Yao 79.3 Beamforming Array Processing and Architecture Interference Rejecting Beamforming Antenna Arrays • Smart University of California at Los Angeles Antenna Beamforming Arrays 79.1 Introduction In recent years, we have experienced many explosions in various technologies in information processing, transmission, distribution, and storage. In this chapter, we will address two distinct but equally important algorithms and architectures of multimedia and beamforming array processings in communication systems. The first problem is motivated by our need for efficient and real-time presentation of still image, live-action video, and audio information commonly called “multimedia.” To most users, multimedia presentation is easier and more natural to comprehend compared with the traditional textual form of presentation on paper. The second problem is motivated by a desire to transfer tremendous amounts of information from one location to another over limited frequency-polarization-space-time channel constraints. Beamforming array processing technologies are used to coherently transmit or receive information over these channel constraints that can significantly improve the performance of a single transmit or receive antenna system. Since both of these problems are of fundamental interest in VLSI and practical implementations of modern communication systems, we consider in detail the basic signal processing algorithmic and architectural limitations of these problems. In Secection 79.2, we first consider the intense computational requirements for displaying images and video using a general purpose PC. The pros and cons of using a separate processor or accelerator to enhance the main CPU are also introduced. Then, “Extended Instruction Set and Multimedia Support for General Purpose Processors” discusses issues related to the extended instruction set for media support © 2000 by CRC Press LLC for a general purpose PC, while “Multimedia Processors and Accelerators” considers in some detail media processors and accelerators. “Architectures and Concepts for Multimedia Processors” deals with the architectures and concepts for multimedia processors. The very long instruction word (VLIW) architecture and the SIMD and associates subword parallelism issues are presented and compared. Four tables are used in Section 79.2 to summarize some of the involved basic operations. In Section 79.3, we discuss the use of beamforming operation originally motivated by various military and aerospace applications, but more recently in many civilian applications. In “Interface Rejecting Beamforming Antenna Arrays”, we consider some early simple, sidelobe-canceller, beamforming arrays based on the LMS adaptive criterion, then we discuss in some detail various aspects of a recursive least- squares criterion QR decomposition-based beamforming array. In “Smart Antenna Beamforming Arrays”, we consider the motivation and evolution of the smart antenna system and its implication. 79.2 Multimedia Support for General Purpose Computers In any computer store, it is fairly common nowadays to see rows of computer screens running video and audio clips. Voices, sounds, and moving pictures are attracting large numbers of users who now require as basic features software applications, which include voice mail and videoconferencing. Advertisements for home computers virtually never miss the buzz words “multimedia,” “interactive,” and “Internet.” Over time, audio and video, as well as network connectivity, have become integral parts of many user applications. This change in users’ expectations has been the cause for a major shift in the operations required of a general-purpose computer. The focus of computer designers and vendors has moved to plain word processing and spreadsheet applications to highly demanding tasks, such as real-time compression and decompression of audio and video streams. CPUs are now required to process large amounts of data fast, and allow for different processes to run simultaneously. For instance, one might have different windows open at the same time, one running a streaming video, another playing an audio file, another showing an open internet connection, etc. In addition, PCs are now expected to provide high-quality graphics, 3-D moving pictures, good quality sound, and full-motion support (e.g., MPEG-2 decoders, etc.). Most computer vendors and processor designers are therefore making huge efforts to alter or enrich their products so as to be able to provide these new multimedia services. In order to better envision the complexity required by some multimedia tasks, consider what is involved in operations such as 3-D image rotation/zooming, etc. Each 3-D object consists of hundreds or even thousands of polygons, usually triangles. The smaller the polygons — and consequently the more numerous — the more detailed the object. When an object moves on the screen (rotates or moves forward/backward), the program must recalculate every vertex of every polygon by performing a number of matrix computations. A typical 3-D object might have 1000 polygons, which means that each time it moves, at least 84,000 multiplies and adds are executed. The new MPEG-4 audio/video standard (release date November 1998) addressed, among other things, the need for high interactive functionality, i.e., interactive access to and manipulation of audio-visual data, and content-based interactivity. The standard will provide an object-layered bit-stream. Each object can be separately decoded and manipulated (scaled, rotated, translated). Even 300-MHz Pentium II’s cannot supply enough coordinates for more than a million polygons per second. Accelerator cards or graphics chips are often required for high-end, graphics intensive applications, to render polygons faster than any CPU can keep up. One of the major drives to multimedia PCs was due to Microsoft’s APIs, particularly the recent DirectX interfaces. With DirectX, it is easier to replace audio and video hardware while retaining software compatibility. However, as long as older applications remain alive, multimedia PCs will have to provide compatibility with existing register level standards, e.g., superVGA and SoundBlaster. Register-level interfaces were not designed to support more advanced audio/video features, e.g., 3-D graphics. Most game programs avoid old APIs because of their poor performance. They run directly under DOS and access audio and video chips directly. Microsoft created DirectX, a new set of APIs, in order to avoid having to modify existing programs to support new graphics devices as they show up on the market. © 2000 by CRC Press LLC Multimedia is likely to reach the electronics consumer market also in ways that are still unclear, for instance, in what has been dubbed home entertainment. Vendors envision gamuts of devices which combine the functions of a PC with traditional TV sets. PCs with dual-purpose monitors and set-top boxes will soon offer functionalities ranging from DVD to videophone and interactive 3-D gaming. The PC might become the basis for a living room entertainment center which includes a TV tuner, DVD, Dolby audio, 3-D graphics, etc. Different companies have chosen different approaches to multimedia and made widely different decisions. Some companies, notably Intel and Sun Microsystems among others, have added circuitry to their new generation processor and enhanced their instruction sets in order to handle multimedia data. Other companies, e.g., Philips and Chromatics, have preferred to build whole new processors from the ground up which could work either independently or in alliance with another processor, solely in charge of audio, video, and network connectivity. There are also those who have been thinking of thoroughly new architectures and concepts, which could open entirely different perspectives and computation par- adigms. In the following, we will provide a (necessarily incomplete and temporary) snapshot of the situation as it appears at the end of the 1990s, when the authors are writing. The solutions pursued by the various companies display tradeoffs. What will be the most successful approach is open to debate and will probably be understood only in hindsight. Nonetheless, some considerations can be made now. The main drawback of external multimedia accelerators is undoubtedly the cost.15 The home PC market has been very sensitive to cost, and most vendors are unwilling to add expensive features which could deter possible customers and therefore reduce their market penetration. On the other hand, software developers are unlikely to generate applications tailored for multimedia processors and accelerators unless there is a sufficiently large installed based. The alternative to a separate processor or accelerator is to enhance the main CPU to allow it to process multimedia data.23 As of today, there are few CPUs which have enough “horsepower” to simultaneously handle the operating system, all standard applications, as well as audio decoding

Algorithms and Architectures for Multimedia and Beamforming in Communications

Computer Science 246 Computer Architecture Spring 2010 Harvard University

Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo's Algorithm

Multiple Instruction Issue and Completion Per Clock Cycle Using Tomasulo’S Algorithm – a Simple Example

Geode™ Gxm Processor Integrated X86 Solution with MMX Support

How Microprocessors Work E 1 of 64 ZM

MICROPROCESSOR REPORT the INSIDERS’ GUIDE to MICROPROCESSOR HARDWARE Slot Vs

Tomasulo's Algorithm

Tomasulo's Algorithm

Tomasulo Algorithm and Dynamic Branch Prediction

WCAE 2003 Workshop on Computer Architecture Education

Verification of an Implementation of Tomasulo's Algorithm by Compositional Model Checking

MIPS Architecture with Tomasulo Algorithm [12]