A PACKETIZED DISPLAY PROTOCOL ARCHITECTURE FOR

INFRARED SCENE PROJECTION SYSTEMS

by

Aaron Myles Landwehr

A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical & Computer Engineering

Fall 2020

© 2020 Aaron Myles Landwehr All Rights Reserved A PACKETIZED DISPLAY PROTOCOL ARCHITECTURE FOR

INFRARED SCENE PROJECTION SYSTEMS

by

Aaron Myles Landwehr

Approved: Jamie D. Phillips, Ph.D. Chair of the Department of Electrical and Computer Engineering

Approved: Levi T. Thompson, Ph.D. Dean of the College of Engineering

Approved: Louis F. Rossi, Ph.D. Vice Provost for Graduate and Professional Education and Dean of the Graduate College I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: Fouad E. Kiamilev, Ph.D. Professor in charge of dissertation

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: Chase J. Cotton, Ph.D. Member of dissertation committee

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: Xiaoming Li, Ph.D. Member of dissertation committee

I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy.

Signed: St´ephaneZuckerman, Ph.D. Member of dissertation committee ACKNOWLEDGMENTS

I thank my committee for agreeing to be on my committee. I thank my col- leagues who helped with the endeavor of realizing PDP on actual hardware: Andrea, Chris, Daniel, and Tyler. I thank the rest of my colleagues past and current who helped in any way with PDP through their actions: Alex, Alexis, Andrew, Ben, Casey, Garret, Hamzah, Jaclyn, Jake, Jeff, Johnny, Jon, Josh, Kassem, Katie, Matt, Matt2, Mateo, Michelle, Miguel, Mike, Peyman, Rebekah, Rodney, Spencer, Tianne, Zack. I thank my friends who supported me over the years: Angela, Diego, Laura, Jose. I thank my family who supported me over the years: Joshua and my mom. I thank my therapist who supported me these past two years: Marcus. Finally, I thank these animals: Aurora, Chowder, Europa, Hal, Hazel, Kiddles, Kosmo, Mewist, Molly, Muffin, Pumpkin, Snickers, Tachi, and these unnamed animals: Birds, Fish, Furbies, Kittens, Mama Cat, Puppies, Random Animals, Tamagotchis. The work discussed within this dissertation was partially funded by (a) Air Force STTR Program AF18A-T017 ‘Next Generation Infrared Scene Projectors for Testing MWIR Systems’ (Contract FA8650-19-C-1948), and (b) the Test Resource Management Center (TRMC) Test and Evaluation/Science & Technology (T&E/S&T) Program through the US Army Program Executive Office for Simulation, Training, and Instrumentation (PEO STRI) under Contract No. W900KK-17-C-0012. I thank ONSemiconductor for fabricating silicon arrays, Firefly Photonics and the University of Iowa for fabricating Infrared LED arrays, and Teledyne Scientific for hybridizing Silicon and LED arrays. Their fabrication effort enabled us to build and test the projector system(s) described in this dissertation.

iv TABLE OF CONTENTS

LIST OF TABLES ...... viii LIST OF FIGURES ...... ix ABSTRACT ...... xiii

Chapter

1 INTRODUCTION ...... 1

2 BACKGROUND ...... 5

2.1 IRLED Scene Projector History ...... 5 2.2 IRLED Projection Process ...... 8

3 PROBLEM FORMULATION ...... 11

3.1 Display Protocol Limitations ...... 11 3.2 High-speed IRLED Scene Projector Systems ...... 12

3.2.1 Hardware Limitations ...... 13 3.2.2 Software Limitations ...... 13

3.3 Problem Statement ...... 14 3.4 Problem Solution ...... 14

4 SYSTEM OVERVIEW ...... 18

4.1 Close Support Electronics ...... 18 4.2 Communication Flow ...... 21

4.2.1 Internal CSE Communication ...... 23 4.2.2 External System Communication ...... 24

v 5 ARRAY WRITE PROCESS ...... 30

5.1 Array Interleaved Write Process ...... 30 5.2 Data ordering ...... 37

6 DISPLAY PROTOCOLS ...... 45

6.1 Conventional Display Protocols ...... 45 6.2 Display Protocols within IRSP Technology ...... 51

7 PACKETIZED DISPLAY PROTOCOL ...... 53

7.1 Design Methodology ...... 53 7.2 Comparison ...... 55 7.3 Packet Format ...... 59 7.4 Packet Types ...... 60 7.5 PDP Stream Decoding ...... 62 7.6 Overhead ...... 65 7.7 Multi- Performance ...... 70

8 MACHINE MODEL ...... 74

8.1 Hardware Mapping ...... 74 8.2 Compositing ...... 77

9 IMPLEMENTATION ...... 82

9.1 HDMI Transport Layer ...... 82 9.2 Abstract Architecture ...... 86 9.3 Frontend Architecture ...... 87 9.4 Overall Backend Architecture ...... 88 9.5 Synchronized Circular Buffer ...... 89

9.5.1 Controllers ...... 89 9.5.2 Routing ...... 90 9.5.3 Internal Buffer and Memory Synchronizer ...... 91

9.6 Array Emitter ...... 96 9.7 State Machines ...... 97 9.8 Write Buffer ...... 99

vi 10 EXPERIMENTAL RESULTS ...... 101

10.1 Memory Synchronizer ...... 101

10.1.1 Simulation ...... 101 10.1.2 Experimental Testing ...... 104

10.2 Firmware ...... 106

10.2.1 Simulation ...... 106 10.2.2 Characterization ...... 107

10.2.2.1 Array Maps ...... 108 10.2.2.2 Fractional Difference Maps ...... 109 10.2.2.3 Non-uniformity Corrected Imagery ...... 114 10.2.2.4 Analog Bandwidth ...... 115

10.3 Packetized Operation ...... 116

10.3.1 Normal-speed ...... 117 10.3.2 High-speed ...... 118 10.3.3 Multi-frame Rate ...... 119

10.4 Summary ...... 120

11 CONCLUSION ...... 122

REFERENCES ...... 125

vii LIST OF TABLES

3.1 Bandwidth requirements of a conventional display protocol ..... 12

6.1 VESA Coordinated Video Timing (CVT) Modeline ...... 50

7.1 Modeline Overhead ...... 57

7.2 List of PDP Packets ...... 61

7.3 PDP Maximum Packet Overhead ...... 66

7.4 Multi-frame Rate Bandwidth Savings ...... 72

9.1 PDP Select Communication APIs ...... 88

9.2 Full/Empty Memory Synchronizer State Transitions ...... 94

viii LIST OF FIGURES

2.1 IRLED Scene Projector System ...... 6

2.2 IRLED Scene Projector Technology Development Timeline Overview7

2.3 SLEDs Array Ratios ...... 8

2.4 Typical IRLED Projection Process ...... 9

3.1 Dynamic frame rate display with multiple regions updating at different frame rates ...... 16

4.1 SLEDS System Block Diagram ...... 20

4.2 Example Hybrid Round Boards ...... 21

4.3 CSE Internals ...... 22

4.4 CSE Externals and Empty Chassis ...... 22

4.5 CSE Internal Communication Block Diagram ...... 24

4.6 CSE External Direct Communication Block Diagram ...... 26

4.7 CSE External Indirect API Communication Block Diagram .... 27

4.8 CSE External Indirect API and Data Communication Block Diagram 28

5.1 TCSA, NSLEDS, and HDILED Array Quadrant Layouts ...... 32

5.2 NSLEDS/HDILED Array Super Pixel Layout ...... 34

5.3 NSLEDS/HDILED Super Pixel Grid Layout ...... 34

5.4 NSLEDS/HDILED Array Interleaved Pixel Mapping Per Write .. 36

ix 5.5 Bit-packing Format ...... 38

5.6 Image Encoding: Input Reordering ...... 39

5.7 Image Encoding: Quadrant Reordering ...... 39

5.8 Image Encoding: Quadrant Reordering with Color Overlay ..... 39

5.9 Image Encoding: Data Bit-Packing ...... 41

5.10 Image Encoding: Data Reorder ...... 41

5.11 Image Encoding: Color Example 1 ...... 43

5.12 Image Encoding: Color Example 2 ...... 43

5.13 Image Encoding: Color Example 3 ...... 43

5.14 Image Encoding: IR Example 1 ...... 44

5.15 Image Encoding: IR Example 2 ...... 44

6.1 Display Protocol Timing Overview ...... 47

6.2 Display Protocol Horizontal Signal Cross Section Timing ...... 48

6.3 Display Protocol Full Signal Cross Section Timing ...... 48

6.4 Custom Synchronization Solution ...... 52

7.1 Display Port Framing ...... 59

7.2 Example PDP Stream ...... 63

8.1 Hardware Mapping of an IRLED Projection Process ...... 75

8.2 Abstract Machine Model of the PDP architecture with 1-to-N relationships between components ...... 76

8.3 Compositing Process Example ...... 78

8.4 Compositing Process Overlayed ...... 79

x 8.5 Average Intensity Map of PDP Regions for Composited Frames .. 80

9.1 Normal Frame with Display Data ...... 83

9.2 Draw Region Packet ...... 84

9.3 Embedded PDP Frame ...... 84

9.4 Embedded PDP Frame to Array Mapping ...... 85

9.5 Abstract PDP Firmware Backend Architecture ...... 87

9.6 Overall PDP Backend Architecture ...... 89

9.7 Synchronized Circular Buffer Architecture ...... 90

9.8 Synchronized Internal Buffer Architecture ...... 91

9.9 Synchronizer Double Handshake ...... 92

9.10 Full/Empty Memory Synchronizer Circuit ...... 93

9.11 Two Flip-flop Synchronizer ...... 93

9.12 Array Emitter Architecture ...... 97

9.13 PDP State Machine ...... 98

9.14 Write Buffer Architecture ...... 100

10.1 Behavioral Simulation of a Single Pixel Being Buffered Through the Full/Empty Memory Synchronizer ...... 102

10.2 Behavioral Simulation of Multiple Being Buffered Through the Full/Empty Memory Synchronizer ...... 104

10.3 ZYBO Test Setup ...... 105

10.4 Checkerboard Input Image ...... 105

10.5 CRC-like 16-pixel Stream ...... 106

10.6 CRC-like Input Image ...... 106

xi 10.7 Simulation of Single HDMI Input ...... 107

10.8 Simulation of PDP Output ...... 108

10.9 Pixel Sweep Grid ...... 109

10.10 Array Map of PDP Firmware Output ...... 110

10.11 Array Map of SNAP Firmware Output ...... 111

10.12 Fractional Difference Map of PDP Firmware Output ...... 112

10.13 Fractional Difference Map of SNAP Firmware Output ...... 113

10.14 Still Image Capture of a Non-uniformity Corrected Image from the PDP Firmware Operating at 100Hz ...... 114

10.15 Still Image Capture of a Grid from the PDP Firmware Operating at 100Hz ...... 115

10.16 Comparison of Imagery Captured from the PDP Firmware Operating at 100hz and 400hz ...... 116

10.17 Still Image Capture of Counting Numbers from the PDP Firmware Operating at 60Hz ...... 117

10.18 Still Image Capture of a Moving Object from the PDP Firmware Operating at 1Khz ...... 118

10.19 Still Image Captures of a Counting Number from the PDP Firmware Operating at 2Khz ...... 119

10.20 Still Images Captures of a Non-uniformity Corrected Rotating Object from the PDP Firmware Operating at 2Khz ...... 119

10.21 Still Image Captures of a Bouncing Number, Small Background Numbers, and a Large Number from the PDP Firmware Operating at 800Hz, 100Hz, and 2Hz, Respectively ...... 120

xii ABSTRACT

Current fixed frame rate display technology, such as, DVI, HDMI, and Dis- playPort is commonly utilized for high-speed IR display systems. This technology, designed for relatively low-speed operation, incorporates a number of design decisions that limit the ability for it to meet the increasing requirements of larger resolutions and faster frame rates needed within IR display systems. Firstly, it requires custom designed synchronization solutions and hardware when utilized within environments where multiple components need to be synchronized. This is because it is not designed to handle system level synchronization. Secondly, the fixed frame rate nature of the technology imposes a static requirement on frame rate across all displayed frames. This unnecessarily increases bandwidth demands by requiring the same amount of data be sent for all frames regardless of what data changes. As a result, maximum frame rate unnecessarily becomes a function of limited hardware bandwidth and image resolution. This dissertation introduces a generalizable, dynamic, and scalable packetized display protocol (PDP) architecture. It incorporates dynamic frame rates, and high- speed capabilities to bridge the performance gaps within existing display solutions for current IR display systems. This PDP architecture eschews with many assumptions found in conventional display protocol technology. In doing so, it provides scalability, reduces bandwidth requirements, increases performance, eases synchronization burden as well as provides a desirable set of features for current and future IRLED Scene Projection systems. These features include dynamic sub-window (intra-frame) refresh rates, dynamic bandwidth utilization, and dynamic inter-frame refresh rates. Further- more, this dissertation contributes a protocol specification and implementation on real hardware, coupled with a demonstration of the benefits of this type of technology for use within high-speed IR display systems.

xiii Chapter 1

INTRODUCTION

Infrared Scene Projection Systems (IRSPs) using Infrared LED emitters are emerging as a novel technology for the testing and development of Infrared (IR) based sensor technology and real-time IR simulations. They provide a compelling alter- native to the older entrenched technology of resistor-array based IR scene projector systems [1,2] due to various improvements over the competing technology. These improvements include but are not limited to better maximum apparent temperature1 (above 1400 Kelvin), better dynamic range, higher pixel density (24 micrometers and lower), substantially faster emission in the target spectrums with optical rise-times in the nanoseconds. Additionally, they are relatively difficult to damage thermally and have the potential to provide emission in multiple spectrums with multi-color pixel designs. Current fixed frame rate display technology, such as, DVI [3], HDMI [4], and DisplayPort [5] is commonly utilized for high-speed IRSPs. It provides a standardized method to transmit digital scene data which is then translated into analog signaling for display on IR arrays. However, within IRLED IRSP systems which are inherently fast and primarily limited by the driving electronics, it has become limiting for high-speed display [6]. This technology, designed for relatively low-speed operation, incorporates a number of design decisions that limit the ability for it to meet the increasing require- ments of larger resolutions and faster frame rates needed within IR display systems. Firstly, it requires custom designed synchronization solutions and hardware when uti- lized within environments where multiple components need to be synchronized. This

1 The temperature a black body giving the same radiance would be.

1 is because it is not designed to handle system level synchronization. For example, a display wall may utilize Sync Cards [7] to provide synchronization across mul- tiple monitors; however, this only guarantees a coarse-grain synchronization between displays. Secondly, the fixed frame rate nature of the technology imposes a static requirement on frame rate across all displayed frames. This unnecessarily increases bandwidth demands by requiring the same amount of data be sent for all frames re- gardless of what data changes. As a result, maximum frame rate unnecessarily becomes a function of limited hardware bandwidth and image resolution. This relationship be- tween frame size and bandwidth is discussed in more detail in Chapter3. This dissertation proposes an alternative to conventional display technology, a packetized display protocol (PDP) architecture capable of providing a synergy with IRSP technology to bridge the performance gaps within existing display solutions for current IR display systems. Sensor technology can operate in ranges of above one kilo- hertz which represents an order of magnitude difference to the current target speeds of fixed-rate display technology. This PDP architecture eschews with many assumptions found in conventional display protocol technology. In doing so, it provides scalability, reduces bandwidth requirements, increases performance, eases synchronization burden as well as provides a desirable set of features for current and future IRLED Scene Projection systems. These features include dynamic sub-window (intra-frame) refresh rates, dynamic bandwidth utilization, and dynamic inter-frame refresh rates. Fur- thermore, this dissertation contributes a protocol specification and implementation on real hardware, coupled with a demonstration of this type of technology within real IRSP systems which shows that with the proper set of control and features, high-speed operation can be achieved even with limited physical bandwidth. The protocol architecture draws inspiration from the video processing field, where encoding schemes for video streaming represent a body of research that at- tempts to tackle a similar but more limited challenge [8]. Some of these encoding schemes attempt to provide a variable frame rate for segments of the incoming stream through differencing algorithms, but also rely on compression [9] which reduces quality

2 and introduces artifacts. In contrast, the case of IRSPs requires lossless quality; and thus, lossy protocols cannot be utilized for this purpose. Instead, the proposed pro- tocol architecture seeks to craft a lossless solution for the IRLED projector field that incorporates similar variable frame rate features to reduce bandwidth consumption as well as allow bandwidth to be used more intelligently. More specifically, it is envisioned that available bandwidth be apportioned to regions of a scene that necessarily need to be updated frequently. In IR scenes, this generally includes regions that transition from dark to light or light to dark quickly, as well as higher temperature regions. Regions which do not change temperature quickly, generally do not need to be updated as often due to the LED driving circuits holding capacitance for milliseconds at a time2. The contributions of this dissertation are as follows: firstly, it provides the architecture of a physical layer agnostic packetized display protocol with the follow- ing features: (1) intelligent dynamic per-frame bandwidth utilization, (2) fine-grained control over frame transmission and synchronization, (3) dynamically changing intra- frame rates, and (4) a realized implementation of the protocol for use on array emitter technology. Within it discusses relevant details of the initial design, methodology, and implementation of the said protocol. Secondly, it provides a sufficiently abstract ma- chine model to indicate a path to utilize the protocol within current and future systems. Thirdly, it demonstrates the use of the protocol within real IRSP systems as well as provides the current results and a comparison with fixed-rate technology. Fourthly, it discusses various use cases for the technology to provide the reader with a more complete understanding of where this technology could be utilized in future systems. The rest of this dissertation is divided into the following sections: (1) back- ground; which discusses the various aspects of current IRSP systems that are relevant to understanding the IRLED scene projector history and projection process, (2) prob- lem formulation; which examines the problem of high-speed projection in detail, (3) system overview; which discusses the supporting electronics and communication flow

2 The general time of discharge depends on the design of the LEDs and amount of charge currently held within. However, test setups have measured >1 millisecond.

3 within IRLED projector systems, (4) array write process; which discusses how IR arrays and the associated data ordering, (5) display protocols; which discusses conventional display protocols and their use within IRSP technology, (6) packetized display pro- tocol; which discusses the design methodology for the PDP, packet details, overhead, and performance, as well as provides a comparison to conventional display protocols; (7) machine model; which discusses the use of the PDP in general systems; (8) imple- mentation; which discusses an implementation of the PDP on an FPGA system, (9) experimental results; which provides details on the implementation and testing process as well as a discussion on performance; and (10) conclusion; which discusses the future of the PDP and potential avenues of further research.

4 Chapter 2

BACKGROUND

This chapter discusses relevant background information toward the goal of im- plementing a packetized display protocol (PDP) architecture for Infrared Scene Pro- jector systems (IRSPs) by providing a general overview of the technology.

2.1 IRLED Scene Projector History IRLED based IRSPs are made up light emitting diodes (LEDs) that emit light in the IR spectrum [10]. Since their inception they have been utilized in various fields such as medical sensing [11, 12, 13, 14, 15]; tracking; and localization [16, 17, 18, 19, 20], and communication [21, 22, 23, 24, 25]. Modern IRLED projectors are an emerging technology [26, 27, 28, 29, 30] with various applications within the IR sensor testing community. A complete IRLED based IRSP system consists of various technologies and processes as shown in Figure 2.1[31]. One denotes the scene generation and two the non-uniformity correction (NUC) pro- cess. These are where pixel data representing IR scenes is fed to a system for display. The NUC process corrects for physical and thermal non-uniformity [32] in an IRLED array [33]. Three denotes the close support electronics (CSE) [34] which are responsible for converting a digital representation of a scene into analog signaling that goes directly to an array. Four indicates the dewar [35, 36] or vacuum chamber which houses an IRLED array and is utilized to keep it below ambient temperatures or at cryogenic temperature ranges. Five indicates an IRLED hybrid which consists of a Read-in In- tegrated Circuit (RIIC) [37] used to address an IRLED array. Analog signals coming from the CSE are passed into the dewar, and then are mapped using the RIIC, which

5 Scene 1 Gen.

2 3 4 SLEDS Hybrid 5 IRLED NUC CSE Dewar RIIC Array

1 2 * Not Shown Above

5 6 * 4

3

Figure 2.1: IRLED Scene Projector System

results in specific IRLEDs within an array being driven. Six indicates an IR record- ing apparatus of some sort utilized to record IR data from an array. Synchronization between source generation, display, and recording is maintained through explicit syn- chronization signaling. Often Camera Link serial communication [38, 39] is used to capture imagery. The setup shown in Figure 2.1 uses FLIR Systems High-speed IR Cameras [40, 41, 42]. The red dotted lines show what is provided within an IRSP system whereas scene generation is usually application specific and thus excluded. In the image, Scene Generation and NUC are performed within the same machine. Figure 2.2 shows the development timeline for IRLED IRSP technology. In 2008, the world’s first IRLED array called the Superlattice Light Emitting Diodes (SLEDs) array was completed [43]. This device was a hybridized combination of a 68 by 68 IRLED wafer bonded to a RIIC wafer [44] providing the electronics to drive the array. Initial testing was done by hand prior to the design and implementation of the overall drive system which was completed in 2011. Following this, an increased IRLED wafer of 512x512 size was fabricated in 2014 [45]. The combination of the array with

6 512 x 512 SLEDs array + RIIC + System to drive Projector with 512 x 512 array 2-color SLEDs array Projector with 2K x 2K SLEDs array System to drive SLEDs 68 x 68 SLEDs array array + RIIC

Projector with 1K x 1K SLEDs array

2008 2011 2014 2016 2018 Figure 2.2: IRLED Scene Projector Technology Development Timeline Overview

the drive system resulted in SLEDS (Superlattice Light Emitting Diode System), the first functioning IRLED IRSP system in the world. Further efforts culminated in two IRLED IRSPs in 2016. The first, TCSA (Two Color SLEDS Array), a 512 by 512 sized array [46, 47, 48, 49, 50] included support for driving the LEDs at two separate wave- length bands, denoted as 2-colors in Figure 2.2. The second, NSLEDS (N3 Superlattice Infrared Light Emitting Diode System), a 1024 by 1024 sized array [51, 52] doubled the total number of pixels supported. Additionally, these systems demonstrated the beginnings of a modular IR Scene Projection (IRSP) platform [53]. A further increase in size and efficiency occurred in 2018 with the world’s first 2048 by 2048-pixel array, HDILED (High Definition Infrared LED) [54]. A visual representation of the pixel ratios is shown in Figure 2.3 TCSA represented a novel step forward in terms of IRLED array technology. It incorporated a multiple color pixel design within a 1.3in2 package consisting of two overlayed LEDs per pixel to enable emission in multiple wave-length spectrums as well

3 The N originally stood for Nightglow, before the device was retargeted to Midwave- IR.

7 TCSA 512 512 1024 NSLEDS

1024 2048

HDILED

2048

Figure 2.3: SLEDs Array Pixel Ratios

as an increase in the number of analog channels from 4 to 16 to allow for more pixels to be driven at a time. It targeted a speed of 1 kilohertz operation. NSLEDS used the same size wafer as a TCSA, but decreased the pixel pitch from 48 to 24 microns and incorporated a single-color pixel design instead of the multiple color design of the original, these changes allowed for the pixel resolution to be doubled paving the way toward larger format IRLED IRSPs. Additionally, NSLEDS targeted 500 hertz oper- ation. HDILED increased the package size to 2.3in2 and doubled the resolution while utilizing a similar RIIC architecture to the prior two arrays. It targeted 250 hertz oper- ation. The interleaved write process of each array is discussed in detail in Chapter 5.1.

2.2 IRLED Projection Process Figure 2.4 shows a typical projection process utilized within an IRLED IRSP. Each step operates at a static frame rate. A scene projector performs scene generation by utilizing a GPU typically. Following this, imagery undergoes non-uniformity cor- rection by utilizing a NUC table created by analyzing the non-uniformity on a given array. After which, image data is reordered for sending to an array. When data is sent,

8 100hz

100hz 100hz 100hz 100hz Non- Digital to Scene- Uniformity Analog Projec�on Genera�on Correc�on/ Conversion Reordering

Figure 2.4: Typical IRLED Projection Process

a digital to analog conversion occurs for each value displayed on a projector. The NUC process compensates for any physical defects that may cause non- linear variation in light emission from different diodes on a given array, thus, allowing for uniform emission across a given spectrum. There are number of different ways this linearization may be performed [55, 56, 57, 58, 59], the details of which are beyond the scope of this work. After non-uniformity correction, imagery is converted pixel by pixel from digital to analog to drive the physical pixels at a given intensity on the array. The design of the digital to analog chain of an IRLED IRSP is another impor- tant challenge as the analog bandwidth4 plays an important role in determining the maximum speed a system can operate at as well as the bit resolution5 of an array [60]. Additionally, timing variance and potential differences in performance between analog channels need to be analyzed and minimized through design and tuning. While outside of the scope of this work, it is worth noting that, a poorly behaving analog chain can introduce undesirable non-linear distortion in analog signaling resulting in non-linear projection [61, 62, 63]. Similarly, the internal analog timings of an array’s RIIC plays a critical role in

4 The rise and fall time of digital-to-analog (DAC) conversion and amplification. 5 A measurement of to what degree changes to DAC inputs can produce consistent measurable light output differences.

9 this as well. One that may be considered even more crucial given that once an array is fabricated and bonded, it cannot be later modified. In contrast, a faster and more precise analog chain could be implemented at a later point in time for an existing array. Decisions made on RIIC architecture can have lasting long-term impact. The following chapter moves to a discussion of the central problems with display protocol technology and the proposed solution.

10 Chapter 3

PROBLEM FORMULATION

This chapter discusses in detail the limitations with current display protocol technology that were alluded to in Chapter1. From there, it focuses on how these limitations affect adversely IRLED projector technology. Finally, it provides a problem solution.

3.1 Display Protocol Limitations Current display technologies, such as HDMI, assume a fixed frame rate display which places a hard limit on frame timing and synchronization. In detail, display protocols operate in a best-effort fashion where a buffer swap- initiated transfer of frame data occurs at a static predetermined interval. If a new frame is unavailable to be transmitted at each interval due to any delay, such as pro- cessing delay, the previous frame is retransmitted. This necessarily makes correct synchronization challenging because modern computation systems do not generally provide real-time guarantees due to variability in system operation, including frame generation, CPU scheduling, and Input/Output (I/O) delays. Generally, while these challenges can often be addressed to some degree, they require custom hardware solu- tions on top of existing display protocols because end-to-end system synchronization is out of the scope of typical display standards which are designed to push relatively low frame rates over single hardware links. These solutions also tend to lack abstraction layers which can greatly hinder upgradeability. Additionally, because of the static nature of the transmission interval (e.g. 100 hertz), the frame rate cannot be dynamically controlled or changed after initializa- tion. Instead, these protocols have static bandwidth requirements for a given resolution

11 bandwidth = resolution × bits × fps bandwidth : bandwidth requirements in bits per second. resolution : number of pixels including porches. bits : bits per pixel. fps : frames per second.

Table 3.1: Bandwidth requirements of a conventional display protocol

and frame rate of the form found in Table 3.1, which shows that resolution size is in- versely proportional to the max speed the display can operate at. Additionally, these protocols only support sending the entire frame at a given interval even if only a small portion of the frame has changed. This is a non-optimal use of bandwidth that does not allow for fine-grained control over the frame rate in cases where a user might wish to dynamically change the frame rate to match the processing rate. In high-speed display scenarios, this inevitably causes dropped frames. This issue is further compounded by the fact that conventional display protocols utilize proprietary drivers and hard- ware such that frame-drops become effectively silent. Similar to the synchronization solutions discussed above, this tends to lead to upgradeability issues due to systems being tailored for specific display hardware. This contributes to scalability issues as well since changing the display hardware often requires a complete system redesign.

3.2 High-speed IRLED Scene Projector Systems Conventional display protocols, which are designed for driving relatively low speed consumer electronics as indicated in Chapter6, are often utilized in High-speed IRLED IRSP systems. This results in a combination of unnecessary hardware and software limitations being incorporated into systems. This section discusses these lim- itations and the general methodology toward alleviating them. Hardware limitations are discussed first then software limitations.

12 3.2.1 Hardware Limitations There are several hardware limitations including physical bandwidth, latency, IC clock rates, and analog chain settling times. Often the hardware that data is transferred over within these systems has inherent bandwidth and latency limitations that cannot be overcome either due to the nature of physics or because faster hardware is not available. One solution toward working around bandwidth issues is to introduce more parallel links into a system. A similar issue arises with respect to the clock rates of components within a system, such as, projector driver firmware, and scene generator. Ideally, these components operate as fast as possible, but the reality is that the large fan out of I/O signaling needed by arrays can cause critical timing closure issues within FPGA based firmware implementations. Careful routing and planning can help alleviate these issue as well as parallelizing projector drivers such that less signals are driven from a single FPGA component. The speed of the analog chain in these systems is dictated by the digital to analog converter (DAC) and amplifier settling times which can be alleviated by increasing the number of converters utilized within a system if supported by an array. This could increase write speeds by allowing more DACs and amplifiers to drive smaller portions of an array in parallel. All the solutions discussed in this section involve increasing the number of parallelized components operating within a system in order to reduce physical bottlenecks.

3.2.2 Software Limitations In terms of software, computational and algorithm limitations exist in the gen- eration of scenery for display, and in correcting for pixel and array imperfections as well as in putting data into the correct format to be understood by the driving firmware of a projector. Rendering of individual frames of a scene can be particularly computation- ally expensive. This can be further complicated by poorly optimized software drivers or algorithms. Additionally, in these systems, tight control over timing is important because frame drops are not tolerable. Meaning that fixed rate display technology is a hinderance in terms of user level control. One solution is to utilize technology that

13 allows for dynamic frame rates and frame segmentation, which, requires protocols with dynamic control built in.

3.3 Problem Statement IRSP systems need to be capable of operating at high speeds with large reso- lutions to meet the needs of the sensor testing community. Typically, this can be on the order of 2048 by 2048 pixels with up to 1KHz frame rates with subframe latency. Necessarily, these goals represent a challenging problem to overcome when utilizing typical display technology which is further hindered due to the limitations discussed above. Given these constraints, the question what an appropriate solution is to ad- dress the aforementioned issues arises. The central question we could ask ourselves is, how to architect a generalizable, dynamic, and scalable IRSP system capable of High-performance operation, sub-frame latency, dynamic frame rates, and dynamic bandwidth utilization?

3.4 Problem Solution Within the current display protocol technology, any effort to address the afore- mentioned issues requires system developers to have complete control over an entire end-to-end system for a given solution to operate correctly. This means that in practice customized hardware and software that deviates from standard display protocol behav- ior is required. Any solution that lacks these requirements would fail to completely address operational issues with performance. In short, to truly address issues within the conventional display protocol would require creating a new system specific protocol tied to specific custom hardware. Ad- ditionally, with this type of solution, it is still not possible to achieve true sub-frame latencies due to the nature of display protocols requiring transmission of entire frames of data at a given interval or frame rate. In practice, IR projector systems are often forced to introduce intermediate buffers within a system in order to ensure correct operation.

14 I believe that any effort to address the aforementioned issues and answer our problem statement requires developing a hardware agnostic alternative display protocol designed specifically for high-speed low-latency frame transmission. This would allow for the protocol be retrofitted within existing systems as well as allow for upgrade- ability, and support for future hardware. In order, to achieve sub-frame latencies, this protocol would need to support transmission of sub-frames of data, i.e. transmission of partial frames. Allowing for partial frame transmission enables a host of other de- sirable features, such as, the ability to control bandwidth utilization and the ability to incorporate dynamic frame rates. A good method of supporting partial frame transfer is to utilize headered packets of data to provide a context for what the data represents and where to draw said data on a display. This method of transferring display data is called a packetized display protocol (PDP). As discussed briefly in the Chapter1, this PDP architecture eschews with as- sumptions found in conventional display technologies to provide a robust feature-set. In contrast to a normal display protocols, the proposed architecture, is designed to utilize a dynamic source driven refresh rate through the coordination of both source (scene generator) and sink (display). In this architecture, frames are segmented into pieces and sent to the display based upon how often these segments need to update. An example of this is shown in Figure 3.1 where different regions of the display operate at different frame rates. By utilizing dynamic frame rate control at sub-frame resolu- tions, substantial bandwidth reductions can occur. This is discussed in further detail in Chapter7. The underlying PDP itself is designed to allow for fine-grained control over when and what data is transmitted as well as to incorporate mechanisms to synchronize displays. Furthermore, the protocol architecture is abstracted in such a way that the physical interconnect layers are transparent in order to enable it to be capable of operating over a wide-spectrum of hardware as well as to allow the protocol to be extended and used within future hardware. For system upgradeability, this offers a risk reduction because it allows for a simpler migration path as new hardware becomes

15 (0,0) (170,0) (171,0) (340,0) (341,0) (426,0) (427,0) (512,0) Legend:

(341,85) (426,85) (427,85) (512,85) (341,86) (426,86) (427,86) (512,86) 500Hz

250Hz (0,170) (170,170) (171,170) (340,170) (341,170) (426,170) (427,170) (512,170) (0,171) (170,171) (171,171) (340,171) (341,171) (512,171) 150Hz

100Hz

(0,340) (170,340) (171,340) (340,340) (341,340) (512,340) (0,341) (340,341) (341,341) (426,341) (427,341) (512,341)

(341,426) (426,426) (427,426) (512,426) (341,427) (426,427) (427,427) (512,427)

(0,512) (340,512) (341,512) (426,512) (427,512) (512,512)

Figure 3.1: Dynamic frame rate display with multiple regions updating at different frame rates

16 available. For example, a hardware system implementing this protocol could switch or upgrade physical components and still utilize the same protocol within the software stack given an appropriately compatible physical layer. To further facilitate this, a packetized protocol structure capable of transmitting pixel data in a generalized way has been chosen. These details are discussed in Chapters7 and8. The remainder of this work is devoted to discussing the details of the environ- ment in which the PDP is designed to operate, the protocol itself, and its implemen- tation. The following chapter provides an overview of system operation.

17 Chapter 4

SYSTEM OVERVIEW

This chapter provides the reader with an overview of IRLED electronics. Firstly, it discusses the Close Support Electronics (CSE) [34, 64, 65] hardware used to drive IRSPs to give the reader an understanding of the internal flow of data from entry into a CSE to output on an external array. Following this, it discusses the communication within a CSE at a block level to provide the reader with an understanding of how communication flow works within a CSE. Finally, external communication of an over- all IRSP system is discussed at a block level to provide the reader with a high-level understanding of the flow of communication from input (scene generation) to output (array) in different system configurations.

4.1 Close Support Electronics As mentioned briefly in Chapter2, a CSE is needed to drive IRLED arrays. Conceptually, A CSE is an interface that converts digital display data to an array specific format in order to produce IR imagery. From there it converts formatted data from the digital domain to analog and amplifies analog signaling to drive an array by charging the internal array cells that make up an array. It further provides power for an array and regulates the current to safeguard arrays from physical damage due to misconfiguration, heating, or hardware faults. The TCSA, NSLEDS, and HDILED arrays discussed in Chapter2 can all be driven using the same electronics with the only difference in the boards directly at- tached to the hybrid array. Figure 4.1 shows the internal components of the CSE architecture, called Nessie, broken out in green. In typical configurations, a display system drives a CSE using dual HDMI inputs to increase the system bandwidth and

18 achieve higher frame rates. Typically, these each carry half of a frame segmented either vertically or horizontally. Each input decodes the video signals in parallel and then outputs individual pixels that are then routed into the main FPGA board which houses a Xilinx Virtex 6 FPGA [66]. From there, pixels are routed into an internal buffered directly or decoded (in the case of the PDP) and then buffered. Other interfaces may be utilized in place of HDMI as discussed later in this section. Due to the two input cards, the firmware within the main FPG board is respon- sible for multiplexing between both inputs to draw data to an array without allowing for the internal firmware buffers to overflow. In practice, current firmware does this by buffering the minimal amount of data needed for a write from both inputs in parallel and context switches between the inputs every other write or every two writes depend- ing on the configuration. As long as the buffers are sized large enough to accommodate for the time needed for the array write process to finish for an individual write, then overflow should not occur. Additional logic is provided to query and detect internal buffer overflows in cases of misconfiguration or signal integrity faults. In practice, if an overflow fault occurs, it is visually obvious in recorded IR data as well. Once enough data is buffered for one of the inputs, the firmware controls the write process to drive the 8 DAC cards which each house 2 16-bit DAC integrated circuits per card, with each circuit consisting of 2 channels per DAC. Yielding 32 parallel channels with 512 total signals. Once the DAC process is done, the analog output of the 32 channels is then routed to 8 amplifier card which contain 4 amplifiers each. Following this, the amplified signals are routed through 2 interface boards which contain ribbon cables that attach to an array hybrid. The ribbon cabling carries both the amplified signals as well as other control signals from the firmware that are routed directly from the main FPGA board to one of the interface boards. Figure 4.2 shows an example of a circuit board which ribbon cables would be attached to via receptacles (not shown). Four ribbon cables are attached to bring signaling to an array. Figure 4.3 shows the components installed within a CSE chassis. Multicolored

19 Scene Generator Dewar – RIIC System

CSE Control Box / NUC System

HDMI INPUT 1 HDMI INPUT 2

Hybrid Array

FPGA Main Board with 10 FMC Slots

8x DAC Cards

8x Amp Cards

2x Interface Board

Close Support Electronics Figure 4.1: SLEDS System Block Diagram

20 Figure 4.2: Example Hybrid Round Boards

ribbon cables are shown in the top right. Mounted to the bottom is the main FPGA board. The two large boards at the top are the 2 interface boards with 8 amplifier cards plugged into them along the bottom. The boards in between the FPGA and amplifier cards are the DAC cards. The power supply is on the left. The additional control signals provided by the firmware and routed to the RIIC through these cables are discussed in Chapter 5.1. The specifics of the PDP firmware architectures write process are discussed in Chapter9. Figure 4.4 provides an external view of a newer CSE chassis that is colored green to follow the Nessie architecture name as well as a view of the inside of the chassis when the majority of components are not installed.

4.2 Communication Flow In this section, a discussion of internal CSE communication is provided followed by a discussion of external communication within an IRSP system as a whole.

21 Figure 4.3: CSE Internals

Figure 4.4: CSE Externals and Empty Chassis

22 4.2.1 Internal CSE Communication Figure 4.5 shows the internals of CSE communication. Communication for con- trolling the behavior of CSE is done through a daisy chained set of UART devices utilizing a reliable blocking two-way communication protocol called the CVORG pro- tocol6. Without loss of generality, the protocol itself consists of commands to control various aspects of operation, such as, tripping an array, setting voltage limits, and configuring firmware operation. It also allows for information to be retrieved about current system configuration as well as operational errors. The destination of an op- eration is encoded as part of each command. Thus, commands not meant for a given component are forwarded along the chain. When commands are issued by a system, they are encoded for transport using the CVORG protocol and sent over UART. The system then waits for an acknowledgment potentially containing payload data. When a command has finished executing within a CSE, the firmware sends the acknowledg- ment over UART to the command initiator with the requested payload data or as a receipt indicating that an action has been successfully executed. The underlying im- plementation details of the CVORG protocol itself are beyond the scope of this work and are not be discussed here. Memory mapped I/O between the frontend and backend firmware is controlled by a MicroBlaze soft processor and used to control the underlying PDP firmware reg- isters as well as program an array using Serial Peripheral Interface (SPI) (Not shown). The details of command operations are discussed in Chapter 9.3. Additionally, SPI communication is also used to send data for LCD readout. Typically, this includes voltage and current information as well as the results of power on sanity checks. Fi- nally, the details of the processing performed on HDMI display data sent directly to the Backend Firmware within the PDP are discussed in Chapter 9.4. Earlier non- PDP firmware implementations used on the TCSA, NSLEDS, and HDILED arrays

6 Named after my research group at the University of Delaware.

23 UART TX-RX / CVORG Protocol (C code) Memory Mapped I/O / SLV Registers (C code / VHDL) SPI (C code) HDMI

SuperFPGA Interface Board Interface Board

Frontend Firmware / Master Slave NUCPC Microblaze Microcontroller Microcontroller Microcontroller

Backend Firmware / LCD FPGA

Figure 4.5: CSE Internal Communication Block Diagram

also utilized HDMI but contained a different less robust implementation. These im- plementations are outside of the scope of this dissertation and are not be discussed in detail here.

4.2.2 External System Communication Figures 4.6, 4.7, and 4.8 show the details of external communication to a CSE in various different system configurations. These provide a representation of some general types of setups that an IRLED array may be placed in. In general, a scene generator of some form would be utilized in all cases to provide IR scene imagery for an array to display. However, data and communication paths may differ between setups. In all the configurations, a Low Pin Count (LPC) FPGA Mezzanine Card (FMC) connector provides the ability for various interfaces to be utilized to send data to the CSE, such as serial protocols or display based protocols over different types of hardware links. The FMC interface cards are responsible for retrieving the data over the link and formatting it in a manner that the internal CSE FPGA can decode.

24 In practice, in display protocol setups, 24-bit pixel words and a data enable pin are utilized, but there is no hardware limitation and other word sizes could be used in setups that utilize other types of interfaces. A vertical sync signal can also be utilized to reset pixels every display frame in display protocol setups. As mentioned earlier in this section, current CSE setups utilize two HDMI FMC cards for input where the input for the top half of an array is delivered over one cable and the bottom over the other. For example, in the NSLEDS array, input would be split into two 512 by 1024 display streams operating in parallel. API communication on the other hand utilizes UART and the CVORG protocol and is largely the same process for all arrays. Gen- erally speaking, on start, frontend software would configure the firmware to output for the correct array size and program the array. After which, auxiliary functionality such as tripping and untripping the array would typically be the only strictly necessary communication done over UART. High-speed serial interfaces are currently in development to provide more control over the timing of data sent to CSEs within systems. These would allow for the blanking data inherit in display based protocols to be removed altogether7 as well as provide users with a means to send data only when required as opposed to at a strictly static interval controlled by vendor drivers such as is the case with GPUs. Figure 4.6 depicts direct communication in which formatted scene data is sent directly to a CSE and system configuration is done directly by a scene generator. In this type of setup, the scene generator can monitor CSE operation directly as well as operate in either a closed or open loop type setup [67, 68]. A direct communication setup is desirable for minimizing end-to-end latency within a system for use cases where performance is paramount. For example, closed loop scenarios may feed recorded output imagery from an array back into the scene generator for in the loop analysis or in some cases subsequent frames may depend on the

7 Blanking is discussed in Chapter6.

25 Scene Generator CSE

API CSE Communica�on Opera�on APIs Signaling Array

Scene Data LPC FMC

Scene Genera�on Scene Data LPC FMC

DVI|HDMI|DP|ARNIC 818 Fiber (in development)

Figure 4.6: CSE External Direct Communication Block Diagram

recorded results from prior frames. This means that in many cases subframe latency is desirable in that individual components in a system should not require buffering entire frames anywhere in the system as this would introduce added latency of a frame or more from generation to capture. This would necessarily result in the system needing delayed feedback control [69]. This represents a complex problem to solve in practice. It becomes even more difficult if the frame delays are unpredictable and dynamic from frame to frame forcing the scene generators to compensate in some manner such as by sending off frames8 between imagery to characterize delay and detect unexpected behavior as well as provide a means of resynchronization between a scene generator and camera or sensor. Figure 4.7 depicts indirect API communication in which system configuration is done through client APIs, and scene data is sent directly to a CSE. This type of setup is useful for situations where control over a CSE is needed but where API operation cannot be tightly coupled with a scene generator due to development costs or practical reasons. Similar to the direct setup, end-to-end latency is minimized by directly driving an array.

8 An off frame is an empty frame that can be analyzed to check if frames are arriving at the expected time or if a frame slip or unexpected delay has occurred.

26 Scene Generator CSE CSE Client Serial Comm Opera�ons Comm CSE Box Opera�on APIs Signaling Array

Scene Data LPC FMC

Scene Genera�on Scene Data LPC FMC

DVI|HDMI|DP|ARNIC 818 Fiber (in development)

Figure 4.7: CSE External Indirect API Communication Block Diagram

In an indirect API communication setup, thin client API shims are provided to execute commands using remote procedure calls (RPC) which then are executed within a CSE operations box to communicate with the CSE [70]. These API shims provide the same interface and level of control as the direct API communication but are issued through some indirect layer such as Ethernet or InfiniBand. When a shim command is issued it is encoded and transmitted to the CSE Operation Box. The CSE operation box then maps the shim command into a direct command call (as if it were being executed directly by a scene generator) and sends it to a CSE. Then the CSE operation box waits for a response from the CSE. Once a response has arrived from the CSE, it encodes it and transmits it back to the scene generator, which is analogous to how the CVORG protocol works during direct operation but with a middleman in between. Figure 4.8 depicts an indirect setup in which both API communication and scene data are sent to an intermediate CSE operations box. Similarly, to the indirect API communication setup, client API shims are provided to execute commands using remote procedure calls (RPC) which then are executed within a CSE operations box to communicate with the CSE. An indirect data and API communication setup is utilized in the event that data

27 Scene Generator CSE

Client Serial Comm Comm CSE Opera�on APIs CSE Signaling Array Opera�ons Scene Box Scene LPC FMC Data Data Scene Genera�on Scene Scene LPC FMC Data Data

DVI|HDMI|DP|ARNIC 818 Pre-forma�ed Fiber (in development)

Figure 4.8: CSE External Indirect API and Data Communication Block Diagram

cannot be formatted directly for display on an array within a scene generator. It may also be utilized in cases where non-uniformity correction is performed externally from scene generation as shown in Figure 4.1. However, this would result in an additional latency cost that could complicate synchronization in closed loop setups due to delayed feedback control being required. A third scenario where this type of setup may be used is without a scene generator, where a CSE operation box could be used as a test bed to characterize an array directly as well as to test and troubleshoot operations. Imagery itself could also be displayed directly from the CSE operation box. This may be desirable in some open loop setups where the recording and processing of data is performed separately as this means no additional infrastructure would be required to be developed by users to interface with a CSE. Instead, users could utilize the provided infrastructure with little to no development costs. In closing, IRSPs can be utilized in many different types of setups depending on user application and requirements. A well design IRSP system eases the process of incorporating a new projector within an environment through versatility while minimiz- ing performance impact as well as providing users with a clear picture of the tradeoffs associated with each type of setup. While this chapter covered the flow of communication inside and outside a CSE

28 as well as different system setups to provide the reader with an understanding of the challenges and complexities of utilizing and driving IRLED arrays, Chapter5 shifts focus to discuss the hardware details of IRLED arrays’ write process and the formatting of data sent to arrays to provide context on some of the challenges of writing to an IRLED array.

29 Chapter 5

ARRAY WRITE PROCESS

This chapter provides the reader with an overview of the data formatting re- quired to drive an IRLED array, and hardware specific details of the underlying array write process to facilitate an understanding of the context in which the PDP is imple- mented as well as highlight some of the challenges with high-speed operation. Firstly, it discusses the interleaved write process required to directly drive LEDs on an IRLED IRSP. Then, it discusses the data re-ordering done to optimize the write process. Pro- tocol level details of how external data is received are discussed in Chapter6.

5.1 Array Interleaved Write Process Although all current IRSP arrays utilize display protocol technology that is decoded pixel by pixel to drive the emitters of an array, arrays may utilize different internal drawing mechanisms for driving pixels. This section discusses the details of those mechanisms within the TCSA, NSLEDS, and HDILED arrays for conceptual purposes, and while the details may differ for other arrays, the overall raster write process is generalizable in the sense that not all pixels are driven at once. Instead some subset of pixels is driven depending on how many analog channels can be utilized at the same time. The number of channels is largely dependent on the array design; however, in some cases the circuitry an array is mounted to allows for some configuration9. For example, an HDILED array has 32 channels in typical configurations, meaning that it can drive 32 pixels at once. However, each quadrant can be operated independently with certain board setups, yielding up to a 128-channel maximum. Similarly, it can

9 Figure 4.2 shows an example of supporting circuitry in Chapter 4.1

30 also be configured to utilize only 16 channels, 8 channels, 4 channels, and 2 channels as well. The details of these configurations are discussed in more detail later in this section. Arrays which operate in a snapshot mode also exhibit similar operational be- havior. Snapshot mode is a type of array operation in high-speed projector systems in which light output does not occur until every pixel for a frame is written. It works by only transferring charge to embedded emitters once all pixel data is written [71]. This has the benefit of reducing heat variation across an array. However, the actual write process is very similar to non-snapshot operation which is often referred to as a rolling-update mode of operation within literature. Similar to rolling-update operation, each pixel charges either individually or in segments. The only difference is that the emitters themselves are enabled later. The write order and number of pixel segments driven at the same time is array dependent though. The TCSA, NSLEDS, and HDILED arrays are organized into four quadrants as shown in Figure 5.1. Each quadrant is organized into a given number of pixels with TCSA housing 256 by 256, NSLEDS housing 512 by 512, and HDILED housing 1024 by 1024 per quadrant. There are number of input signals necessary to drive an array. These are a 4-bit quadrant write enable, X address, Y address, LOAD bit, 16 Strong/Weak drive strength bit, array reset bit, and analog signaling lines which charges the pixels being addressed. The quadrant write enable, X address, Y address, and LOAD signals are all utilized for addressing the array. The quadrant write enable selects which quadrant to drive. It is worth noting that each quadrant has separate internal signaling which allows for each quadrant to operate independently in parallel when mounted in a package that provides independent external signals as noted earlier. However, to date most of the fabricated IRLED arrays are mounted in packages that allows for only one quadrant to be drawn at a time. At the time of writing, only HDILED has been tested in the type of setup [72, 73, 74, 75] due to it having the largest resolution per quadrant. During independent quadrant operation, multiple independent CSEs must be utilized. In a two CSE setup, half of the quadrant

31 TCSA NSLEDS HDILED

Q1 Q2 Q1 Q2 Q1 Q2 256 512 1024 512 2048 1024 256 512 Q3 Q4 Q3 Q4 Q31024 Q4

512 1024 2048 Figure 5.1: TCSA, NSLEDS, and HDILED Array Quadrant Layoutsa

a Not drawn to scale bits would be controlled by one CSE and half by another. This would yield a total of 64 channels being written in parallel for double the analog performance. In a four CSE setup, a single quadrant would be controlled by a single CSE. This would yield a total of 128 channels being written in parallel for quadruple the analog performance. Irrespective of the number of CSEs used to a drive an array, the internal RIIC signal lines would be driven in precisely the same manner within each quadrant with the only change being that they operate asynchronously with respect to the others. The X address, Y address, and LOAD are used to select which pixel or group of pixels to write within a quadrant depending on the mode of operation. Though these lines are effectively shared by quadrants in a single CSE Setup, within the RIIC architecture they can be driven independently to enable the multiple CSE operation mentioned previously. Internally, each array (or quadrants in multi-CSE setups) can write up to 32 pixels (or channels) of data at a given time. As noted previously, the mode of operation dictates whether 2, 4, 8, 16, or 32 channels are used. It also affects the number of address bits utilized in practice with lower modes of operation using more bits. Additionally, the number of address bits differs by array due to the differences in sizes. NSLEDS utilizes 7 bits for the X address and 7 bits for the Y address, yielding a total of 256 by 256 addresses per quadrant. The LOAD bit is used

32 to select between even and odd rows, yielding an effective address space of 256 by 512 per quadrant. Because the smallest mode of operation writes 2 pixels at a time, this is sufficient to fully address the array. Similarly, HDILED utilizes 8 bits for the X address and 8 bits for the Y address, yielding a total of 512 by 512 addresses per quadrant. Again, as with NSLEDS, the LOAD bit is used to select between even and odd rows, yielding an effective address space of 512 by 1024 per quadrant. This is again sufficient to completely address the array since it also can at a minimize write 2 pixels at a time. Structurally, NSLEDS and HDILED consist of super pixels as shown in Fig- ure 5.2. Each super pixel is made up of a grid of 4 pixels spanning two rows and columns. As discussed above, the LOAD line selects between the top two pixels (even rows) and bottom two pixels (odd rows) of each super pixel in the quadrant, and the two selected pixels are both written at the same time. Additionally, these share a drive strength as noted in the diagram. The emission spectrum is controlled using the Strong/Weak drive strength bit which dictates whether to provide a strong or weaker light emission for the given pair of pixels. Super pixels are laid out across the array in a grid structure with NSLEDS consisting of 256 by 256, and HDILED consisting of 512 by 512. The general layout is shown in Figure 5.3. The 32 analog signaling lines control the emission intensity of driven pixels. These 32 channels are controlled by digital to analog converters within the CSE that are driven by the firmware. Internally, a CSE has 8 DAC cards with 2 DAC integrated circuits per card, with each DAC circuit consisting of 2 DAC channels as discussed in Chapter 4.1. Each channel is used to drive a single pixel, giving the ability to drive 32 pixels at once or some subset as mentioned previously. In practice, it is preferable to utilize all channels at once because this allows for more pixels to be driven in a shorter amount of time. Each of the lower modes of operation cut analog bandwidth in half each time the channel count is halved. Figure 5.4 shows the pixel mapping per write for 32 physical pixels on an array in the highest mode of operation. 2 by 32 columns of pixels are shown segmented into super pixels. The Y address denotes the 16 super pixels that are selected per address.

33 Shared Drive Strength

LOAD=0 p|ch1 n|ch0

LOAD=1 p|ch1 n|ch0

Shared Drive Strength Figure 5.2: NSLEDS/HDILED Array Super Pixel Layout

Columns 0 1 2 3 4 5 6 7 0 … 1 … 2 … 3 … Rows 4 … 5 … 6 … 7 … … … … … … … … …

Figure 5.3: NSLEDS/HDILED Super Pixel Grid Layout

34 If the Y Address is incremented by 1 then the next 16 rows of super pixels would be selected. The X address (not shown) selects the next column of super pixels. DAC Card denotes which DAC card drives the given super pixel. L denotes which value of LOAD selects which rows within each super pixel. When LOAD is low as shown in the middle segment, the top pixels of a superpixel are selected as indicated in cyan. When LOAD is high as shown in the right segment, the bottom pixels of a superpixel are selected. The overall writing process for writing 64 pixels is a two-step process. First, 32 values for the even rows are loaded in and written to the array, followed by 32 values from the odd rows being loaded in and written to the array. At a data level, it is ideal to interleave the data such that it is available at the optimal time to reduce latency and buffering requirements within the firmware. This is discussed in detail in Chapter7. Writing additional segments of pixels is a matter of buffering more data and repeating the same write process while asserting the correct address lines for each segment. Given that arrays have no inherit hardware required write order other than what has been discussed above, the exact order of writing independent segments of 32 pixels can change depending on a number of factors. In a single CSE Setup, under most circumstances, the data for the top quadrants is carried by a single HDMI input into a CSE, and the data for the bottom quadrants carried by the other HDMI input10. In this case, the CSE firmware swaps between writing segments of 32 or 64 pixels to the top and bottom halves of an array. The former if it is desirable to write a minimal amount of data before servicing data from another input, the latter if it is desirable to complete an entire chunk of pixels. In a two CSE Setup, data over the HDMI links data could be segmented either horizontally or vertically meaning that each HDMI link would carry data for an entire quadrant. In a four CSE setup, each link would carry half of the data for a quadrant. As the reader may imagine, the order of writes

10 As discussed in Chapter9 when utilizing the PDP for driving an array, the location of where data is written to an array is agnostic to the HDMI input the data is transported on.

35 Figure 5.4: NSLEDS/HDILED Array Interleaved Pixel Mapping Per Write

36 could be configured in many different ways under these scenarios. Though it is not discussed in detail within this dissertation, it is worth noting for posterity that the order of writes on IRLED arrays can and does affect the thermal load on an array which ultimately affects pixel brightness [32, 76, 77]; thus, controlling the order of writes can be an important factor to consider for designers and users of a system. As mentioned previously, many of these effects can be alleviated through a snapshot mode operation.

5.2 Data ordering Due to the interleaved write process described in Chapter 5.1, data sent to an array is reordered in a manner that simplifies firmware development and minimizes buffering requirements. Though the details of the PDP itself are discussed in later chapters, data ordering is discussed here to illustrate some performance considerations with array operation, and could be considered independent of the PDP in that the data ordering could be implemented in non-PDP firmware. Figure 5.5 shows the data bit-packing utilized within the system when the PDP is in use. It is designed to map to the superpixel layout shown in Figure 5.2. Shown is a 24-bit pixel word which normally represents an RGB value in RGB color space within display protocols where 8 bits are reserved for each of the red, green, and blue channels. Below this is the mapping of each bit value for an NSLEDS or HDILED array. Where S indicates the drive strength of the super pixel, L indicates the value to drive the left side of the super superpixel at, and R indicates the value to drive the right side of the superpixel at. Only 11 bits are currently used to transmit data per pixel due to bit resolution limitations in the DAC and amplifier boards used within a CSE. Future CSEs may have higher bit resolutions resulting in the need to transmit 16-bits per pixel. In this event bit-packing would not be used. There has been some work on improving the DAC architecture as well [60]. Figure 5.6 shows the input reordering performed on data sent to a CSE. Each cable sends half of the data as noted in Chapter 4.1. The input example is segmented

37 RGB Legend Mapping Legend R: Red S: Drive Strength G: Green L: Left B: Blue R: Right U: Unused

Bit 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 RGB R R R R R R R R G G G G G G G G B B B B B B B B

Mapping S L L L L L L L L L L L U R R R R R R R R R R R

111 11 Figure 5.5: Bit-packing Format

into top, middle, and bottom to indicate which part goes over which input and the different reordering steps. The portions denoted as Top are transmitted over CSE input 1. The portions denoted as Bottom are transmitted over CSE input 2. The portions marked Middle are split evenly over both inputs evenly. The first step of data reordering is to bit-pack into 11-bit words as shown in Figure 5.5. Next, even/odd reordering is performed to reduce latency and buffering constraints as is shown in more detail in subsequent figures. Note the pattern introduced on the words in the diagram due to even/odd reordering. Finally, data is transposed before being sent to the array to accommodate the column write order of the array shown in Figure 5.4. If data were not transposed, then multiple lines of data would need to be buffered to draw 32 pixels for a single write. In fact, the first generation of firmware utilized on the TCSA and NSLEDS arrays required full image buffering before displaying a single pixel on an array which resulted in an entire frame of latency during operation. The second generation of firmware [78] utilized on TCSA, NSLEDS, and HDILED drastically reduced buffering and latency requirements through data reordering, but required double the amount of buffering to that of the current implementation of the PDP because it did not have an

38 Figure 5.6: Image Encoding: Input Reordering

Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2

Q3 Q4 Q3 Q4 Q3 Q4 Q3 Q4 Q3 Q4 Figure 5.7: Image Encoding: Quadrant Reordering

Even/Odd Original or NUC Image Bit-pack Row Reorder Transpose Array Q1 Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2 Q2 Q3 Q3 Q4 Q3 Q4 Q3 Q4 Q3 Q4 Q4

Figure 5.8: Image Encoding: Quadrant Reordering with Color Overlay

39 even/odd row reorder step during encoding. This meant that 2 by 64 pixels of even and odd rows of data were required to be buffered before a single write could occur even though only 2 by 32 pixels are needed by the hardware for a single write. It is worth noting that as mentioned in Chapter 5.1, different arrays could have different rasterization processes, and in that event the transformations described here would need to be changed to minimize buffering for those scenarios. Figure 5.7 shows how the same image data maps to each quadrant on an array. Similarly, to the previous image the separation of top, middle, and bottom by input cable holds here. Additionally, shown is that quadrant one and two are transmitted over the first input and quadrant three and four over the second input. Note also that the top-left of the image corresponds to quadrant one, the top-right of the image cor- responds to quadrant two, the bottom-left of the image corresponds to quadrant three, and the bottom-right of the image corresponds to quadrant four. This relationship holds for all subsequent images. Figure 5.8 shows the same details in a color overlayed chart. Figure 5.9 shows the bit-packing process for a false color image to aid with un- derstanding. The blown-up sections show single columns of data and the corresponding bit-packed version of the data where two columns of input with different colors of data per column are merged into a single column of 24-bit data with one color. This results in the example input image having two solid colors after bit-packing. In the actual implementation of bit-packing, real data is in the IR-spectrum and not averaged in this way but clamped and scaled instead. While not required, generally IR data is normally 16-bits per pixel which corresponds to current higher-class IR detectors hav- ing a dynamic range of 14-bits per pixel [40, 41, 42]. Cheaper detectors may only have lower dynamic ranges resulting in a lower ability to differentiate light output. Even/odd reordering and the transpose do not result in any noticeable data changes for this example. Figure 5.10 shows a false color input image designed to highlight the even/odd row reordering applied to imagery. The blown-up sections show 64 labeled rows of

40 Figure 5.9: Image Encoding: Data Bit-Packing

Figure 5.10: Image Encoding: Data Reorder

41 input data where the even rows are one color and the odd rows another color for every 32 rows. Additionally, for every 32 rows, the even and odd rows are de-interlaced into 16 even rows of data follow by 16 odd rows of data. In the example input image this results in every 16 de-interlaced rows having a different color. This is due to the interleaved write process for individual 32-pixel writes discussed in Chapter 5.1. Reordering data allows for only 16 bit-packed pixels of data to be buffered per write. As discussed previously, without this reordering, double the number of pixels would need to be buffered per array write increasing both latency and implementation complexity. Figures 5.11, 5.12, and 5.13 show some examples of what false colored images would look like if processed by the reordering kernels to give the reader a better under- standing of how different types of data would look during the intermediate processes. Note the characteristic jagged pattern due to even/odd row reordering present in each image. Also note that each image is transposed and segmented in half for transfer over separate inputs to a CSE. Figures 5.14, and 5.15 show IR imagery moving through the process of reorder- ing. The image shown in Figure 5.14 is commonly used to focus IR cameras and for testing IR array behavior with various shapes and numbers. The image shown in Fig- ure 5.15 is test imagery from one of the projects of my lab. Similarly, to the false color images, images are transposed and sent in halves over the CSE inputs. While this chapter discussed the internal details of how imagery received within the array is formatted and written to an array, Chapter6 shifts to a discussion of how imagery is sent to an array at the protocol level within an IRSP system while providing details of the limitations and challenges present there.

42 Figure 5.11: Image Encoding: Color Example 1

Figure 5.12: Image Encoding: Color Example 2

Figure 5.13: Image Encoding: Color Example 3

43 Figure 5.14: Image Encoding: IR Example 1

Figure 5.15: Image Encoding: IR Example 2

44 Chapter 6

DISPLAY PROTOCOLS

This chapter discusses the details of display protocols. Firstly, it provides a general discussion of how common display protocols work to send pixel data to a display system (e.g. a television). Then, it discusses how these protocols are utilized within IRSP technology.

6.1 Conventional Display Protocols Display specifications such as DSI [79], DVI [3], HDMI [4], and DisplayPort [80] are the backbone of consumer electronic display devices11 They are utilized in a plethora of devices ranging from televisions, monitors, laptops, smart phones, to embedded devices such as point of sale (POS) terminals. Increasingly, they are being utilized in the ever-increasing smart display market for applications such as registration, product menus, smart watches, etc. These generally provide a standardized feature-set, or display protocol, that is rooted in classical analog video specifications (e.g. VGA, Composite) [81] that utilize scan lines [82]. Scan lines are used to provide video timing information to synchronize a display to a given refresh rate. Each scan line consists of an active video region followed by a horizontal blanking period. After all active video scan lines are displayed, a vertical synchronization region is used to indicate the end of a frame. An overview of this is shown in Figure 6.1. The region shown in green is the pixel data for the active video region of the display. It is of size ha · va which represents the

11 Some newer additions such as Variable Refresh Rate (VRR) and the framing of DisplayPort are discussed in Chapter7 to allow for a direct comparison with the PDP.

45 number of pixels to display, for example, 1920 by 1080 for a HDTV high-definition video mode [83]. The blanking time regions denote pixel data that is sent but not displayed12.

A scan line consists of pixels made up of the ha + hfp + hsp + hbp regions. These are the horizontal active size, the horizontal front porch, the horizontal sync pulse, and the horizontal back porch, respectively. va, the vertical active size, indicates the number of scanlines that make up the active region of the display. The vertical blanking period makes up multiple scanlines and consists of vfp + vsp + vbp scanlines. These are the vertical front porch before the vsync pulse, the vertical sync pulse, and the vertical back porch, respectively. Sync pulses are generally active low, meaning that during active display a sync signal is high as shown in the diagram. Note, this terminology is consistent with the VESA Coordinated Video Timings (CVT) Standard [84]. Figure 6.2 shows a closeup view of signal lines during the active region of display for two scan lines13. A data enable signal denoted by enable is high during the active region shown in green. Following this, it transitions low for a period of time denoted by the hfp + hsp + hbp regions. The horizontal sync signal transitions low only in the region shown in yellow between the front porch and back porches. This process repeats for all scan lines. Once the last active region pixel is drawn, the enable signal stops transitioning high during the vertical synchronization period. Figure 6.3 shows a closeup view of signal lines during the transition into the

14 vertical synchronization period . The region donated by va indicates the end of the video active region of the display which occurs toward the end of a frame. After the active video region, all data has been drawn to a display. The region denoted by vfp + vsp + vbp is the vertical blanking or vsync period during which no active video

12 Typically data lines are held low during this period, but sometimes they are used for out-of-band communication to send other information such as audio encoding. 13 The active pixel count is proportionally smaller to blanking regions than in real modes for illustration purposes. 14 The blanking regions consist of less scanlines than in real modes for illustration purposes.

46 hsync Signal

ha hfp hsp hbp

va Active Video vsync Signal vsync

vfp

vsp Blanking Time

vbp

Figure 6.1: Display Protocol Timing Overview

data is sent; therefore, data enable denoted by enable is always low during this period.

Before the vertical sync pulse period denoted by vsp occurs, a vertical front porch period denoted by vfp occurs. After the vertical sync pulse, a vertical back porch region vbp occurs. Following this, the beginning of the next frame begins after the vbp region. Equations (6.1) through (6.5) show the relationship between the different regions of a display and the frequency or refresh rate. In Equation (6.1), lh denotes the scan line size of a display, or total horizontal width, which is made up of the horizontal active and horizontal porch region pixels of a display. In Equation (6.2), lv denotes

47 enable hsync

ha hfp hsp hbp

Figure 6.2: Display Protocol Horizontal Signal Cross Section Timing

ha hfp hsp hbp

enable hsync vsync

va vfp vsp vbp

Figure 6.3: Display Protocol Full Signal Cross Section Timing

the total vertical width of a display, which is made up the vertical active and vertical porch region pixels of a display. In Equation (6.3), each pixel is sent at a rate denoted by fp, the pixel frequency (also called the pixel clock) where the result ff denotes the frame frequency or frame rate of a display. This is the pixel frequency over the total number of pixels (video active and porches) of a display. In Equation (6.4), pt denotes the time period a single pixel takes to send. In equation (6.5), ft denotes the time period for an entire frame.

lh = ha + hfp + hsp + hbp (6.1)

48 lv = va + vfp + vsp + vbp (6.2)

fp ff = (6.3) lh · lv

1 pt = (6.4) fp

1 ft = (6.5) ff To illustrate, let us look at the display modeline generated using the VESA Co- ordinated Video Timing (CVT) standard shown in Table 6.1. This modeline operates a total frame frequency of approximately 30 Hz. The pixel clock 79.75, denoted in red, is specified in megahertz. The horizontal pixels, denoted in blue; are the horizontal display width, the horizontal sync start, the horizontal sync end, and the horizontal total pixels, respectively. The vertical pixels (measured in lines), denoted in green; are the vertical display height, the vertical sync start, the end of vertical sync end, and the horizontal total pixels, respectively. The sync pulse polarities, denoted in yellow; indicate whether a given sync pulse is active low or active high. A minus symbol in- dicates active low and a plus symbol indicates active high. The terminology for these modeline parameters comes from The X Window System [85], a commonly utilized windowing system in the Linux family of operating systems where the parameters, while equivalent, are specified in a different format from the VESA standards. Equa- tions (6.6) and (6.7) show the relationship between the X Window System parameters and the VESA parameters.

49 Pixel Horizontal Vertical

Name Clock Parameters Parameters Polarity

(MHz) (pixels) (lines)

fp hD hSS hSE hT vD vSS vSE vT “1920x1080 30.00” 79.75 1920 1976 2168 2416 1080 1083 1088 1102 -hsync +vsync

Table 6.1: VESA Coordinated Video Timing (CVT) Modeline

ha = hD hfp = hSS − ha hsp = hSE − hSS hbp = hT − hSE

ha = 1920 hfp = 1976 − 1920 hsp = 2168 − 1976 hbp = 2416 − 2168 (6.6)

ha = 1920 hfp = 56 hsp = 192 hbp = 248

va = vD vfp = vSS − va vsp = vSE − vSS vbp = vT − vSE

va = 1080 vfp = 1083 − 1080 vsp = 1088 − 1083 vbp = 1102 − 1088 (6.7)

va = 1080 vfp = 3 vsp = 5 vbp = 14 If the parameters for the modeline in Table 6.1 are placed into the formulas shown in Equations (6.1) through (6.5), the results shown in Equations (6.8) through

(6.12) are yielded. The astute reader should note that lh and lv are the same as the total width and height for the given modeline. The pixel period is ∼ 12.53ns, meaning that each pixel is drawn for the given amount of time. The frame period is ∼ 33.38ms, meaning that each frame is drawn for that given amount of time.

lh = hT = ha + hfp + hsp + hbp

lh = hT = 1920 + 56 + 192 + 248 (6.8)

lh = hT = 2416

lv = vT = va + vfp + vsp + vbp

lv = vT = 1080 + 3 + 5 + 14 (6.9)

lv = vT = 1102

50 fp ff = lh · lv 79.75e6 f = f 2416 · 1102 (6.10)

ff = 29.95

1 pt = 12.53ns = (6.11) fp

1 ft = 33.38ms = (6.12) ff

6.2 Display Protocols within IRSP Technology IRSP technology typically utilizes conventional display protocol technology to drive IR-arrays. In the most basic form a scene generator provides imagery that is encoded utilizing a display protocol and sends it to some form of close support elec- tronics which then decodes the stream pixel by pixel to drive an array as discussed in Chapter5. For scenarios that involve unsynchronized operation where dropped frames are not an issue, these protocols can largely be used without modification. However, sce- narios that require synchronization in either open loop or closed loop setups present a challenge. Often, non-standard modifications must be used to compensate for jit- ter among different system processes and overall system latency. Figure 6.4 shows where in a system setup a custom solution would need to be inserted in the case of a scene generator connected directly to a CSE. In this diagram, the scene generation, NUC process, and the camera would need to be end-to-end synchronized with built-in compensation for frame latencies. This can range from utilizing off the shelf components such as Nvidia Quadro Sync cards [7] or developing additional hardware pipelines capable of buffering and delaying emission of frame data such as an intermediate buffer card. This presents a particular challenge because the user typically does not have direct control over frame

51 Scene Gen.

NUC

SLEDS Hybrid Custom IRLED Sync CSE Dewar RIIC Camera Solu�on Array

Figure 6.4: Custom Synchronization Solution

buffers, frame emission, or the software drivers within a system when utilizing display protocol-based technology. Moreover, encoders and decoders expect the protocols to work in a defined way that modifications for enabling synchronization could run afoul of; thus, resulting in the need for non-standard encoder and decoder implementations. The PDP on the other hand, eases synchronization due to its nature of being packetized, which allows for controlled data to be sent when directed without the need for a custom synchronization solution. With a PDP based solution, a scene generator could synchronize to a camera (if necessary) and send frame data when required under its own direct control. Moreover, hardware links could be replaced in the future with faster technology without the need to design a new hardware specific solution for synchronization. The only requirement being that the protocol be utilized across the new link. Now that the background of display protocols has been discussed, Chapter7 shifts focus to a discussion of the design of the PDP itself followed by a protocol specification. Some of the more protocol specific details left out here, such as variable refresh rate (VRR), are discussed there to provide a comparison with PDP features.

52 Chapter 7

PACKETIZED DISPLAY PROTOCOL

This chapter discusses the underlying communication details of the packetized display protocol (PDP) architecture. First, it gives an overview of the protocol design methodology. Secondly, it provides comparison with conventional display technology. Thirdly, it discusses requirements for packet formatting. Fourthly, it discusses individ- ual packet types. Fifthly, it delves into the overhead of utilizing PDP packets. Sixthly, it discusses multi-frame rate performance within the PDP through an example of a dual frame rate operation. The central purpose of this chapter is to provide the reader with an understanding of the reasoning behind design decisions as well as to provide a detailed specification of the protocol and its performance characteristics when utilized in different scenarios.

7.1 Design Methodology This section discusses the design methodology for the PDP Protocol which began with a number of critical design goals [86]. An attempt to address the goals is not presented directly here but discussed elsewhere due to it being a complex topic with many interwoven aspects. The design goals are as follows:

1. To design a scalable display system that is distributable and hardware agnostic. The display protocols and interfaces utilized within projector systems have sys- tematic issues with remaining up to date with current technology in that old standards (such as DVI) continue to be utilized due to the inability for custom synchronization solutions to work with newer hardware as well as due to the costs and time associated with implementing newer solutions utilizing newer standards. This is touched upon throughout this chapter and discussed more extensively in Chapter8.

53 2. To provide a protocol that is relatively simple to implement without unnecessary complexity to ease the encoding and decoding process. A low overhead and fast decode process is crucial to ensuring latency is sub-frame15 across an end-to-end system. Additionally, simplifying these processes also eases potential hardware implementation mistakes as well as inefficiencies that could lead to reduced per- formance. This is discussed in Chapters 7.3, 7.4, and 7.5.

3. To provide dynamic intra-frame variable refresh rate (VRR) in order to enable better bandwidth utilization. In particular, to allow for regions of a display to be intelligently updated at different rates when driven by a scene generator. This is discussed upon in Chapters 7.7 and 8.2.

4. To provide a path to utilize conventional display protocol streams such as HDMI as a backwards compatible transport layer for the PDP without introducing overhead. This is to allow for interoperability where necessary when utilizing conventional display sources, and to ease migration to a PDP based system. This is touched upon in Chapter 7.5 and discussed more extensively in Chapter9.

For methodology, each goal was considered when making decisions about what should and should not be part of the PDP, and where the boundaries of the protocol should lie. This included such decisions as how to incorporate hardware specific fea- tures, the best method to support VRR without constraining the methods by which users can implement support for it within scene generators16, and how to ease imple- mentation in current and future hardware. Hardware specific optimizations can be important for performance reasons even in hardware agnostic protocols. Therefore, it is paramount to make hardware agnostic protocols general enough to support various configurations. For example, tuning within TCP [87] is a well-known important con- sideration in the field of networks for maximizing throughput due to differing hardware and network topologies. In the PDP, one example of a hardware specific feature that

15 This refers to latency of that below the time it would take to buffer an entire frame. 16 VRR implemented within HDMI and DisplayPort is driver controlled and thus the user has no ability to control it.

54 needs consideration is the support for different array write processes17 because effi- cient ordering of data is important for low-latency operation and minimizing hardware complexity.

7.2 Comparison To better understand how to design the PDP, the constraints of conventional display protocols were investigated to discern which features within these protocols conflict with the design goals of the PDP. In addition, it was necessary to investigate whether these could provide a path forward in the design and development of the PDP. Many versions of these protocols (DVI [3], HDMI [4], DisplayPort [80], etc.) provide similar feature-sets to end-users with the major focus being on increasing refresh rates and resolutions with each new specification. However, as discussed in Chapter 6.1, their basis is rooted in classical analog video specifications that utilize scan lines [82]. This means that signal timing utilizes vertical and horizontal blanking periods that consist of a front-porches, sync pulses, and back-porches; in addition to, the active video data to be displayed. For early analog display devices, these signals enabled operators to manually adjust horizontal and vertical hold times relative to the sync pulses to correct for the imprecise timing of early display hardware; but provide little benefit on modern hardware other than as an embedded method to support sending frames at a static interval, and to enable tearless buffer swapping in either double buffering [88] or triple buffering schemes [89] utilized within GPUs18. In digital display technology, embedded blanking periods represent an anachro- nism that impedes the goal of maximizing bandwidth utilization when driving a display by requiring the transmission of unnecessary data over digital protocols. For example,

17 See Chapter5 for the interleaved array write process of the NSLEDS and HDILED arrays. 18 This is performed by swapping buffers during the vertical sync (vsync) interval of a frame [90, 91]. See Chapter6 for details about vsync intervals.

55 a commonly utilized 1920 by 1080-pixel mode operating at 60 hertz [83] on a modern display has a 16 percent blanking period overhead due to the specification of vertical and horizontal sync periods. Other examples can be seen in Table 7.1. Of note, modes with 512 by 512 and 512 by 256 visible pixels are non-standard and were tested on existing hardware to minimize the overhead of blanking. Additionally, these have been utilized within NSLEDS and TCSA during actual array operation. What one sees is that as modelines shrink and data rates decrease, the percentage of blanking relative to displayed pixels significantly increases. This is due to the inability for common implementations of video decoders to operate correctly when blanking is minimized. An additional issue is that when non-standard visible resolutions are utilized many decoders do not operate at all. These protocols also internally utilize a mode based display of data that requires the specification of the absolute width and height of display as well as a pixel clock which when used in conjunction with the vertical blanking information provides a total refresh rate as described in Chapter 6.1. This means that the bandwidth requirements for a given mode are inherently static across all frames. In addition, this constrains the refresh rate for a display to be static in terms of both the intra-frame regions of the display and between frames. Effectively increasing the burden of synchronization and impeding the introduction of dynamism into the display process. In recent years, work has been done to implement a limited form of variable refresh rate (VRR) display between frames for use with newer protocols [92, 93]. In essence, it allows for entire frames to be sent for display immediately once the rendering process has completed. A downside is that historically this has generally required specialized hardware support out of the scope of protocol specifications. A recent update of the HDMI 2.1 specification [4] has integrated a speed-limited form of VRR directly into the specification that requires full frames of data to be transmitted at a statically specified resolution and target frame rate. DisplayPort provides a similar form of VRR [94] with a similar set of limitations which also require full frames of data to be transmitted at a specified resolution and target frame rate.

56 Modeline Overhead Resolution Refresh Visible Total Pixels Overhead Rate (Hz) Pixels 1920x1080 60 2073600 2475000 16.2% 1600x1200 60 1920000 2700000 28.9% 1280x1024 60 1310720 1799408 27.2% 1280x960 60 1228800 1800000 31.7% 1280x800 60 1024000 1391040 26.4% 1024x768 60 786432 1083264 27.4% 512x512 500 262144 296100 11.5% 512x512 400 262144 296100 11.5% 512x512 300 262144 357500 26.7% 512x512 100 262144 357500 26.7% 512x512 60 262144 357500 26.7% 512x512 50 262144 364000 28.0% 512x512 30 262144 520000 50.0% 512x256 1000 131072 149460 12.3% 512x256 500 131072 256000 48.8% 512x256 200 131072 320000 59.0% 512x256 100 131072 320000 59.0% 512x256 60 131072 320000 59.0%

Table 7.1: Modeline overhead for various resolutions and refresh rates [83]. Computed using active pixel area over total pixel area. 512x512 and 512x256 are typical modeline resolutions used on IRLED arrays.

57 DisplayPort differs from older display standards in that data streams themselves are framed [95, 96], though the standard itself refers to this framing as packetization it differs from the normal sense of packets in that arbitrary packets of data with dy- namic meanings and decoding cannot be sent. An example of the framing is shown in Figure 7.1. Once per frame in between pixel data, a blanking start symbol is inserted into the data stream to indicate the start of vertical blanking. Then, a Main Stream Attribute (MSA) packet is sent that contains the total number of horizontal pixels per line, the total number lines, the start of active video pixels relative to hsync, the start of active lines relative to vsync, and the pixel formatting. After which, a blanking end symbol is inserted to indicate the end of vertical blanking. Following this, pixel data conforming to the video specification within the MSA packet is sent along with stuffing symbols that are framed with fill start and end symbols. These can be of different lengths and are used to represent space between actual data. After all the data for a frame is sent, blanking symbols for the next frame occur, and the process repeats. In essence, what DisplayPort provides is a framed method of sending video formatting information per frame instead of embedding these signals in a separate synchronized stream. Display port also provides a secondary stream to send audio or other infor- mation (not shown) during the blanking interval similar to how blanking intervals are sometimes used as a side channel for extra data in earlier protocols. This differs from the PDP in that the PDP allows for completely arbitrary packets to be sent and decoded with differing packet sizes which gives the PDP the ability to support dynamic sub-frame frame rates, and to be extendable to future packet types. Additionally, the PDP carries no internal notion of horizontal or vertical blanking intervals which allows the PDP to operate as fast as possible and give scene generators direct control over frame rates at the user level. For example, the transport upon which the PDP is implemented can operate at the maximum possible data rate allowable in hardware; then, the user in turn can send packets over the data link at the desired rate while inserting empty space where necessary. In the backend, a PDP decoder would decode the data as quickly as possible and display it. From a higher

58 Legend Blanking Start/End Main Stream A�ribute Pixel Data Filling Start/End

Frame 0 Frame 1

B B P F F P F F P F F B B P F F MSA … …… MSA …… S E D S E D S E D S E S E D S E

Figure 7.1: Display Port Framing

level, this could be viewed as the user issuing commands to send data packets similar to that of how TCP based programs send data over a link with a remote environment executing the actual commands. It is up to the user where and when they want data displayed which enables the PDP to fit a larger set of use cases, and to be simpler to integrate within projector systems.

7.3 Packet Format Given the goals discussed in Chapter 7.1, it is important to discuss different considerations for packet formatting within the PDP. At a minimum, the PDP needs a method to be able to write to different regions of a display in insolation from other regions. For a region write, a minimal set of data that needs to be encoded is an x start address, y start address, x end address, y end address, and the pixels data. Additionally, these fields need to be able to be efficiently encodable and decodable as well as minimize the necessary intermediate operations required for manipulating the data in both backend architectures (close to an array), and front-end architectures (close to a scene generator). Moreover, the bit-width is important both in terms of

59 encoding/decoding efficiency and rollover. The latter, due to the maximum sizes arrays could reach in terms of resolution. Given that majority of off the shelf computer hardware architectures are at least aligned or 8-bit aligned in terms of instruction set architecture (ISA) and widths, it is prudent to consider data alignment. This is primarily due to unaligned data requiring additional operations such as bit shifting and masking in order to manipulate it within the majority of modern architectures. For example, a multiplication of a 6-bit width data may require storing the result in an 8-bit word and masking the two upper bits to zero before the multiplication operation is performed. The coordinate system being utilized within the PDP is another important con- sideration because sub-optimal choices could lead to unnecessary translations which increase operation costs and computational complexity for manipulating data. For example, if one were to store the beginning X coordinate within a packet with a co- ordinate offset for the range instead of the end coordinate itself, then computing the ending coordinate for the range would require an addition on the backend. And poten- tially, 2N 2 additional mathematical operations to allow comparing region boundaries if the PDP were being composited where N is the number of independent regions to draw.

7.4 Packet Types Table 7.2 shows the basic packets used for communication within the PDP. These are strictly for data transfer and synchronization of system operations, and do not include other aspects such as system setup or enumeration19. These packets are organized into type specific fields of some set word-size. The exact size of word fields is left abstracted to allow for an optimal implementation to be used in practice. For example, a system may utilize 24-bit word size if an array has a native 24-bit pixel size, or 32-bit word size if the hardware transport layer has a specific optimal word size.

19 System setup and enumeration are typically system specific operations and outside of the current PDP Design but may be incorporated in the future.

60 Name Type ID Type Specific Fields No Opera�on 0x0 Draw Region 0x1 X start X End Y Start Y End Data… Array Reset 0x2 Quad Trigger 0x3 Ac�on Table 7.2: List of PDP Packets

Typically, a multiple of 8-bit word size would be utilized in practice, as most hard- ware architectures (such as x86) utilize some multiple of this size [97]. In any given implementation, the word size of all fields must match, in order to simplify decod- ing operations. This allows for fixed-size decoding of incoming data, which simplifies processing and firmware implementation as well as can ease timing constraints and enforce non-variability in the decoding time of incoming packets of data. In general, PDP packets are designed to send a minimal amount of header data to lower overhead and to allow for pixel orderings that minimize buffering requirements in order to enable real-time processing. In terms of the protocol itself, the PDP uses a single global coordinate system to refer to pixel locations on a display array. For example, a 512 by 512-pixel array would have coordinates from 0 to 511 in both the horizontal and vertical directions. All packets referencing sub-regions of this display would utilize coordinates that map to some rectangular sub-region of the display. Any overlapping regions of data would be composited during system operation with the compositor giving priority to data segments that need to be displayed at higher frame rates. PDP Packets are divided into four types, a no operation packet, a draw region packet, array reset packet, and trigger packet. All packets consist of a Type ID field of word-size. The type ID is used by the decoder to determine the packet type. The first packet type, No Operation, is reserved to indicate that command IDs of 0x0 are ignored. The second packet type, Draw Region, is used to send a rectangular sub-region of pixel data in global array coordinates to minimize computations and translations. It

61 has fields for the start and stop horizontal and vertical coordinates (defined inclusively) followed by individual pixel data. For example, suppose a scene generator were to send a packet of data from array region 10 to 19 along the X axis and 20 to 29 along the Y axis, a total of 100 pixels of data would follow the packet coordinates given that the packet specifies a 100 pixel sized region. The third packet type, Array Reset, is utilized to indicate that quadrants on a given array should be cleared. The causes a quadrant to stop displaying its current contents. It consists of an array specific quadrant bitmask used to indicate which quadrant to reset. Any unused bits are reserved. This type of packet would be utilized exclusively to control clearing of quadrants which may be necessary on certain types of array architectures. The fourth packet type, Trigger, could be used to implement a trigger-based synchronization within the PDP. It consists of a system specific action bitmask used to indicate the type of operation to trigger. In IRLED array systems, the coordinator of synchronization is dependent on the array itself and the different components within the system. In some systems, a sensor may be used as the source of synchronization, in other systems, another component may be utilized. Other aspects of system operation may even be triggered outside of the system synchronization interval based off other events. For this reason, the PDP has opted for a trigger-based approach to synchro- nization. This approach allows for synchronization, data transfer, and computation to be custom tailored for specific use cases. For example, the action mask could be used to trigger the generation of the next frame to be displayed when needed, the source of which is defined by the system itself. Another example would be to utilize the ac- tion mask to indicate that further computations (such as scene generation) stall until otherwise indicated.

7.5 PDP Stream Decoding In this section, a discussion of how the PDP might operate with actual data is provided to aid the reader with understanding of how packet decoding is envisioned to

62 Cycle Decoding Data Input Stream PDP State Pixel Buffer 0 1 XS XE YS YE D D D D D D D D D …

1 XS XE YS YE D D D D D D D D D … DR

2 XE YS YE D D D D D D D D D … DR XS

3 YS YE D D D D D D D D D … DR XS XE

4 YE D D D D D D D D D … DR XS XE YS

5 D D D D D D D D D … DR XS XE YS YE

6 D D D D D D D D … DR XS XE YS YE D0

7 D D D D D D D … DR XS XE YS YE D0 D1

8 D D D D D D … DR XS XE YS YE D0 D1 D2

9 D D D D D … DR XS XE YS YE D0 D1 D2 D3 Display 10 D D D D … DR XS XE YS YE D4 +n Figure 7.2: Example PDP Stream

work. An actual implementation of the PDP on a real system is discussed in Chapter9 with experimental results discussed in Chapter 10. Figure 7.2 demonstrates a PDP stream for an architecture with a minimum write size of four pixels and a column-first write order. Data is streaming to the left. Cycles are represented moving downwards with the internal PDP state and pixel buffer data changing as each word is streamed in. One is the command ID for a draw region packet. XS stands for x start. XE stands for x end. YS stands for y start. YE stands for y end. These are the header words for a PDP draw region packet. D stands for data which represents each pixel word to draw to an array. At the beginning of time, the PDP state is empty. Once the command ID is decoded, the PDP firmware enters the draw region state indicated by DR. Then each word for the region boundaries is read in until the entire header has been parsed. Next, pixel data is streamed in starting with D0. Once all the words

63 necessary for a single array draw are buffered in, the buffered data is then clocked out of the buffer for display. Following this, YS is incremented to move to the next address on the array20, and D4 is buffered in. Note, an implementation could both display pixels and buffer new pixels for display simultaneously. Not shown in the in figure, the buffering and writing process would then continue until the end of the column when YS equals YE. Then XS would be incremented and YS reset to the start of the region. Then each column would be buffered and written in the same manner as the first until YS equals YE and XS equals XE, which indicates the end of the frame. After this, a new packet would be streamed in and decoded in the same manner. One important note about the decoding process discussed here is that it would be relatively simple to implement by utilizing a state machine within hardware that encapsulates the different PDP state variables. A major factor in this is the fixed width word size utilized within the PDP. Fixed width decoding means that partial words do not need to be buffered, and dynamic buffers and routing need not be utilized within a decoder. As discussed in Chapter 7.1, one of the design goals of the PDP is to provide a means to ease implementation complexity. This simplification of hardware implementation allows for resources to be focused on optimally routing data in and out of the an FPGA, ASIC, or other implementations while minimizing timing closure issues in an environment where a major challenge is the plethora of signals that need to be routed. For example, within NSLEDS and HDILED based systems, it is necessary to route 512 data lines due to the 32 16-bit DAC channels used. Future systems could double that number to 1024 by utilizing 64 16-bit DAC channels. As one of the goals of the PDP is to provide backwards compatibility, an impor- tant consideration is how this decoding process might be implemented within a display protocol transport layer. Most of the details regarding this is discussed in Chapter 9.1.

20 Array write processes can differ as indicated in Chapter 5.1, due to this the index n in Figure 7.2 is an array specific value that indicates the amount by which to increment YS. In architectures with a row-first write order, XS may be incremented instead.

64 For now, it is worth noting that for an implementation that is backwards compatible with HDMI, draw packet headers could be transmitted by a secondary data channel while leaving the main stream channel for pixel data. This would allow for pixels data to be transmitted exactly as normal while utilizing normal display protocol modelines, thus, allowing users to be able to utilize their array technology transparently as-is without any changes while utilizing an array that internally supports the PDP via its close support electronics.

7.6 Overhead As mentioned previously, internally the PDP has no notion of blanking periods or porches for providing synchronization, and therefore does not encapsulate the inef- ficiencies inherent in the aforementioned protocols. Instead synchronization and frame rates within the PDP are controlled by the source through timing when data is sent. This means that high overheads due to blanking can be mitigated. Table 7.3 shows the maximum packet overhead due to packet encapsulation when the PDP is utilized for the same resolutions and frequencies listed in Table 7.1. These are computed using active pixel area over total pixel area. The original modeline overheads are shown in the Modeline Overhead column. The Overhead Reduction columns show the percent reduction relative to the original modeline for both bit-packed and unpacked data. Note, unpacked data requires buffering double the number of pixels in order to write the same amount of data to an array as bit-packed data so one would need to double the modeline size in practice. 512 by 512 and 512 by 256 are typical modeline resolutions used on IRLED arrays. These numbers represent a worst-case scenario when the PDP is implemented within a transport layer that does not require blanking intervals. In actual usage, larger packets consisting of more pixels would be utilized therefore, the overhead would be lower on average. However, even in worst-case scenarios the PDP has a vast reduction on bandwidth requirements on average which means that faster refresh rates can be utilized in bandwidth limited situations. Additionally, higher integration times could

65 PDP Maximum Packet Overhead Original Bit-packed Unpacked (13.5% Overhead) (7.2% Overhead) Resolu�on Frame rate Modeline Total Pixels Overhead Total Pixels Overhead (Hz) Overhead Reduc�on Reduc�on (%) (%) (%) 1920x1080 60 16.2 2397600 2.7 2235600 9.0 1600x1200 60 28.9 222000 15.4 2070000 21.7 1280x1024 60 27.2 1515520 13.7 1413120 20.0 1280x960 60 31.7 1420800 18.2 1324800 24.5 1280x800 60 26.4 1184000 12.9 1104000 19.2 1024x768 60 27.4 909312 13.9 847872 20.2 512x512 500 11.5 303104 -2.0 282624 4.3 512x512 400 11.5 303104 -2.0 282624 4.3 512x512 300 26.7 303104 13.2 282624 19.5 512x512 100 26.7 303104 13.2 282624 19.5 512x512 60 26.7 303104 13.2 282624 19.5 512x512 50 28.0 303104 14.5 282624 20.8 512x512 30 50.0 303104 36.5 282624 42.8 512x256 1000 12.3 151552 -1.2 141312 5.1 512x256 500 48.8 151552 35.3 141312 41.6 512x256 200 59.0 151552 45.5 141312 51.8 512x256 100 59.0 151552 45.5 141312 51.8 512x256 60 59.0 151552 45.5 141312 51.8

Table 7.3: PDP Maximum Packet Overhead

66 be utilized within IR cameras and sensors due to less time being needed for writing the same visible data to an IR array. For example, with the 512 by 256 modeline operating at 200 hertz, integration time could be increased by up to 45.5 percent over a traditional modelines for bit-packed data. The numbers in Table 7.3 are computed by taking the visible resolution and dividing it by the minimum pixel write size supported on an array. The minimum pixel write size is also synonymous with the minimum possible PDP packet payload size excluding headers, and thus, is denoted as Spkt. This yields the maximum possible packet count, Cpkt, for a given resolution as shown in Equation (7.1). The visible resolution parameters are denoted as horizontal active, ha, and vertical active, va, respectively to keep consistent with the naming conventions used in Chapter6. Within the NSLEDS and HDILED arrays, the minimal packet size for packed data, SpktP , is 2 by 16 due to the interleaved array write process and bit-packing format discussed in

Chapter5. The minimal packet size for unpacked data, SpktU , is 2 by 32 pixels due to unpacked data needing double the amount of pixel data per write.

ha · va Cpkt = Spkt h · v C = a a pktP 2 · 16 (7.1) h · v C = a a pktU 2 · 32

The total pixel overhead, Opx, is computed by multiplying the total packet count, Cpkt, by the packet overhead, Opkt, as shown in Equation (7.2). The overhead is determined by looking at the total number of additional words added for PDP header data. Header details are discussed in Chapter 7.3.

Opx = Opkt · Cpkt (7.2) Opx = 5 · Cpkt

67 The total pixel count, Tpx, is determined by adding the total pixel overhead, Opx, (or the result of Equation (7.2)) to the visible pixel size as shown in Equation (7.3).

Tpx = Opkt · Cpkt + ha · va

Tpx = Opx + ha · va (7.3)

Tpx = 5 · Cpkt + ha · va

Putting the input variables together and simplifying yields Equation (7.4) where

Tpx is the total pixel count.

ha · va Tpx = Opkt · + ha · va Spkt

Opkt Tpx = ha · va · ( + 1) Spkt 5 (7.4) T = h · v · ( + 1) pxP a a 2 · 16 5 T = h · v · ( + 1) pxU a a 2 · 32

Equation 7.5 shows computing the overhead percentage, Opdp, is a matter of dividing the total pixel overhead, Opx, by the total pixels, Tpx.

Opkt · Cpkt Opdp = · 100 Tpx (7.5) Opx Opdp = · 100 Tpx

Due to the nature of the total pixel overhead being proportional to the visible pixel size for each resolution, the overhead percentage under worst-case conditions can- cels out to a constant consisting of only a relationship between the minimum packet size and packet overhead. Substituting Equations 7.1 and 7.4 into Equation 7.5 yields

68 Equation 7.6 which shows this cancellation process and the resulting overhead percent- ages. The overhead for packed data is 13.5% and the overhead for unpacked data is 7.2%.

ha·va Opkt · Spkt Opdp = · 100 ha·va Opkt · + ha · va Spkt : 1 ha·va Opkt · S O = pkt · 100 pdp : 1 1 ha·va : Opkt · + ha· va Spkt

1 Opkt · Spkt Opdp = 1 · 100 Opkt · + 1 Spkt

Opkt Spkt Opdp = · 100 Opkt + 1 Spkt (7.6) Opkt Opdp = · 100 Opkt + Spkt

5 O = · 100 pdpP 5 + 2 · 16

OpdpP = 13.5

5 O = · 100 pdpU 5 + 2 · 32

OpdpU = 7.2

The overhead reduction in Table 7.3 is computed by subtracting the PDP over- head from the overheads listed in Table 7.1. In most cases, there is an overhead reduction. It is worth restating for posterity that these numbers represent worst-case scenarios for the PDP when implemented on top of a transport layer that does not require blanking intervals. If the underlying transport layer upon which the PDP is implemented cannot function without additional overhead, then the performance would be worse in practice as is the case if the PDP is utilized with a display protocol-based

69 transport layer. However, in the case of display protocols, overhead can be minimized relative to normal non-PDP operation by utilizing the fastest frequency the hardware allows and minimizing modeline porches. This would enable the PDP to use much smaller porches than normal modelines. Additionally, horizontal porch overhead can be mitigated by timing any inherit analog settling delays to occur during porches. This can be done by ensuring that there is enough data for a write prior to a horizontal blanking period.

7.7 Multi-frame Rate Performance This section discusses the benefits of utilizing a PDP based approach for reaching high frame rates. In high performance systems, data movement [98], bandwidth [99], and latency [100] are of primary concern due to the nature of these typically imped- ing performance requirements. In many cases, minimizing data movement can vastly improve the performance of systems [101, 102, 103]. In real time systems [104], such as IR projector systems, where end-to-end latency is important for driving arrays, and the observation of real-time IR behavior is important for characterizing temperature changes [105] minimizing data movement, and choosing appropriate hardware with high bandwidth as well as low-latency is extremely important. As detailed throughout this dissertation, the PDP allows for control over which individual segments of an array can be written to. This means that large amounts of bandwidth can be saved by updating parts of a frame at slower frame rates than other sections of a frame. In a simplified dual frame rate setup, Equation (7.7) and the subsequent simplification show how to compute the percentage of saved bandwidth, bs, when operating some proportion of pixels at a slower rate than a given faster rate. The proportion of pixels operating at a slow frame rate is p, the fast frame rate is rf , and the slow frame rate is rs. The final simplified form is the percentage of pixels operating

70 at a slow rate multiplied by the difference between the rates over the fast rate.

  (1 − p) · rf + p · rs bs = 1 − · 100 rf   (1 − p) · rf + p · rs bs · rf = rf − rf · · 100 rf

bs · rf = (rf − (1 − p) · rf − p · rs) · 100

bs · rf = (rf − rf + p · rf − p · rs) · 100 (7.7)

bs · rf = (p · rf − p · rs) · 100

bs · rf = (p · (rf − rs)) · 100   p · (rf − rs) bs = · 100 rf Table 7.4 shows various examples of the bandwidth saved relative to operating all pixels at a given fast frame rate for PDP operation when two frame rates are utilized. The results are computed using Equation (7.7) to give the reader an understanding of the kinds of savings that are possible by utilizing this method of operation. Of note, driving just half of the pixels at a slow rate can save near 50 percent of the bandwidth in these scenarios. This is of particular importance because many types of IR test scenarios utilized in practice contain sparse objects with low intensity backgrounds. In these types of scenarios, the bandwidth reduction would translate into the ability to display fast imagery at much higher rates than possible with conventual display technology. Indeed, display of small objects could have bandwidth reductions of over 95 percent. While these equations demonstrate a simplified use case for the PDP, they provide a powerful argument for why the intra-frame variable refresh rate technology within the PDP has the potential to provide orders of magnitude better performance in bandwidth starved environments. In practice, many current projector systems struggle to achieve frame Rates above 200 hertz due to the inability to deliver data at higher rates when utilizing conventional display technology.

71 Multi-frame Rate Bandwidth Savings Slow Frame Fast Frame Proportion of Bandwidth Rate (Hz) Rate (Hz) Slow Pixels (p) Saved (%) 10 100 0.1 9.0 10 100 0.2 18.0 10 100 0.3 27.0 10 100 0.4 36.0 10 100 0.5 45.0 10 100 0.6 54.0 10 100 0.7 63.0 10 100 0.8 72.0 10 100 0.9 81.0 10 100 1.0 90.0 10 1000 0.1 9.9 10 1000 0.2 19.8 10 1000 0.3 29.7 10 1000 0.4 39.6 10 1000 0.5 49.5 10 1000 0.6 59.4 10 1000 0.7 69.3 10 1000 0.8 79.2 10 1000 0.9 89.1 10 1000 1.0 99.9

Table 7.4: Multi-frame Rate Bandwidth Savings

72 Now that the PDP design methodology, underlying protocol details, perfor- mance characteristics, and potential operation within a system have been explored. Discussion is shifted to the machine model upon which the PDP operates to show the generalized type of environment the PDP is envisioned to be utilized within in the future.

73 Chapter 8

MACHINE MODEL

This chapter describes the machine model upon which the PDP operates to motivate current and future use-cases. Each component is discussed in detail with a focus on how these components map to the operations within traditional IRSPs. Finally, it provides an example of how utilizing compositing and PDP in tandem could provide performance benefits.

8.1 Hardware Mapping This section discusses the different processes of an IRSP system in order to show how these processes can be mapped to an abstract PDP architecture that could be implemented with different kinds of physical hardware. The purpose is to show the reader that with the PDP the traditional IRSP system can be scaled to a multitude of different parallel setups. Traditional IRSP systems are composed of four distinct processes: scene gener- ation, non-uniformity correction (with data reordering), digital to analog conversion, and projection as indicated in Figure 2.4 in Chapter2. Scene generation and non- uniformity correction may or may not operate within the same hardware, but from an implementation perspective are segmented into a pipeline operation irrespective of the physical hardware as shown in Figure 8.1. Digital to analog conversion on the other hand is always performed with separate electronics that are located close to an array as indicated in Chapter 4.1. Given that these processes are separate, they can be mapped readily to a ma- chine model which represents an abstraction of an IRSP system utilizing the PDP as

74 Sc. Gen. Hardware NUC Hardware CSE Array

Non- Digital to Scene- Uniformity Analog Projec�on Genera�on Correc�on / Reordering Conversion

Sc. Gen. Hardware CSE Array

Non- Digital to Scene- Uniformity Analog Projec�on Genera�on Correc�on / Reordering Conversion

Figure 8.1: Hardware Mapping of an IRLED Projection Process

shown in Figure 8.2. The Abstract Machine Model (AMM) separates the system op- eration of an IRSP system into three main components scene generation, compositing, and display. Scene generation consist of image generation just as in traditional IRSPs; however, more than one scene generator can be utilized to generate partial imagery. Compositing is the process of combining partial images into full PDP frames as well as performing non-uniformity correction and data reordering. More than one compos- itor is allowed for the purposes of scaling. an IRLED Array tile represents both an array tile and its supporting electronics. Between the components exist abstract links which operate using the PDP. Compositor blocks and IRLED Array Tiles have a one- to-one relationship. The multiple CSE setup discussed in Chapter 5.1 is an example of this type of setup. There is prior work on parallel scene generation and compositing specifically for hardware in the loop scene generation [106, 107, 108]. The relationship between these components remains abstracted in such a way that hardware components may be scaled to fit demand. At its most basic, a single scene generator, compositor, and IRLED array tile may be used just as in a tradi- tional setup. For higher speed requirements, hardware components may be mapped as needed. As noted above, the links between components in the system utilize the PDP

75 Scene

Generator ...

IRLED Array Compositor

... Tile

...... Scene Generator

Scene ...

Generator ......

IRLED Array Compositor

Tile ...

Scene Generator

Figure 8.2: Abstract Machine Model of the PDP architecture with 1-to-N relation- ships between components

76 for communication, data-transfer, and synchronization. The compositing component differs from a traditional IRSP system in that it is responsible for taking imagery from many sources, possibly at different frame rates, and combining them into a single image for transmission to IRLED array tiles. This process is discussed in more detail in the following section.

8.2 Compositing This section discusses a compositing layer and analysis that could be imple- mented into a system utilizing the PDP. As noted previously, compositing is the process of combining two or more images into a single picture. In high-speed systems imagery could be generated in individual pieces and combined with a compositor layer. This process might be done by a single system or with separate systems. A general example of how compositing might look is shown in Figure 8.3. During the compositing process frame segments are ranked to determine which to send at high speeds, and which to send at low speeds for intelligent bandwidth utilization. Once segmented and ranked into non-overlapping speed classes, frame segments are transmitted at the necessary rate. Figure 8.4 shows the source segments overlayed with the image where one can see how each region maps to the image. Consider the above scenario in the case of a single scene generator. In this case, the compositor receives data from a single source. As frame data is received, a differencing algorithm can be employed to determine how to segment the overall frame for optimal data transfer based off the rate of change for individual portions of the frame relative to the prior frame. Portions that rapidly change are sent more often than portions that change slowly in order to maximize bandwidth for high-speed display. This consequently also has the effect of improving the performance of the analog chain by allowing for devices to reserve more time to drive rapidly changing portions of a display over slowly changing portions. This is discussed in more detail shortly.

77 Source Frame Data Segments

Legend:

High Speed

Medium Speed

Low Speed

Compositing Process

Display

Figure 8.3: Compositing Process Example

78 Legend:

High Speed

Medium Speed

Low Speed

Figure 8.4: Compositing Process Overlayed

In order to demonstrate how a differencing algorithm might be employed con- sider Figure 8.5 which shows a series of composited frames with segmented regions that could be sent as individual PDP draw region packets to an array. Each region is a representation of the average intensity of all pixels contained within that region. Darker regions indicate higher intensity and lighter regions lower intensity. During the compositing process of each frame, the average intensity of light output for each segment could be computed. The outlined segments for a given frame indicate a large change in intensity from the previous frame. The compositor could then assign a weight to indicate which regions have a large absolute change in intensity from frame to frame. This information could then be utilized for deciding which segments of an image to send at high-speed operation or low speed operation. Segments with larger changes in intensity would be higher priority than segments with lower changes. The PDP could be utilized to send these regions at a much higher frame rate than the more slowly changing segments to reflect the fast changes in intensity. In practice, an implementation would likely utilize a rolling weighted score where regions with higher weighted scores would be sent more frequently than regions with lower scores; with the

79 Frame 0 Frame 1

Frame 2 Frame 3

Figure 8.5: Average Intensity Map of PDP Regions for Composited Frames

remaining bandwidth devoted to lower regions. Regions that could not be updated in a given frame would then have their priority increased for the next frame. In a real system, frames would likely be segmented more finely than in this example, allowing for small segments to be dynamically transmitted, as necessary. This would then give the ability for fast changing data to update at rates far greater than a static fixed frame rate display would be capable of doing under the same hardware constraints. An added benefit is that by prioritizing segments, and decreasing the rates of slowly changing segments, more analog bandwidth can be utilized for drawing the high-speed portions of a display since the close support electronics no longer need

80 to devote DAC and amplifier resources toward drawing low speed regions as often. Ultimately this would result in improved image fidelity by giving additional settling time to rapidly changing regions of an array. Though only touched upon in passing in Chapter 5.1, analog performance is of primary concern in IRSPs because a higher performance analog chain results in more consistent thermal output and an overall more thermally accurate image. Low analog performance can result in the same intensity data having different results over multiple captured frames due to DACs, amplifiers, and emitters not having the time to completely settle in high-speed operation. Now that all of the protocol level and abstract architectural details of the PDP have been discussed, the following chapter moves toward a discussion of an implemen- tation of PDP on real hardware.

81 Chapter 9

IMPLEMENTATION

This chapter describes an FPGA implementation of the PDP architecture [109, 110, 111]. For purposes of this discussion, only relevant details are included to ease the readers understanding. First, the chapter discusses the purpose of the imple- mented architecture. Following this, it discusses the abstract high-level architecture of the firmware. Finally, it discusses the frontend and backend implementations of the firmware.

9.1 HDMI Transport Layer This section discusses utilizing HDMI as a transport layer for the PDP. HDMI was chosen for the first implementation of the PDP for several reasons. Firstly, current IRSP technology typically utilizes display protocols and the corresponding links so from the perspective of hardware compatibility a display-based protocol layer is ideal. Secondly, providing a backwards compatible implementation of the PDP is a design goal as discussed in Chapter 7.1. Thirdly, utilizing existing hardware and systems is much cheaper and quicker than designing new hardware. Fourthly, minimizing barriers to entry for users of the protocol is important. Fifthly, it allows for a direct comparison with the same hardware. Sixthly, the systems in my lab utilize HDMI directly. Recall that HDMI is a display protocol with blanking regions as discussed ex- tensively in Chapter 6.1. Display protocols have an Active Video region as shown in Figure 6.1. This region is made up of scanlines that represent rows of pixels. In normal HDMI operation, the active video region is a representation of all visible pixels. At the start of a frame, the first pixel is streamed to an HDMI decoder and translated to an analog voltage to physically drive the first pixel of a display starting from the top left

82 D D D D D D D D D D D D D D D D D D D D D D D D

D D D D D D D D D D D D D D D D D D D …

Figure 9.1: Normal Frame with Display Data

corner. Following this, subsequent pixels are streamed in and drive each pixel moving toward the right. When an hsync pulse occurs, the display moves to the next row and begins again. This is a typical rasterization process. Traditional IRSP technology uses display protocols quite similarly but reorders data depending on how the rasterization process differs for the projector hardware. Recall the array write process discussed in Chapter 5.1 which includes a transpose and data reordering. For purposes of this discussion let us put aside any data reordering, bit-packing, and blanking regions. Without loss of generality, drawing to an IRSP looks like Figure 9.1 where D represents an individual 24-bit pixel. Each D pixel would be decoded one by one as the data is streamed over HDMI to a decoder and mapped to physical pixels on an array. Now let us review the details of the PDP Draw Region packet shown in Fig- ure 9.2. Recall that it consists of an ID number, x start address, x end address, y start address, end y address, and trailing pixel data. The header data is denoted in yellow and the pixel data denoted in green. Recall additionally, that the PDP allows its internal word size to be implementation specific; as such, for HDMI it is ideal to use a 24-bit word size in order to allow for fixed width pixel decoder logic. Additionally, this allows for each PDP word to be directly embeddable within an HDMI display stream with a one-to-one mapping between an HDMI pixel and a PDP word.

83 1 XS XE YS YE D D D …

Figure 9.2: Draw Region Packet

1 XS XE YS YE D D D D D 1 XS XE YS YE D D D D D D D D 1

XS XE YS YE D D D D D D D D D D 1 XS XE YS YE …

Figure 9.3: Embedded PDP Frame

Extrapolating, PDP region packets can be embedded into a HDMI data stream as shown in Figure 9.3 where each PDP word would be represented as a 24-bit pixel in the HDMI data stream. However, the meaning of the pixels differs from a normal frame in that only some pixels are displayed to an array, and the location of displayed pixels is header dependent. As the pixel words are streamed in, they are decoded one by one and examined for a context. For example, at the start of the example stream, an ID word would be streamed and decoded, signaling to the firmware that a Draw Region packet is being streamed in. From there, the four header words representing the region size would be streamed in and stored for determining where to draw the pixel data that follows. As the pixel data is streamed in, the firmware would actively draw pixels in the corresponding locations on the array where indicated within the header. This process would continue until the end of a packet, and then the subsequent packet would undergo the same process. This effectively separates the location a pixel is drawn at from the location in which it exists within an HDMI stream which allows for pixels to be driven indepen- dently of the HDMI stream resolution. Moreover, pixels can be driven multiple times

84 Large Array Emi�er

Small Data Frame (not to scale) High Speed 1 XS XE YS YE D D 1 Low Speed XS XE YS YE D D D D

1 XS XE YS YE D D D

Low Speed … Low Speed

Figure 9.4: Embedded PDP Frame to Array Mapping

within an embedded PDP frame by specifying another packet with the same overlap- ping locations. This allows for pixels to be driven at a faster rate than the base HDMI resolution. In fact, with the separation of pixel draw location from stream location, a specific HDMI resolution is not even necessary. Smaller faster resolutions can be utilized to dynamically draw to arrays with larger resolutions as shown in Figure 9.4. Ultimately, this allows for costume tailored modelines with different resolutions and base frame rates to be utilizable with the PDP over HDMI. Moreover, these benefits would hold for other display protocols if used as a PDP transport layer such as Dis- playPort, and would allow the PDP to be utilized by replacing the HDMI decoder hardware with other off the shelf decoder components without requiring changes to the PDP decoding logic or implementation, thus easing hardware requirements and vendor lock-in for PDP based implementations. For backwards compatible operation with normal HDMI, the reader should note that a normal HDMI frame could be expressed as a single draw region packet with a header at the start the start of a frame. For purposes of backwards compatibility, it is not necessary to send the header over HDMI itself. Instead, the header information could be sent as configuration information to the PDP firmware and the HDMI data stream treated as normal display data. This allows for the same decoding and array

85 write logic to be used both for embedded PDP frames and normal HDMI frames without requiring a separate implementation. Thus, users can operate the firmware in either a backwards compatible mode or normal PDP mode of operation. The next section moves to a general discussion of the PDP firmware layout.

9.2 Abstract Architecture There are many important considerations when implementing a PDP based firmware for IRSPs. These range from pulling data into the FPGA over physical pins to clock domain crossing and timing issues. HDMI itself operates with a given pixel clock that is specified on the source side of the connection (at the machine generating the HDMI stream), meaning that the PDP firmware needs to support configurable HDMI speeds. The decoding of the HDMI stream itself is performed by off the shelf HDMI cards as noted in Chapter 4.1; there- fore, the PDP firmware need not directly handle the protocol. Instead the firmware need only handle individual pixels at the pixel over the two inputs simulta- neously. The array itself requires a specific timing for signaling as well as enough time to completely settle any analog data lines; otherwise, pixels flicker due to metastability issues with the signaling. This means that the HDMI clock cannot be directly utilized for timing array writes without introducing severe complications for correct timing. In fact, older non-PDP firmware implementations operated this way which resulted in image quality being negatively impacted at high speeds. In order to correct for these issues and guarantee stable operation at high- speed, the PDP firmware implementation instead opted to isolate firmware operation into separate clock domains as shown in Figure 9.5. Ideally, there would only be a single write and a single read clock domain where the write clock domain uses the input HDMI pixel clock, and pixel data is buffered over an asynchronous boundary to the read clock domain. The read clock domain would operate at a static predefined speed that is faster than the write clock domain in order to ensure that under normal operation data

86 Write Clock Domain Read Clock Domain

Input 0

Write Firmware Async Boundary Read Firmware Logic Logic Array Input 1

Figure 9.5: Abstract PDP Firmware Backend Architecture

cannot be lost. It would be responsible for decoding HDMI pixel data into headers and array data, in order to drive the signaling of an array directly. In practice, the write clock domain is separated into two separate clock domains for each HDMI input so that the HDMI input clocks do not need to be perfectly synchronized. As discussed in Chapter 10.1.2, a correct and performant asynchronous boundary proved challenging in practice due to metastability issues and variability in FPGA synthesis from build to build.

9.3 Frontend Architecture This section discusses the PDP frontend architecture. This is the portion of the firmware that is controllable and allows for user configuration. Recall from Figure 4.5 in Chapter 4.2 that CSE communication utilizes the CVORG protocol which is brought into a MicroBlaze soft-processor. The PDP firmware opted to retrofit this layer for its own configuration. Essentially, the frontend architecture works by decoding CVORG protocol commands which in turn set internal PDP registers which are read by the backend architecture for configuration. These include such things as controlling the length of time an individual array write occurs for, delays in the signaling to an array, and whether to operate the firmware in backwards compatibility mode where header data is read from internal PDP registers (called AXI mode) or PDP stream mode where header data is embedded into HDMI streams. Select APIs are shown in Table 9.1.

87 API Description get axi dimensions Gets AXI mode dimensions as 32-bit unsigned integers. get axi mode Gets AXI mode. 0 is stream mode. 1 is backwards compat mode. get post write ticks Gets the post write duration for PDP. get state Gets current firmware state information. get write ticks Gets the delay and duration for pixel writes in PDP. set axi dimensions Sets AXI dimensions. set axi mode Sets AXI mode. 0 is stream mode. 1 is backwards compat mode. set post write ticks Sets the post write duration for PDP. set write ticks Sets the delay and duration for pixel writes in PDP.

Table 9.1: PDP Select Communication APIs

9.4 Overall Backend Architecture The implemented backend architecture consists of the portion of the AMM that drives an IRLED tile (or emitter array) directly from data packets sent by a compos- itor. As such, it is responsible for receiving PDP packets, decoding, validating, and drawing them to an array. This is shown in Figure 9.6. In the current implementa- tion, packets are sent using an underlying HDMI protocol layer. The incoming data is synchronized across two distinct clock domains utilizing a synchronized circular buffer (SCB). The input side consists of two separate HDMI inputs to meet system band- width requirements. Each input is assumed to contain clock skew relative to the other so separate SCBs are used to synchronize these to the system domain. At a high-level, individual data words of 24-bit sized values come in each HDMI clock cycle. These are transitioned to the system domain and stored for retrieval by the array emitter mod- ule. The array emitter module is responsible for bringing in each 24-bit word value and emptying the corresponding SCB slot. As it brings in each word, it begins to decode them into PDP commands. Once enough data is buffered for a command, the data is sent to the write buffer module which then drives an emitter directly through I/O lines.

88 Figure 9.6: Overall PDP Backend Architecture

9.5 Synchronized Circular Buffer Figure 9.7 shows the submodule details of the synchronized circular buffer uti- lized within the implementation. It is used to handle clock domain crossing for data sent from the HDMI domain to the System domain. Internally, it consists of two controllers, two data routers, and the actual internal buffer storage with a built-in synchronizer circuit. The details of each is explained in following sections.

9.5.1 Controllers The write controller is used to coordinate which internal buffer to write pixel data to. At a given clock cycle, when a pixel is streamed in, the write enable is used as a write trigger which causes the write controller to immediately set the current buffer to full and select the next buffer. On the same clock cycle, the incoming pixel data is stored in the current buffer. In practice, this process repeats for each HDMI clock cycle when write enable is high. The write router performs the actual data redirection based off the buffer selected by the write controller.

89 HDMI System Domain Domain Synchronized Circular Buffer HDMI Wr_en Write_Trigger Write Read Empty_Trigger Controller Controller Valid Array Emitter Set_Full Sel Set_Empty Sel Data HDMI Data Pixel Data Write Synchronized Read SynchronizedSynchronized Router Buffer Router BufBufferfer

Figure 9.7: Synchronized Circular Buffer Architecture

The read controller behaves similarly to the write controller but is used to coordinate the reading of stored data from the internal SCB buffers. A valid signal is used to indicate whether a buffer is filled. If valid is low, then the output of the data signal is undefined. In operation, an array emitter watches for a buffer to be filled then once it is, the array emitter decodes the data and sends an empty trigger, causing the read controller to set the selected buffer to empty and select the next buffer or slot. The array emitter then polls the valid signal for the next buffer and the process repeats. The read router performs the actual data redirection based off the select signal controlled by the read controller.

9.5.2 Routing Figure 9.8 shows the internals of the synchronized buffers used by the SCB with attached data routers. While only four buffers are shown for demonstration purposes, the number of buffers is compile-time definable in the implementation. In the system, both the writer and reader have independent views of the SCB state and their own individual buffer selection. As mentioned previously, data is mapped based off the select signals. These are used to control to which buffer external lines are mapped. The empty flag is used to determine if a buffer is empty. The full flag is mapped as a valid flag to the array emitter which uses it to determine if a buffer is full. The set full

90 Sel Sel Synchronized Buffer empty_flag set_empty Memory set_full full_flag Synchronizer Write Router Read Router hold data_out data_in Shift Register

Figure 9.8: Synchronized Internal Buffer Architecture

and set empty signals are used to change the state of internal buffers. The hold signals are used to hold any shift register contents when a buffer is not selected. Any unselected shift registers automatically hold their contents. A selected shift register conditionally holds its contents if a write trigger is not occurring and updates its contents otherwise. Data into the SCB is selectively routed to each shift register to allow for the data to be stored and data out of the SCB is selectively routed to an array emitter for decoding.

9.5.3 Internal Buffer and Memory Synchronizer At high-level, internally, a double request-acknowledgment handshake is used to ensure the buffer state has transitioned correctly across clock domains and is available on the other side. This is shown in Figure 9.9. Once data becomes available, the read router outputs the data lines as well as a valid signal to indicate that the data can be read and cleared. The read controller clears it once an empty trigger is sent from the array emitter. The SCB then select the next buffer. Once the last buffer is written or read by either controller, the first buffer is selected again. Figure 9.10 shows a register-transfer level (RTL) design of the circuit. Set full and set empty are the control signals for the circuit to allow it to transition from empty to full and full to empty, respectively. The set full signal is controlled by the write domain and the set empty signal is controlled by the reader domain. For correct operation, set full can only be toggled for a single write cycle when the empty flag is

91 Write Read Domain Domain

set_full empty_flag ↓ Req ↑

full_flag ↑

Stable set_empty Ack ↑ full_flag ↓

Req ↓

Ack ↓

empty_flag ↑

Figure 9.9: Synchronizer Double Handshake

high and set empty can only be toggled for a single read cycle when the full flag is high. If the set operations are performed during other times, then the circuit behavior is undefined. Additionally, the read clock must be faster than the write clock in order to guarantee that buffer overrun cannot occur. To eliminate metastability issues with clock domain crossing, the circuit is built upon the two flip-flop synchronizer shown in Figure 9.11. This is a commonly utilized technique in hardware descriptor language (HDL) based designs because it behaves cor- rectly as long a design is able to meet timing constraints [112, 113]. Other techniques that involve utilizing combinational logic with clocks directly are prone to failure due to variability in synthesis from run to run. The writer and reader employ the two flip-flop synchronizer to transition state across clock domains for the request and ac- knowledgement bits, respectively. Data within the shift register is guaranteed to have stabilized by the time the request is seen in the read domain since it takes at least two

92 writer clock

ack ack int ack set set empty register register latch empty flag reset

full flag reset set full set req req int req latch register register reader clock

Figure 9.10: Full/Empty Memory Synchronizer Circuit

data data data data register register register aclk bclk Figure 9.11: Two Flip-flop Synchronizer

write clock cycles for the request bit to transition. Table 9.2 shows the state transitions of the circuit with relative time moving from left to right. Note, the table does not show per cycle transitions but instead the relative order over time as bits transition across clock domains. For a per cycle view of the synchronizer’s behavior, see Figure 10.1 in Chapter 10. Green denotes stable output that does not change until the circuit is transitioned through set full or set empty line toggles. Orange denotes transitioning internal state bits. The green ellipses indicate an unknown amount of time between stable states due to no changes to the set full and set empty lines. The dark orange ellipses indicate an unpredictable amount of time for metastable transition of bits across unsynchronized clock domains to internal registers on the other side. When a buffer is filled by the writer and the set full signal is toggled, a request is initiated by raising the req signal. This request causes the empty flag to lower. The req signal then transitions from the write clock domain to the read clock domain. Once it arrives, the full flag is raised to indicate data is available.

93 S Wr Domain S Rd Domain Wr Domain Rd Domain e � e � � � Signal�Stable � t Rd Domain �Stable� t Wr Domain Rd Domain Wr Domain�Stable� set_full 0 … 0 1 0 … 0 0 … 0 0 0 … 0 0 0 … 0 0 0 … 0 0 … 0 set_empty 0 … 0 0 0 … 0 0 … 0 1 0 … 0 0 0 … 0 0 0 … 0 0 … 0

req latch 0 … 0 0 1 … 1 1 … 1 1 1 … 1 1 0 … 0 0 0 … 0 0 … 0 req int reg 0 … 0 0 0 … 1 1 … 1 1 1 … 1 1 1 … 0 0 0 … 0 0 … 0 req reg 0 … 0 0 0 … 0 1 … 1 1 1 … 1 1 1 … 1 0 0 … 0 0 … 0

ack latch 0 … 0 0 0 … 0 0 … 0 0 1 … 1 1 1 … 1 1 0 … 0 0 … 0 ack int reg 0 … 0 0 0 … 0 0 … 0 0 0 … 1 1 1 … 1 1 1 … 0 0 … 0 ack reg 0 … 0 0 0 … 0 0 … 0 0 0 … 0 1 1 1 1 1 … 1 0 … 0

empty flag 1 … 1 1 0 … 0 0 … 0 0 0 … 0 0 0 … 0 0 0 … 0 1 … 1 full flag 0 … 0 0 0 … 0 1 … 1 1 0 … 0 0 0 … 0 0 0 … 0 0 … 0 Table 9.2: Full/Empty Memory Synchronizer State Transitions

When the data is read by the reader domain and the set empty signal is toggled, an acknowledgement is initiated by raising the ack signal. This acknowledgement causes the full flag to lower. The ack signal then transitions from the read clock domain to the write clock domain. Once it arrives, the req signal is lowered to indicate the acknowledgement has been received. The req signal then transitions from the write clock domain to the read clock domain. Once it arrives, the ack signal is lowered to indicate the lowering of the request has been received. The ack signal then transitions from the read clock domain to the write clock domain. Once it arrives, the empty flag is raised to indicate that the buffer is now empty. The following HDL code listing is provided to supplement the discussion above:

1 library IEEE; 2 use IEEE.std_logic_1164.all; 3 library UNISIM; 4 use UNISIM.VComponents.all; 5 6 entity memory_synchronizer is 7 port( 8 -- wr domain in: 9 wr_reset: in std_ulogic; -- global reset 10 wr_clk: in std_ulogic; -- write clock 11 set_full: in std_ulogic; -- write set_full trigger

94 12 -- wr domain out: 13 empty_flag: out std_ulogic := '1' -- empty flag 14 -- rd domain in: 15 rd_reset: in std_ulogic; -- global reset 16 rd_clk: in std_ulogic; -- read clock 17 set_empty: in std_ulogic; -- read set_empty trigger 18 -- rd domain out: 19 full_flag: out std_ulogic := '0'; -- full flag 20 ); 21 end entity; 22 23 architecture Behavioral of memory_synchronizer is 24 signal req: std_ulogic := '0'; -- request write domain 25 signal req_int: std_ulogic := '0'; -- request buffer 26 signal req_reg: std_ulogic := '0'; -- request read domain 27 signal ack: std_ulogic := '0'; -- acknowledgement read domain 28 signal ack_int: std_ulogic := '0'; -- acknowledgement buffer 29 signal ack_reg: std_ulogic := '0'; -- acknowledgement write domain 30 begin 31 empty_flag <= req nor ack_reg; 32 full_flag <= req_reg and not ack; 33 34 process (wr_clk) 35 begin 36 if (rising_edge(wr_clk)) then 37 -- transition acknowledgement to write domain. 38 ack_int <= ack; 39 ack_reg <= ack_int; 40 if wr_reset= '1' then 41 req <= '0'; 42 ack_int <= '0'; 43 ack_reg <= '0'; 44 else 45 if set_full= '1' then -- set full to signal request. 46 if req= '0' then 47 req <= '1'; 48 end if; 49 end if; 50 if ack_reg= '1' then -- if reader transitioned ack high 51 req <= '0'; -- then transition request low. 52 end if; 53 end if; 54 end if; 55 end process; 56 process (rd_clk) 57 begin 58 if (rising_edge(rd_clk)) then 59 -- transition request to reader domain. 60 req_int <= req; 61 req_reg <= req_int; 62 if rd_reset= '1' then 63 ack <= '0';

95 64 req_int <= '0'; 65 req_reg <= '0'; 66 else 67 if set_empty= '1' then -- set empty to signal ack. 68 ack <= '1'; 69 end if; 70 if req_reg= '0' then -- if writer transitioned request low 71 ack <= '0'; -- then transition ack low. 72 end if; 73 end if; 74 end if; 75 end process; 76 end architecture;

9.6 Array Emitter Figure 9.12 shows the details of the array emitter module used in the imple- mentation. The logic for handling the PDP is encapsulated completely here. An array emitter module is responsible for decoding the words of PDP packets and sending pixel data or array reset requests to the write buffer which either draws to the array or resets the array depending on the requested operation. Internally, the array emitter consists of individual state machine controllers that each handle a particular packet type. The current implementation includes a write enable controller, which controls decoding of draw region packets; a reset controller, which controls the array reset process; and a valid controller, which does validation checks on the input to verify the correctness of packets discarding any unknown data. Each controller takes in similar input signals and produces similar output signals with some exceptions depending on the actual function of the individual controller module. On the input side, each controller takes in a valid and data line corresponding to an individual word of a PDP packet. Additionally, an active line is brought into each controller to indicate whether another controller module is active or not. This is used to ensure that other modules do not become active and attempt to decode data not meant for them while another is currently processing a packet. During an idle phase, the write enable and reset controllers wait for a corresponding packet ID to come in

96 Select Empty_Trigger Array Emitter Busy

Active Write Enable Controller

Write Valid Synchronized Data Ctrl Buffer Reset & Circular Buffer Controller Data

Valid Controller

Active Active In Out

Figure 9.12: Array Emitter Architecture

to begin operation. The details of the controllers’ state machines are discussed in the following section.

9.7 State Machines Figure 9.13 shows a block diagram of the PDP state machines for the draw region and array reset controllers. Each cycle of the system clock, an array emitter attempts to read a word of data stored within an SCB. For simplicity, an example draw region packet is shown. When a packet ID matching an operation handled by a given controller arrives, the corresponding controller switches states and then waits for the rest of the incoming packet data to arrive. For example, if a draw region packet ID arrives, then the draw region state machine waits for the X start address, X end address, Y start address, and Y end address to arrive word by word by performing a check for valid data each system clock cycle. Finally, after data arrives and the header is decoded, the state machine

97 Data Signals

1 XS XE YS YE D D D …

Array Emi�er State Machines

Read Command From Data Stream

Draw Region Array Reset

Load Reg Load Reg Write Load=0 X_end X_start

Load Reg Load Reg X_start Y_end Write Load=1

Load Pixel Write Data Data

Figure 9.13: PDP State Machine

loads the necessary pixel data for a write and moves to the write data state once all data has been buffered. In this state, it sends the data to the write buffer. If more data needs to be written for the PDP packet, it then continues buffering data, and waits for the write buffer to be idle to send the next set of data. This continues until all data is written for the packet, finally proceeding to the idle state. For more incite, recall that Figure 7.2 in Chapter 7.4 shows an example PDP stream being decoded cycle by cycle. This state machine is an implemented version of the decoding process shown there. The reset controller contains similar logic. When a packet ID matching a reset packet arrives, the reset controller begins the array reset process. Recall that array resets are a system specific process. For NSLEDS, an array reset requires enabling

98 all quadrant bits, driving the load signal low, and driving the array reset signal low. Following this, it requires enabling all quadrant bits, driving the load signal high, and driving the array reset signal high. The reset controller does this by sending a request for both operations to the write buffer and transitions back to an idle state afterwards. For more information about these signals see Chapter 5.1. The valid controller (not shown) is used to ensure that incorrect or corrupt packet data is cleared. If an invalid packet command arrives during an array emitter idle phase, the valid controller empties the corresponding SCB slot.

9.8 Write Buffer Figure 9.14 shows the details of the write buffer’s inputs and outputs along with important signaling to other modules. The write buffer is responsible for driving the I/O pins that are passed to the array. At a high level, the write buffer is responsible for coordinating writes between the two array emitters shown. Generally, this is done by handing off write buffer privileges every other write through the select line in order to ensure starvation does not occur for either HDMI input. The busy line is used to indicate when the write buffer is in the process of writing to the array in order to ensure that array emitter modules do not attempt to send new data to the write buffer while it is currently active. For input, the write buffer takes bits representing the signaling I/O to the RIIC. This includes all of the output signals discussed in Chapter 5.1. Configuration information is sent to indicate for how long a write or reset should occur as well as when during the write process specific signaling should transition high and back to low. These signals are configurable through the frontend APIs discussed in Chapter 9.3. It is worth noting that in practice, this timing is array specific but must be long enough for the analog signaling to settle. The following chapter describes the experimental results from implementing, testing, and running this PDP based firmware on NSLEDS and HDILED arrays.

99 Figure 9.14: Write Buffer Architecture

100 Chapter 10

EXPERIMENTAL RESULTS

This chapter discusses a number of experimental results obtained using the implemented PDP firmware discussed in Chapter9. Only select results are included. These were chosen because they provide direct evidence that the protocol and PDP firmware solve the challenges put forth in Chapter3 and meet the design goals discussed in Chapter 7.1. Firstly, memory synchronizer operation is discussed and evaluated because cor- rect clock domain crossing is an essential requirement for array operation. Secondly, overall firmware behavior is evaluated in both simulation and practice. Thirdly, packe- tized operation is evaluated. Finally, overall results are discussed to provide a summary of lessons learned during operational testing.

10.1 Memory Synchronizer This section describes the various tests performed on the full/empty memory synchronizer used within the PDP firmware to synchronize data transitioning across clock domains. First, it discusses simulation results obtained through a full behavioral simulation of the PDP firmware. Secondly, it discusses using the memory synchronizer in a test setup and the pitfalls that occurred with earlier implementations.

10.1.1 Simulation Figure 10.1 shows the simulated results of the full/empty memory synchronizer when transitioning to full and back to empty for a writer and reader operating at 35.75 megahertz and 200 megahertz, respectively. For discussion about each signal and

101 Figure 10.1: Behavioral Simulation of a Single Pixel Being Buffered Through the Full/Empty Memory Synchronizer

expected internal behavior see Chapter 9.5.3. Each state transition in the simulation should look similar to the state transitions shown in Table 9.2. The circuit holds a steady state until the set full signal is toggled for a single write clock cycle. On the next write clock cycle, the req signal transitions high and the empty flag transitions low. Then the data is moved to the read clock domain using the two flip-flop synchronizer scheme discussed in the implementation details. At the next rising edge of the read clock, the data is latched into req internal which is an internal metastable register. Due to the speed difference between write and read clock speed, the data latches into the register quickly. Following this, a cycle later, the data is clocked into req reg which completes the read clock domain transition causing the full flag to transition high. Once any corresponding buffer data is handled by the PDP firmware, the set empty signal is toggled for a single read clock cycle. This causes the ack signal to transition high. Then the data is moved to the write clock domain using the two flip- flop synchronizer scheme discussed in the implementation details. At the next rising edge of the write clock, the data is latched into ack internal. Following this, a cycle later, the data is clocked into ack reg which completes the write clock domain transition. Notice that it takes much longer to transition data from the read clock domain to the

102 write clock domain than vice versa. This is due to the slow write clock. A write clock cycle later, req transitions low. Subsequently, it transitions to the read clock domain just as before. Following this, the ack line is lowered and transitions to the write clock domain just as before. At the same time, the empty flag transitions high, and the circuit is in the state it began in. This follows the expected state transitions from Table 9.2 with the only notable distinction being that transitions across clock domains take different amounts of time depending on the sources clock speed. In practice, a complete double handshake takes many cycles to complete which could lead to undesired overflow behavior. In PDP, overflow issues are mitigated by allocating enough internal buffers to allow for there to always be a buffer available for a writer to fill. Additionally, the use of the two flip-flop synchronizer means that it takes more than a single read clock cycle for the full flag to raise; however, given that the pixels of an IRLED array take many read clock cycles to charge, this does not pose a problem. Moreover, in practice, the full flag transitions in between two to three cycles of the reader clock depending on where the rising edge of the reader clock falls relative to the transitioning of the req signal. Figure 10.2 shows the simulated results of the full/empty memory synchronizer with multiple pixels being buffered through it for a writer and reader operating at 35.75 megahertz and 200 megahertz, respectively. The large gaps of time between transitions to full are due to there being multiple buffers available as discussed above. This indicates that at the simulated rates no overflow is occurring. In practice, it would be possible for overflow to occur if the write clock speed were close to the read clock speed; however, due to the maximum possible write speed for HDMI only being 148.5 megahertz and the fact that clock domain crossing latency is hidden through the usage of multiple buffers, a 200 megahertz input clock is sufficient for correct behavior.

103 Figure 10.2: Behavioral Simulation of Multiple Pixels Being Buffered Through the Full/Empty Memory Synchronizer

10.1.2 Experimental Testing Testing the memory synchronizer on real hardware was an important consid- eration because behavioral simulations do not capture timing behavior accurately. In real hardware, noise, clock jitter, and variations in synthesis introduce unpredictability that can lead designs to fail unexpectedly if they have inherent metastability issues. Testing was performed using a ZYBO development board [114] hooked up as shown in Figure 10.3. Image data consisting of 256 by 256 pixels operating at 250 hertz was sent over from a test PC to a ZYBO board running an implementation of the synchronized circular buffer. VGA was used to provide loop-back for visual inspection. It is not possible to check for bit-level accuracy using a VGA device due to the analog nature of the interface so a UART device was used as an alternative (slow) path to read out internal image data. It was also used to collect statistical information such as the error bits for a given configuration and the amount of data that passed through the interface. For the first test, the synchronized circular buffer circuit was driven using the checkerboard input image shown in Figure 10.4. This pattern was routed to the loop- back device for visual inspection. It was manually retrieved over UART and diffed with the original image to check for bit-level accuracy.

104 VGA Loop-back

HDMI Test PC ZYBO

UART TX/RX Figure 10.3: ZYBO Test Setup

Figure 10.4: Checkerboard Input Image

For the second test, a dynamic CRC-like pattern was used to verify data ac- curacy directly. 4096 segments of 8 consecutively ordered randomly generated pixel values were inserted into HDMI stream twice as shown in in Figure 10.5. The pattern dynamically changed for every input frame. Figure 10.6 shows an example pattern. Internally, the test circuit would compare the results of the segments and log errors. UART was used to retrieve the error count. The initial synchronizer circuit employed within the design was able to pass behavioral simulation but during the experimental testing phase exhibited bit-level er- rors. Upon careful analysis, it was discovered that the design contained metastability issues due to how the synthesizer routes signals during synthesis. The early design employed a clock switching strategy within the memory synchronizer circuit. Essen- tially, the clock of the circuit was switched depending on whether the circuit was full or empty. In practice, this would randomly result in bit corruption depending on how the circuit synthesized due to variability in the routing process. As a consequence,

105 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7

Figure 10.5: CRC-like 16-pixel Stream

Figure 10.6: CRC-like Input Image

the timing delays between components would change unpredictably between different synthesized runs and ultimately led to a metastable clock handoff which resulted in the circuit entering an inconsistent state. The final design discussed in this dissertation removes the clock switching circuitry and instead employs the double handshake design discussed in Chapter 9.5.3 in order to guarantee correct operation. The test patterns discussed above were able verify this.

10.2 Firmware This section discusses behavioral simulations of the PDP firmware implemen- tation and experimental results obtained from running the firmware on actual IRLED arrays. The simulations serve to verify correct operational logic and the experimental results serve to validate the timing behavior of the implemented circuits.

10.2.1 Simulation This section provides captures of a few simulation inputs and outputs to show how packets arrive and are processed by the PDP firmware architecture. Other iso- lated module and integrated simulations (not shown here) were performed to access

106 Figure 10.7: Simulation of Single HDMI Input

behavioral operation within the PDP firmware circuitry. Additional module level unit testing was performed as well. In Figure 10.7, simulated HDMI is shown. When video data enable (write enable) transitions high words of data representing PDP packets start to stream in. These are indicated by packet ID, X start, X end, Y start, Y end, and Packet Data. Each word would be stored in an SCB slot as indicated in the previous section. The final piece of data indicated is a reset packet ID. Figure 10.8 shows the final output which would be passed from the write buffer to an array in a real system. Highlighted in red is data for drawing a segment on the array. Note, all values out are up shifted by 5 bits to be received by the DACs in the system. Additionally, the values are shown in reverse order from the input diagram. For example, 992 corresponds to the value of 31 on the input side. In purple an array reset is shown with two stages of array writes. In the first stage, the load signal transitions to low. In the second stage the load signal transitions high. The states shown are the transitioning stages of the write buffer.

10.2.2 Characterization This section shows select results of running the PDP firmware on an NSLEDS array to characterize its behavior. Namely, whether it correctly draws to an array and whether it performs better than earlier firmware incarnations. Some non-uniformity corrected imagery is also shown which indicates the PDP firmware is capable of drawing

107 Figure 10.8: Simulation of PDP Output

full imagery correctly. Though, only a few results are shown here, the PDP firmware has been run on multiple NSLEDS and HDILED arrays to collect data and characterize the arrays themselves. As of this writing, it has been used in daily operations on my research group’s IRLED arrays for over a year and is the primary firmware in use today.

10.2.2.1 Array Maps In order to characterize whether the behavior of the PDP firmware is consistent with older firmware (called SNAP) and has reproducible results between runs, a number of different array characterization tests were performed on an NSLEDS array. In one test, an array map is collected which is performed by doing a sweep of array pixels by outputting light to each pixel separately over multiple frames and capturing the result. Typically, this is done using grids to speed up data collection. An example grid for a partial frame is shown in Figure 10.9. Defective pixels on an array have low light output or have inconsistent brightness from run to run. This could also happen if a firmware contained internal timing issues as well. Comparing a test firmware with a known good reference firmware can rule out firmware issues in the advent of this type of poor array behavior. Figure 10.10 shows an array map collected from the PDP firmware and Fig- ure 10.11 shows an array map collected from the older SNAP firmware. In each figure, array data collected using an IR camera is shown on the left with a camera count scale

108 Figure 10.9: Pixel Sweep Grid on the right. Camera counts are an uncalibrated measure of IR intensity that can mapped to apparent temperature when calibrated. At a glance, each image appears to be the same; however, the PDP firmware has a slightly higher light output. The dark regions of each map are physically dead pixels on the array. More than half of the particular array is nonfunctioning. The repeating patterns of horizontal lines that can be seen are the result of slow channels on the array. Every 32 columns there are 2 channels which take longer to settle and that can be seen in these results, as these channels have not fully settled. This is due to a design flaw on early arrays which resulted in higher capacitance on those lines. Overall, the PDP firmware results are as expected and indicate correct operation.

10.2.2.2 Fractional Difference Maps Another test is to collect multiple array maps and take the fractional difference between them to see the percent difference in light output between runs. This allows for the consistency of light output to be characterized. A large difference could indicate problems with the pixels of an array, analog chain, or firmware behavior. Small differ- ences indicate that the pixels of an array are more consistent between runs. Differences in behavior between firmware (or firmware revisions) would be indicative of variance in

109 Figure 10.10: Array Map of PDP Firmware Output

signal timing. In this case, a larger variance would indicate less precise timing within the firmware. Some variance is expected is due to camera focus, noise, and physical vibration. The particular IR camera used to collect data in this setup utilizes a motor- ized reversed-stirling cryo-cooler. These are often utilized for cooling detectors within thermal imaging systems [115]. These induce a physical vibration which can result in noise within collected camera imagery. Figure 10.12 shows a fractional difference map for the PDP firmware and Fig- ure 10.13 shows a fractional difference map for the older SNAP firmware. The dark

110 Figure 10.11: Array Map of SNAP Firmware Output

blue areas indicating zero difference are dead regions of the array similar to the ear- lier results. The red areas of high variance are due to those pixels being located on marginal portions of the array specifically along non-functioning edges. In practice for displaying imagery, these areas would be marked as defected and not driven. The average variance for both the SNAP and PDP firmware is less than 4 percent. However, the PDP firmware has less variance in per pixel light output per run as indicated by the darker blue colors in the working portions of the array compared to the SNAP firmware. This is likely due to the static 200 megahertz system clock used

111 Figure 10.12: Fractional Difference Map of PDP Firmware Output

for driving an arrays pixels combined with the two flip-flop synchronization strategy utilized within the PDP firmware that is discussed in Chapter 9.5.3. The earlier SNAP firmware used an HDMI clock signaling based approach to time triggering of events on the system side. Given that the clock is an input from an external HDMI card, it most likely has some noise. Another factor that may contribute to the larger variance is the fact that most functionality within SNAP used only the first HDMI card clock and ignored the second HDMI card clock. The crystal oscillators used for generating clock signals within devices typically have a finite precision due to various contributing

112 Figure 10.13: Fractional Difference Map of SNAP Firmware Output

factors such as thermal noise, power supply variations, and aging [116]. This means that oscillators within different physical devices may vary resulting in clock skew. Any inherit skew in oscillation could be further compounded by the fact that each HDMI card’s data lines travel physically different trace paths to arrive within the FPGA. The PDP firmware on the other hand, makes no assumptions about input clock skew. The Memory synchronizer utilized within explicitly synchronizes the data of each HDMI card separately across clock domains to the system clock domain. Then it explicitly pulls data from each synchronized circular buffer independently. Overall, the behavioral

113 Figure 10.14: Still Image Capture of a Non-uniformity Corrected Image from the PDP Firmware Operating at 100Hz results are promising for the PDP firmware.

10.2.2.3 Non-uniformity Corrected Imagery Figure 10.14 shows an example of non-uniformity corrected IR imagery gener- ated using the PDP firmware on an NSLEDS array. This particular array has much better yield than the one used for some of the earlier results presented in this chapter and is capable of displaying full images at a 1024 by 1024 resolution. Figure 10.15 shows a grid being displayed on the same array. The lower right- hand corner of the array has physical defects, but the majority of the array operates correctly. Interspersed in between the working pixels are a couple of dead pixels here and there. These look worse in a static grid map than in practice, as the brightness of neighboring pixels effectively hides individually defective pixels within an array. This is true as long as multiple pixels within a small region of an array are not defective as well. This is why Figure 10.14 shows no obvious dead pixels even though individual pixels are defective within the region that the image is displayed on.

114 Figure 10.15: Still Image Capture of a Grid from the PDP Firmware Operating at 100Hz

10.2.2.4 Analog Bandwidth Figure 10.16 shows the same IR image drawn at two different frame rates using the PDP firmware operating in backwards compatibility mode. In this mode, normal HDMI data is sent to an array at full resolution. At 100 hertz operation, the image quality is much more uniform across pixels despite having low dynamic range when not non-uniformity corrected. At 400 hertz operation, there are distinct banding patterns that are not visible at lower frame rates. These banding patterns are the result of lim- ited analog bandwidth due to the time it takes for the analog portions of the system to settle which includes DAC settling time, amplifier settling time, RIIC internal timings, and emitter charge time. Under normal HDMI operation, it is difficult to correct for these issues at high-speed without introducing complicated compensation schemes such

115 Figure 10.16: Comparison of Imagery Captured from the PDP Firmware Operating at 100hz and 400hz

as analog or digital pre-emphasis filtering21. Utilizing a packetized approach to display can help mitigate issues with high- speed display by providing more time for changing regions of a display to settle. Con- sider a 1024 by 1024 display operating at high speed. When packetized operation is utilized, ideally only subregions of the display are updated at high-speed and the por- tions of the array operating at low-speed are addressed at much slower rate; and thus, use much less analog bandwidth. The exact amount of analog bandwidth saved would depend on how much of the imagery needs to run at high-speed; however, this could be on the order of a 75 percent reduction in cases of low-speed backgrounds with small high-speed objects. This means that portions updating at high speed can be driven for longer amounts of times to allow for complete settling.

10.3 Packetized Operation This section shows various examples of the PDP firmware running at low-speed and high-speed operation to demonstrate the capabilities and versatility of a packetized

21 Analog and digital pre-emphasis is used in many types of applications [117, 118, 119, 120, 121].

116 Figure 10.17: Still Image Capture of Counting Numbers from the PDP Firmware Operating at 60Hz

display protocol. Additionally, multiple frame rate operation is demonstrated at high- speeds to show the capabilities of packetized operation.

10.3.1 Normal-speed Figure 10.17 shows a series of counting numbers operating at low speeds. These were part of some of the initial demos used for testing packetized operation. This demo, in particular, ran at a number of different rates ranging from 60 hertz to 240 hertz operation. It was used as an initial verification of the firmware’s packetized mode of operation. Other imagery included moving objects across the array at low-speed and mov- ing changing numbers. Of particular interest was analyzing the boundaries of each packet for artifacts in overlapping packet regions and verifying that moving objects were not exhibiting partial display of previous packet data as these behaviors may indicate incorrect firmware operation. In initial testing, some implementation chal- lenges included ensuring that the correct packet data was being sent from the driving computer to the CSE and verifying that the correct pixels were being addressed from within the firmware. The former was addressed by attaching the HDMI output links to loop-back capture devices and meticulously analyzing the output to ensure both the data ordering described in Chapter 5.2 was being abided by, and that packet headers were being correctly specified. The latter was addressed by using an oscilloscope to

117 Figure 10.18: Still Image Capture of a Moving Object from the PDP Firmware Op- erating at 1Khz

analyze the address lines routed directly to the array.

10.3.2 High-speed There have been a number of different high-speed demos used to test the PDP firmware at rates above 250 hertz operation on an NSLEDS array. Due to the limi- tations in IR Camera speed, these were obtained by windowing the camera resolution down to allow for faster capture. Under full resolution, the IR cameras used can only reach 565 hertz or 132 hertz depending on the exact model. Figure 10.18 shows a still image for a demo of an object that moves along both the x and y dimensions of the array at 1 kilohertz. When an edge is reached, the object bounces similar to the electronic game of Pong[122]. This object was drawn using multiple PDP draw region packets embedded into a larger HDMI frame. For every packet, the x start and y start locations were modified. Drawing at this rate with normal full frame HDMI is impossible. At best, 240 hertz may be used with sacrifices in image fidelity due to limited analog bandwidth as discussed in Chapter 10.2.2.4. Figure 10.19 shows a series of images for a number counting demo running at 2 kilohertz operation. This demo consisted of both a static configuration in which a predefined location was used to display the number and a moving configuration in which the number would move around the array in both the x and y dimensions. In both configurations, the number would increment by one at the given frame rate. The object itself was drawn using individual PDP packets for each numerical digit. As with the previous example, drawing at this rate with normal full frame HDMI is impossible

118 Figure 10.19: Still Image Captures of a Counting Number from the PDP Firmware Operating at 2Khz

Figure 10.20: Still Images Captures of a Non-uniformity Corrected Rotating Object from the PDP Firmware Operating at 2Khz

which demonstrates that packetized operation can achieve rates well beyond normal HDMI with the same hardware. Figure 10.20 shows a series of frames for a non-uniformity corrected rotating three-dimensional object running at 2 Kilohertz operation. The object itself was drawn using individual PDP packets for each rotation. Notably, even with the limited resolu- tion of the IR camera when operating at a high-speed, the object is clear and distinct during each rotation. This demonstrates that packetized operation can indeed miti- gate analog performance issues as discussed in Chapter 10.2.2.4. A small object such as this could be displayed with a slow-speed background without sacrificing image quality which is representative of some types of IR imagery used in practice.

10.3.3 Multi-frame Rate Figure 10.21 shows a series of frames for a multi-frame rate display of numbers using the PDP firmware. The small bright number was updated at 800 hertz, the small background numbers was updated at 100 hertz, and the large number was updated at 2 hertz. Each number was drawn using individual PDP region packets. Similar to other

119 Figure 10.21: Still Image Captures of a Bouncing Number, Small Background Num- bers, and a Large Number from the PDP Firmware Operating at 800Hz, 100Hz, and 2Hz, Respectively

demos discussed in this chapter, each number was incremented at the corresponding frame rate. This demonstrates that multiple framerate display operates correctly within the firmware and is able to achieve rates much higher than possible with the same hardware utilizing conventional display technology.

10.4 Summary The experimental results provided within this chapter make a strong argument to the benefit of packetized display technology for use in future high-speed IRSP sys- tems. A wide breadth of different experiments was shown including normal backwards

120 compatible HDMI operation, low-speed packetized display, high-speed packetized dis- play, and high-speed multiple frame rate operation. Taken together, these demonstrate that packetized operation is capable of vastly improving performance in bandwidth starved environments. The improvements in dig- ital bandwidth utilization show that substantially higher frame rates can be achieved. Additionally, multiple frame rate operation, in particular, enables better analog per- formance by allowing analog bandwidth to be to be reserved for high-speed updates of pixel regions which in turn yields improved overall image quality. In contrast, conven- tional display technology requires that analog bandwidth be apportioned equally for all pixels which results in considerably inferior image quality at high-speeds. The subsequent and final chapter of this dissertation is devoted to an overall discussion of the journey through the architectural development and implementation of a Packetized Display Protocol (PDP) architecture for Infrared Scene Projection Systems (IRSPs).

121 Chapter 11

CONCLUSION

This dissertation presented a Packetized Display Protocol (PDP) architecture for Infrared Scene Projection Systems (IRSPs) with the goals of providing a scalable display system that was both distributable and hardware agnostic, capable of being im- plemented without unnecessary complexity, capable of providing dynamic intra-frame variable refresh rates, and provide backwards compatible support for older display technology. Firstly, the dissertation discussed the history of IRLED scene projectors and the projection process. Secondly, it provided detailed hardware and software limitations within the current technology used to drive arrays. Thirdly, it provided a problem solution to address the issues with current technology through the use of a physical layer agnostic packetized display protocol. Fourthly, it delved into the details of using a projector system to drive arrays from end-to-end including communication between components, the interleaved array write process, and the data ordering. Fifthly, it provided an explanation of the internals of display protocols diving into how they operate at a base level; and the challenges associated with utilizing them for high speed operation. Sixthly, it provided a specification for a packetized display protocol for use with high-speed arrays. Seventhly, it provided an abstract machine model to map the protocol to current and future IRSP hardware. Eighthly, it provided a robust implementation of the protocol on real hardware. And Ninthly, it demonstrated that the use of a packetized display protocol can provide substantial benefits for IRSPs. The PDP architecture detailed in this work has been demonstrated operating at rates of up to 2 Kilohertz while conventual technology on the very same system hardware struggles to produce imagery at rates of only 400 hertz. Furthermore, it

122 has demonstrated the ability to provide fine-grained control over frame transmission and the ability to dynamically control subregion frame rates ranging from 2 hertz to 800 hertz while conventional technology struggles to provide support for much simpler per-frame variable refresh rates. In point of fact, the high-speed frame rates that are detailed within this disser- tation have never been achieved on high-speed IRSP technology before. And to the best of my knowledge, no other attempt to packetize display technology for high-speed multi-frame rate operation has been done before. Indeed, in researching this topic, I found very little related research topics on the subject. At the start of this work, I set out with the intention of finding a potential solution to bridge the performance gap within current IRSP display systems. As I conclude this work, I fully believe that this goal has been achieved. And furthermore, I believe, this work provides a solid foundation for furthering research into display protocol technology. This work focused primarily on the protocol level details of IRSPs, but there remains much research that can be done to improve the front-end side of these systems. A dynamic compositor layer could ease transition and serve as a general solution to packetizing display data dynamically. Parallel scene generators need also be developed if the field is to continue to improve performance. Many additionally optimizations to the packetized display protocol itself and associated implementation can be done to improve frame rates further and lower over- head. Newer high-speed transport layers could be utilized to lower latency and remove unnecessary porch related bandwidth usage altogether. Synchronization of protocol within a parallel system needs to be investigated further. For high speed operation, it could be utilized with multiple parallel drivers to double or quadruple performance with the right hardware. While not explored within this work, improvements to the analog bandwidth of a system could yield further improvements in PDP performance by allowing larger high-speed imagery to be used.

123 Outside the field of IRSPs, it is entirely feasible that multi-frame rate technol- ogy could be used to drive future monitors, video game systems, and virtual reality systems to achieve higher frame rates. In energy efficient systems, often data movement costs power. This technology could decrease data movement resulting in better power efficiency. Furthermore, in mobile devices, keeping the screen powered on requires high amounts of energy. This type of technology may be able to reduce power consumption requirements by driving pixels less often resulting in less overall battery load. As for my own future work, I plan to continue to support and improve upon this technology for the foreseeable future with the hope that it will one day see standard adoption within the field of IRSPs. A Focus on dynamic frame segmentation will be one of my avenues of future research. Another avenue will be in investigating how this technology can be used to characterize array behavior and ease the non-uniformity detection process.

124 REFERENCES

[1] Alan P Pritchard, Mark D Balmond, Stephen Paul Lake, David W Gough, Mark A Venables, Ian M Sturland, Michael C Hebbron, and Lucy A Brimecombe. Design and fabrication progress in BAe’s high-complexity resistor-array IR scene projector devices. In Technologies for Synthetic Environments: Hardware-in-the- Loop Testing III, volume 3368, pages 71–77. International Society for Optics and Photonics, 1998.

[2] Owen M Williams, George C Goldsmith II, and Robert G Stockbridge. History of resistor array infrared projectors: hindsight is always 100% operability. In Tech- nologies for Synthetic Environments: Hardware-in-the-Loop Testing X, volume 5785, pages 208–224. International Society for Optics and Photonics, 2005.

[3] Digital Display Working Group. Digital visual interface. http://www.cs.unc. edu/~stc/FAQs/Video/dvi_spec-V1_0.pdf, April 1999. [4] HDMI Forum. HDMI forum releases version 2.1 of the HDMI spec- ification. https://hdmiforum.org/wp-content/uploads/2017/11/Press- Release-HDMI-2.1-20171128-EN.pdf, November 2018.

[5] Achintya K. Bhowmik, Sylvia J. Downing, George R. Hayek, Srikanth Kamb- hatla, Maximino Vasquez, and Nickolas Willow. 45.2: Displayport—: The emerg- ing converged digital display interface technology and implementation in mobile computing platforms. SID Symposium Digest of Technical Papers, 39(1):673–676, 2012.

[6] Marcus Prewarski Joe LaVeigne, Greg Franks. A two-color 1024x1024 dynamic infrared scene projection system. In Technologies for Synthetic Environments: Hardware-in-the-Loop XVIII, volume 8707, pages 8707 – 8707 – 9, 2013.

[7] NVIDIA. Nvidia quadro sync. https://www.nvidia.com/en-us/design- visualization/solutions/quadro-sync/, February 2020.

[8] G. Bakar, R. A. Kirmizioglu, and A. M. Tekalp. Motion-based adaptive streaming in webrtc using spatio-temporal scalable VP9 video coding. In GLOBECOM 2017 - 2017 IEEE Global Communications Conference, pages 1–6, Dec 2017.

125 [9] E. V. Castillo, C. S. C´ardenas,and M. R. Jara. An efficient hardware archi- tecture of the h.264/avc half and quarter-pixel motion estimation for real-time high-definition video streams. In 2012 IEEE 3rd Latin American Symposium on Circuits and Systems (LASCAS), pages 1–4, Feb 2012.

[10] James R Biard and Gary E Pittman. Semiconductor radiant diode, December 20 1966. US Patent 3,293,513.

[11] Akio Yamanishi and Kenji Hamaguri. Respiration diagnosis apparatus, Jan- uary 31 1995. US Patent 5,385,144.

[12] Sanford L Meeks, Francis J Bova, William A Friedman, John M Buatti, Rus- sell D Moore, and William M Mendenhall. IRLED-based patient localization for linac radiosurgery. International Journal of Radiation Oncology*Biology*Physics, 41(2):433 – 439, 1998.

[13] Neil Sadick. A study to determine the effect of combination blue (415 nm) and near-infrared (830 nm) light-emitting diode (LED) therapy for moderate acne vulgaris. Journal of Cosmetic and Laser Therapy, 11(2):125–128, 2009.

[14] Juliana Santos de Carvalho Monteiro, Susana Carla Pires Sampaio de Oliveira, Maria de F´atimaFerreira Lima, Jos´eAugusto Cardoso Sousa, AntˆonioLuiz Bar- bosa Pinheiro, and Jean Nunes dos Santos. Effect of LED red and IR pho- tobiomodulation in tongue mast cells in wistar rats: Histological study. Pho- tomedicine and Laser Surgery, 29(11):767–771, 2011. PMID: 21790272.

[15] Mohammad Ashrafzadeh Takhtfooladi, Mehran Shahzamani, Hamed Ashrafzadeh Takhtfooladi, Fariborz Moayer, and Amin Allahverdi. Effects of light-emitting diode (led) therapy on skeletal muscle ischemia reperfusion in rats. Lasers in medical science, 30(1):311–316, 2015.

[16] Kimon Roufas, Ying Zhang, Dave Duff, and Mark Yim. Six degree of freedom sensing for docking using IR LED emitters and receivers. In Daniela Rus and San- jiv Singh, editors, Experimental Robotics VII, pages 91–100, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.

[17] I Zeylikovich, W Wang, F Zeng, J Ali, BL Yu, V Benischek, and RR Alfano. Mid-IR transmission window for corrosion detection beneath paint. Electronics Letters, 39(1):1, 2003.

[18] Ioan Plotog and Marian Vladescu. Power LED efficiency in relation to operating temperature . In Ionica Cristea, Marian Vladescu, and Razvan Tamas, editors, Advanced Topics in Optoelectronics, Microelectronics, and Nanotechnologies VII, volume 9258, pages 652 – 657. International Society for Optics and Photonics, SPIE, 2015.

126 [19] L. Scholz, A. Ortiz Perez, S. Knobelspies, J. W¨ollenstein,and S. Palzer. MID-IR LED-based, photoacoustic CO2 sensor. Procedia Engineering, 120:1233 – 1236, 2015. Eurosensors 2015. [20] E. Walsh, W. Daems, and J. Steckel. An optical head-pose tracking sensor for pointing devices using IR-LED based markers and a low-cost camera. In 2015 IEEE SENSORS, pages 1–4, Nov 2015. [21] C. J. Georgopoulos and A. K. Kormakopoulos. A 1-mb/s IR LED array driver for office wireless communication. IEEE Journal of Solid-State Circuits, 21(4):582– 584, Aug 1986. [22] Marcus Escobosa, Thomas M Salsman, and William L Brown. IR receiver using IR transmitting diode, March 2 2004. US Patent 6,701,091. [23] Byungkon Sohn, Jaeyeong Lee, Heesung Chae, and Wonpil Yu. Localization system for mobile robot using wireless communication with IR landmark. In Proceedings of the 1st international conference on Robot communication and co- ordination, pages 1–6. Citeseer, 2007. [24] Yunseon Jang, Kyungmook Choi, F. Rawshan, Seungrok Dan, MinChul Ju, and Youngil Park. Bi-directional visible light communication using performance- based selection of IR-LEDs in upstream transmission. In 2012 Fourth Inter- national Conference on Ubiquitous and Future Networks (ICUFN), pages 8–9, July 2012. [25] Giulio Cossu, Raffaele Corsini, Amir M. Khalid, and Ernesto Ciaramella. Bi- directional 400 Mbit/s LED-based optical wireless communication for non- directed line-of-sight transmission. In Optical Fiber Communication Conference, page Th1F.2. Optical Society of America, 2014. [26] Hamzah Ahmed, Joshua Marks, Fouad Kiamilev, Jacob Benedict, Rodney McGee, Garrett Ejzak, Nicholas Waite, Kassem Nabha, Jonathan Dickason, Russell J. Ricker, Aaron Muhowski, Robert Heise, Sydney Provence, and John Prineas. Technology roadmap for irled scene projectors. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2018. [27] Kassem Nabha, Cassandra Bogh, Deep Dumka, John Prineas, Thomas Boggess, Sydney Provence, and Fouad Kiamilev. Demonstration of 48-micron pitch mid- wave infrared LED emitter arrays integrated with GaAs driver circuits by wafer bonding. Government Microcircuit Applications and Critical Technology GO- MAC Tech, Mar 2018. [28] M. Hernandez, K. Nabha, E. Koerperick, J. Prineas, P. Barakhshan, F. Kiamilev, and G. Ejzak. Improving density and efficiency of infrared projectors. In 2018 IEEE Research and Applications of Photonics In Defense Conference (RAPID), pages 1–4, 2018.

127 [29] M. Hernandez, J. Marks, E. Koerperick, P. Barakhshan, G. Ejzak, K. Nabha, J. Prineas, and F. Kiamilev. Improving density and efficiency of infrared projec- tors. IEEE Photonics Journal, 11(3):1–10, 2019.

[30] A. Deputy, F. Kiamilev, P. Barakhshan, and A. Landwehr. Longitudinal study to evaluate reliability, repeatability, and reproducibility of infrared led scene pro- jectors. In 2019 IEEE Research and Applications of Photonics in Defense Con- ference (RAPID), pages 1–4, 2019.

[31] R. Houser, H. Ahmed, K. Nabha, and F. Kiamilev. Modular system architec- ture as a foundation for rapid IRSP development. In 2018 IEEE Research and Applications of Photonics In Defense Conference (RAPID), pages 1–4, 2018.

[32] Peyman Barakhshan, Garrett Ejzak, Kassem Nabha, John Lawler, and Fouad Kiamilev. Thermal performance characterization of a 512x512 MWIR SLEDS projector. Government Microcircuit Applications and Critical Technology GO- MAC Tech, Mar 2017.

[33] P. Barakhshan, J. Volz, R. J. Ricker, M. Hernandez, A. Landwehr, S. Provence, K. Nabha, R. Houser, J. P. Prineas, C. Campbell, F. Kiamilev, and T. F. Boggess. End to end testing of IRLED projectors. In 2018 IEEE Research and Applications of Photonics In Defense Conference (RAPID), pages 1–4, Aug 2018.

[34] G. A. Ejzak, J. Dickason, J. Benedict, H. Ahmed, R. McGee, K. Nabha, J. Marks, N. Waite, M. Hernandez, S. Cockerill, and F. Kiamilev. Scalable and modular architecture for close support electronics of an infrared scene projector. Gov- ernment Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2015.

[35] Corey Lange, Rodney McGee, Nicholas Waite, Robert Haislip, and Fouad E Ki- amilev. System for driving 2D infrared emitter arrays at cryogenic temperatures. Proc. SPIE, 8015:801507, 2011.

[36] Zachary Marks, Andrea Waite, Joshua Marks, Jonathan Dickason, Rodney McGee, and Fouad Kiamilev. Advanced packaging for infrared scene projec- tors. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2017.

[37] Miguel Hernandez, Jonathan Dickason, Peyman Barakhshan, Josh Marks, Gar- rett Ejzak, Alexis Deputy, Andrea Waite, Rodney McGee, and Fouad Kiamilev. 512x512, two-color infrared scene projector. Government Microcircuit Applica- tions and Critical Technology GOMAC Tech, Mar 2017.

128 [38] Basler, Cognex, DALSA, Data Translation, Datacube, EPIX, Euresys, Fore- sight Imaging, Integral Technologies, , National Instruments, and PUL- NiX America. Camera link: Specifications of the camera link interface stan- dard for digital cameras and frame grabbers. http://www.imagelabs.com/wp- content/uploads/2010/10/CameraLink5.pdf, October 2000.

[39] Qi-dan Zhu, Jin-ye Liu, and Ling Kang. The design of hardware circuit for camera link interface. Applied Science and Technology, 35(8):57–60, 2008.

[40] FLIR Systems, Inc. FLIR SC6000 Series: MWIR science grade camera. http: //www.flirmedia.com/MMC/THG/Brochures/RND_016/RND_016_US.pdf, 2014.

[41] FLIR Systems, Inc. FLIR SC8000 Series: High speed MWIR megapixel science- grade infrared cameras. http://www.flirmedia.com/MMC/THG/Brochures/ RND_018/RND_018_US.pdf, 2014.

[42] FLIR Systems, Inc. FLIR X6900sc: High speed MWIR science-grade infrared camera. http://www.flirmedia.com/MMC/THG/Brochures/RND_065/RND_065_ US.pdf, 2016.

[43] H. Ahmed, R. McGee, J. Marks, A. Waite, A. Landwehr, C. Jackson, G. Ejzak, T. Browning, P. Barakhshan, M. Hernandez, A. Deputy, T. Lassitter, C. Camp- bell, F. Kiamilev, J. Prineas, and E. Koerperick. Fabrication, evaluation, and improvements of 1KÖ1K and 2KÖ2K infrared LED scene projector systems. In 2019 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–1, Aug 2019.

[44] N C Das, M Taysing-Lara, K A Olver, F Kiamilev, J P Prineas, J T Olesberg, E J Koerperick, L M Murray, and T F Boggess. Flip chip bonding of 68×68 MWIR LED arrays. IEEE Transactions on Electronics Packaging Manufacturing, 32(1):9–13, Jan 2009.

[45] D T Norton, J T Olesberg, R T McGee, N A Waite, J Dickason, K W Goossen, J Lawler, G Sullivan, A Ikhlassi, F Kiamilev, E J Koerperick, L M Murray, J P Prineas, and T F Boggess. 512×512 individually addressable MWIR LED arrays based on type-II InAs/GaSb superlattices. IEEE Journal of Quantum Electronics, 49(9):753–759, Sep 2013.

[46] Rodney McGee, Fouad Kiamilev, Nicholas Waite, Joshua Marks, Kassem Nabha, Garrett Ejzak, Jon Dickason, Jake Benedict, and Miguel Hernandez. 512x512, 100hz mid-wave infrared LED scene projector. Government Microcircuit Appli- cations and Critical Technology GOMAC Tech, Mar 2015.

[47] Garrett A. Ejzak, Jonathan Dickason, Joshua A. Marks, Kassem Nabha, Rod- ney T. McGee, Nicholas A. Waite, Jake T. Benedict, Miguel A. Hernandez, Sydney R. Provence, Dennis T. Norton, John P. Prineas, Keith W. Goossen,

129 Fouad E. Kiamilev, and Thomas F. Boggess. 512 × 512, 100 Hz mid-wave in- frared LED microdisplay system. J. Display Technol., 12(10):1139–1144, Oct 2016. [48] G. A. Ejzak, J. Dickason, J. A. Marks, K. Nabha, R. T. McGee, N. A. Waite, J. T. Benedict, M. A. Hernandez, S. R. Provence, D. T. Norton, J. P. Prineas, K. W. Goossen, F. E. Kiamilev, and T. F. Boggess. 512 × 512, 100 Hz mid-wave infrared LED microdisplay system. Journal of Display Technology, 12(10):1139– 1144, Oct 2016. [49] Garrett Ejzak, Jonathan Dickason, Joshua Marks, Jacob Benedict, Rodney McGee, Kassem Nabha, Andrea Waite, Hamzah Ahmed, Miguel Hernandez, Pey- man Barakhshan, Tyler Browning, Jeffrey Volz, Fouad Kiamilev, and Thomas Boggess. 512x512, two-color infrared scene projector. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2017. [50] Russell J. Ricker, Sydney Provence, Lee M. Murray, Dennis T. Norton, Jonathon T. Olesberg, John P. Prineas, and Thomas F. Boggess. 512x512 array of dual-color InAs/GaSb superlattice light-emitting diodes. In Jong Kyu Kim, Michael R. Krames, Li-Wei Tu, and Martin Strassburg, editors, Light-Emitting Diodes: Materials, Devices, and Applications for Solid State Lighting XXI, vol- ume 10124, pages 201 – 206. International Society for Optics and Photonics, SPIE, 2017. [51] Jacob Benedict, Rodney McGee, Joshua Marks, Kassem Nabha, Nicholas Waite, Garrett Ejzak, Jonathan Dickason, Hamzah Ahmed, Miguel Hernandez, Peyman Barakhshan, Tyler Browning, Jeffrey Volz, Russell J. Ricker, Fouad Kiamilev, John Prineas, and Thomas Boggess. 1kx1k resolution infrared LED scene pro- jector at 24um pixel pitch. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2017. [52] H. Ahmed, A. Landwehr, C. Jackson, T. Browning, P. Barakhshan, M. Hernan- dez, A. Deputy, T. Lassitter, C. Campbell, J. Singh, B. Steenkamer, R. McGee, A. Waite, F. Kiamilev, J. Prineas, M. Bellus, L. E. Koerperick, and L. Nichols. Testing, instrumentation, and results to make the world’s first usable 1Kx1K infrared LED scene projector systems. In 2020 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–2, 2020. [53] T. Browning, C. Jackson, R. Houser, A. Landwehr, H. Ahmed, and F. Kiamilev. A modular platform for rapid IRSP development. IEEE Photonics Journal, 11(3):1–10, June 2019. [54] Jacob Benedict, Rodney McGee, Hamzah Ahmed, Miguel Hernandez, Peyman Barakhshan, Rebekah Houser, Joshua Marks, Kassem Nabha, Garrett Ejzak, Nicholas Waite, Jonathan Dickason, Tyler Browning, Christopher Jackson, Rus- sell J. Ricker, Aaron Muhowski, Robert Heise, Fouad Kiamilev, and John Prineas.

130 4-megapixel infrared scene projector based on superlattice light emitting diodes. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2018.

[55] Tyler Browning, Jeffrey Volz, Nicholas Waite, Rodney McGee, and Fouad Ki- amilev. of non-uniformity correction for high-performance real-time infrared LED scene projectors. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2016.

[56] Aaron Landwehr, Nick Waite, Peyman Barakhshan, Jeff Volz, and Fouad Ki- amilev. Non-uniformity correction (NUC) and 1Khz frame rate for IRLED scene projectors. Government Microcircuit Applications and Critical Technology GO- MAC Tech, Mar 2017.

[57] Peyman Barakhshan, Garrett Ejzak, Miguel Hernandez, Aaron Landwehr, Nick Waite, Kassem Nabha, Fouad Kiamilev, Russell J., Sydney Provence, John Prineas, and Thomas F. Boggess. Gamma correction and operability of irled scene projectors. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2018.

[58] Peyman Barakhshan, Miguel Hernandez, Joshua Marks, Nick Waite, and Fouad Kiamilev. Non-uniformity detection and correction of IRLED scene projectors. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2019.

[59] P. Barakhshan, C. Campbell, M. Hernandez, and F. Kiamilev. Evaluation of performance, non-uniformity and thermal limits for infrared led scene projectors. In 2019 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–1, 2019.

[60] G. A. Ejzak, M. Hernandez, A. Landwehr, and F. E. Kiamilev. ’hybrid DAC’ ap- proach to increasing dynamic range and signal to noise in IRSP systems. In 2019 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–4, 2019.

[61] Dennis M Freeman. Slewing distortion in digital-to-analog conversion. Journal of the Audio Engineering Society, 25(4):178–183, 1977.

[62] B Gordon. Linear electronic analog/digital conversion architectures, their origins, parameters, limitations, and applications. IEEE Transactions on circuits and systems, 25(7):391–418, 1978.

[63] K. L. Chan, N. Rakuljic, and I. Galton. Segmented dynamic element matching for high-resolution digital-to-analog conversion. IEEE Transactions on Circuits and Systems I: Regular Papers, 55(11):3383–3392, Dec 2008.

131 [64] M. Hernandez, T. Lassiter, P. Barakhshan, A. Deputy, G. Ejzak, and F. Ki- amilev. Small batch production and test of custom support electronics for infrared LED scene projectors. In 2019 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–1, 2019.

[65] J. Singh, F. E. Kiamilev, M. Hernandez, T. Lassiter, and A. Deputy. Small batch fabrication and weekly rotational testing of closed system electronics for infrared LED scene projectors. In 2020 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–2, 2020.

[66] Xilinx, Inc. Virtex-6 Family Overview. https://www.xilinx.com/support/ documentation/data_sheets/ds150.pdf, 2015.

[67] I.J. Nagrath and M. Gopal. Control Systems (As Per Latest Jntu Syllabus). New Age International (P) Limited, 2009.

[68] S.A. Frank. Control Theory Tutorial: Basic Concepts Illustrated by Software Examples. SpringerBriefs in Applied Sciences and Technology. Springer Interna- tional Publishing, 2018.

[69] H.Y. Hu and Z.H. Wang. Dynamics of Controlled Mechanical Systems with De- layed Feedback. Engineering online library. Springer Berlin Heidelberg, 2002.

[70] C. Campbell, P. Barakhshan, A. Landwehr, J. Volz, and F. Kiamilev. Advanced software for user interface, user control, data/image input, and test automation for infrared led scene projectors. In 2019 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–3, 2019.

[71] Stephen W. McHugh, Jon A. Warner, Mike Pollack, Alan Irwin, Theodore R. Hoelter, William J. Parrish, and James T. Woolaway II. MIRAGE dynamic IR scene projector overview and status. In Robert Lee Murrer Jr., editor, Technolo- gies for Synthetic Environments: Hardware-in-the-Loop Testing IV, volume 3697, pages 209 – 222. International Society for Optics and Photonics, SPIE, 1999.

[72] Tianne L. Lassiter. Scalable board architecture design & mechanical adaptations for infrared scene projector systems. Master’s thesis, University of Delaware, 2019.

[73] T. L. Lassiter, J. Dickason, G. A. Ejzak, Z. Marks, A. Waite, and F. E. Kiamilev. Modular carrier board and package for infrared LED arrays. In 2018 IEEE Research and Applications of Photonics In Defense Conference (RAPID), pages 1–3, Aug 2018.

[74] T. L. Lassiter, J. Marks, J. Dickason, G. A. Ejzak, Z. Marks, A. Waite, and F. E. Kiamilev. Modular carrier board and package for infrared LED arrays. IEEE Photonics Journal, 11(4):1–6, Aug 2019.

132 [75] T. Lassiter, G. A. Ejzak, A. Waite, M. Hernandez, and F. E. Kiamilev. Elec- tronic mechanical development of a multi-platform infrared LED scene projector system. In 2020 IEEE Research and Applications of Photonics in Defense Con- ference (RAPID), pages 1–3, 2020.

[76] Joe LaVeigne and Breck Sieglinger. Design considerations for a high-temperature, high-dynamic range IRSP. In James A. Buford Jr., R. Lee Murrer Jr., and Gary H. Ballard, editors, Technologies for Synthetic Environments: Hardware- in-the-Loop XVII, volume 8356, pages 105 – 115. International Society for Optics and Photonics, SPIE, 2012.

[77] D. T. Norton. Type-II InAs/GaSb superlattice LEDs: applications for infrared scene projector systems. PhD thesis, University of Iowa, 2013.

[78] Rebekah Houser, Hamzah Ahmed, Kassem Nabha, Miguel Hernandez, Christo- pher Jackson, Tyler Browning, Jacob Benedict, Garrett Ejzak, Nick Waite, and Fouad Kiamilev. Scalable firmware for infrared scene projectors. Government Microcircuit Applications and Critical Technology GOMAC Tech, Mar 2018.

[79] MIPI Alliance, Inc. MIPI display serial interface (MIPI DSI). https://www. mipi.org/specifications/dsi, April 2017.

[80] Video Electronics Standards Association. VESA publishes displayport— stan- dard version 1.4. https://vesa.org/featured-articles/vesa-publishes- displayport-standard-version-1-4/, March 2016.

[81] National Instruments. Analog video 101. http://www.ni.com/white-paper/ 4750/en/, May 2018.

[82] J. D. Neal. Hardware level VGA and SVGA video programming information page. http://www.osdever.net/FreeVGA/vga/crtcreg.htm, 1998.

[83] MythTV Community. Mythtv modeline database. https://www.mythtv.org/ wiki/Modeline_Database, October 2015.

[84] Video Electronics Standards Association. VESA coordinated video timings CVT standard 1.2. https://vesa.org/vesa-standards/, February 2013.

[85] The Open Group. The X Window SystemTM . http://www.opengroup.org/ tech/desktop/x-window-system/, October 2020.

[86] A. Landwehr, A. Waite, T. Browning, C. Jackson, R. Houser, H. Ahmed, and F. Kiamilev. Toward a packetized display protocol architecture for IRLED pro- jector systems. In 2018 IEEE Research and Applications of Photonics In Defense Conference (RAPID), pages 1–4, Aug 2018.

133 [87] E. Weigle and Wu-chun Feng. A comparison of tcp automatic tuning techniques for distributed computing. In Proceedings 11th IEEE International Symposium on High Performance Distributed Computing, pages 265–272, 2002.

[88] Jeff Vroom Jeffrey Friedberg, Larry Seiler. Extending X for double- buffering, multi-buffering, and stereo. https://xcb.freedesktop.org/ XcbNotes/buffer.pdf, January 1990.

[89] . Glide 2.2 programming guide, March 1997.

[90] 3Dfx Interactive. 3dfx sst-1(a.k.a video graphics) high performance graph- ics engine for 3D game acceleration. http://darwin-3dfx.sourceforge.net/ voodoo_graphics.pdf, December 1999.

[91] 3Dfx Interactive. 3dfx voodoo2 graphics high performance graphics engine for 3D game acceleration. http://darwin-3dfx.sourceforge.net/voodoo2.pdf, December 1999.

[92] AMD. freesync technology. https://www.amd.com/en/technologies/ free-sync, February 2019.

[93] NVIDIA. Geforce g-sync. https://developer.nvidia.com/g-sync, February 2020.

[94] Video Electronics Standards Association. VESA adds ‘adaptive-sync’ to popular displayport— video standard. https://vesa.org/featured-articles/vesa- adds-adaptive-sync-to-popular-displayport-video-standard/, May 2014.

[95] Video Electronics Standards Association. DisplayPort technical overview. https://www.vesa.org/wp-content/uploads/2011/01/ICCE-Presentation- on-VESA-DisplayPort.pdf, January 2011.

[96] Craig Raymond Wiley. 40.1: Invited paper: Displayport® 1.2, embedded displayport, and future trends. SID Symposium Digest of Technical Papers, 42(1):551–554, 2011.

[97] J.L. Hennessy, D.A. Patterson, and K. Asanovi´c. Computer Architecture: A Quantitative Approach. Computer Architecture: A Quantitative Approach. El- sevier Science, 2012.

[98] KY Lee, W Abu-Sufah, and DJ Kuck. On modeling performance degradation due to data movement in vector machines,”. In Proc. 1984 Int. Conf. Parallel Proc., IEEE, pages 269–277, 1984.

134 [99] K. Lai and M. Baker. Measuring bandwidth. In IEEE INFOCOM ’99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320), volume 1, pages 235–245 vol.1, 1999.

[100] F. Zheng, T. Whitted, A. Lastra, P. Lincoln, A. State, A. Maimone, and H. Fuchs. Minimizing latency for augmented reality displays: Frames considered harmful. In 2014 IEEE International Symposium on Mixed and Augmented Reality (IS- MAR), pages 195–200, 2014.

[101] Seema Bandyopadhyay and Edward J. Coyle. Minimizing communication costs in hierarchically-clustered networks of wireless sensors. Computer Networks, 44(1):1 – 16, 2004.

[102] J. Li, A. Deshpande, and S. Khuller. Minimizing communication cost in dis- tributed multi-query processing. In 2009 IEEE 25th International Conference on Data Engineering, pages 772–783, 2009.

[103] Mary Hall. High performance is all about minimizing data movement. In Pro- ceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’20, page 3–4, New York, NY, USA, 2020. Asso- ciation for Computing Machinery.

[104] Hermann Kopetz. Real-time systems: design principles for distributed embedded applications. Springer Science & Business Media, 2011.

[105] X. Zhou, J. He, L. S. Liao, M. Lu, X. M. Ding, X. Y. Hou, X. M. Zhang, X. Q. He, and S. T. Lee. Real-time observation of temperature rise and thermal breakdown processes in organic LEDs using an IR imaging and analysis system. Advanced Materials, 12(4):265–269, 2000.

[106] Dennis Crow, Charles Coker, and Wayne Keen. Fast line-of-sight imagery for tar- get and exhaust-plume signatures (FLITES) scene generation program. In Robert Lee Murrer Jr., editor, Technologies for Synthetic Environments: Hardware-in- the-Loop Testing XI, volume 6208, pages 195 – 202. International Society for Optics and Photonics, SPIE, 2006.

[107] Hawjye Shyu, Thomas M. Taczak, Kevin Cox, Robert Gover, Carlos Maraviglia, and Colin Cahill. High-fidelity real-time maritime scene rendering. In Scott B. Mobley, editor, Technologies for Synthetic Environments: Hardware-in-the-Loop XVI, volume 8015, pages 165 – 175. International Society for Optics and Pho- tonics, SPIE, 2011.

[108] Joseph W. Morris, Gary H. Ballard, Dennis H. Bunfield, Thomas E. Peddycoart, and Darian E. Trimble. The multispectral advanced volumetric real-time imaging compositor for real-time distributed scene generation. In Scott B. Mobley, editor,

135 Technologies for Synthetic Environments: Hardware-in-the-Loop XVI, volume 8015, pages 186 – 195. International Society for Optics and Photonics, SPIE, 2011.

[109] A. Landwehr, T. Browning, C. Jackson, D. May, A. Waite, H. Ahmed, and F. Kiamilev. An implementation of a packetized display protocol architecture for IRLED projector systems. IEEE Photonics Journal, 11(2):1–10, 2019.

[110] C. Jackson, T. Browning, A. Landwehr, D. May, H. Ahmed, A. Waite, and F. Kiamilev. Demonstration of packetized display protocol (PDP) to overcome speed and resolution limitations of conventional display protocols. In 2019 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–1, 2019.

[111] T. Browning, C. Jackson, A. Landwehr, A. Waite, D. May, and F. Kiamilev. Architectural enhancements of a packetized display protocol for high-speed IRSP operation. In 2020 IEEE Research and Applications of Photonics in Defense Conference (RAPID), pages 1–2, 2020.

[112] Clifford E Cummings. Clock domain crossing (CDC) design & verification tech- niques using systemverilog. Synopsys User Group Meeting 2008, Boston, 2008.

[113] R. Ginosar. Metastability and synchronizers: A tutorial. IEEE Design & Test of Computers, 28(5):23–35, 2011.

[114] Digilent, Inc. ZYBO— FPGA Board Reference Manual. https://reference. digilentinc.com/_media/zybo/zybo_rm.pdf, 2016.

[115] Allan J. Organ. The miniature, reversed stirling cycle cryo-cooler: integrated simulation of performance. Cryogenics, 39(3):253 – 266, 1999.

[116] National Research Council Of the National Academies. An Assessment of Preci- sion Time and Time Interval Science and Technology. National Academies Press, 2002.

[117] J. F. Buckwalter, M. Meghelli, D. J. Friedman, and A. Hajimiri. Phase and amplitude pre-emphasis techniques for low-power serial links. IEEE Journal of Solid-State Circuits, 41(6):1391–1399, 2006.

[118] D. Rafique, T. Rahman, A. Napoli, and B. Spinnler. Digital pre-emphasis in opti- cal communication systems: On the nonlinear performance. Journal of Lightwave Technology, 33(1):140–150, 2015.

[119] B. Hu, Y. Du, R. Huang, J. Lee, Y. Chen, and M. F. Chang. A capacitor-dac- based technique for pre-emphasis-enabled multilevel transmitters. IEEE Trans- actions on Circuits and Systems II: Express Briefs, 64(9):1012–1016, 2017.

136 [120] Pham Quang Thai, Francois Rottenberg, Pham Tien Dat, and Shimamoto Shigeru. Increase data rate of OLED VLC system using pre-emphasis circuit and FBMC modulation. In Imaging and Applied Optics 2018 (3D, AO, AIO, COSI, DH, IS, LACSEA, LS&C, MATH, pcAOP), page SM2H.4. Optical Soci- ety of America, 2018.

[121] Z. Zhou, T. Odedeyi, B. Kelly, J. O’Carroll, R. Phelan, I. Darwazeh, and Z. Liu. Impact of analog and digital pre-emphasis on the signal-to-noise ratio of bandwidth-limited optical transceivers. IEEE Photonics Journal, 12(2):1–12, 2020.

[122] Alan Alcorn. Pong. Published by: Atari Inc, 1972.

137