Per-Pixel Coded-Exposure CMOS Image Sensors

by

Navid Sarhangnejad

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering. University of Toronto

© Copyright by Navid Sarhangnejad 2021 Abstract

Per-Pixel Coded-Exposure CMOS Image Sensors

Navid Sarhangnejad Doctor of Philosophy Department of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering. University of Toronto 2021

The ever-growing demand for applications of cameras necessitate research not only on improving the performance of image sensors but also on new image sensor architectures. One of the most recent image sensor architectures, based on coded-exposure pixels (CEP), allows for the programmability of exposure time at the pixel level, and allows for imaging in new ways that were not possible so far.

In this thesis, first a comparison of different photo-detectors is presented to highlight their operation principle as well as their capabilities. Five photo-detector architectures are simulated to compare the most important specifications in CEP cameras, namely sensitivity and tap-contrast.

Next, a first prototype, a CEP image sensor based on photogate (PG) pixels, is presented. The sensor has a total resolution of 180 × 160 pixels and is fabricated in 0.35µm CMOS technology. Dual-tap pixels with per-tap conversion gain are proposed, where the photogenerated charges in the pixel are collected in one of the taps based on the code stored in the pixel at each interval of the exposure.

The second prototype is an image sensor based on pinned-photodiode (PPD) pixels. The sensor is fabricated in a 0.11µm CMOS technology with the main array consisting of 244 × 162 pixels. The dual-tap pixel proposed in this work has the same conversion gain for the two taps but provides per-tap adjustable gain in the readout. The array

ii operates at a maximum subframe rate of 180Hz, which is equivalent to 4 subframes per frame at 25fps considering the overhead time of frame readout. The sensor is deployed in two different single-shot 3D computational imaging techniques. Finally, an architecture based on global-shutter PPD pixels is presented allowing the implementation of smallest CEP pixels (7µm pitch) reported to date. The sensor is fabricated in 0.11µm CMOS technology with a resolution of 312 × 320 pixels. In the proposed pixel, a pinned storage diode operates as a charge memory to pipeline the charge generation and charge sorting operations. At a subframe rate of 2.7kHz, a reasonable tap- contrast of more than 90% is measured. Finally, a few different computational imaging techniques that are demonstrated with this camera are presented.

iii Acknowledgements

I want to extend my gratitude to professor Roman Genov, my direct Ph.D. supervisor, for giving me the opportunity to be part of his team, for his continuous support and his insight. During the time at Roman’s group, he provided the tools and resources needed for my research. Recruiting post-docs and new grad students to help on different aspects of the project was a significant help in completing my program. Because of his approach, I learned a lot more than I initially expected from a Ph.D program by the opportunity to work with several grad students and conduct many undergrad students on different aspects of the project.

I would also like to express my gratitude to professor Kiriakos N. Kutulakos, my Ph.D. co-supervisor, for his priceless support, vision and contributions. The basis of my research stem from his group’s extraordinary work on computational imaging. I consider myself very lucky to have Kyros as a co-advisor. His patience to listen to my ideas, detailed explanations and brainstorming sessions were crucial in completing my research.

I would like to thank my defense committee members, professor Ali Sheikholeslami and professor Antonio Liscidini for reviewing my thesis and providing constructive com- ments. Also, I would Like to thank external examiner Prof. Franco Zappa for taking part in my defense and providing his valuable feedback.

I also would like to thank Dr. David Stoppa and his team for giving me the opportu- nity to join them at FBK IRIS for two months. It was a significant learning opportunity for me, especially with the support of Manuel Moreno Garcia.

My Ph.D. started with working with Matthew P. O’Toole and Hyunjoong Lee on a computational imaging project, and my research became part of this project. They were outstanding colleagues and I learned a lot from them. I want to thank them for their priceless help and patience to educate me. I would like to thank other team mates, and simultaneously great friends, that contributed to the project and had paramount contributions to what we achieved together. They are Nikola Katic (a.k.a Johnny, the

iv calm and wise guy that I could talk to regarding any topic and he would never let me down), Mian Wei (Mr. finisher, gets the demo done in a matter of days while educating us on how the demo actually works), Gairik Dutta (The energetic guy, analog designer working 24/7 in black leather jacket!), Nikita Gusev (The laser specialist, energetic and fun but unfortunately not that good at foosball!), Rahul Gulve (The mastermind, knows and works on everything and gets the issues fixed), Zhengfan Xia (The magician, We never understood when or how, but he was getting the Python/C++ codes done in very short time), and Harel Haim (The guide, answering my never ending silly computational imaging questions). They are all great friends, and I learned a lot from them through several tape-outs, prototyping, measurement and demonstrations.

Out of many people from BA5158, only Javid Musayev had the title of ”my best friend”. Although he tries to duplicate my name, but I am not mad at him. He is a great friend in any possible way that one can ask for. I want to thank him for being such an incredible mate.

I would like to thank Hossein Kassiri Bidhendi, Nima Soltani and Arshya Feyzi for the first months and year that I joined the team. They were great friends who helped me in settling in the city, as well as in the group. They provided a lot of help on the possibilities of working with undergrad students, writing proposals and lab management.

I would also like to acknowledge my U of T colleagues, who made the Ph.D. expe- rience a much more pleasant and memorable one for me. I would like to thank Enver Kilinc, Gerald O’Leary, Maged ElAnsary, Wilfred Cho, Mohammad Reza Pazhouhan- deh, Camilo Tejeiro, Jose Sales Filho, Farhad Ramezankhani, Farrokh Etezadi, Saeed Reza Khosravirad, Navid Samavati, Foad Arvani, Amer Samarah, Chuanwei (Jason) Li, Keiming Kwong, Amirali Amirsoleimani, Jianxiong (Jay) Xu, Asish Abraham, Sadegh Dadash, Masumi Shibata, Ahmed Elian, Mahdi Marsousi, Samira Karimelahi, Hamed Sadeghi, Sevil Zeynep Lulec, Robert Baker, Daniel Rozhko, Mario Milicevic, Andrew Shorten, and Jin Hee Kim. I am also grateful to ECE staff members Jennifer Rodriguez,

v Jaro Pristupa, Darlene Gorzo, and Jayne Leake and who helped me in different ways. It was a fortunate opportunity to mentor many bright graduate and undergraduate students who also contributed to my research. Hence I would like to thank Kevin Lee, Hardik Patel, Chengzhi (Winston) Liu, Peter Li, Terrence Cole Millar, Nafis Ahbab, Hsin-Yu Lo, Anas Ahmed Jamil, Shakthi Sanjana Seerala, Hui Feng (Jackie) Ke, Hui Di (Wendy) Wang, Jinzhuo (Sarah) Tang, Gilead Posluns, and Shichen Lu. I also want to thank the team at Huawei Technologies Canada, for accepting me in their team long time before I graduate from my Ph.D. program. I would like to thank Be- hzad Dehlaghi Jadid, Alireza Sharif-Bakhtiar, Mohammad Sadegh Jalali, Shayan Shahramian, Joshua Liang, Hossein Shakiba, Jingxuan Chen, Summer Zhu, Yingying Fu, Dustin Dun- well, and David Cassan as well as all the other colleagues. I want to thank my parents, brother and sisters for always supporting me and believing in me in my choices. I’d like to also thank my in-laws for their kindness and support. I am grateful for my beautiful little daughter, Ava, that made life colourful and was patient for me to finish my Ph.D. program. And finally, I am beyond grateful for all the support and kindness that my wife had for me during this time. Sheida made it possible for me to go through all the stress and hardships, and I want to dedicate this thesis to her and Ava.

vi Contents

Acknowledgements iv

Contents vii

List of Figures x

List of Tables xvi

1 Introduction 1

1.1 CMOS image sensor background ...... 2

1.1.1 Imaging system pipeline ...... 2

1.1.2 Image sensor architectures ...... 4

1.1.3 Pixel operation ...... 5

1.1.4 Photosensing principle ...... 6

1.1.5 Performance metrics ...... 9

1.2 Thesis motivation: coded-exposure imagers ...... 13

1.3 Thesis objectives ...... 19

1.4 Thesis outline ...... 20

2 Comparison of photodetectors for coded-exposure pixels 22

2.1 Introduction ...... 23

2.2 Photodetector architectures ...... 25

vii 2.2.1 Operational speed ...... 25

2.2.2 Architecture choices ...... 27

2.2.3 Dual-tap architectures ...... 28

2.3 Comparison of dual-tap pixels ...... 31

2.3.1 Cross-section of the pixels ...... 31

2.3.2 Simulated electrostatic potential diagrams ...... 34

2.3.3 Simulated sensitivity and contrast ...... 36

2.3.4 Comparisons in literature ...... 40

2.4 Considerations for in-pixel circuitry ...... 43

2.4.1 Capacitors ...... 43

2.4.2 Storage diodes ...... 43

2.4.3 In-pixel transistors ...... 44

2.5 Conclusions ...... 44

3 Coded-exposure image sensor based on photogate 46

3.1 Introduction ...... 46

3.2 Coded-Exposure Pixels ...... 47

3.3 Sensor Architecture ...... 49

3.4 Camera System Architecture ...... 51

3.5 Measurement Results ...... 53

3.6 Conclusion ...... 54

4 PPD-based CEP imager with in-pixel code memory 55

4.1 Introduction ...... 56

4.2 Coded-Exposure Pixel Architecture ...... 57

4.2.1 Computational photography applications ...... 57

4.2.2 Pixel ...... 60

4.3 CEP Sensor Architecture ...... 63

viii 4.4 System implementation ...... 67 4.5 Experimental characterization ...... 68 4.6 Experimental demonstration ...... 73 4.6.1 Single-frame structured light imaging results ...... 73 4.6.2 Single-frame photometric stereo imaging results ...... 73 4.7 Discussion ...... 74 4.8 Conclusions ...... 77

5 PPD-based CEP imager with in-pixel analog image memory 78 5.1 Introduction ...... 78 5.2 Pixel Architecture ...... 79 5.2.1 Background ...... 79 5.2.2 Proposed operation ...... 80 5.2.3 Proposed pixel architecture ...... 81 5.3 Sensor Architecture ...... 84 5.4 System implementation ...... 86 5.5 Experimental results ...... 87 5.5.1 Electrical experiments ...... 87 5.5.2 Direct/Indirect imaging ...... 88 5.5.3 Multispectral imaging ...... 92 5.5.4 Range gating imaging ...... 95 5.6 Conclusion ...... 95

6 Conclusion and future work 97 6.1 Summary ...... 97 6.2 future work ...... 99

Bibliography 102

ix List of Figures

1.1 A conventional imaging system pipeline...... 2

1.2 A computational imaging system pipeline...... 3

1.3 Common image sensor building blocks...... 5

1.4 (a) Schematic of a 3T active pixel in a column of pixels, and (b) conceptual output diagram of the pixel...... 6

1.5 Simplified photo-gate device used as a photo-detector (a) the cross-section and (b) the electrostatic potential diagram across z-z’ in depletion mode.8

1.6 A reverse biased PN junction as a photodiode (a) cross-section and (b) electrostatic potential diagram across z-z’...... 9

1.7 Uncertainty in the final pixel output because of (a) reset level error due to static (DSNU) or temporal (random) noise, and (b) error in the static response of the pixel (PRNU)...... 10

1.8 Simplified diagram of (a) single-sampling, (b) double-sampling, (c) corre- lated double-sampling and (d) correlated multiple-sampling readout schemes. 11

1.9 A simplified charge transfer path of a photo-generated electron...... 12

1.10 Exposure time of conventional (a) and coded-exposure-pixel (b) image sensors...... 14

1.11 Extending DR by using a CEP camera in a static scene and passive illu- mination [34]...... 15

1.12 Example of direct/indirect imaging, explained in more detail in Chapter 5. 17

x 1.13 Experimental results from [29]. First column: Input coded exposure im- ages. Numbers in parentheses denote the camera integration time for the input image. Second column: Close-ups illustrate the coded motion blur. Third-sixth columns: The reconstructions maintain high spatial resolution despite a significant gain in temporal resolution (9X − 18X)...... 18

1.14 An illustration of the electrical coded-exposure implementation: (a) coded- exposure imaging example of the arbitrary codes that could be applied over many different subframes, (b) pixel timing diagram of the related pixel signal integration process depending on the code value over multiple subframes...... 19

2.1 The conceptual circuit of coded-exposure pixel...... 24

2.2 The cross-sections of the (a) photogate, (b) n-well/p-sub, and (c) pinned photodiode...... 26

2.3 Dual-tap pixel choices. The cross-sections of the (a) PG, (b) enhanced- PG, (c) dual-TG n-well/p-sub, (d) n-well/p-sub with in-pixel CTIA, and (e) PPD...... 29

2.4 Doping profiles of the simulated dual-tap structures: (a) PG, (b) enhanced- PG, (c) dual-TG n-well/p-sub, (d) n-well/p-sub with in-pixel CTIA, and (e) PPD...... 32

2.5 Simulated electrostatic potential diagram of the dual-tap pixel structures: (a) PG, (b) enhanced-PG, (c) dual-TG n-well/p-sub, (d) n-well/p-sub with in-pixel CTIA, and (e) PPD...... 35

2.6 Simulated optical generation for the PPD device...... 37

2.7 Simulated sensitivity of the pixel structures...... 37

2.8 Simulated optical intensity on PG and PPD structures for 400nm and 700nm light wavelengths...... 38

xi 2.9 Simulated tap-contrast of the pixel architectures for 550nm light wave- length (a) and 850nm (b)...... 39

2.10 Transient simulation for demodulation contrast of n-well/p-sub (a) and PPD (b)...... 41

3.1 The proposed PDC pixel circuit diagram with level-sensitive latches. . . . 48

3.2 The measured digital pixel output at uniform light versus the coded pixel exposure expressed as percentage of masks applied over the complete frame exposure...... 48

3.3 A simplified waveform diagram of the mask deserialization and loading. . 49

3.4 The proposed CEP CMOS image sensor architecture...... 50

3.5 The pixel layout and the image sensor die micrograph showing the chip region dedicated to PDC...... 50

3.6 The architecture of the camera system, including the custom designed CEP image sensor...... 51

3.7 Capturing a photo of Lena shown on the computer screen, while changing the level of pixel masking. The number of masks equal to one starts from 5% on the left, and increases all the way up to 95% on the right. The output image is shown for each of the buckets...... 52

3.8 The complete sensing side of the imaging system with the camera system and the FPGA board used for the direct signal acquisition...... 53

4.1 The imaging principle for (a) single-frame structured-light imaging, and (b) single-frame photometric stereo imaging [35]...... 58

4.2 The principle of operation of (a) a generic dual-tap coded-exposure pixel, and (b) the presented dual-tap code-memory pixel (CMP) [35]. N is the number of subframes...... 60

xii 4.3 (a) The pixel schematic of CMP pixel and (b) its simplified timing diagram. H × V are pixel array dimensions; h and v are the horizontal and vertical indices, respectively. N is the number of subframes...... 61

4.4 (a) Floorplan of the pixel, and (b) the electrostatic potential diagram for both code 0 and code 1 [35]...... 62

4.5 VLSI architecture of the sensor. Blocks highlighted in grey are presented in more detail in the following sections...... 63

4.6 Circuit and timing diagram of a 48-column slice of the column-parallel programmable-gain amplifier (PGA) and the sample-and-hold (S/H) cir- cuit in Fig. 4.5...... 64

4.7 Block diagram of code deserializers and code loading circuit together with the pixel array (top), and the simplified circuitry of key internal blocks (bottom). Other auxiliary circuitry for various configurations of the sensor are not shown here for simplicity...... 66

4.8 Micrograph of a prototype fabricated in a CIS 110nm process. The die size is 3mm × 4mm [35]...... 67

4.9 (a) Imaging system experimental setup for single-frame structured-light imaging, and (b) the camera module block diagram...... 68

4.10 (a) Experimentally measured tap contrast map of the sensor (the on-line version is in color), and (b) the histogram of the tap contrast between 0.8 and 1 values (note that the vertical axis is shown on a logarithmic scale). 69

4.11 Experimentally captured raw output images of the sensor for two uniform- lighting cases: for one (top) and for two (bottom) subframes per frame [35]. 70

4.12 Experimentally measured single-frame structured-light imaging results: (a) The albedo and depth map reconstruction pipeline [35], and (b) albedo and disparity (i.e., inverse depth) maps from other scenes, for both static (left) and dynamic (right) scenes, all generated at 20fps...... 72

xiii 4.13 Experimentally measured single-frame photometric stereo imaging results: normals maps and albedo maps generated at 20fps, for both static (left) and dynamic (right) scenes [35]...... 74

4.14 Cross-section of a PPD device with possible different n/p doping wells. . 76

5.1 Architectures for rolling-shutter (top) and global shutter (bottom) pixels. 80

5.2 The flow chart for analog-memory pixel (AMP)...... 81

5.3 Schematic and the corresponding timing diagram of the analog-memory pixel (AMP). V is the number of rows, h is the column index and N is the number of subframes...... 82

5.4 The layout of the pixel (top) and the corresponding potential diagrams during the global data sampling and charge sorting phases (bottom). . . 83

5.5 Architecture of the sensor...... 84

5.6 Simplified code deserializer block diagram and the associated blocks. This block is used for every channel in Fig. 5.5...... 85

5.7 Chip micrograph. 3mm × 4mm CMOS 0.11µm ...... 86

5.8 Camera and projector configuration for both direct/indirect and range gating applications, and the camera module block diagram...... 87

5.9 The simplified schematic of the pixel and timing diagram for measuring the tap-contrast...... 88

5.10 Experimentally measured tap contrast...... 89

5.11 The setup for direct/indirect imaging...... 89

5.12 The sensor output images of (a) a highly scattering object (wax candle) demonstrating the direct/indirect light imaging concept and corresponding experimental results, and (b) direct/indirect light imaging experimental results at video rate (the image on top is captured by a conventional camera)...... 90

5.13 The setup for multispectral imaging...... 92

xiv 5.14 Multispectral imaging experimental results at video rate. The image on top is captured by a conventional camera...... 93 5.15 (a) The setup for range gating imaging, and (b) raw output images of the camera with a hand at different depths in the field of view of the camera. 94

6.1 A simple microbump map illustration for a 2×2 pixel set. For a microbump pitch of d, the pixel pitch is 2d...... 100

xv List of Tables

2.1 Comparison table of ToF sensors ...... 42

4.1 Comparison table ...... 71

5.1 Comparison table...... 96

xvi Chapter 1

Introduction

Since the view from the window at Le Gras, the oldest surviving photograph shot in 1826 or 1827, photography has seen many significant milestones such as color photography in 1861, video in 1888 (oldest surviving), and the first digital camera in 1975. In the past couple of decades, with the dominance of the CMOS cameras in the market, the race for high-performance cameras has resulted in image sensor architectures with sub-electron read noise [1–3], native high dynamic range [4–6], high resolution [7] and high frame rate. Additionally, computational imaging has introduced new possibilities for imaging. One example that is the main focus of this thesis is the coded-exposure imaging so far used in applications like indirect insensitive 3D imaging, image de-blurring, high-speed imaging, transport-aware imaging (such as seeing through smoke or fog and looking around the corners), and extended dynamic-range applications. These new ways of imaging have required new hardware and have lead to new challenges in the design of such camera systems. Some solutions have used bulky optics and complex systems, while others may have chosen custom image sensors. In this chapter, first, the of image sensor design are briefly introduced before the proposed designs are presented in the following chapters. Additionally, the objectives and outline of the thesis are given at the end of this chapter.

1 Chapter 1. Introduction 2

AUTO FOCUS AUTO EXPOSURE

COLOR PROC. DE-NOISE COMPRESSION 0 1 ... ETC. IMAGE IMAGING IMAGE AGC POST SCENE STORAGE OR OPTICS SENSOR ADC PROCESSING VISUALIZATION

Figure 1.1: A conventional imaging system pipeline.

1.1 CMOS image sensor background

In this section, the common architectures used for image sensors, principles of photo- detection in semiconductors, and the essential performance metrics of an image sensor are explained.

1.1.1 Imaging system pipeline

A conventional imaging system pipeline is illustrated in Fig. 1.1. An image of the scene is focused on a detector plane, known as image sensor. The image sensor, which may consist color filter array or microlens array, produces a two-dimensional electrical signal representation of the scene. Analog signal processing, such as analog gain control (AGC), is applied to the signal before it is digitized by the analog-to-digital converters (ADC). After this step, further signal enhancement processing is applied to the image data and finally the image data is stored or visualized. A camera module usually includes most of these operations. Additionally there can be control loops for focus, exposure and other configurations of the camera for better image quality.

In comparison, computational cameras may use unconventional optics to optically Chapter 1. Introduction 3

AUTO FOCUS AUTO EXPOSURE

COLOR PROC. DE-NOISE f(I) COMPRESSION 0 1 ... ETC. IMAGE NOVEL IMAGE AGC POST SCENE COMPUTE STORAGE OR OPTICS SENSOR ADC PROCESSING VISUALIZATION

Figure 1.2: A computational imaging system pipeline. encode the image focused on the image sensor as shown in Fig. 1.2. The image of such system may not be easily understood by visualization and additional computation may be included to extract useful information from the image. Depending on the computa- tional imaging technique incorporated by such cameras, some of the functionalities of the conventional imaging pipeline may be void but for sake of generality they are shown in the figure.

An example of a computational imaging is when an image is coded to be only sensitive at parts of the exposure time. This coding can be a pseudo-random sequence for the purpose of temporally compressing the image [8]. Such technique can be used to extract temporal information from the image, for instance to de-blure the image. If the control of the coding is controled at the pixel level [9], then more information can be extracted to reconstruct a video from a single image of the camera. In addition to programmability at pixel level, it is also possible to have more than one output per pixel. The extra output from the pixel, can provide more information from the same frame for processing. This can be used, for example, to perform 3D structured light imaging in a single frame [10], or can be used for high-dynamic range imaging [11]. Chapter 1. Introduction 4

Coding methods used in computational cameras can be classified into six approaches [12]. Sensor side coding is the method relying on the image sensor technologies, and what is feasible on such sensors. To understand what can be implemented in such sensors to enable computational imaging, background on conventional image sensors are provided in the rest of this chapter.

1.1.2 Image sensor architectures

The main component of an image sensor is the two-dimensional array of pixels for the sensing of the light rays as depicted in Figure 1.3. The pixel array usually occupies the largest part of the image sensor. Every pixel in the array includes a photosensitive area (the shaded area inside the pixels) that detects the amount of arriving light per pixel. The photosensitive area ratio to the pixel’s total area is known as the fill factor (FF). A block of row decoders is used to control the readout phase and the pixels’ exposure time, during which the photo-generated charges are collected. During the readout time, the row decoders enable a set of the pixels located in a single row to connect to the readout circuitry to process their light-induced generated charges or voltage and send it off the image sensor chip.

In addition to the photosensitive area, pixels usually include a small number of devices to perform signal processing on the collected charges. This can be as small as a source follower transistor to buffer a voltage level to the readout block or include a complete analog-to-digital converter (ADC), such as in some high-speed image sensors. These extra in-pixel circuitries are custom and are based on the application. For instance, global- shutter pixels include more gates and storage capacitors, or the single-photon avalanche diode (SPAD) pixels may include quenching circuits or digital gates for data processing. The extra circuitry in the pixel results in larger pixel sizes or lower FF. One solution to this compromise is to take advantage of 3D integration technology where the extra in-pixel circuitry can be on a separate chip. Chapter 1. Introduction 5

LIGHT PIXEL PIXEL ARRAY

LIGHT

LIGHT ROW DECODERS ROW

PHOTO DETECTOR READOUT CIRCUITS MISC.

Figure 1.3: Common image sensor building blocks.

An image sensor’s readout block can cover a wide range of features and capabilities based on the pixel architecture. This block’s standard functionalities are signal process- ing such as amplification, analog-to-digital conversion, row-noise correction, black-sun correction, and scanning or multiplexing the sensed data to the output.

In addition to the above-mentioned building blocks, other miscellaneous circuits can be included on an image sensor chip. Such blocks can include a sequencer that controls the building blocks’ timing, bandgap and reference generation circuits, serial peripheral interface (SPI) registers for configuring the sensor in different modes, image compression engines, etc.

1.1.3 Pixel operation

A schematic of a 3T active pixel in a column of pixels is shown in Fig. 1.4(a). It is called an active pixel since in addition to a photodetector, it consist of a source follower acting as an active buffering stage. At the readout phase, the pixels are connected to the readout through a SELECT switch, and the source follower becomes active for reading Chapter 1. Introduction 6

VDD LOW LIGHT HIGH LIGHT QPD VDD RESET LIGHT t PHOTO- C VPD DETECTOR PD READOUT SELECT t

tEXP (a) (b)

Figure 1.4: (a) Schematic of a 3T active pixel in a column of pixels, and (b) conceptual output diagram of the pixel. out that row of pixels. The diagram in Fig. 1.4(b) is an illustration of the integration of the photogenerated charges in the photodetector. The number of photogenerated charges are represented by QPD, and the voltage across the diode is represented by VPD. At the start of the exposure time, the photodetector is reset to a know voltage. After the RESET signal goes low, the incident light on the photodetector generates charges. In most of the architectures, the electrons are then accumulated over the photodetector capacitance, CPD, and possible parasitic capacitances. As a result, the voltage of this node decreases where the drop in voltage is a representation of amount of charge. As shown in the diagram, the higher the light intensity, the faster the voltage drops. A very high intensity of light can cause saturation in the pixel as shown by the red curve. There are many options as photodetectors and alternatives for pixel architectures. In the following sections a few of them that are relevant to this thesis are going to be explained.

1.1.4 Photosensing principle

Although the early solid-state photo-detectors were bipolar and MOS photodiodes [13], the silicon-based imaging was dominated by the charge-coupled device (CCD) architec- Chapter 1. Introduction 7 tures in the 1970s. With the advancement of CMOS technologies and the advent of the sub-micron technology nodes in the 1990s, the efforts of developing CMOS-based im- age sensors progressed significantly. The use of in-pixel amplifier (e.g. source follower stage) helped improve the signal-to-noise ratio (SNR) without sacrificing the pixel size in CMOS image sensors. Additional customizations in the CMOS process for improving the sensitivity pushed the cost-performance trade-off in favor of CMOS image sensor options. Other advantages such as operation speed and compatibility to CMOS post- processes (like 3D integration) have made the CMOS image sensors’ dominance even more significant in the past couple of decades.

A comprehensive introduction to various types of photo-detectors can be found in [14]. In this thesis, the focus has been on CMOS image sensors and the associated photo- detectors. Hence only a brief introduction of the related architectures is given here to cover a comparison study in Chapter 2.

One of the early solid-state photo-detectors was a photo-gate. A polysilicon gate is deposited on a p-type substrate that is used for NMOS transistors (alternatively n- well used for PMOS transistors). A thin silicon dioxide layer, called gate oxide, isolates the polysilicon from the substrate. This device is effectively a MOS capacitor (MOS- C) used in integrated circuits and is photo sensitive due to polysilicon’s partial light transparency. A simplified diagram of such structure using a p-type substrate is shown in Figure 1.5(a). The n+ diffusion area is where the collected charge is stored and read out. At the start of the photosensing process, first this node is charged to a known value through φ switch and then the switch is disconnected. When a positive bias voltage is applied to the gate of the device, the holes (the majority carriers in the p-type substrate) are pushed away from the silicon surface. The ionized acceptors in the p-sub create a space charge region (SCR - a region where an electric field is induced) with a depth dependant on the applied bias voltage, electric field drop across the gate-oxide, and acceptor concentration in p-sub. The structure is out of equilibrium, and this condition Chapter 1. Introduction 8

ϕ LIGHT z

- - N+ - - + - e h + - + POLYSILICON GATE OXIDE + P - SUBSTRATE INVERSION LAYER + + z’ Ec Ev (a) (b)

Figure 1.5: Simplified photo-gate device used as a photo-detector (a) the cross-section and (b) the electrostatic potential diagram across z-z’ in depletion mode.

is called depletion. A simplified energy-band diagram of the device under this condition is shown in Figure 1.5(b). Thermally-generated or photogenerated electron-hole pairs in the SCR are separated by the electric field. The electrons are transferred to the silicon- oxide interface and eventually to the n+ diffusion region. At any point of time that the device is still in depletion mode, the potential of the n+ region represents the amount of collected charges, which is a function of the light intensity and integration time. This process continues until the device enters the equilibrium mode, where enough electrons are collected under the gate to compensate for the applied positive bias voltage. At this point, the device is not collecting any more photogenerated charges.

Among the photo-detectors used in imagers nowadays, the most common ones are based on reverse-biased positive-negative doped (PN) junction photodiode, as shown for an n-well/p-sub type in Figure 1.6(a). At the start of the photo-sensing process, a positive voltage is applied to the n+ region to reversely bias the photodiode. Photogenerated electron-hole pairs in the SCR, the latter also referred to as the depletion region of the junction, create a drift current due to the electric field across this region. The electrostatic potential diagram of the device under this condition is depicted in 1.6(b). This current, Chapter 1. Introduction 9

LIGHT TO READOUT z E E - c v - N+ - - N-WELL - - + - h + e DEPLETION + REGION - + + + + P - SUBSTRATE + z’ (a) (b)

Figure 1.6: A reverse biased PN junction as a photodiode (a) cross-section and (b) electrostatic potential diagram across z-z’. also known as the photocurrent, flows from the N-side to the P-side. In passive pixels, this photocurrent is read out as a representation of the light intensity level. On the other hand, in the active pixels, the photocurrent is integrated on a capacitor whose potential represents the amount of charge collected.

1.1.5 Performance metrics

In this section, a few of the many metrics used to characterize an image sensor’s perfor- mance are explained in short in order to facilitate the flow of the next chapters.

Noise in image sensors

The noise in image sensors is categorized into two main types; fixed pattern noise (FPN) and temporal noise. The FPN is the variation in the pixels’ response within the same image sensor array under uniform illumination conditions. This is caused due to device mismatches in the pixel and readout circuitry or the physical signal traces. FPN is divided into two parts: dark signal non-uniformity (DSNU) and photo response non-uniformity (PRNU). Chapter 1. Introduction 10

V V RESET

t t (a) (b)

Figure 1.7: Uncertainty in the final pixel output because of (a) reset level error due to static (DSNU) or temporal (random) noise, and (b) error in the static response of the pixel (PRNU).

DSNU acts like an offset in the pixel output, as depicted in Fig. 1.7(a), independent of the signal level. In contrast, as shown in Fig. 1.7(b), PRNU represents a gain error in the response of the pixel. Hence, PRNU is proportional to the signal level. Depending on the source of the FPN, it can be pixel-level FPN (for instance, mismatch of the source-follower within pixels), column-wise FPN (for example, the mismatch in the current source of the source follower in the column) or other possible patterns based on the analog readout circuits. FPN is fixed among frames and can be corrected by post-processing, although the best practice is to mitigate it as much as possible by proper design and architecture choices.

Temporal noise, as its name implies, changes from frame to frame. It can be depicted as in Fig. 1.7(a), with the difference that the error changes frame to frame in comparison to static value of the DSNU. Shot noise, pixel reset noise, random telegraph signal, read- out thermal or flicker noises, and quantization noise are among the sources of temporal noise. Temporal noise sets the ultimate limit on the signal fidelity in image sensors, and there have been numerous efforts since the advent of CMOS image sensors to improve temporal noise.

The total output noise depends on how the readout of the pixels is performed. Il- lustrated in Figure 1.8, are four general approaches of reading out the pixel’s signal. It Chapter 1. Introduction 11

FRAME i i+1 FRAME i i+1

V V S2 S2 RESET RESET

S1 S1 S1 S1 t t (a) (b)

FRAME i i+1 FRAME i i+1

V S1 S1 V S1..Sn S1..Sn RESET RESET

S2 Sn+1..S2n S2 Sn+1..S2n t t (c) (d)

Figure 1.8: Simplified diagram of (a) single-sampling, (b) double-sampling, (c) correlated double-sampling and (d) correlated multiple-sampling readout schemes.

should be noted that the waveforms shown here only represent APS pixel voltage outputs, while other pixel architecture readouts (such as digital pixels, single-photon avalanche diodes, jots in quanta image sensors, etc.) are beyond the context of the thesis and are not shown here. Figure 1.8(a) shows the case where only pixels’ final value after the exposure phase is read out. This type of readout is used in ultra-high-speed or ultra- low-power sensors where noise performance is compromised for speed or power. In cases that more time and power are affordable, double-sampling (DS) readout shown in Figure

1.8(b) is performed where two samples S1 and S2 are read. The final output representing the image is the S2 − S1. With this subtraction, offset (DSNU) and low-frequency noise sources (like readout circuits flicker noise) are canceled out, and better signal quality is achieved. On the other hand, parts of the temporal noise such as reset noise and read- out thermal noise are increased as they are uncorrelated between the two samples. If the pixel architecture allows (such as 4T pinned photodiode pixels), then the correlated Chapter 1. Introduction 12

LIGHT READOUT TX P+ N+

N-WELL P-WELL

e-

P - SUBSTRATE

Figure 1.9: A simplified charge transfer path of a photo-generated electron. double-sampling (CDS) readout shown in Figure 1.8(c) can be performed. In addition to the benefits that DS offers, the advantage of CDS is that the reset noise of the two samples are correlated and can be suppressed. The reset noise is usually one of the dom- inant noise source in DS readout, and with CDS architectures, temporal noise as low as a few e− has been reported. The last readout method is correlated multiple-sampling (CMS) as illustrated in Figure 1.8(d). In CMS, the reset and signal levels are read out n times, and the final value is extracted from the average value of the samples. As the readout temporal noise is independent among the readout, when averaging the power of the noise is divided by n. Hence for the same signal level, the readout noise is divided √ by n. With this technique, sub-e− noise performance has been reported [15].

Charge transfer in image sensors

Charge transfer in the image sensors refers to the process of the transport of the photo- generated charges from any point within the pixel area to a collection node (e.g. the floating diffusion in a standard 4T PPD-based pixel). This process is governed by several factors such as drift and diffusion currents in a pn-junction, and induced electric fields. This subject is discussed in more detail in the next chapter, but a simplified charge transfer path in a standard 4T pixel is shown in Figure 1.9. During the exposure time, the photo-generated electrons are first transferred to the N-doped well of the photodiode. Chapter 1. Introduction 13

These charges are accumulated until the readout phase when the transfer gate (TX) is toggled to conduct the electrons to the floating diffusion (FD), also known as a tap or bucket. The incomplete charge transfer can happen when the charges are trapped due to unwanted pockets or barriers in the transfer path or if the TX signal’s toggling is fast. This incomplete transfer can affect the subsequent frames and is known as image-lag. In multi-tap pixels, it deteriorates the contrast between the signal read out from the multiple tap outputs (or in other terms, part of the signal leaks to other taps) and is known as demodulation contrast, tap contrast, and also extinction ratio.

1.2 Thesis motivation: coded-exposure imagers

The race for improving the image quality has driven the image sensor pixel pitch to sub- µm [16] to achieve higher image resolution, and has yielded the read noise level of sub- e− [17] and the dynamic range of up to 132dB [18]. Besides application-specific solutions, there have been various process developments and novel circuit techniques to improve the pixel performance to these astonishing levels. The future of the imaging industry will likely rely on what else a camera can offer. Already, cameras can refocus photos after they are captured (e.g., cameras by Lytro [19], Raytrix [20] and Pelican Imaging [21]); and they can even record three-dimensional RGB-D video, whose pixels contain depth information in addition to RGB color (e.g., cameras by [22], PrimeSense [23] and Microsofts [24]). These cameras fall in the computational imaging category as they rely on tight integration of optical and computational processing to produce an image [12]. An image sensor that performs part of the computation or facilitates the novel optics for a computational camera is a computational image sensor. Cameras incorporating such image sensors can potentially benefit from smaller form factors, lower power consumption, and simpler system integration.

Computational photography techniques using coded-exposure have enabled new imag- Chapter 1. Introduction 14

CONVENTIONAL COMPUTATIONAL ROLLING-SHUTTER GLOBAL-SHUTTER CODED-EXPOSURE PIXEL

(a) (b)

Figure 1.10: Exposure time of conventional (a) and coded-exposure-pixel (b) image sen- sors. ing capabilities. Electronic repurposing of existing off-the-shelf camera modules for ap- plications like transient imaging and demultiplexing light sources are reported in [25–28]. Other works, by adopting unconventional optics together with off-the-shelf cameras, have reported high-speed and transport-aware imaging by per-pixel coded-exposure [9,29,30].

Moreover, developing camera modules with custom-designed computational image sensors have been on the rise [11, 31–33]. Time delay integration and spatiotemporal filtering by in-pixel ADC implementation have been addressed by [31]. Dual and quad- bucket pixel architecture for several computational imaging techniques has been proposed by [11], and [32] has offered compressive sensing for high-speed imaging by a multi- aperture image sensor.

An emerging focus of computational image sensors is the programmability, or coding, of the camera exposure at the individual-pixel level, as illustrated in Figure 1.10 in comparison to conventional rolling and global-shutter cameras. Unlike global-shutter pixels, which are exposed for the same time interval (exposure time), the pixel in this emerging class of coded-exposure-pixel (CEP) cameras can be programmed to detect only some of that light selectively.

CEP cameras can be used in many different applications that can be classified into Chapter 1. Introduction 15

LOW EXPOSURE HIGH EXPOSURE

CAMERA

CONVENTIONAL

CEP CAMERA IN NORMALIN MODE

CODE GENERATED BASED RECONSTRUCTED

ON PREVIOUS FRAME HDR SENSOR OUTPUT

CEP CAMERA IN HDRIN MODE

Figure 1.11: Extending DR by using a CEP camera in a static scene and passive illumi- nation [34].

4 groups based on the scene and illumination combinations. The scene is considered static if there is zero or negligible change in the scene in one frame exposure time. In contrast, if the scene has so much movement in one frame exposure that the image of a conventional camera is distorted, or blurred, it is considered dynamic. In terms of illumination, it is considered passive illumination if there is no controlled light source, for instance daylight, or only constant uniform projection. When there is a temporally and/or spatially varying light projection, it is considered to be an active illumination imaging.

Static scene and passive illumination: In this case, a CEP camera can enhance the quality of the image. For example, the camera can be coded with a pattern that is a function of the light intensity on individual pixels to enhance the dynamic range [34]. An example image can be seen in Fig. 1.11. The top 4 images show both high and low Chapter 1. Introduction 16

exposure images of a conventional camera and a CEP camera imaging in a conventional manner. For the low exposure mode, the details of the darker part of the image are not visible while for the high exposure mode the details of the bright part of the image are lost. In bottom row, a scene dependent code that is generated to improve the image quality is illustrated as well as the output image of the CEP camera when such code is applied to it. The details of the scene in all parts of this image can be seen.

Static scene and active illumination: In this combination of scene and illumina- tion, the light projection can have variations in different modes, for example spatiotem- poral pattern or light source wavelength. In case of the spatiotemporal, during a frame exposure time multiple spatial patterns can be projected to the scene and a CEP cam- era can be programmed accordingly to extract new information from the scene. This information can be used for single-frame 3D imaging [10], transport-aware imaging [27], depth-gating and other possible applications. One example is direct/indirect imaging, a sub-category of transport-aware imaging, which decomposes the light from the projector into two components. One is the direct light, the light rays that have only one reflection in the scene, and the other is indirect light, the light rays that undergo more than one reflections or refractions. An example image is shown in Fig. 1.12, where the tap 1 of the camera represents the direct image and tap 2 represents the indirect image.

Dynamic scene and passive illumination: If the activity in the scene is high, a conventional camera’s output image would look blurry. This is because during one frame exposure time, any point of the moving objects in the scene maps to multiple pixels rather than a single pixel in the camera. A CEP camera can be used for compressive sensing where each pixel’s image data is compressed with a sequence of pseudo random codes [29, 33]. Multiple frame data can then be de-compressed from the single frame readout. Examples of such images can be seen in Fig. 1.13.

Dynamic scene and active illumination: To the best of my knowledge, this scene and illumination combination is not explored yet. One reason could be the limitations Chapter 1. Introduction 17

1 PATTERN PROJECTOR SCENE

CEP CAMERA

TAP1 IMAGE TAP2 IMAGE

30 PATTERNS 1ST 30TH ACCUMULATE THE IMAGES OF BOTH TAPS FOR 30 PATTERNS IN ONE FRAME

TAP1 IMAGE TAP2 IMAGE

Figure 1.12: Example of direct/indirect imaging, explained in more detail in Chapter 5.

that currently exist for CEP cameras. Having both dynamic scene and active illumination may require more than two taps in the pixels to resolve the final image, while currently only single- and dual-tap pixels have been available.

The CEP concept is illustrated in more detail in Fig. 1.14. The pixels in the image array can be programmed arbitrarily during the exposure time of the camera. This is Chapter 1. Introduction 18

CODED EXPOSURE CLOSE-UP RECONSTRUCTED FRAMES (3 OUT OF 9) IMAGE (27ms) [D. Liu, et. al, IEEE TPAMI 2014]

Figure 1.13: Experimental results from [29]. First column: Input coded exposure im- ages. Numbers in parentheses denote the camera integration time for the input image. Second column: Close-ups illustrate the coded motion blur. Third-sixth columns: The reconstructions maintain high spatial resolution despite a significant gain in temporal resolution (9X − 18X). performed by submitting codes, or mask bits, to the pixel array. Depending on the information/application of interest, these masks can take very different shapes and need to change many times during a single image exposure time, as depicted in Fig. 1.14(a). In order to electrically implement this functionality, the complete frame exposure time is divided into many different “subframes” and a single bit of data is sent to each individual pixel to define the value of the pixel mask during the corresponding subframe. In the example shown in Fig. 1.14(b), the pixel has two photogenerated-charge collection nodes (buckets or taps). Depending on whether the pixel masking bit value is one or zero, the signal is either integrated on the first or on the second bucket, respectively.

Having two buckets allows the sensor collect the signal even when the pixel is masked by the computational imaging scheme. Having two buckets is especially useful in appli- cations where multiple data information should be collected on each pixel. Examples are single-frame 3D imaging and direct/indirect imaging, where different image information

ONE FRAME

ROLLING ROLLING SHUTTER

Chapter 1. Introduction 19

BAND

ROLLING ROLLING

SHUTTER

CODE

SUBFRAME SUBFRAME ARBITRARY SUBFRAME 1 SUBFRAME 2 SUBFRAME 3 SUBFRAME N (a)

ONE FRAME SFR1 1 SFR 2 SFR 3 SFR N MASK BIT 0 1 0 1 MASK UPDATE

MASK INTEGRATE MASK INTEGRATE BUCKET 1 INTEGRATED CHARGE (SIGNAL)

time

BUCKET 2 INTEGRATE MASK INTEGRATE MASK INTEGRATED CHARGE (SIGNAL)

time 1SFR = SUB-FRAME (b)

Figure 1.14: An illustration of the electrical coded-exposure implementation: (a) coded- exposure imaging example of the arbitrary codes that could be applied over many dif- ferent subframes, (b) pixel timing diagram of the related pixel signal integration process depending on the code value over multiple subframes. are needed and multi-bucket (multi-tap) sensors make it possible to capture all of that data in one frame. This thesis’s focus is on the custom design of CMOS image sensors for such CEP cameras, with applications in automotive, robotics or biomedical sectors such as 3D imaging, depth gating, compressive imaging, high-speed imaging, looking around the corner, extended dynamic-range, imaging in difficult smoky conditions, hyperspectral imaging, seeing under the skin, etc.

1.3 Thesis objectives

The objectives of this thesis are as follows:

ˆ Comparison study and device simulation of different types of photo-detectors suit- Chapter 1. Introduction 20

able for CEP image sensors

ˆ Exploration and design of CEP prototypes in CMOS image sensor technologies

ˆ Propose and design of coded-exposure pixels:

– with compact pixel pitch, suitable for high-resolution image sensors

– meeting speed requirements of the computational imaging applications

– with low compatibility risk for accessible image sensor technologies

ˆ Design of custom camera modules, incorporating custom-designed image sensors, for computational imaging applications

ˆ Development of camera modules in sync with light sources for demonstration of active illumination computational imaging applications

1.4 Thesis outline

The rest of this thesis is organized as follows:

ˆ Chapter 2 provides a comparison study and analysis of 5 different pixel architectures that can be used for CEP computational cameras. Device simulation results, using Sentaurus TCAD tools, have been presented to compare their sensitivity and tap contrast. A brief overview of relevant pixels presented in literature is also provided for completeness.

ˆ Chapter 3 outlines the design and implementation of the first CEP image sensor prototype using photo-gate pixel architecture. The fabricated 6 × 6mm2 chip in a 0.35µm CMOS technology, with a pixel pitch of 25µm, accommodates 180 × 160 pixels. Proposing dual-tap architecture, the light can be collected for both code values 0 and 1, making it more light-efficient. The dual-tap pixel features different Chapter 1. Introduction 21

gains at each tap to improve performance for computational imaging techniques where one tap may collect a much weaker signal level than the other.

ˆ Chapter 4 describes a code memory pixel architecture in a 0.11µm CMOS image sensor technology. The pixel is PPD-based to improve the sensitivity and overall performance compared to previous design. The pixel pitch of 11.2µm was the smallest dual-tap CEP architecture reported at the time (and the 2nd smallest if single-tap on-chip CEP architectures are considered as well). In contrast to the first design, the per-tap analog gain is implemented in the readout. In this chapter, the electrical functionality and performance of the chip, and two computational imaging techniques that this camera was used for, are explained.

ˆ Chapter 5 is dedicated to the final pixel architecture proposed in this thesis, a data memory pixel. Like the second prototype, this chip was also fabricated in 0.11µm technology but uses storage pinned diodes to sample the image data before sorting them based on the per-pixel code. This pixel was implemented in a pitch of 7µm, and with a 312×320 resolution – the smallest pixel and the highest image resolution reported for a CEP image sensor. This chapter discusses the details of the design and performance, and multiple computational imaging demonstrations that used this prototype.

ˆ Chapter 6 concludes the thesis and provides some direction for future research in the area of CEP image sensor design. Chapter 2

Comparison of photodetectors for coded-exposure pixels

This chapter focuses on the comparison of photodetectors for CEP applications with the following highlights:

ˆ Brief explanation of the three suitable photodetector options for CEP applications and introduction of their dual-tap counterparts;

ˆ Device simulation conditions, process profile choices and explanation of the pho- todetectors’ operation principles;

ˆ Sensitivity and contrast simulation results for comparison among the devices;

ˆ Analysis, compatibility and trade-off discussion of using additional in-pixel devices.

Candidate photodetectors for use in multi-tap pixels are simulated and evaluated for computational imaging applications in this chapter. The photodetector architectures are picked based on their prior-art usage in multi-tap applications (such as computational imaging, time-of-flight (ToF) imaging, fluorescence life-time microscopy, etc.) and their technology accessibility. These architectures are compared in terms of charge transfer

22 Chapter 2. Comparison of photodetectors for coded-exposure pixels 23 speed and sensitivity. Additionally, the challenges of in-pixel circuitry on the photode- tector performance are discussed.

2.1 Introduction

Custom-fabricated image sensors in complementary metal-oxide-semiconductor (CMOS) processes for computational imaging applications are gaining attention because of the new features and functionalities they offer. As mentioned in the previous chapter, Com- putational cameras are referred to the the class of cameras where a combination of novel optics and computation is used to implement a new way of imaging. In sensor side coding, what can be offered by the image sensor technologies, and pixel architectures in particular, play an important role.

Cameras with multi-tap pixels have introduced new possibilities in the field of com- puter vision. Dual-tap ToF cameras are adopted in transient imaging [25], illumina- tion demultiplexing via nanosecond coding [26] and snapshot difference imaging [28]. Additionally, custom-designed sensors with dual and quad-tap pixels are reported in [11] for applications in high-dynamic-range imaging and motion-corrected photography. Dual-tap pixels with per-pixel controllability have shown the possiblity of single-frame structured-light imaging [10,35].

Camera modules that implement CEP functionality by off-the-shelf image sensors and additional hardware such as liquid crystal on silicon (LCoS) and digital micromirror devices (DMD) have been reported in the past decade. Examples are light transport prob- ing [30, 36, 37], compressive video acquisition [9] and computational imaging on electric grid [38]. The additional optics and components in these systems make them bulky, ex- pensive, complex, and degrade the performance in terms of distortion and light-efficiency.

In the recent years, custom-designed image sensors are reported to address these issues in compressive imaging [33], image de-blurring [39], primal-dual coding [40] and Chapter 2. Comparison of photodetectors for coded-exposure pixels 24

PIXEL CTRL MATRIX

PIXEL ARRAY

TEXP CTRL CTRL V DRAIN DD OR RST TAP 2 SEL SF SW2 SW1 Photo- CFD1

detector READOUT

Figure 2.1: The conceptual circuit of coded-exposure pixel.

single-frame 3D imaging [10]. Each of these image sensors uses a different photodetector type, and choosing the proper one for the application depends on factors such as light wavelength, operation speed, dynamic range, spatial resolution, etc. What all these applications have in common is the possibility of per-pixel programmability during the integration time of the sensor, which can be achieved by a circuit concept shown in Fig. 2.1. In this circuit, when CTRL is toggled high based on the code sent to the pixel,

photo-generated charges are accumulated on CFD1. On the other hand, when CTRL is set low and consequently CTRL is high, the charges are either sent to another output identical to CFD1 or drained away from the photodiode. At the end of the frame time, the pixels’ output(s) are read out through the peripheral blocks.

The CEP architectures have in common that the camera’s exposure time is divided into multiple intervals of Tsubf , called subframes or sub-exposures, during which all the photo-generated charges should be transferred to either of the outputs based on a per- pixel code. The Tsubf in computational imaging applications defines the pixel’s charge Chapter 2. Comparison of photodetectors for coded-exposure pixels 25

transfer time. The subframe time reported in the relevant computational imaging appli- cations ranges from tens of µs to 10 ms. This range is much higher compared to ToF image sensors that achieve almost complete charge transfers of a few ns up to a few hundreds of ns, [41–43].

To compare the photo-detection architectures used in [33, 39, 40], important perfor- mance metrics that affect the image quality of multi-tap computational image sensors shall be studied. For this reason, tap-contrast is considered for demonstrating how well the charges are transferred to the correct pixel tap at the desired speed and the sensitiv- ity of the pixels represents how much the output changes at a given light intensity. In the next section, we will have a quick review of the operational speed requirements and the top pixel choices. Section 2.3 shows the comparison of the pixel choices in charge transfer speed, sensitivity, and tap-contrast. Section 2.4 is about the implementation considerations, followed by conclusions in section 2.5.

2.2 Photodetector architectures

2.2.1 Operational speed

CEP computational imaging applications require a wide range of Tsubf . Up to 800 sub-

frames is reported in [37] (with Tsubf ' 42µs at 30 fps). On the other hand, typical number of subframes used in [9] are 9 ∼ 18 (with Tsubf = 1ms at 55 fps) and 14 in [33]

(with Tsubf = 10ms at 7.1 fps).

Regardless of the application, the minimum achievable Tsubf in an image sensor and accordingly the maximum subframe rate, are determined by the time required for updat- ing the codes in the array. The access time of one row in the array for submitting the code, Tacc, is a determining factor of the maximum subframe rate. It can be calculated by: Chapter 2. Comparison of photodetectors for coded-exposure pixels 26

PHOTOGATE N-WELL/P-SUB PINNED PHOTODIODE LIGHT LIGHT LIGHT VPG TX READOUT READOUT POLYSILICON READOUT POLYSILICON P+ N+ N+ P+ N+ P+ N+

N-WELL N-WELL

P-WELL P - SUBSTRATE P - SUBSTRATE P - SUBSTRATE

(a) (b) (c)

Figure 2.2: The cross-sections of the (a) photogate, (b) n-well/p-sub, and (c) pinned photodiode.

Tsubf,min Tacc = Nrow

where Nrow is the number of updated rows at each subframe and Tsubf,min is the minimum subframe time, the maximum value of Nrow is the number of rows in the array for row- based code update. It is smaller in cases that multiple rows are updated simultaneously,

or partial array updates are required. In the example of Tsubf = 42µs, if an image sensor with VGA resolution is considered (640 × 480), the pixels row access time will be

Tacc = 87.5ns.

In comparison to a conventional image sensor, there are two additional operations in a CEP sensor that have to be performed. One is the code data that has to be sent to the individual pixels, and another is the sorting of the photo-electrons based on this

code. There are two ways that Tacc time can be allocated to these two tasks: 1. Perform both tasks during this time and sort the photo-electrons while the code is shipped to the pixels, or 2. Pipeline the operations by shipping and store the code data in a memory cell in the pixel during Tacc and perform the sorting at a longer time (for example, during the consecutive subframe period). Any of these choices have their advantages and require its architecture. Chapter 2. Comparison of photodetectors for coded-exposure pixels 27

2.2.2 Architecture choices

Since the wide adoption of CMOS active pixel sensors (APS) instead of charged-coupled devices (CCD) in digital photography, different architectures have been proposed. There are a number of insightful comparative studies on photodetector topologies, for in- stance [44], with the focus on single-tap applications. To address dual/multi-tap pixel topology comparison, three photodetector options are chosen in this chapter. They are chosen based on literature in multi-tap applications, such as [33,39,40,45], and technol- ogy accessibility. Other architectures may outperform these choices, but they may need customization or specialized technology processes, thus are not included in this compari- son. Only the dual-tap enhanced-photogate architecture that is simulated and discussed in later sections of this chapter requires process customization, and that is only included to show the potential of such approaches. These photo-detectors are:

Photogate (PG): A polysilicon gate on a silicon substrate separated by a thin gate- oxide layer, as shown in Fig. 2.2(a), forms the structure of a photogate. When the photogate is positively biased (on a p-type substrate), it collects the photo-generated charges in a channel underneath it. The electrons are collected and accumulated for readout by n+ doped regions. The charge traps at the interface of the silicon and gate- oxide may lower the speed for the charge transfer and increase the dark current generation rate. A possibility for improving the performance is by adopting buried channels using a shallow p-doped layer below the polysilicon. Adding an n-doped well can also improve the charge transfer efficiency. The polysilicon gate is not transparent for all visible wavelengths and lowers the device’s quantum efficiency, especially in the light wavelengths shorter than 600nm. Using multi-finger photogate can enhance the sensitivity [46].

n-well/p-sub diode: This diode is formed by n-well implantation on the p-doped substrate, a simplified cross-section illustrated in Fig. 2.2(b). At the start of the expo- sure, the diode is reverse biased, resulting in a depletion region at the p-n junction. The width of the depletion region depends on the reverse bias potential and doping levels. Chapter 2. Comparison of photodetectors for coded-exposure pixels 28

The higher the reverse bias, the larger the width. By the start of the exposure time, the n+ terminal is disconnected from other circuits and is a floating node, known as floating diffusion. As the photons generate electron-hole pairs in the pixel (inside the depletion region or within its diffusion length), the electrons are attracted to the n+ area lowering the voltage across the diode. At the readout time, this voltage is proportional to the number of electrons collected during the exposure time, which represents the intensity of the light incident on the pixel. Although a larger diode can collect more photo-generated charges, its larger capacitance may limit the sensitivity improvement.

Pinned photo-diode (PPD): PPD is the most commonly used photo-detector in consumer image sensors today. An n-well is sandwiched between the p-substrate and a p+ region at the silicon and silicon dioxide region interface, as shown in Fig. 2.2(c). At the start of the exposure, the TX signal is asserted high, and the n-well charges are depleted. With the TX signal going down, the new photo-generated charges are collected and stored in this area until the readout time. At the readout phase, TX is again asserted high, and the charges are sent to the readout. Excellent dark current performance is achieved since the interface charge traps are isolated from the n-well area by the p+ doping. Noise performance can be excellent thanks to correlated double sampling (CDS). Additionally, good sensitivity and compactness have caused the dominance of this pixel architecture in the market.

2.2.3 Dual-tap architectures

Pixels in the CEP image sensors have to collect portions of the charges during the expo- sure time selectively. For example, the pixel in [33] uses one-tap PPD architecture where the unwanted charges are discarded through supply. This process is controlled by the code sent to the pixels in each subframe interval. This architecture has the limitation of having only one programmable continuous interval at every frame, and the readout of the pixel is at the end of this interval. Although efficient for temporal compressive Chapter 2. Comparison of photodetectors for coded-exposure pixels 29

PHOTOGATE N-WELL/P-SUB

RO1 RO1 TX1 VPG TX2 TX1 TX2 RO2 RO2 POLYSILICON P+ N+ N+ P+ N+ N+ PINNED PHOTODIODE N-WELL RO1

P - SUBSTRATE TX1 TX2 P - SUBSTRATE RO2 (a) (c) P+ N+ N+ PHOTOGATE N-WELL/P-SUB RST C2 S2 N-WELL RO1 C1 S1 P-WELL TX1 VPG TX2 RO2 RO P - SUBSTRATE POLYSILICON (e) P+ N+ N+ P+ N+ P+ N-WELL N-WELL

P - SUBSTRATE P - SUBSTRATE

(b) (d)

Figure 2.3: Dual-tap pixel choices. The cross-sections of the (a) PG, (b) enhanced-PG, (c) dual-TG n-well/p-sub, (d) n-well/p-sub with in-pixel CTIA, and (e) PPD. sensing, it can not be used for most CEP applications.

A more light-efficient approach is to have two taps per pixel, one tap for each of the code values of 1 and 0. This dual-tap approach has been used in [39,40] with n-well/psub and PG photo-detectors. These architectures are more attractive as they provide more information for processing and enable more computational imaging applications. In the following, five dual-tap pixel implementations based on the photo-detectors explained in 2.2.2 are discussed.

Dual-tap PG: PG devices are reported in dual-tap image sensors such as ToF appli- cations [47,48]. Multiple implementations are possible with a difference in doping profiles as shown for two cases in Fig. 2.3(a) and (b). The implementation in Fig. 2.3(a) suffers from a barrier in between the VPG and T Xx gates, which is in the transfer path of photo-generated charges. To overcome this shortcoming, a shallow p-doped layer can be used as shown in Fig. 2.3(b). In this study, we refer to this structure as the enhanced Chapter 2. Comparison of photodetectors for coded-exposure pixels 30 photogate (enhanced-PG).

Dual-tap n-well/p-sub: Two implementations have been reported based on n- well/p-sub as shown in Figs. 2.3(c) and (d), namely using dual transfer gates (dual-TG) and using in-pixel capacitive transimpedence amplifier (CTIA), respectively.

In the dual-TG case, the gates TX1 and TX2 are connected based on the code sent to the pixel, conducting the charges toward one tap or the other as proposed in [45]. Using a dual, or multiple, transfer gate is a standard solution for multi-tap pixels with different photo-detector types. The other approach collects the photo-generated charges in an n-well/p-sub diode onto two capacitors of the in-pixel CTIA. The in-pixel CTIA is used to create a virtual ground node for depleting the charges from the photo-detector. CTIA usage is not limited to only n-well/p-sub photodiodes. Common exploitations increase the sensitivity of the pixel [49] or selectively integrate photo-generated charges on two or multiple capacitors [50]. One possible implementation for n-well/p-sub CTIA pixel with two collection capacitors is illustrated in Fig. 2.3(d). Before starting the integration time, the RST, S1 and S2 switches are shorted to discharge the capacitors and photo- detector from any charges. At each subframe during the integration time, based on the code assigned to each pixel, one of the capacitors is put in the amplifier’s feedback. The charges are accumulated on the capacitors in several subframes and are read out once at the end of the frame. PMOS transistors are considered to be area expensive as their n-well requires extra spacing from other devices, especially photodiodes. Consequently, NMOS only architecture in the pixel is preferred for the amplifier where the PMOS devices are shared for all pixels in the same column. In such configuration, the reported reverse-bias voltages of the photodiode are usually in the 500 − 700mV range.

Dual-tap PPD: PPD is also widely used in 2-tap ToF sensors, with demodulation frequencies up to tens of MHz [41]. In a 2-tap PPD pixel, if the floating diffusion (FD) is used as a memory node for the photo-charges, CDS readout can not easily be performed, and the temporal noise performance will be compromised. Additionally, the dark current Chapter 2. Comparison of photodetectors for coded-exposure pixels 31

associated with the FD node is added to the signal during the exposure. Customized processes can overcome this limitation by introducing a charge-mode memory in the signal path [11,51].

2.3 Comparison of dual-tap pixels

The five dual-tap structures are simulated using Sentaurus TCAD tools. The cross- section and doping profiles are shown in Fig. 2.4. The simulated depth of all the struc- tures is 6µm while the effective photo-sensitive width is 5µm.

It is paramount to mention that these doping profiles are the best-guessed values based on published articles, and they are not reflecting a specific technology process. It has to be noted that an optimized process in any of the structures, which sometimes can include additional doping profiles, yields better results than what is reported here. This analysis’s intention is a comparative study of the behavior of the pixels considering similar process conditions.

The p-sub of all structures is considered to have a constant doping concentration of 2e+15 cm−3, with higher doping at the 6µm depth for substrate contact. There are additionally p-wells with substrate contact on the side of the pixel structures with peak doping levels of 2e+17 cm−3 at the surface.

2.3.1 Cross-section of the pixels n-well/p-sub

The cross-section of n-well/p-sub pixel with two transfer gates and with in-pixel CTIA are shown in Figs. 2.4(a) and (b), respectively. The CTIA circuitry is not shown in this cross-section, but the simulation includes the SPICE models of capacitors, ideal switches, and an amplifier with a voltage gain of 20 (26dB).

The n-well doping at the surface is 1e+17 cm−3 and has a Gaussian concentration Chapter 2. Comparison of photodetectors for coded-exposure pixels 32

X [µm] X [µm] -6 -3 0 3 6 -6 -3 0 3 6

0 0

] ]

m m µ

µ -3 -3

[ [

Y Y Y

-6 -6

(a) (b)

X [µm] X [µm] -6 -3 0 3 6 -6 -3 0 3 6

0 0

] ]

m m µ

µ -3 -3

[ [

Y Y Y

-6 -6

(c) (d) X [µm] -6 -3 0 3 6 Doping concentration (cm-3) 1e+21 0 1.3e+13

-1.4e+14 ] m

µ -3

-1e+20 [ Y Y

-6 (e)

Figure 2.4: Doping profiles of the simulated dual-tap structures: (a) PG, (b) enhanced- PG, (c) dual-TG n-well/p-sub, (d) n-well/p-sub with in-pixel CTIA, and (e) PPD. Chapter 2. Comparison of photodetectors for coded-exposure pixels 33 with a depth of 0.5µm. This depth is similar to practical n-well depth in the standard CMOS process. The deeper this well can get, the higher the sensitivity since the pixel will have a higher possibility of collecting the photo-charges, especially for longer wavelengths that penetrate deeper in the silicon.

For the CTIA pixel, there is only one collection point from the photodiode, and it is considered to be in the center of the device as shown in Fig. 2.4(b). This choice makes the charge collection faster than having the n+ layer on one side of the device. The reason is that the electric field is higher (same potential difference over shorter range), and also the maximum horizontal length that the charges travel is shorter. The disadvantage is that in the pixel’s physical layout, a metal trace should be routed over the pixel, which may slightly compromise the quantum efficiency.

Photogate

The cross-section of the PG pixel is shown in Fig. 2.4(c), where the polysilicon gate in the center shall be positively biased to accumulate the photo-generated electrons underneath it. As mentioned in the previous section, a potential barrier between the transfer gates and the photogate in the center can cause a slow response and a non-complete charge transfer. The enhanced-PG structure, as shown in Fig. 2.4(d), can address these issues. In this study, the shallow p layer has an acceptor concentration of 5e+18 cm−3 with a depth of 0.1µm. The n-well has a Gaussian doping profile with a peak of 5e+16 cm−3 at the surface and a depth of 0.6µm. These profiles make it possible to completely deplete the photodiode to the connected tap, similar to PPD operation. The doping characteristics of the shallow p layer and the n-well are essential for the full-well charge, charge transfer efficiency, and speed. Chapter 2. Comparison of photodetectors for coded-exposure pixels 34

PPD

Fig. 2.4(e) shows the cross-section of a PPD pixel. The layers here are extracted from what can be found in literature and do not reflect a specific technology. The main doping profiles are a P+ layer on top of the n-well, forming the pinned photodiode. Below the transfer gates, there is an additional shallow p layer used to avoid the possibility of a pocket, or trap, for the electrons in this region. The P+ layer has an acceptor concen- tration of 1e+20 cm−3 with a depth of 0.1µm, the n-well has a peak donor concentration of 1.1e+16 cm−3 with a depth of 1.5µm. Other values for this structure can be found in literature such as [52]. For example, the depth of the n-well is 0.5 − 0.6µm. The doping and depth values are adjusted to reach a pinning voltage of around 1 V .

2.3.2 Simulated electrostatic potential diagrams

Simulated electrostatic potential diagrams of the devices are shown in Fig. 2.5. The electrostatic potential simulations for PG structures are shown in Figs. 2.5(a) and (b). The photogate in the center is biased at 1.65V , and the transfer gates are at 0 and 3.3V . A horizontal potential gradient can be seen in the diagram, which can help in achieving good transfer to the desired tap. Pixels that use such voltages on poly-silicon gates to induce electric field are also known as lateral electric field modulators (LEFM) in some references [53].

In the case of the n-well/p-sub photodetector, the dual-TG n-well/p-sub shows a depletion region deeper than the CTIA pixel as shown in Figs. 2.5(c) and (d), respectively. It is because of the assumption that the CTIA is using NMOS as the input transistor and requires biasing point close to 600mV . This value for the photodiode’s reverse bias voltage results in shallower depletion depth compared to the 3.3V of the dual-TG photodiode. In the dual-TG structure’s potential diagram, the right transfer gate is at high voltage, and the left one is turned low to conduct the charges to the right tap. The simulated horizontal potential gradient is not significant, meaning that the electric field Chapter 2. Comparison of photodetectors for coded-exposure pixels 35

X [µm] -4 -2 0 2 4 Electrostatic

0 potential (V) ]

m 3.8

µ [

Y Y 2.3 -1 0.9 -0.5 (a) X [µm] -4 -2 0 2 4 Electrostatic

0 potential (V) ]

m 3.8

µ [

Y Y 2.3 -1 0.9 -0.5 (b) X [µm] -4 -2 0 2 4 0 Electrostatic potential (V)

] 3.8 m

µ -1

[ 2.3 Y Y 0.9 -0.5 -2 (c) X [µm] -4 -2 0 2 4 Electrostatic 0 potential (V)

] 3.8

m µ

[ 2.3 Y Y -1 0.9 -0.5 (d) X [µm] -4 -2 0 2 4 0 Electrostatic potential (V)

] 3.8 m

µ -1

[ 2.3 Y Y 0.9 -0.5 -2 (e)

Figure 2.5: Simulated electrostatic potential diagram of the dual-tap pixel structures: (a) PG, (b) enhanced-PG, (c) dual-TG n-well/p-sub, (d) n-well/p-sub with in-pixel CTIA, and (e) PPD. Chapter 2. Comparison of photodetectors for coded-exposure pixels 36

conducting the photocharges to the taps is not large. Thus the charge-transfer time is not expected to be better than the other structures, resulting to a poorer tap-contrast.

The PPD layers are adjusted such that the pinning voltage, CTIA pixeli.e. the potential at which the diode is fully depleted from carriers, is approximately 1V . The higher this voltage is, potentially the more charge it can store before it saturates. Another consideration is that the FD node should have large enough capacitance to collect that much of the charges, and increasing this node’s capacitance can increase the temporal noise. On the other hand, a lower number suggests a faster charge transfer. This is more suitable for two or multiple tap pixels. However, a 1V pinning voltage is picked here to represent a standard PPD. Even with this pinning voltage, an excellent horizontal potential gradient is achieved.

2.3.3 Simulated sensitivity and contrast

The sensitivity of an image sensor is commonly defined as the gradient of its output signal versus the change in the input light intensity. For comparing the devices’ sensitivity, the output voltage change of all five structures are read out for the same light intensity illuminated perpendicular to devices’ surfaces from −2.5µm to 2.5µm. The intensity of the incident light for the sensitivity simulations is set to 12W/cm2. The optical generation is shown for the simulated PPD device in Fig. 2.6 as an example, where the optical generation represents the rate at which the electron-hole pairs are generated due to the absorbed photons.

All simulations are conducted in 2D mode, using Synopsys Sentaurus Device tool. The default assumption in the 2D simulations is that the device has a thickness in the third dimension of 1µm. In the simulations, a capacitance of 1fF is added to the output node. Then the output of each device is measured by the final voltage read at these nodes. In order to compare the sensitivities, all the values are normalized to the maximum voltage change read among all pixels, and the results are presented in Chapter 2. Comparison of photodetectors for coded-exposure pixels 37

Optical generation (cm-3.s-1) 5.3e-08

2.6e-08

0.0e+00

Figure 2.6: Simulated optical generation for the PPD device.

1

0.8 n-well/p-sub 0.6 CTIA n-well/p-sub 0.4 PG

Sensitivity[a.u.] Enhanced PG 0.2 PPD 0 400 500 600 700 800 900 Wavelength [nm]

Figure 2.7: Simulated sensitivity of the pixel structures.

Fig. 2.7. In this graph, sensitivity of 1 represents 3.36V/µW . This number may be considered as an optimistic value, as the metal parasitics and source follower loading effects are not accurately included. The dual-TG n-well/p-sub pixel has low sensitivity compared to other architectures. The reason is that the capacitor associated with the photodiode is large, and the charge sharing with the FD capacitor that happens in this architecture results into relatively lower voltage swing. In order to improve the sensitivity, the junction capacitor should be lowered. For instance, a doping profile that ensures the photodiode is fully depleted at the operating condition, can improve the sensitivity significantly. The CTIA pixel with 1fF feedback capacitance has the highest Chapter 2. Comparison of photodetectors for coded-exposure pixels 38

400nm OPTICAL EXCITATION 700nm OPTICAL EXCITATION X [µm] X [µm] -6 -3 0 3 6 -6 -3 0 3 6

0 0

] ]

m m µ

µ -3 -3

[ [

Y Y Y PHOTOGATE

-6 -6 X [µm] X [µm] -6 -3 0 3 6 -6 -3 0 3 6 0 0

OPTICAL INTENSITY

(W*m^-3)

] ] m

m 1.2e+01 µ

µ -3 -3

[ [

PPD Y Y Y Y 8.0e+00

4.0e+00 -6 -6 0.0e+00

Figure 2.8: Simulated optical intensity on PG and PPD structures for 400nm and 700nm light wavelengths.

sensitivity since the photocharges are integrated only on this capacitance, while the other architectures have the junction capacitance in parallel to a 1fF of parasitic capacitance for integrating the charges. Thus, assuming a similar number of charges generated, a larger voltage swing is observed on the CTIA due to this smaller capacitance. The PG architecture shows a low sensitivity due to two main reasons, the polysilicon gate blocking the smaller wavelengths and the potential barrier between the transfer gates and the photogate hindering the charge transfer. The enhanced-PG shows poor sensitivity at lower wavelengths, and after 600nm shows similar performance to PPD. The reason is again, the shorter wavelengths are blocked by the polysilicon as shown in simulations presented in Fig. 2.8. The PPD pixel shows the 2nd best sensitivity as it has a reasonable junction capacitance and no polysilicon gate blocking the light.

In the tap-contrast simulations, also known as demodulation contrast in ToF image sensors, the pixels are exposed to modulated light and the gates are toggled at the same modulation frequency. One of the taps is in-phase with light modulation, and the other Chapter 2. Comparison of photodetectors for coded-exposure pixels 39

Tap-contrast [550nm] 100

80 nwell/psub 60 CTIA nwell/psub 40

PG Contrast [%] Contrast 20 Enhanced PG PPD 0 0 10 20 30 40 50 Modulation frequency [MHz]

(a)

Tap-contrast [850nm] 100

80 nwell/psub 60 CTIA nwell/psub 40

PG Contrast [%] Contrast 20 Enhanced PG PPD 0 0 10 20 30 40 50 Modulation frequency [MHz]

(b) Figure 2.9: Simulated tap-contrast of the pixel architectures for 550nm light wavelength (a) and 850nm (b).

is out of phase. The modulation frequency is swept from 1MHz to 50MHz, and the results are depicted in Fig. 2.9 for light wavelength of (a) 550nm and (b) for 850nm.

In the sub-10MHz, the PPD and enhanced-PG show the best contrast, with PPD reaching more than 90% demodulation contrast for both wavelengths. The reason for Chapter 2. Comparison of photodetectors for coded-exposure pixels 40

the better contrast of the PPD at a higher wavelength is the better vertical potential gradient, which is due to the deeper n-layer doping chosen for PPD.

At frequencies higher than 10MHz, the enhanced PG shows the best performance. This can be explained by the lateral electric field induced by the device’s gates, which enhances the transfer speed to the FD node. The better response of the CTIA compared to PPD can also be explained by the fact that the photo-generated charges have to travel through a transfer gate in the PPD pixel while there are no gates in the CTIA pixel. Additionally, the FD is considered to be in the center of the device, which effectively shortens the charges’ transfer length and makes the collection more efficient.

The n-well/p-sub and PG architectures do not provide good contrast. The PG has potential barriers in the transfer path to the FD node, as well as the possibility of trap- ping the charges below the photogate. In the n-well/p-sub pixel, the photodiode does not entirely deplete of charges even at the reset level. This means that the capacitor associ- ated with the photodiode is connected sequentially to the two output floating diffusions. It causes a charge sharing between the photodiode capacitance and the FD capacitances. Consequently, both taps settle to similar voltages as illustrated in Fig. 2.10(a) for a 40MHz modulation experiment. In comparison, the FD in PPD depletes the charges from the photodiode. If the modulation frequency is high, this process finishes incom- plete, and the charges are then transferred to the other FD node. Illustrated in Fig. 2.10(b), the right FD node shown in red is out of phase with respect to light, but its voltage is decaying over time. However, this decay is much less than the decay in the right FD shown in green.

2.3.4 Comparisons in literature

Comparing the pixel architectures reported in the literature is not straightforward as different customized processes and pixel sizes are reported, and demodulation contrast only at specific frequencies is provided. With these limitations in mind, a few of the Chapter 2. Comparison of photodetectors for coded-exposure pixels 41

RIGHT FD LEFT FD

3.5

]

V [

3 Voltage

2.5

0 5e-08 1e-7 Time [s] (a)

RIGHT FD LEFT FD

3

]

V

[ Voltage 2.5

0 5e-08 1e-7 Time [s] (b) Figure 2.10: Transient simulation for demodulation contrast of n-well/p-sub (a) and PPD (b).

indirect time-of-flight sensors using different pixel architectures are summarized in table 2.1.

All image sensors in the table are using a customized CMOS process for optimized Chapter 2. Comparison of photodetectors for coded-exposure pixels 42

Table 2.1: Comparison table of ToF sensors ’18 Microsoft’15 Samsung’14 FBK’11 Sony’18 [43] STM’17 [51] [42] [48] [47] [54] Technology [nm] 90 65 NA 130 130 180 Enhanced Enhanced Pixel type CAPD Custom PG PPD Custom PG PG PG 91@50MHz 87@200MHz 75@50MHz 68@50MHz >80@10MHz 60@10MHz Contrast [%] 85@100MHz 78@320MHz 73@110MHz 57@130MHz >50@50MHz 29@50MHz Light wavelength [nm] 850 860 930 850 850 850 Pixel size [µm] 10 3.5 6.2 10 28 10 Temporal noise [e−] 87 3 3.2 12.3 NA NA

performance. The current assisted photonic demodulator (CAPD) reported by Sony [43] uses a thin substrate for improving the contrast significantly at high frequencies. CAPD pixel architecture can be a promising choice only at low to medium resolution. The pixels in this architecture require a bias current, which makes the architecture power hungry at high resolutions. Microsoft uses customized PG architecture [42, 48] in order to achieve high contrast as well as CDS operation. ST Microelectronics has reported a 3-tap PPD based pixel with charge-mode memory to enable CDS for low read noise [51]. Samsung [47] has reported 4 different custom pixels with different doping and have shown the better performance of the enhanced-PG type. Moreover, FBK [54] has reported an enhanced- PG pixel architecture, which is similar to PPD with extended transfer gates. From these examples, it seems that structures similar to enhance-PG (with more customization) demonstrate higher tap-contrast.

Different architectures may be suitable for different applications of CEP cameras, concluding from these references and other state-of-the-art multi-tap image sensors. For instance, a PPD architecture or a customized PG pixel may provide a compact pixel with a relatively low temporal noise level. On the other hand, SPADs are capable of achieving sub-e temporal noise while they usually require more area. A proper pixel choice depends on a combination of technology accessibility (including how customizable the process is) and the target performance. Chapter 2. Comparison of photodetectors for coded-exposure pixels 43

2.4 Considerations for in-pixel circuitry

Other in-pixel circuitry or components may be needed depending on the application. Such circuits can include capacitors, storage diodes, and extra transistors. The goal is to design a robust architecture with a combination of components that achieves the smallest pixel pitch based on the functionality. In the following section, some of these items’ considerations are discussed to highlight the trade-off between the possibilities.

2.4.1 Capacitors

Capacitors are used in different architectures such as in high-dynamic range pixels for lowering the pixel’s conversion gain [55,56], in global-shutter pixels to sample the signal and reset values [57], or sometimes in ToF pixel for ambient light rejection circuits [24]. The type of the capacitor required is decided based on the capacitance size needed and whether access to both terminals is required. Metal-insulator-metal (MIM) and metal- oxide-metal (MoM) capacitors are used when access to both terminals are needed, while MIM capacitors provide lower parasitics on the top plate. Poly-poly capcitors can be used similarly, if 2-poly technology is used. In some cases MOS-C capacitors are used, where capacitance density is higher than the former types. Novel structures are also reported in literature, for example high density 3D integrated capacitors [58] used in 3µm pixel pitch, but they are not accessible in all technologies.

2.4.2 Storage diodes

Nowadays, storage diodes are used extensively in global-shutter pixels [4, 59]. They behave as charge-mode storage where all the charge is transferred to the FD node at the read time. This permits for CDS operation to improve the read noise and dynamic range. The drawback is that the area required for these diodes is comparable to the photodiode area. Consequently, the maximum achievable fill factor is ∼ 50% if micro-lens or backside Chapter 2. Comparison of photodetectors for coded-exposure pixels 44 illumination technologies are not used.

2.4.3 In-pixel transistors

Most active pixels use NMOS transistors as switches and source follower. Additional transistors may be used for passive circuits (e.g., switch for lateral overflow capacitors in HDR pixels [18, 56]), active circuits (e.g., CTIA [49]), or logic gates. These circuits require analysis of the signal propagation, the line RC time constant, and IR drops to ensure the proper signal and power distribution. Additionally, it should be noted that the transistor models inside the pixel array may not be accurate for simulation since the substrate/well dopant profiles are usually different in the pixel array. Hence the circuits have to be used with extra caution. When using PMOS transistors in a PPD pixel array, the following points shall be considered:

1. The PMOS n-well needs clearance from other devices that makes it area expensive.

2. The interaction between the PMOS n-well and PPD n-well can lower the pixel’s quantum efficiency.

3. Additional possible process steps in the pixel (for instance, to improve the dark current performance) may affect the threshold voltage and speed of both NMOS and PMOS devices.

4. These additional steps may increase the resistivity of both p-well and n-well and consequently increase the possibility for latch-up.

2.5 Conclusions

Five different implementations using three types of photodetectors in CMOS technology have been studied. Based on simulations, PPD shows good performance for sub-10MHz range, namely Tsubf > 50ns. It provides both good sensitivity and tap-contrast over Chapter 2. Comparison of photodetectors for coded-exposure pixels 45 all wavelengths. Many mature technology processes exist with the PPD option, making this device a suitable choice for many computational imaging. For applications that require smaller Tsubf (e.g., CEP combined with ToF applications), then the enhanced-PG shows good performance at the cost of less sensitivity at shorter wavelengths. The CTIA architecture also presents good performance, especially with higher sensitivity compared to other choices. Although it should be taken into account that the maximum achievable sensitivity depends on the smallest possible capacitance that can be implemented in the CTIA. Specifications such as full well capacity, dynamic range, and dark current also play an important role in the final application’s performance. These specifications are dependant on parameters that are not considerably affected by using single- or multi-tap pixels. Hence, they are not covered in the simulations presented in this chapter. Other pixel architectures can be used in computational imaging as well. For example, single-photon avalanche diodes (SPADs) are used for high sensitivity (at single-photon detection levels) as well as high temporal resolution detections (tens of ps). Top chal- lenges of using SPADs are their large size and the requirement of a high voltage supply (typically in the range of 10V and higher). Additionally, special care should be taken to achieve reasonable dynamic range. Given their sensitivity and speed, they are suitable for specific computational imaging techniques, such as non-line-of-sight imaging [60]. An- other example of a photodetector is the jots used in Quanta Image Sensors (QIS), which provide very low temporal noise levels suitable for electron counting applications. QIS usually uses sophisticated technology (special pixel doping profiles, as well as 3D inte- gration) and can be the right approach in the future for CEP applications. At the time of this comparison, the low number of subframes at video rate is the bottleneck. Chapter 3

Coded-exposure image sensor based on photogate

This chapter focuses on the first prototype of CEP image sensor with the following highlights:

ˆ Proposing the first dual-tap CEP photogate-based pixel architecture

ˆ Code (mask) rate of up to 2500 subframe/second

ˆ Development of the CEP camera module capable of submitting 2500 arbitrary frame patterns to the CEP image sensor

3.1 Introduction

Primal-dual-coding (PDC) refers to an active illumination technique where coding is applied to both the light emission at the source (the primal domain) and light detection on the sensor side (the dual domain). To acquire an image, a sequence of 2D patterns is projected onto the scene, while another sequence of 2D mask patterns is applied in lockstep to control sensor’s exposure. Previous research in computational imaging has shown that PDC is a very effective tool in actively probing light transport and can

46 Chapter 3. Coded-exposure image sensor based on photogate 47 therefore provide many potentially interesting imaging capabilities such as separating direct and indirect light paths based on the number of reflections, eliminating scattered light, improving the accuracy and/or power efficiency of structured light and other 3D imaging systems [27,29,30,61]. However, performing light transport probing in the optical domain typically results in a large, unportable and expensive optical system [29]. Hence, to fully exploit the potential benefits of the scheme, an electrical implementation of the selective light sensing is required which in turns needs a customized CEP image sensor architecture.

The first prototype sensor proposed in this thesis consists of 180 by 160 pixels with embedded latches which are responsible for the pre-loading and applying the exposure codes. The subsequent exposure code (mask) can therefore be loaded while the current mask is being used for exposure, resulting in a pipelined coding operation which does not interfere with the pixel exposure time. The mask loading is done serially via a vertical metal line (one line per-column), making both the imager architecture and the pixel array scalable towards high pixel resolutions. In the subsequent sections of this chapter the pixel architecture is discussed in more details, while the circuit level details of the peripheral blocks (such as mask loading and readout) are left out since they are functionally the same as the ones described in next chapters.

3.2 Coded-Exposure Pixels

The sensor consist of different pixel types, for testing different functionalities and perfor- mance. A 60 by 80 array of pixels are discussed here for which the schematic is shown in Fig. 3.1. Switches SW1 and SW2 are used to steer the collected charge towards the appropriate bucket depending on the latched mask bit value. Moreover, since one of the switches is kept on during the readout, the channel capacitance of the switch modulates the overall floating diffusion capacitance. This allows the pixel to intrinsically have a Chapter 3. Coded-exposure image sensor based on photogate 48

LATCHES MASK BIT D Q D Q

LOAD ROW LOAD ARRAY EN EN Q MASK COLUMN VDD VDD VDD

RST DRN RST SEL SEL SW2 SW1 SF SF PHOTO CFD2 CFD1 COLUMN GATE COLUMN 2 1

Figure 3.1: The proposed PDC pixel circuit diagram with level-sensitive latches.

Figure 3.2: The measured digital pixel output at uniform light versus the coded pixel exposure expressed as percentage of masks applied over the complete frame exposure. different conversion gain in two buckets. The different gain helps for direct and indirect light collection, since usually direct light has a significantly higher power than the indi- rect light. The difference in conversion gains depending on the pixel exposure coding is shown in Fig. 3.2.

The pixel contains two level-sensitive latches. The first latch is activated row-wise to Chapter 3. Coded-exposure image sensor based on photogate 49

TSUB-FRAME

STREAM EN

CLK

]

]

] ]

n

0

n

0 0

0

0

n n

17

[

[

17 17 17

[ [

M

M M M

MASK DATA[k] M

M

M

M

M M M M x4 M (x18) Mk,COL[n] X M[n] (row 1) M[n] (row 2) M[n] (row 80) M[n] (row 1) LOAD ROW 1 LOAD ROW 2

LOAD ROW 80 LOAD ARRAY

Figure 3.3: A simplified waveform diagram of the mask deserialization and loading.

memorize the mask bit (mask bit signal is routed vertically and is physically the same for the single column) when the corresponding LOAD ROW trigger signal arrives (the whole row of masks is loaded at the same time). The masks are loaded serially through 10 separate channels together with a clock up to 100MHz. The bits are then deserialized into a parallel data, i.e. 1-bit per every individual column. Once all the masks are loaded for all the 160 rows individually, the complete mask for the full frame is latched by the second latch. This mask loading process for a single subframe is illustrated in Fig. 3.3. The same process is then repeated many times for every subframe within a single frame. Two latches allow for the light exposure of the current subframe while the exposure mask for the next subframe is being loaded row-by-row. This results in pipelining the operation of the mask deserialization and loading with the regular pixel operation.

3.3 Sensor Architecture

The complete sensor architecture is shown in Fig. 3.4. Mask deserializing and timing circuit is placed on top of the pixel array. Pixel outputs are delta-double-sampled and amplified by the programmable-gain amplifiers and multiplexed over analog output pads. Counting the dummy pixels (2 on each side) and considering each pixel has two outputs, 46 pixel outputs are multiplexed over each of the 8 pads. Analog-to-digital conversion is Chapter 3. Coded-exposure image sensor based on photogate 50

LEGEND STREAM ENABLE DIGITAL INPUT MASK DESERIALIZING MASK DATA<9:0> DIGITAL OUTPUT AND TIMING CIRCUITS ANALOG INPUT CLK ANALOG OUTPUT

PIX PIX PIX PIXEL RESET PIXEL MASK AND PIX PIX PIX UPLOAD READOUT VERTICAL TIMING TIMING CIRCUIT CIRCUIT CLK PIX PIX PIX

VOLTAGE & POWER CONFIG CURRENT AND READOUT CIRCUIT AND TEST DATA REFERENCES REFERENCES REGISTER OUT

POWER GND CLK PIXEL OUT [7:0] SPI DATA SPI CLK

Figure 3.4: The proposed CEP CMOS image sensor architecture.

6mm

60x80 PHOTOGATE PIXEL

ARRAY

mm

READOUT 6

LATCHES

Figure 3.5: The pixel layout and the image sensor die micrograph showing the chip region dedicated to PDC.

performed off-chip by 16-bit ADCs. The proposed pixel pitch is 25µm with the 20.5% fill-factor in the 0.35µm image sensor optimized CMOS process used in this work. The part of the sensor that is discussed here has a resolution of 60×80 pixels (see Fig. 3.5). Chapter 3. Coded-exposure image sensor based on photogate 51

Light Source SMA Sensor Board Temperature Temperature Data AC/DC Sensor Voltage Adapter Sensor power supply regulators Modulation Signal ADC power supply 8-channel Analog pixel data CEP 8-Channel SPI Sensor ADCs

Digital data (sensor control Digital data SPI Digital pixel signals and (masking patterns) data internal registers) Board-to-board connector/cable

Sensor Temperature ADC ADC and ADC Data, Clock Data sampling Regulator (acquisition) DIOs & Control clock SPI Data FPGA Board-to-board connector/cable board

UART FPGA USB

DDR This memory is Flash used for storing (or SRAM) masking patterns Memory Memory

Figure 3.6: The architecture of the camera system, including the custom designed CEP image sensor.

3.4 Camera System Architecture

The architecture of the camera module is shown in Fig. 3.6. The module consists of two main boards, where one carries the custom-designed image sensor, and the other is a development board including FPGA and DDR memory. The sensor board, shown on the top side of the figure, includes regulators providing supply or references to the sensor and 16-bit ADCs for converting the analog output of the sensor to digital.

The FPGA board is responsible for providing the timing required to operate the image sensor, including the mask data shipment, exposure time controls, and readout timing. Chapter 3. Coded-exposure image sensor based on photogate 52

5% masks=”1" 25% masks=”1" 50% masks=”1" 75% masks=”1" 95% masks=”1" Bucket 1 Bucket 1 Bucket 1 Bucket 1 Bucket 1

Bucket 2 Bucket 2 Bucket 2 Bucket 2 Bucket 2

Figure 3.7: Capturing a photo of Lena shown on the computer screen, while changing the level of pixel masking. The number of masks equal to one starts from 5% on the left, and increases all the way up to 95% on the right. The output image is shown for each of the buckets.

At the start of the operation, the computer transfers the mask data for all subframes to the DDR memory through the FPGA board. The sensor’s operation begins by FPGA resetting the pixels and transferring the first subframe mask data to the chip. In the next step, the integration of the photogenerated charges in the pixels based on the shipped mask is enabled. During each subframe’s integration time, the next subframe’s mask data is shipped to the sensor.

Additionally, the FPGA has additional control signals for operating a light source in sync with the image sensor. This is required in active illumination computational imaging applications. The light source has also been used to characterize the tap-contrast of the custom-designed sensors. Chapter 3. Coded-exposure image sensor based on photogate 53

Figure 3.8: The complete sensing side of the imaging system with the camera system and the FPGA board used for the direct signal acquisition.

3.5 Measurement Results

The results of coded-exposure imaging with the proposed design are shown in Fig. 3.7. The complete system with the sensor (see Fig. 3.8) is used to capture a photo of Lena shown on a computer screen. The sensor light exposure in Fig. 3.7 is modulated by the number of masks equal to one (or zero) during a single image frame exposure time, and the outputs of both buckets are shown. From left to right, the number of mask bits equal to one increases, showing how the bucket 1 gets unblocked and starts collecting more charge. Conversely, bucket 2 collects the signal initially, but starts being blocked more as the number of masking bits equal to zero increases. Based on the measurement results, the loading of the complete mask can be performed in 30µs, allowing a relatively high number of masks to be applied during a single image frame. Chapter 3. Coded-exposure image sensor based on photogate 54

3.6 Conclusion

A prototype CMOS image sensor for CEP applications is presented in this chapter. The sensor uses in-pixel memories to implement pipelined mask loading to the pixel and mask application to the captured image. The sensor consumes 7mW of power from a 3.3V power supply, while operating at 25 fps rate. The mask loading process is verified at 2500 subframes per second, and the results are demonstrated for when the density of masks change. This chip was designed and fabricated prior to completing the simulations presented in chapter 2, and as expected based on those simulations the measured tap- contrast was low (below 0.5, even at low mask rate). Due to this shortcoming, the sensor could not be used for demonstration of most computational photography applications. Chapter 4

PPD-based CEP imager with in-pixel code memory

This chapter provides a detailed explanation of a PPD-based CEP camera implementa- tion with the following highlights:

ˆ Proposing the first dual-tap CEP PPD-based pixel architecture

– with the smallest dual-tap pixel pitch of 11.2µm to date

– highest pixel array resolution of a CEP image sensor (effective 244×162 pixels)

– almost ideal tap-contrast of 99%

– pixel architecture capable of global code update on the full image plane

ˆ Development of the camera module consisting of active illumination for active com- putational imaging demonstrations

ˆ Analysis and discussion of the incompatibility of the pixel architecture choice and the image sensor technology

55 Chapter 4. PPD-based CEP imager with in-pixel code memory 56

4.1 Introduction

For applications such as 3D sensing, gesture analysis and robotic navigation, CEP cam- eras were initially implemented using bulky, distortive, slow and expensive components [62] like digital micromirror devices (DMD) together with off-the-shelf image sensors. In the last few years, the first prototypes of fully integrated CEP image sensors have emerged, including a one-tap imager for compressive sensing [33]. One-tap imagers lose light when the tap is off and cannot sort photo-generated charge. Low-resolution (80x60 or less) two-tap light-sorting image sensors for primal-dual coded imaging [40] and motion-deblurring [39] have recently been reported. By using two taps and assur- ing one of the taps is always capturing light during frame exposure, all incident light is utilized regardless of applied codes. These imagers have large pixels (> 12µm) and low fill factors (< 34%). None of them use a pinned photodiode-based (PPD) image sensor CMOS technology and thus suffer from low image quality and low tap-contrast (defined as the transferred minus residual charge divided by their sum), and as a result, are prone to degraded photo-generated charge sorting. Additionally, the code submission to the array in [33] and [39] are operated in a random access and row-wise manner, respectively. These architectures are not suitable where code update at each subframe has to be done globally. For example, active illumination in [62] is changing for the full field of view all at once and requires the camera to operate similarly, imposing a global code update at the start of the subframe in the CEP image sensor.

In this chapter, we present a next-generation dual-tap CEP image sensor, which is comprised of a 244H × 160V main array of 11.2µm pixels in a 0.11µm CIS technology. The use of a CIS technology improves: (a) the dark current performance by isolating the

Si − SiO2 interface from the depletion region of the photodiode, (b) quantum efficiency by having deeper photodiode and (c) tap-contrast due to complete charge transfer of the PPD device. The previously mentioned limitations of existing architectures are ad- dressed by this sensor which features the combination of improved pixel performance and Chapter 4. PPD-based CEP imager with in-pixel code memory 57 globally operated CEP functionality. The global operation is achieved by the pipelined dual-tap pixel architecture which can apply any arbitrary codes to the full pixel array. Additionally, the design improves upon the state of the art by 1) implementing a 2-tap pixel array at practical resolutions for demonstration, 2) a relatively small pixel pitch while maintaining a good fill-factor, and 3) achieving a high tap-contrast required for separation of charges from one subframe to another. In Section 4.2, we begin with a brief overview of two multi-exposure single-frame computational photography applications, and explain how these define the architecture of the pixel. In Section 4.3, we describe the sensor architecture which enables the operation of the presented CEP sensor. The system-level implementation is described in Section 4.4, followed by the experimental results and validation in the two computational photography applications, in sections 4.5 and 4.6, respectively.

4.2 Coded-Exposure Pixel Architecture

4.2.1 Computational photography applications

The architecture of the coded-exposure pixel is designed to be generic, targeting many computational photography applications. Two such applications [10], as illustrated in Fig. 4.1, are introduced here, with results demonstrated in Section 4.6. A novel aspect of the presented implementation of these imaging applications on our dual-tap CEP imager is that they are performed using only a single video frame, at the sensor’s native resolution.

Single-frame structured-light imaging

Fig. 4.1(a) depicts the single-frame structured light imaging implementation. It is per- formed using the following steps:

1. Four spatial sinusoidal patterns separated by a 90 degree phase shift are cyclically Chapter 4. PPD-based CEP imager with in-pixel code memory 58

CAMERA PROJECTOR

(a)

S1

S3

S2

1 2 1 2

TAP TAP TAP TAP S4 S4 S4 S3 S2 S2 S1 S1 S3 CAMERA S4 S4 S2 S3 S1 S3 S1 S2 S1-S4: 4 LIGHT SOURCES (b) Figure 4.1: The imaging principle for (a) single-frame structured-light imaging, and (b) single-frame photometric stereo imaging [35].

projected onto the scene.

2. Synchronously with the projected patterns, four code matrices (one matrix per projector pattern) with time-multiplexed Bayer-like mosaic pattern, as shown, are cyclically submitted to the camera.

3. Next, the sorted photo-generated charges are accumulated across every four sub- frames and read out as two images per frame.

4. Off-chip image processing reconstructs all lighting conditions at full spatial resolu- tion. Chapter 4. PPD-based CEP imager with in-pixel code memory 59

5. Finally, disparity or 3D depth and albedo maps are computed [10]. Disparity encodes the difference in horizontal coordinates of the point when viewed from both camera and projector viewpoints, which depends on the separation distance of the camera and projector. 3D depth can be computed as inversely proportional to the disparity. Albedo is a measure of how much light is reflected off a surface without being absorbed.

Single-frame photometric-stereo imaging

Fig. 4.1(b) depicts the single-frame photometric stereo implementation. It is imple- mented by the following steps:

1. Four LED light sources surrounding the camera are cyclically turned on and illu- minate the scene, one at a time.

2. Synchronously with the LEDs, four code matrices (one matrix per LED) with time- multiplexed Bayer-like mosaic pattern, as shown, are cyclically submitted to the camera.

3. Next, the sorted photo-generated charges are accumulated across every four sub- frames and read out as two images per frame.

4. Off-chip image processing reconstructs all lighting conditions at full spatial resolu- tion. As shown in the figure, the raw image from taps 2 of the pixels contain data on S1, S2 and S3 light source. By demosaicing, a technique used in RGB cameras, the image can be up-sampled to full resolution, and the S4 light source image can be computed from the data.

5. Finally, surface normals and albedo maps are computed [10]. A surface normal is a vector perpendicular to the tangent plane to that surface at a given point. A map of normals is widely used in computer vision to encode 3D information of a visual scene. Chapter 4. PPD-based CEP imager with in-pixel code memory 60

n = 0 SUBFRAME [0] PRELOAD AND BUFFER CODE ROW-WISE

SUBFRAME [n] STORE AND ACTIVATE CODE GLOBALLY

n = 0 0 PIXEL 1 CODE

CHARGE GEN. & CHARGE GEN. & 0 PIXEL 1 ACCUMULATION ACCUMULATION CODE ON TAP 1 ON TAP 2

ADD PHOTO- ADD PHOTO- CHARGE TO CHARGE TO n = n + 1

TAP 1 TAP 2 PIPELINED SUBFRAME [n+1] PRELOAD n = n + 1 AND BUFFER CODE ROW-WISE

NO NO n = N n = N

YES YES FRAME READOUT FRAME READOUT AND RESET

(a) (b)

Figure 4.2: The principle of operation of (a) a generic dual-tap coded-exposure pixel, and (b) the presented dual-tap code-memory pixel (CMP) [35]. N is the number of subframes.

4.2.2 Pixel

Dual-tap and multi-tap pixels are commonly used for indirect time-of-flight (ToF) imag- ing applications to demodulate the received light and extract the phase information [51, 63, 64]. To enable coded-exposure-pixel applications described in Section 4.2.1, a dual-tap pixel with a general operating principle described in the flow chart in Fig. 4.2(a) is needed. The photo-generated charge is stored on taps 1 or 2 for codes 0 and 1, respectively, during each coded subframe, and the results are accumulated over N subframes within one video frame.

We introduce the code-memory pixel (CMP) architecture that operates following the Chapter 4. PPD-based CEP imager with in-pixel code memory 61

FRAME

COL RESET EXPOSURE READOUT

CODE N SUBFRAMES: SUBFR(1) SUBFR(i) SUBFR(N) LATCH 1 LATCH 2 Q PIXELi,j CODE 1 0 1

TAP 1 VOLTAGE SIGNAL 1

LATCHES -

Q D LOAD_CODE_ROW 2 LOAD_CODE_GLOBAL TAP 2 VOLTAGE EXPOSURE SIGNAL 2 EXPOSURE

PRGEXP (i) CODE UPLOAD (i+1) LOGIC LOAD_CODE_GLOBAL

CODE ACTIVATION ACTIVATION CODE EXPOSURE VDDPIX VDDPIX CODECOL [h] code 0 code 1 code V-1 RST_1 LOAD_CODE_ROW [0] RST_2 VDDPIX VDDPIX LOAD_CODE_ROW [1] TG1 TG2 LOAD_CODE_ROW [V-1]

TAP PHOTODETECTOR TAP TG1[0,h] CFD1 CFD2 - IF CODE = 1 PPD TG2[0,h]

TAP1 TAP2 DUAL ROW_SELECT TG1[1,h]

IF CODE = 0

L R

_ TG2[1,h]

_

COL COL (a) (b) Figure 4.3: (a) The pixel schematic of CMP pixel and (b) its simplified timing dia- gram. H × V are pixel array dimensions; h and v are the horizontal and vertical indices, respectively. N is the number of subframes. aforementioned principle as depicted in Fig. 4.2(b). In order to perform global coded- subframe exposure it requires an in-pixel dual digital code memory. The code memory is pipelined: a code value is pre-loaded row-wise into each pixel during the previous subframe and is applied at the beginning of the current subframe. Photo-generated charge is collected based on the current code while the next subframe’s code is being pre-loaded into the pixel. The pipeline operation assures the code update in the full array is done at once, at the beginning of each subframe. This is suitable for applications with active illumination and does not require overhead time for code upload time.

Figures 4.3(a) depicts the CMP architecture. The CMP includes two D-latches, one controlled by LOAD CODE ROW - to pre-load the code patterns row-by-row and the other controlled by LOAD CODE GLOBAL - to activate this pattern globally. Based on the code in each pixel, one of the two transfer gates, TG1 or TG2, connects the pinned- Chapter 4. PPD-based CEP imager with in-pixel code memory 62

TG1 HIGH TG2 LOW 11.2μm

CODE = 0 A A’ POTENTIAL A A’ TG1 LOW TG2 HIGH

CODE = 1 POTENTIAL A A’ PHOTO-GENERATED CHARGES CROSS-SECTION FOR POTENTIAL DIAGRAMS DIRECTION OF THE COLLECTION OF THE CHARGES (a) (b)

Figure 4.4: (a) Floorplan of the pixel, and (b) the electrostatic potential diagram for both code 0 and code 1 [35].

photodiode (PPD) to the corresponding tap, FD1 or FD2, respectively. The pattern in each pixel is gated with EXPOSURE signal to stop any charge transfer during the read- out phase. EXPOSURE is also kept low during the global code updates to ensure that signal and supply glitches caused by digital switching in the array do not affect the tap contrast. The timing diagram corresponding to the CMP pixel is shown in Fig. 4.3(b). The subframe phase, in addition to code upload period, includes a programmable period (P RGEXP ) to increase the controllability of the exposure time. During the reset phase (not shown in detail), the EXPOSURE signal is toggled high and the reset signals of all pixels, RST 1 and RST 2, are asserted to reset both taps and the PPD through one of the taps. At the end of this phase, the EXPOSURE signal is first lowered and then the reset signals are set to the associated low voltage. During the readout phase (not shown in detail), the EXPOSURE is kept low while the pixels are accessed row-wise by ROW SELECT for column-parallel readout. In Fig. 4.3(b), h and v refer to a column and row indices, respectively, for a generic pixel array size of H × V .

Fig. 4.4 depicts the pixel layout floorplan and its cross-sectional electrostatic potential diagrams. The PMOS transistors are placed at the maximum distance from the PPD devices, to assure minimal interaction between the n-wells. A fill-factor of 45.3% is Chapter 4. PPD-based CEP imager with in-pixel code memory 63

18 DIGITAL CHANNELS FOR PIXEL CODES LOADING

18-CHANNEL PIXEL CODE DESERIALIZERS

COLUMN SHARING LOGIC

PIXEL ARRAY

280Hx176V

ROW SHARING ROW LOGIC

ROW LOGIC AND LOGIC DRIVERS ROW CODE LOADINGCODECONTROLS

288 COLUMN-PARALLEL PGA & S/H BANK CONFIG. 48:1 48:1 48:1 REG.

X1 X1 X1

OUT <0> OUT <1> OUT <5>

Figure 4.5: VLSI architecture of the sensor. Blocks highlighted in grey are presented in more detail in the following sections. achieved with 27% of the area occupied by the latches and logic gates. As illustrated by the electrostatic potential in Fig. 4.4(b), the photo-generated charges at any instance are sorted based on the transfer gate voltages determined by the 1-bit binary code stored in the pixel.

4.3 CEP Sensor Architecture

Fig. 4.5 depicts the 280x176-pixel sensor system architecture. Unlike conventional image sensors, during each subframe the CEP sensor receives pixel codes to control all pixels’ exposure individually. The sensor has 18 digital input channels for streaming the codes row by row, 18 code-deserializers with logic for arranging the codes, and vertically orga- nized code loading control circuits to ship the codes to their respective rows. Additional logic is included for sharing the codes per neighbouring columns or rows. By sharing the Chapter 4. PPD-based CEP imager with in-pixel code memory 64

FROM PGA CIRCUIT S/H CIRCUIT 48 PIXEL VPRECH COLUMNS PGA_RST PRECH CI TAP2 BANK FROM SAMP_STAP1 BANK READ_S PIXEL ɸ1 CS ɸ2 SAMP_SIG READ_SIG CSIG CSIG X1 CRES ɸ2 ɸ1 SAMP_R READ_R CRST SAMP_RST READ_RST 48:1 ɸ1 X1 ɸ2 PGA_RST SAMP_SIG SAMP_RST TAP1 TAP2 OUT[i] READ_SIG SIG: PIXEL SIGNAL i ϵ [0:5] READ_RST TAP2 TAP1 RST: PIXEL RESET

Figure 4.6: Circuit and timing diagram of a 48-column slice of the column-parallel programmable-gain amplifier (PGA) and the sample-and-hold (S/H) circuit in Fig. 4.5.

codes, less data shipment from an off-chip DRAM is required, beneficial for saving power or enhancing the subframe rate. The column-parallel outputs are amplified by switched- capacitor programmable-gain amplifiers (PGAs), which can apply different gains to each of the two taps (with a gain range of 0.5x to 8x). The output is then serialized and buffered over 6 analog output channels with 48:1 multiplexing ratio.

The column-parallel PGAs and sample-and-hold (S/H) circuits are shown in Fig.

4.6. The PGA is a switched-capacitor integrator, where the integrating capacitor, CI , is configurable to two different values for a gain of x0.5 or x1. By using an integrator, instead of a gain amplifier, multiple-sampling can be performed on the output of the pixels to amplify the signal level. This has the advantage of a programmable gain through timing

adjustments (number of integrations by φ1 & φ2 pulses as shown in the timing diagram of Fig. 4.6), rather than Serial Peripheral Interface (SPI) configurations for capacitor banks.

The adjustable gain can be accommodated by first making SAMP SIG high while φ1

and φ2 pulses sample the pixel signal data on CS and integrate over CI , respectively, and

the output is sampled over the CSIG capacitance of tap 1. Then the pixel reset signal is

sampled in a similar fashion over the CRST capacitance. The sampled data of tap 1 is Chapter 4. PPD-based CEP imager with in-pixel code memory 65

next multiplexed to the output by READ RST and READ SIG signals while the tap

2 data is being processed by the PGA. The number of integrations of CS charge over CI determines the additional gain control from x1 up to x8.

There is one PGA per pixel column and the two taps are read out at different times. Two sampling capacitor banks are implemented, one for each of the taps. When one tap is being sampled by the PGA, the previously sampled tap is buffered to the output I/O pads as shown in the timing diagram of Fig. 4.6. Each bank has two capacitors, one for the reset level of the pixel (RST) and one for the signal level (SIG). Double sampling (DS) operation is performed by first reading the signal and then reading the reset level of the pixel. The subtraction of the two values is done on the off-chip ADC.

The simplified pixel code-deserializers and code-loading controls circuits are shown in more detail in Fig. 4.7. The timing generator block ensures that the 18 deserializers register the received codes at every 16th clock cycle and sends it to the pixel array. Signal DAT A EN is asserted high with the arrival of the first code bit, and after 16 CLK cycles sets DES for one CLK period. Signal DES registers the serialized data in the 1:16 deserializer block. One clock cycle later, when the pixel array columns settle to the correct code data, the P RELOAD signal enables the LOAD CODE ROW signal of the respective row in order to preload the codes to the first latches in the pixels of that row. The pulse generator block extends the P RELOAD signal width to 15 clock cycles to ensure the latches have the maximum time for registering the data in a 16 clock cycle period. During these 16 CLK cycles, the code for the next row is deserilized. This is repeated row by row. After all the rows are pre-loaded with the code data, a global signal triggers the transfer of the codes to the second latch of pixels. Chapter 4. PPD-based CEP imager with in-pixel code memory 66

CODE[0] CODE[1] CODE[17]

PIXEL CODE DESERIALIZERS

CLK EN

TIMING _

16 16 16 16 16

: : :

1 1

1 GENERATOR

DATA

DESERIALIZER DESERIALIZER DESERIALIZER

] 16

D /

]

287 ]

:

31

CLK

:

15 6

: Q

1 272 0

PRELOAD

[ [ [

D D D LOAD_CODE_ ROW LOGIC ROW[0] V H 280 X 176 LOAD_CODE_ ROW LOGIC PIXEL ARRAY ROW[1]

LOAD_CODE_ ROW LOGIC ROW[175] CODE LOADING CONTROLS CLK DATA_EN DES PRELOAD TIMING GENERATOR 15 CLK PULSE 16 DFF GENERATOR

D D Q D Q D Q D Q

Q

CODE[i] CLK DES 1:16 DESERIALIZER PRELOAD FROM [i-1]

D Q D Q D Q

] DATA_EN

i Q D [ CLKMPRE

Q

ROW _ Q D Q D Q D

CODE ROW LOGIC _

TO [i+1] LOAD D[1] D[2] D[16]

Figure 4.7: Block diagram of code deserializers and code loading circuit together with the pixel array (top), and the simplified circuitry of key internal blocks (bottom). Other auxiliary circuitry for various configurations of the sensor are not shown here for simplic- ity. Chapter 4. PPD-based CEP imager with in-pixel code memory 67

18:280 CODE DESERIALIZERS

MAIN PIXEL ARRAY

244Hx162V

ROW DRIVERS ROW

CODE LOADING CODE COLUMNSTESTOF PIXELS

14 ROWS OF TEST PIXELS 36

Figure 4.8: Micrograph of a prototype fabricated in a CIS 110nm process. The die size is 3mm × 4mm [35].

4.4 System implementation

Based on the pixel concept described in Section 4.2 and the sensor architecture in Section 4.3, we have designed and fabricated an image sensor in a 110nm CIS technology. 1.2V supply is used for the digital peripheral circuits, 3.3V for the analog readout block. The micrograph of the image sensor is shown in Fig. 4.8.

The camera module and the projector used for structured-light imaging (for the demonstration in Section 4.6.1) is shown in Fig. 4.9. The camera consists of two boards. One is a commercially available board carrying a DDR memory for the pixel code data and FPGA for controlling the system operation. The second board is a custom board including the designed image sensor and off-the-shelf ADC ICs for converting the ana- log image data of the sensor to the digital domain. At the start up, the FPGA sets the supply and reference levels through SPI. Then the PC uploads the pixel codes to the DDR through the FPGA and also uploads the projection patterns to the projector DDR. When the video streaming command is initiated by the PC, the FPGA ships the codes from the DDR to the sensor, and controls the image sensor and ADC to receive the digitized data and finally streams the data to the PC. The FPGA also controls the Chapter 4. PPD-based CEP imager with in-pixel code memory 68

TO PROJECTOR CIS BOARD REGULATORS CMOS IMAGE 6 ANALOG CAMERA SENSOR CHANNELS

ADCs

SYNC SIGNAL SYNC SPI CONTROL PIXEL-CODE IMAGE SIGNALS DATA SPI DATA BOARD-TO-BOARD CONNECTORS

IMAGER ADC SAMP SPI ADC OUTPUT PROJECTOR CONTROLS CLK DATA BOARD-TO-BOARD CONNECTORS

FPGA

MEMORY FOR DDR FLASH PIXEL CODE DATA MEMORY MEMORY FPGA BOARD USB 3.0

(a) (b) PC

Figure 4.9: (a) Imaging system experimental setup for single-frame structured-light imag- ing, and (b) the camera module block diagram. projector and makes sure that the projection of the patterns are synchronized with the codes sent to the sensor.

In the following sections, first the electrical testing and characterization of the sensor is described (Section 4.5) and then the deployment of the camera in two single-shot computational imaging applications are shown (Section 4.6).

4.5 Experimental characterization

The setup shown in Fig. 4.9(a) is used for the experimental characterization of the sensor. The most important specification of the camera is the tap contrast which determines how much of the photo-generated charge in each pixel is directed to the tap corresponding to the code shipped to the pixel. As mentioned in Section 4.1, tap contrast is defined as:

Q − Q χ = 1 2 Q1 + Q2 Chapter 4. PPD-based CEP imager with in-pixel code memory 69

1.00

0.75 0 HISTOGRAM OF THE CONTRAST 20 0.50

40 104 0.25 60 103 80 0.00

100 2

CONTRAST 10 −0.25 120 1 −0.50 10 140

0 −0.75 10 0 50 100 150 200 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 −1.00 CONTRAST VALUE (a) (b) Figure 4.10: (a) Experimentally measured tap contrast map of the sensor (the on-line version is in color), and (b) the histogram of the tap contrast between 0.8 and 1 values (note that the vertical axis is shown on a logarithmic scale).

where Q1 and Q2 are the amounts of charge stored on taps 1 and 2 in the pixel, respec- tively. For measuring Q1 and Q2 in this experiment, alternating codes 1 and 0 are sent to all pixels of the sensor and the light is projected only for code value 0. This means that tap 1 should collect all the photo-generated charges and tap 2 should not collect any charges. In this scenario, the projection at code 0 and 1 results to tap contrasts of 1 and -1 for an ideal pixel. The tap contrast map of the sensor is shown in Fig. 4.10(a). The measurement is done at 4 subframes per frame, at 25fps frame rate. Considering the overhead times for the readout and data transfer to the PC, the effective subframe rate is about 181.8Hz. The average tap contrast achieved is 0.99, and the histogram of the tap contrast between 0.8 and 1 is illustrated in Fig. 4.10(b). The small number of pixels in the two corners of Fig. 4.10(a), with tap contrast values of around -1, are the pixels that don’t register the correct pixel code and collect the charges on the wrong tap (due to resolvable technology limitations).

Fig. 4.11 depicts raw outputs of the sensor for four different camera configurations for the same scene, under uniform lighting conditions. The first two rows show images where only one subframe is recorded per frame. The image is collected on taps 1 and 2 for black and white codes, respectively. In the 3rd row, there are two subframes, with Chapter 4. PPD-based CEP imager with in-pixel code memory 70

PROJECTION PIXEL TAP 1 TAP 2

PATTERN CODES IMAGE IMAGE

SUBFRAME PER SUBFRAMEPER FRAME 1 1

SFR 1 SFR 2 SFR 1 SFR 2

2 PIX 2 PIX

PIX PIX

2 2 2

2 PIX 2 PIX

PIX PIX

2 2 2

SUBFRAMESFRAME PER 2 2 *SFR: SUBFRAME

Figure 4.11: Experimentally captured raw output images of the sensor for two uniform- lighting cases: for one (top) and for two (bottom) subframes per frame [35].

tiled 2x2-pixel codes, as shown, sent to the pixel array. The projector projects all-white during the first subframe and all-black (no light) during the second one. The magnified inset for tap 1 shows that the image is black for 1 out of 4 pixels. The last row depicts the same conditions, but the projector projects all-white for both subframes. The magnified inset shows that the alternating rows are brighter because they have collected light for two subframes, while the darker rows have collected light for one subframe only.

A comparison to the state-of-the-art is given in Table 4.1, where CEP cameras are highlighted by light gray shading. The presented image sensor has the highest spatial resolution among the CEP architectures and additionally is the first dual-tap CEP using PPD pixels. A competitive pitch of 11.2µm and a 45.3% fill factor are achieved with the Chapter 4. PPD-based CEP imager with in-pixel code memory 71

Table 4.1: Comparison table

[39] UBC [40] Toronto [33] JHU [32] Shizouka [11] Stanford [53] Shizouka THIS WORK TCAS 2018 IISW 2017 OE 2016 ISSCC 2015 JSSC 2012 ISSCC 2015 PER-PIXEL (i.e. PIXELWISE SPATIAL PER FULL ARRAY CODED-EXPOSURE PER 1/15 OF ARRAY AND TEMPORAL CODING) (i.e TEMPORAL CODE ONLY) TECHNOLOGY [nm] 110 CIS 130 CMOS 350 CMOS 180 CIS 110 CIS 130 CIS 110 CIS PINNED PHOTODIODE YES NO (NW/P) NO (PG) YES YES YES YES PIXEL PITCH [µm] 11.2 12.1x12.2 25 10 11.2X5.6 5 11.2X5.6

PIXEL FILL FACTOR [%] 45.3 33.2 20.5 52 -- 42 -- NUMBER OF TAPS 2 2 2 1 1 2 2 TAP CONTRAST 0.99 @ 181sfps 1 -- LOW N/A N/A -- 0.94 2 PIXEL COUNT [HxV] 244 x 162 10 x 10 80 x 60 127 x 90 64 x 108 640 x 576 256 x 512

FRAME RATE [fps] 25 60 25 100 32 N/A 12 3.6mV READ NOISE 5.2mV -- 5.4DN -- 5.5e- 1.75e- 5.3DN (12-bit ADC) DYNAMIC RANGE [dB] 50.5 52 -- 51.2 -- 85 / 103 (HDR mode) --

CONVERSION GAIN [µV/e-] 33.5 ------51 85 ARCHITECTURE

POWER [mW] 34.4 0.012 / 1.23 7 1.3 1620 N/A 540

POWER FoM [nJ] 3 34 2 / 205 58 1.14 4 7324 4 N/A 343 4 IN-PIXEL IN-PIXEL CODE MEMORY IN-PIXEL (DRAM) IN-PIXEL (SRAM) OFF-PIXEL N/A N/A (2 LATCHES) (2 LATCHES) SUBFRAME RATE [sfps 1] 181.8 600 / 300000 5 -- 100 PIXEL-CODE RATE [MHz] 7.2 0.06 / 30 5 -- 0.11 N/A ARBITRARY CODE / ROI YES/YES YES/-- YES/-- NO/--

SUBFRAME SHUTTER GLOBAL ROLLING GLOBAL ROLLING SYSTEM 1.Single-frame structured-light, Ultra high speed w/ 2.Single-frame Spatiotemportal IMAGING APPLICATIONS Deblurring Transport-aware compressive High-dynamic-range Fluorescence lifetime photometric stereo, compressive sensing sampling 3.Other: compressive sensing, etc. 1: subframe per second 5: demonstration only shown up to 600 sfps 2: also known as extinction ratio N/A: Not applicable 3: 퐹표푀 = 푃표푤푒푟Τ 푁푢푚푏푒푟 표푓 푝푖푥푒푙푠 × 퐹푟푎푚푒 푟푎푡푒 --: Not available 4: FoM includes the on-chip ADC power Bold font denotes the best performance proposed architecture. A very high tap contrast of 0.99 at 181.8 subframe per second rate is measured. The achieved power FoM is 2nd best to [33], which is implemented for low-power compressive sensing applications by incorporating pipelined readout and exposure with low-power Successive Approximation ADC (SAR) architectures in a single- tap sensor. The subframe rate, and code rate at pixel level are 2nd best to [39] that has a significantly smaller pixel array size.

Comparison to non-CEP image sensors (not highlighted in gray shading) suitable for computational imaging is performed in the table as well. Ultra high-speed imaging is done by multi-aperture camera in [32] where temporal coding at the sub-arrays level is used. Dual-bucket pixel in [11] is designed for full-array level coded exposure applications, not for per-pixel-exposure coding, for example for high-dynamic-range imaging. Fluorescence

Chapter 4. PPD-based CEP imager with in-pixel code memory 72

1

TAP

RAW IMAGERAW

OF OF THE SCENE

2

VIEWS

4 4

TAP

AND PROCESS TO RECOVER PROCESS TO AND

UPSAMPLE BYUPSAMPLEDEMOSAICING RAW IMAGERAW

COMPUTE: ALBEDO DEPTH [cm] 140

100

60

(a)

DISPARITY ALBEDO

(b) Figure 4.12: Experimentally measured single-frame structured-light imaging results: (a) The albedo and depth map reconstruction pipeline [35], and (b) albedo and disparity (i.e., inverse depth) maps from other scenes, for both static (left) and dynamic (right) scenes, all generated at 20fps.

lifetime imaging by a very high extinction ratio of 0.94 is proposed by dual-tap globally- programmed pixels in [53]. Chapter 4. PPD-based CEP imager with in-pixel code memory 73

4.6 Experimental demonstration

In this section, two computer vision applications as briefly explained in Section 4.2.1, and in more detail in [10], are demonstrated using the proposed CEP image sensor.

4.6.1 Single-frame structured light imaging results

Structured-light imaging is used for 3D imaging by projecting patterns to the scene and analyzing them from a different view point than that of the projection. The depth and surface information can be calculated by finding the correspondence between the sent patterns and the received ones. This technique conventionally requires multiple frames to reconstruct the disparity (the inverse depth) and albedo map of the scene. We show that by using dual-tap CEP camera, one can obtain the disparity (and consequently depth) and albedo maps at full pixel array resolution in a single frame, as described in Section 4.2.1.

Fig. 4.12(b) depicts the albedo and depth map reconstruction pipeline for the single- frame structured light 3D imaging setup where four subframes are used. As discussed before for Fig. 4.1(a), at the end of a frame, each of the two tap images captures the visual scene sampled four times and coded by a four-pixel time-multiplexed Bayer-like mosaic pattern, corresponding to four 90-degree-shifted sinusoidal illumination patterns. The Bayer-coded raw images from taps 1 and 2 are separated out into four images for each tap, each containing different structured light sine phase information. These images are then upsampled and processed to obtain four full-resolution images from which the resulting depth (or, alternatively, disparity) and albedo maps are computed [10].

4.6.2 Single-frame photometric stereo imaging results

Photometric stereo is an imaging technique for determining the scene surface 3D orienta- tion at each image point [65]. This technique conventionally requires capturing multiple

Chapter 4. PPD-based CEP imager with in-pixel code memory 74

SURFACE

NORMALS ALBEDO

Figure 4.13: Experimentally measured single-frame photometric stereo imaging results: surface normals maps and albedo maps generated at 20fps, for both static (left) and dynamic (right) scenes [35]. images, or deploying an image sensor with multiple taps (usually 3 or more taps [66]). We demonstrated that four images with different directions of illumination light can be acquired by a dual-tap CEP sensor within a single frame as described in Section 4.2.1.

Fig. 4.13 shows the results from a single-frame photometric stereo experiment. The top row depicts the surface normal maps of two different scenes captured at 20fps. The images are color-coded with the orientation of the surface normal at each pixel. The bottom row shows the albedo maps containing information on the reflectivity of the same scenes.

4.7 Discussion

The proposed dual-tap-pixel camera has been demonstrated in two computational pho- tography applications, and depth and surface normals maps have been successfully ob- tained using only a single frame. Alternative solutions for such applications using con- ventional image-sensors require multiple frame acquisitions [67] or, in some cases, more than two taps per pixel [66]. Multiple-frame readout suffers from motion blur, consumes more power and typically suffers from higher read noise. The noise of multiple read- Chapter 4. PPD-based CEP imager with in-pixel code memory 75 outs is higher if it is dominated by the readout circuitry noise (pixel reset noise, source follower noise, column-parallel circuitry noise, etc.) rather than the photon shot noise. Existing multi-tap pixels with more than two taps do not offer the versatility, flexibility and universality of a coded-exposure pixel camera, which are some of the advantages of the presented design.

In the proposed architecture, the code deserializers where designed to perform at an aggregate rate of 1.8Gb/s, and using typical transistor models for simulations the latches in the pixels could successfully register the codes at higher rate. In the measurements though, the maximum operational code rate is lowered to 7.2Mb/s and supply voltage is increased to assure the proper code registration in the pixels. Even under these condi- tions, it can be seen in the first row of Fig. 4.11 that some of the pixels can not change their code value (sparse pixels on the upper left half of the image that seem black on tap 2). The exact reasons are not found for this issue due to unknown technology pro- cess conditions and unconventional pixel choices. However, based on the analysis for this pixel and other image sensors that are not presented here, this is due to the use of PMOS transistors in the pixels and incompatibility with PPD architecture. Different issues can rise from using PMOS transistors in a PPD-based pixel, and they are discussed here to the best of our knowledge.

ˆ The doping profiles of the wells in the pixel array are expected to be different from the periphery of the pixel array, including the p-well used for NMOS devices. It is because the p-well doping profile near the transfer-gate of a PPD can significantly affect the transfer efficiency. Additionally, the combination of the p-well doping and the PPD n-well determines the charge transfer speed and dark current perfor- mance. Additional doping profiles may even be included, for instance to improve the dark current performance. Hence the regular models of the transistors are not accurate representation of the devices in the pixel and different performance shall be expected. It is worth mentioning that some technologies may provide models Chapter 4. PPD-based CEP imager with in-pixel code memory 76

PIXEL ARRAY GUARD RING 1 PIXEL CROSS SECTION 1 PIXEL CROSS SECTION n+ p+ TX

PMOS NMOS PMOS NMOS NWELL PWELL NWELL PWELL PWELL NWELL PWELL

PPD-NWELL Connected POSSIBLE INTER-WELL INTERACTIONS, AFFECTING Connected DEVICE FUNCTIONALITY/PERFORMANCE

DEEP ISOLATION PWELL

DEEP ISOLATION NWELL

p-epi

Figure 4.14: Cross-section of a PPD device with possible different n/p doping wells.

for in-pixel devices, but that has not been the case in our technology choice.

ˆ Due to the different doping profiles, the latch-up conditions are different from pe- ripheral circuits. Extra care must be taken into account to avoid it from happening.

ˆ Self-aligned silicide (salicide) process that is usually used in the active regions of microelectornic circuits to improve the connectivity of the interconnects and devices is blocked in photodetectors. The reason is that this process causes a reflective surface that lowers the quantum efficiency of the pixels. The drawback is the resistivity of the connections between diffusion/gate areas and interconnects increases by as much as an order of magnitude. This extra resistivity can make the devices slower (compared to peripheral models) and additionally increase the risk of latch-up.

ˆ The extra n-well required for the PMOS can drain some of the photogenerated electrons. This may cause significant issue if the wells are too close, or shorted. Otherwise this may only result in lower quantum-efficiency. The cross section for a PPD pixel with possible wells used for improving the photodetector performance, as well as NMOS/PMOS wells is shown in 4.14.

ˆ Deep isolation wells (both p-doped and n-doped) may be used in some technologies Chapter 4. PPD-based CEP imager with in-pixel code memory 77

for improving the pixel dark current performance. The doping profiles of those deep wells are made such that they are connected to the pixel array guard ring wells. Such doping profile, may affect the dopant concentration of the NMOS/PMOS wells and affect their performance as discussed earlier.

4.8 Conclusions

A method of camera re-programmability at the pixel level, multiple times during a single frame exposure is presented. This enables many new imaging applications in computa- tional photography including applications that previously required bulky and distortive optics used together with off-the-shelf cameras. The presented coded-exposure-pixel cam- era has been demonstrated in two such applications, but using only a single image sen- sor. These 3D imaging techniques are: structured-light imaging and photometric stereo imaging, both implemented within just a single frame. Compared to an equivalent im- plementation using a high-frame-rate camera operating at the same frame rate as the subframe rate in CEP cameras, the presented camera does not suffer from added read noise in each subframe, offers real-time operation without constraints of prohibitively high output data rate, and does not require excessive output buffer memory. Much lower power and complexity are also important advantages. Chapter 5

PPD-based CEP imager with in-pixel analog image memory

This chapter provides a detailed explanation of a PPD-based CEP camera implementa- tion with pinned storage-diode with the following highlights:

ˆ Proposing the first NMOS-only dual-tap CEP PPD-based pixel architecture

– with the smallest dual-tap pixel pitch of 7µm to date

– highest pixel array resolution of a CEP image sensor (effective 312×320 pixels)

– a high measured tap-contrast of 90%

– pixel architecture capable of global code update on the image plane

ˆ Development of the camera module consisting of active illumination for active com- putational imaging demonstrations

5.1 Introduction

As highlighted in the previous chapters, a key challenge in the design of CEP image sensors is the area and time overhead due to the in-pixel exposure control circuits. All

78 Chapter 5. PPD-based CEP imager with in-pixel analog image memory79 existing CEP image sensors [33–35,39,40] employ an in-pixel memory to store the expo- sure code at the cost of a larger pixel.

In the design detailed in this chapter, a two-tap pixel 2.5x smaller than the state of the art [68] (presented in chapter 4) is achieved, by introducing an analog-memory pixel (AMP) architecture that eliminates the need for in-pixel storage of the exposure code and enables an NMOS only architecture without capacitors. A charge sorting transfer time of 1µs yields a 2700 exposures per second at 312 x 320 sensor resolution, comparable to pixel code rate of the state of the art [69]. An image sensor based on this pixel is fabricated and is deployed in three computational imaging applications.

5.2 Pixel Architecture

5.2.1 Background

The two mainstream pixel architectures widely used in CMOS image sensors are the rolling shutter and the global shutter pixels as shown in Fig. 5.1 top and bottom, re- spectively. In the rolling shutter pixel, as depicted by the timing diagram on the right, the exposure of each row of pixel is shifted by one row readout time. This configuration allows for the pipelined readout and exposure operation among different rows to improve the frame rate. The global shutter pixel, on the other hand introduces a data storage diode in order to perform the pipeline operation and allows for the identical exposure time interval among all pixels.

Such pixel architectures are modified and used in different computational imaging applications, for example dual- and quad-bucket image sensor with CDS enabled capa- bility is proposed in [11] or the in-pixel coded-exposure capability introduced in [4]. Such designs require the control of the pixels to be global, and control of individual pixels necessary for CEP image sensors needs new pixel architecture as well as new periph- eral circuitry. Proposed solutions for CEP pixels so far have used in-pixel memory for Chapter 5. PPD-based CEP imager with in-pixel analog image memory80

EXPOSURE READOUT FRAME[n-1] FRAME[n+1] FRAME[n] EXP[i] RO[i] ROW[0] RES[i] ROW[1] TG[i] SEL[i] VCOL ROW[v-1] VR[i] VS[i]

ROLLING SHUTTER PIXEL GLOBAL SHUTTER PIXEL VDD VDD TG_GLOB RES TG RES VDD TG VDD SEL SEL1

STORAGE CFD CFD PPD

PPD DIODE

COL COL

EXPOSURE READOUT FRAME[n+1] EXP[i] RO[i] FRAME[n] ROW[0] TG_GLOB ROW[1] RES[i] TG[i] SEL[i] ROW[v-1] VCOL VR[i] VS[i]

Figure 5.1: Architectures for rolling-shutter (top) and global shutter (bottom) pixels. storing the pixel code and charge sorting based this code. These memories are either SRAM-based as in [33] and latch-based such as [68] which require PMOS transistors or DRAM-based proposed in [69] where regular code refreshments should be used to keep the in-pixel data valid. A new pixel architecture that doesn’t require code memory inside the pixel can address these issues.

5.2.2 Proposed operation

In code memory pixels (CMP), dynamic memories require small area at the cost of fast refreshing rates (due to light sensitivity of the memory) while area-expensive static Chapter 5. PPD-based CEP imager with in-pixel analog image memory81

n = 0

SUBFRAME n = n+1 FOR SUBFRAME [n+1] FOR SUBFRAME [n] PIPELINED PHOTO-CHARGE APPLY CODE ROW-WISE GENERATION TO MEMORY

TRANSFER CHARGE TO 0 PIXEL 1 MEMORY CODE TRANSFER TRANSFER TO TAP 1 AND TO TAP 2 AND ACCUMULATE ACCUMULATE

NO n = N

YES FRAME READOUT AND RESET

Figure 5.2: The flow chart for analog-memory pixel (AMP). memories don’t require refreshing. If an area efficient and light-insensitive memory can be used to store the output of the PPD, then both issues are addressed. The operation of this pixel would be different from CMP architectures, such as [69] and [35]. As depicted in Fig. 5.2, after a global exposure in a given subframe, the photogenerated charge is first transferred to an intermediate charge-storage node, here referred to as “analog memory”. During the next subframe’s exposure, in a pipelined fashion, the current subframe’s pixel codes are applied from an external memory row-by-row to sort the stored photogenerated charge by shifting it to taps 1 or 2 for codes 0 and 1, respectively. This is repeated N times per frame. As a result, the photogenerated charges across all subframes of a frame are selectively integrated over the two taps according to the per-pixel code sequence and are read out once at the end of the frame as two images.

5.2.3 Proposed pixel architecture

The AMP schematic and timing diagrams are shown in Fig. 5.3. The photogenerated charge at the pinned photodiode (PPD) at each subframe is transferred by globally controlled TG GLOBAL gate to the charge memory comprised of a pinned storage diode Chapter 5. PPD-based CEP imager with in-pixel analog image memory82

TG: TRANSFER GATE CODECOL FD: FLOATING DIFFUSION CODECOL TAP 1 VDD VDD ROW_LOAD CFD1 SELECT1 DRAIN TG_GLOBAL TG1 VRST RESET1 RESET2 PPD SD

TG2 COLUMN SELECT2 CFD2 VDD TAP 2 FRAME RESET EXPOSURE READOUT

N SUBFRAMES: SUBFR[1] SUBFR[n] SUBFR[N]

SUBFRAME [n] EXPOSURE & SUBFRAME [n-1] CODE UPLOAD TG_GLOBAL

CODECOL [h] code[0] code[1] code[V-1] ROW_LOAD [0] ROW_LOAD [1]

ROW_LOAD [V-1] TG1[0,h] ROW[0] IF CODE[0]=1

] TG2[0,h] h [ TG1[1,h] ROW[1] IF CODE[1]=0

TG2[1,h] COLUMN TG1[V-1,h] ROW[V-1] IF CODE[V-1]=0 TG2[V-1,h]

Figure 5.3: Schematic and the corresponding timing diagram of the analog-memory pixel (AMP). V is the number of rows, h is the column index and N is the number of subframes.

(SD). During the next subframe, rows are accessed one by one by ROW LOAD and the exposure code is applied. The charge is then transferred to taps 1 or 2 for codes 0 and 1, respectively. The compact, 7µm pixel layout and its potential diagrams are shown in Fig. 5.4. The top potential diagram shows the transfer of the photo-generated charges to the SD, and the lower diagram presents the pipeline operation of the photo-charge generation and charge sorting based on the code pattern send to the pixel.

Pixel pitch of 7µm with 38.5% FF has been achieved by using NMOS-only archi- Chapter 5. PPD-based CEP imager with in-pixel analog image memory83

MUX

READOUT 1 TAP1

B’ FD1 1

PINNED TG PHOTODIODE (PPD) STORAGE 7μm B DIODE

(SD)

2

GLOBAL

_

TG TG

DRAINGATE B’ FD2

READOUT 2 TAP2 MUX

DRAIN LOW TGG HIGH TG2 LOW

TG1 LOW POTENTIAL B B’ DRAIN LOW TGG LOW TG2 HIGH (LOW)

TG1 LOW (HIGH) POTENTIAL B B’ CROSS-SECTION FOR POTENTIAL DIAGRAMS PHOTO-GENERATED CHARGE: CHARGE INTEGRATION CHARGE SORTING BASED ON THE CODE

Figure 5.4: The layout of the pixel (top) and the corresponding potential diagrams during the global data sampling and charge sorting phases (bottom).

tecture. The SD must have comparable area to the PPD for a good charge transfer efficiency. In this design an SD to PPD ratio of about 39% is chosen. Depending on the choice of pixel pitch, the two readouts and two multiplexers per pixel reduce the FF. Additional improvement of effective FF can be achieved using techniques such as incorporating micro lenses and light guide structure [4] or backside illumination [70]. Chapter 5. PPD-based CEP imager with in-pixel analog image memory84

10 DIGITAL CHANNELS FOR PIXEL CODES

10 PIXEL CODE DESERIALIZERS AND LOGICS

CODED-EXPOSURE ANALOG-MEMORY PIXEL (AMP) ARRAY

312Hx320V ROW LOGIC AND DRIVERS AND LOGIC ROW

324 COLUMN-PARALLEL INCREMENTAL ΔΣ ADCs CONFIG. REG. 9 OUTPUT SERIALIZERS

9 DIGITAL CHANNELS FOR IMAGE STREAMING (TWO IMAGES PER FRAME: TAP1 AND TAP2)

Figure 5.5: Architecture of the sensor.

5.3 Sensor Architecture

The top-level architecture of the sensor is depicted in Fig. 5.5. The 4Gbps-rate input codes are deserialized and sent to 312 columns of pixels, row by row. The outputs of the pixels are digitized by a bank of 10-bit column-parallel 2nd-order incremental ADCs and are multiplexed to 9 output channels. A row logic block on the right side of the array controls the upload of the codes row by row. The code pattern upload and readout are time-multiplexed, thus the row logic is shared for both operations.

The code deserializers are shown in more detail in Fig. 5.6. One clock input to the chip is used for deserializing the 10 input channels. The clock is distributed by a clock tree to minimize skew between different channels. The clock and data at each channel are going through signal conditioning circuits to assure proper timing for registering the data. After registering the deserialized data for 32 pixel columns, level shifters and drivers are shipping the code data to the pixel array. ENABLE signal is used to keep

Chapter 5. PPD-based CEP imager with in-pixel analog image memory85

]

i

[

IN

_

CLK CODE

D Q D Q

D D Q D Q D Q D Q

CLK SIGNAL SIGNAL CONDITIONING

CLK

> >

> > > > > >

2 3 5 1 4 6

31 32

< < < < < <

< < CLK/16 COLUMN SHARING LOGIC 32 CLK/16 32-BIT REGISTERS 32

GENERATION ENABLE CLK AND CLK PULSE 32 COLUMN DRIVERS CODE<1:32> CODE<1:32> TO PIXEL ARRAY

Figure 5.6: Simplified code deserializer block diagram and the associated blocks. This block is used for every channel in Fig. 5.5.

both CODE[1 : 32] and CODE[1 : 32] low at the transition of the codes between different rows. This assures that the code of one row does not affect the charge sorting at another row. The maximum input clock rate is 200MHz, resulting to 400Mbps per channel and minimum row pattern upload of 80ns (for total of 320 codes through all 10 channels). A column sharing block enables the code to be duplicated for neighbouring columns. When it is activated, the deserialization factor can be lowered from 32 to 16 or 8 and the CLK generation block adjusts the clock division used for the 32-bit registers.

The image sensor is designed and fabricated in a 0.11µm CIS technology, based on the pixel architecture described earlier. The chip area is 3mm × 4mm with an active pixel area of 2.184mm × 2.24mm. Supply voltages of 1.2V and 3.3V are used for digital peripheral circuits and readout, respectively. The micrograph of the chip with the main building blocks is shown in Fig. 5.7. Chapter 5. PPD-based CEP imager with in-pixel analog image memory86

10 CODE DESERIALIZER CHANNELS

CODED-EXPOSURE IC ANALOG-MEMORY PIXEL (AMP) ARRAY (312Hx320V)

CODE LOADING AND ROW DRIVERS ROW AND LOADING CODE FPGA 324 COLUMN-PARALLEL 2ND-ORDER ΔΣ ADCs

HORIZONTAL READOUT SCANNER

Figure 5.7: Chip micrograph. 3mm × 4mm CMOS 0.11µm

5.4 System implementation

As required in active computational imaging applications, in this work a projector is used together with the camera for experimental measurements. The projector is used for both characterizing the pixel performance as well as the applications that are demonstrated in the next section. The setup is shown in Fig. 5.8, with a simplified block diagram of the camera. A development board together with a custom-designed PCB carrying the image sensor is used. On the custom board, in addition to the image sensor are the regulators to generate the necessary biasing and references for the sensor as well as the proper port for connection to the projector for synchronized operation. The FPGA on the development board generates all the timing for the sensor and initialization of the board. At the start of the camera operation, first all the necessary code patterns are sent Chapter 5. PPD-based CEP imager with in-pixel analog image memory87

TO PROJECTOR CIS BOARD

CMOS IMAGE REGULATORS SENSOR

SYNC SIGNAL SYNC CONTROL PIXEL-CODE IMAGE SPI 7-WAVELENGTH LED ARRAY SIGNALS DATA DATA (5 OF THEM ARE USED HERE) BOARD-TO-BOARD CONNECTORS CODED-EXPOSURE CAMERA

BOARD-TO-BOARD CONNECTORS DMD PROJECTOR

FPGA

MEMORY FOR PIXEL DDR FLASH CODE DATA MEMORY MEMORY

USB 3.0 FPGA BOARD PC

Figure 5.8: Camera and projector configuration for both direct/indirect and range gating applications, and the camera module block diagram. from PC to a DDR memory through FPGA. During the frame operation, the patterns are sent one by one to the sensor on their respective subframes. Also at every subframe, the FPGA sends a trigger to projector in order to project the corresponding pattern.

5.5 Experimental results

The camera is electrically characterized and deployed in three computational imaging applications of transport-aware imaging, multispectral imaging and range gating imaging. The details of each are given in the following.

5.5.1 Electrical experiments

In order to characterize the efficiency of the charge transfers from PPD to taps 1 or 2, a tap-contrast similar to the one introduced in [68] is used here. The abstract pixel circuit and timing diagram for the measurement are shown in Fig. 5.9. The light is a Chapter 5. PPD-based CEP imager with in-pixel analog image memory88

LIGHT TGG VDDPIX VDDPIX PPD RES RES

VDDPIX TG1 TG2 VDDPIX

C1 C2

SD SEL SEL LIGHT TGG TG1 TG2 RES SEL

Figure 5.9: The simplified schematic of the pixel and timing diagram for measuring the tap-contrast. pulsed light synchronized with the negative edge of T GG control signal. At the end of every interval, the T GG signal is toggled high to transfer the photo-generated charges from PPD to storage diode (SD). After every TGG toggle, the transfer to C1 or C2 is done by asserting TG1 or TG2. In the illustrated timing diagram all the photo-generated charges are expected to be collected in C1, but due to non-idealities such as limited charge transfer speed and charge transfer efficiency, some part of the charges will be collected in C2. Fig. 5.10 depicts the average tap contrast as a function of the subframe rate as well as the tap contrast histogram at 2700 subframes/sec rate, where the average tap contrast of 90% is measured.

5.5.2 Direct/Indirect imaging

The AMP sensor enables single-shot (i.e., single-frame) imaging capability in challeng- ing computational photography applications, such as direct/indirect light imaging [62], multispectral imaging [71] and range gating imaging. In the direct/indirect imaging, the Chapter 5. PPD-based CEP imager with in-pixel analog image memory89

AVERAGE TAP CONTRAST 100

80

60 [%]

40

20 103 2700 104 SUBFRAME CODE UPDATE RATE [sfps]

TAP CONTRAST HISTOGRAM AT 2700 x 104 SUBFRAME CODE UPDATE RATE 5

4

3

2

OF PIXELS OF # # 1

0 0 0.2 0.4 0.6 0.8 1 TAP CONTRAST sfps: SUBFRAMES PER SECOND

Figure 5.10: Experimentally measured tap contrast.

TAP 1 DIRECT/INDIRECT LIGHT IMAGING ON TAP 2 30 30 OFF 1 1 N=30 N=30

j q i p CAMERA PLANE OF PROJECTOR IMAGE PIXEL i EPIPOLAR LINES TAP 1 TAP 2

Figure 5.11: The setup for direct/indirect imaging. Chapter 5. PPD-based CEP imager with in-pixel analog image memory90

PROJECTOR CODES CAMERA CODES MEASURED TAP 1 IMAGE MEASURED TAP 2 IMAGE ON OFF TAP 1 TAP 2

UNIFORM TAP 1 ON TAP 1 ON: TAP 2 OFF:

ILLUMINATION (i.e., TAP 2 OFF) COLLECTS ALL LIGHT COLLECTS NO LIGHT

SUBFRAME 1 1

PROJECTED MIRRORED STRIPE CODES SOME DIRECT LIGHT SOME INDIRECT LIGHT

WIDE STRIPES ON CAMERA (SURFACE REFLECTIONS) (INTERNAL SCATTERING)

SUBFRAME 1 1

ALL DIRECT LIGHT ALL INDIRECT LIGHT

(SURFACE REFLECTIONS) (INTERNAL SCATTERING)

ST ST

1 1

SUBFRAMES

30 30

TH TH

30 30

INTERROGATIONS OF THE THE OF INTERROGATIONS

PROJECTIONS OF STRIPES STRIPES OF PROJECTIONS

OF VARIOUS GEOMETRIES VARIOUS OF

30 30

30 30 SCENE WITH MIRRORED STRIPES MIRRORED WITH SCENE (a)

SCATTERING OBJECT (WAX) TAP 1: DIRECT LIGHT TAP 2: INDIRECT LIGHT (b)

Figure 5.12: The sensor output images of (a) a highly scattering object (wax candle) demonstrating the direct/indirect light imaging concept and corresponding experimental results, and (b) direct/indirect light imaging experimental results at video rate (the image on top is captured by a conventional camera). photogenerated charges due to direct and indirect incident light are sorted into two taps and read out as two images. The projected light that reflects from the object once and Chapter 5. PPD-based CEP imager with in-pixel analog image memory91 travels back to the camera is called direct light. Indirect light is the light that reaches the camera after multiple reflections or scattering. Fig. 5.11 illustrates this imaging concept by showing a direct light ray (through projectors pixel p) and an indirect light ray (projected by pixel q of the projector). The path of indirect light is arbitrary, but the direct light follows the epipolar constraint, i.e. it travels through a pair of corresponding (i.e., mirrored) lines, called epipolar lines, from the projector to the camera [7]. This constraint can be easily exploited in the AMP sensor by projecting geometrical patterns, such as lines of various orientations, over multiple (e.g., N=30) subframes to sufficiently interrogate the scene. In synchrony with the projector, the imager is programmed to accumulate on tap 1 for all pixels within the camera epipolar lines, and on tap 2 for all other pixels, in order to integrate charge due to direct and indirect light, respectively (due to its unconstrained nature, some indirect light contributes to tap 1 rather than tap 2 and this contribution is then subtracted out, but this is ignored here for the sake of simplicity).

Fig. 5.12(a) illustrates the outputs of the sensor for different projector and camera code patterns. In Fig. 5.12(a-top), the scene is uniformly illuminated and tap 1, coded to be on for all pixels, captures all light, both direct and indirect. In Fig. 5.12(a-middle), mirrored (and thus epipolar) line patterns are uploaded to the projector and the camera, in a single subframe. As a result, for the illuminated regions of the scene, tap 1 collects the charge corresponding to the direct light, and tap 2 - corresponding to the indirect light. If various such epipolar patterns are submitted to the camera and projector over multiple subframes, such as 30 subframes in Fig. 5.12(a-bottom), all regions of the scene are uniformly captured, and images including all direct/indirect light for the entire scene can be obtained, as depicted. The sensors output images for a more complex scene is shown in Fig. 5.12(b).

For both Figs. 5.12(a) and (b), in the direct light image, the light that is reflected directly from the surface of a highly scattering object is weak, making it appear darker Chapter 5. PPD-based CEP imager with in-pixel analog image memory92

MULTISPECTRAL IMAGING N=5 λ5 TAP 1 λ4 TAP 2 λ3 0 λ2 λ1 5 λ3 λ2 λ1 λ4 1 λ1 λ2

λ3

2 1 2 1

λ5 λ4

TAP TAP TAP TAP CAMERA λ5 2x2 t PIXEL λ1-λ4 - RAW OUTPUTS TILE λ5 - RECONSTRUCTED

Figure 5.13: The setup for multispectral imaging. than the non-scattering background. In contrast, in the indirect light image, the object appears brighter than the surroundings. This is due to its highly scattering nature, where light penetrates the object, scatters and emerges from a different point that is not on the camera epipolar line anymore.

5.5.3 Multispectral imaging

Multispectral imaging is another computational imaging application that can utilize the proposed CEP camera. Fig. 5.13 illustrates how the AMP sensor performs single-frame multispectral imaging, without using on-chip color filters. In this example, 5 LEDs of different wavelengths, λ1 to λ5, are sequentially turned on during the 5 respective subframes. The LEDs have flood illumination, without spatial patterns. Synchronously with the LEDs, 5 code matrices, organized in 2x2-pixel tiles, are submitted to the AMP sensor. The sorted photogenerated charges are accumulated during the 5 subframes and are read out once at the end of the frame as two images. From 8 buckets of each 2x2-pixel tile, 4 images from buckets 2 contain images from LEDs λ1 to λ4 as shown in the Fig. 5.13. The images from buckets 1 can be used to extract the last LED image, namely Chapter 5. PPD-based CEP imager with in-pixel analog image memory93

MULTI-COLOR OBJECT MULTISPECTRAL IMAGING EXPERIMENTAL RESULTS

460 nm (BLUE) 525 nm (GREEN) 623 nm (RED)

850 nm (NIR) 940 nm (NIR) RECONSTRUCTED

Figure 5.14: Multispectral imaging experimental results at video rate. The image on top is captured by a conventional camera.

from λ5 wavelength. The extracted 5 images at the 5 wavelengths can be upsampled to full resolution [10]. Finally, spectral analysis or color imaging can be performed on these images.

Fig. 5.14 depicts the output images of the camera for 5 different wavelengths demul- tiplexed from the two tap images of a single frame. A reconstructed color image is also depicted. Chapter 5. PPD-based CEP imager with in-pixel analog image memory94

DEPTH-GATING IMAGING OBJECT BEYOND PLANE B I TAP1 BETWEEN A AND B DEPTH B II TAP2 CLOSER DEPTH A THAN A TAP1 III i j IMAGE PLANE P CAMERA OF i AND j PROJECTOR TAP 1 TAP 1 TAP 2 TAP 2 (a)

TAP1 TAP2 I

DEPTH > B OUT OF RANGE

II

OUT OF RANGE A < DEPTH < B

III

DEPTH < A OUT OF RANGE

(b)

Figure 5.15: (a) The setup for range gating imaging, and (b) raw output images of the camera with a hand at different depths in the field of view of the camera. Chapter 5. PPD-based CEP imager with in-pixel analog image memory95

5.5.4 Range gating imaging

The range gating imaging setup is depicted in Fig. 5.15(a), where the camera captures images of objects based on their depth in the field of view. As illustrated in the figure, the camera and projector are arranged to work in epipolar planes, where the epipolar plane is defined as a plane that contains both the projected light ray and the reflected light ray that is imaged by the camera. Under this condition, the camera and projector can be programmed such that for pixel p of the projector, the depth of interest maps to a range of pixels in the imaging plane of the camera bound by pixels i and j. In case of a raster scan laser as projector, a CEP camera can be programmed to only image a line of pixels (can be a row, a diagonal line of pixels or a curved line of pixels) that corresponds to the epipolar plane at any point of time [27]. Using a dual-tap pixel architecture, the light reflected from the depths outside of the range of interest can be captured on the secondary tap of the pixels outside of the [i : j] range. Output images of the camera from different frames captured at 30fps are shown in Fig. 5.15(b). In this example, when the hand is further than depth B or closer than depth A from the camera, it appears on the tap 1 images. Otherwise, within the depth A and B range, the hand image is captured on tap 2.

5.6 Conclusion

In this work we proposed an NMOS-only CEP pixel architecture. This architecture has enabled us to shrink the pixel pitch to 7µm, the smallest CEP pixel reported to date.

An image sensor with 312H × 320V pixel array resolution is fabricated in a 0.11µm CIS technology. Tap-contrast of 90% is achieved at subframe rate of 2700sfps. The achieved performance is suitable for many computational imaging techniques, though we only demonstrated three computational applications of direct/indirect imaging, multispectral imaging and range gating imaging. Table 5.1 summarizes the performance of the proposed Chapter 5. PPD-based CEP imager with in-pixel analog image memory96

Table 5.1: Comparison table.

[69] UBC [34] Toronto [39] UBC [33] JHU [32] Shizuoka [11] Stanford THIS WORK OE 2019 ISSCC 2019 TCAS 2018 OE 2016 ISSCC 2015 JSSC 2012 PER 1/15 OF CODED-EXPOSURE MODE PER-PIXEL (i.e. PIXELWISE SPATIAL AND TEMPORAL CODING) PER FULL ARRAY ARRAY TECHNOLOGY [nm] 110 CIS 130 CMOS 110 CIS 130 CMOS 180 CIS 110 CIS 130 CIS PINNED PHOTODIODE YES -- YES NO (NW/P) YES YES YES PIXEL PITCH [µm] 7 10.2 11.2 12.1x12.2 10 11.2X5.6 5

PIXEL FILL FACTOR [%] 38.5 41.5 45.3 33.2 52 -- 42 NUMBER OF TAPS 2 1 2 2 1 1 2 TAP CONTRAST 1 90 N/A 99.5 -- N/A -- -- PIXEL COUNT [HxV] 312 x 320 128x128 244 x 162 10 x 10 127 x 90 64 x 108 640 x 576 FRAME RATE [fps] 30 10 25 60 100 32 N/A POWER [mW] 54 0.094 / 0.18 / 32.85 34.4 5 0.012 / 1.23 5 1.3 1620 N/A

2

ARCHITECTURE POWER FoM [nJ] 18 0.6 / 1.11 / 200 34 2 / 205 1.14 7324 N/A IN-PIXEL CODE MEMORY NO YES (DRAM) YES (2 LATCHES) YES (DRAM) YES (SRAM) IN-PIXEL DATA MEMORY YES (CHARGE) NO NO NO NO FRAME-CODE RATE [sfps 3] 2700 60 / 1280 / 300000 6 180 600 / 300000 7 100 N/A PIXEL-CODE RATE [MHz] 270 0.98 / 20.9 / 4910 6 7.1 0.06 / 30 7 0.11 ARBITRARY CODE / ROI 4 YES/YES YES/YES YES/YES YES/-- NO/--

SYSTEM FRAME-CODE SHUTTER GLOBAL/ROLLING ROLLING GLOBAL ROLLING ROLLING 1 Direct/indirect 1 Structured-light Ultra-high-speed Spatio-temportal light imaging High frame-rate imaging imaging w/ High-dynamic-range IMAGING APPLICATIONS Deblurring compressive 2 Multispectral video reconstruction 2 Photometric compressive imaging sensing imaging stereo imaging sensing 1: Also known as Extinction Ratio 5: without ADC power 2: 퐹표푀 = 푃표푤푒푟Τ 푁푢푚푏푒푟 표푓 푃푖푥푒푙푠 × 퐹푟푎푚푒 푟푎푡푒 6: demonstration done up to 1280 sfps 3: sfps: subframes per second 7: demonstration done up to 600 sfps 4: ROI: region of interest Bold font denotes the best performance among the shaded columns N/A: Not Applicable; --: Not Available

image sensor in comparison to the state-of-the-art image sensors, including two non-CEP image sensors [11,32]. Future work for CEP cameras includes improving subframe rate, introducing pixel architectures compatible with correlated-double sampling (CDS) for good noise perfor- mance and shrinking the pixel pitch even further. Chapter 6

Conclusion and future work

6.1 Summary

CEP image sensors are a new class of sensors that bring new capabilities and higher order of programmability at the pixel level in a camera module. This thesis aims to explore the technologies alongside the pixel architectures for these image sensors.

Different types of photo-detectors have been studied and compared in chapter 2. This chapter’s focus has not been to find the best photo-detector performance, but an introduction of how different photo-detectors perform and may the CEP sensors benefit from them. Finding the best performance for each of the photo-detectors requires the optimization of the process doping profiles and the geometries for the pixel’s layers. This task is outside the scope of this thesis, as the goal has been to implement such a sensor and not find the best performing pixel. In parallel to this study, I conducted an analysis on the effect of doping gradient and pixel shape in charge transfer speed in PPD pixels, and the results are published in [72].

The first generation of CEP camera module with a custom-designed image sensor using photo-gate based pixels is explained in chapter 3. The concept of a dual-tap pixel using in pixel D-latches for storing the codes being applied to the image at pixel-level

97 Chapter 6. Conclusion and future work 98 is introduced. In this work, the two taps have different conversion gains to ensure a better dynamic range in applications like direct/indirect imaging. One tap is expected to capture a smaller signal value. Due to using photo-gate structure in a technology that was not optimized for this choice, a poor tap-contrast and dark current performance was achieved. This made it difficult to demonstrate the camera in a proper computer vision application, but the new camera module electrical functionality was reported in [40].

Then in chapter 4, code memory pixel is proposed based on PPD pixels. Like the previous architecture, this pixel also uses D-latches inside the pixel to store the codes applied to the pixel array. The proposed pixel has been fabricated in a 110nm CMOS image sensor technology and had the smallest pixel-pitch for a dual-tap CEP architecture at the time of publication. A very high tap-contrast of 99% was achieved by the PPD pixel, but due to the in-pixel PMOS transistors’ behavior, the speed of the code appli- cation was more than 100 times slower than what was expected. The camera module based on the proposed image sensor was used to demonstrate single-shot 3D imaging using computer vision techniques. The camera results and the applications have been published in [10,34,35,68].

Finally, in chapter 5, analog memory pixel is proposed based on PPD pixels with storage diodes, usually used for global-shutter image sensors. The new architecture uses an in-pixel storage diode to implement a pipeline operation for the photo-charge collection and sorting based on the pixel code. This solution enables an NMOS-only pixel design to shrink the pixel pitch to 7µm. Compared to the code memory pixel, it has improved the frame-code and pixel-code rates by factors of 15 and 38, respectively. Different computer vision applications such as direct/indirect, depth-gating, and multi- spectral imaging have been demonstrated using the new camera module. The results of this work will be published in the near future. Chapter 6. Conclusion and future work 99

6.2 future work

In this thesis, two new architectures for dual-tap CEP image sensors have been proposed, and several computer vision applications are demonstrated using the new custom designed camera modules. There have been very few groups with different approaches working on this topic so far, and there is plenty of work and research remaining in the field. In brief, they are itemized here:

ˆ The charge transfer times reported in the literature are as low as 1ns in PPD pixels, using specialized process steps or pixel shapes. The charge transfer time of the pixels reported in this thesis is in the order of µs, and improving the transfer speed can proportionally improve the frame-code and pixel-code rates. This enables high frame-code rates at higher pixel array resolutions, improving the current method’s performances, or enabling further computer vision applications.

ˆ There are imaging techniques that have not benefited from CEP architectures so far. For instance, ToF cameras can benefit from these sensors when combined with direct/indirect imaging. This combination can make the ToF insensitive to reflections or interference from other ToF cameras in the environment. Such design requires each pixel to have fast charge transfers (as low as 1ns) and possibly four or more taps to meet the video frame rate requirement.

ˆ In this work, single readout and DS readout have been used. Thus, the pixel’s reset noise, which is the dominant noise source of the pixel at low light levels, is not suppressed. In order to improve the noise performance, new pixel architectures that enable CDS shall be proposed. Depending on the size of the pixel floating diffusion size and the parasitic capacitance associated with it, the temporal noise can be as low as a few electrons using CDS.

ˆ Going to higher image sensor resolutions and faster frame-code rates requires to Chapter 6. Conclusion and future work 100

d 2d

Figure 6.1: A simple microbump map illustration for a 2 × 2 pixel set. For a microbump pitch of d, the pixel pitch is 2d.

increase the code transfer rate proportionally. In the image sensors proposed here, the aggregate code transfer rate has been as high as 4 Gb/s, using double-data- rate CMOS level signaling. This data rate for a 1MPixel image sensor at 40fps translates to around 100 sfps, which can be a limiting factor for applications such as direct/indirect imaging (requiring several hundred to a few thousand). One brute force solution to this issue is to use high-speed links for the code transfer, but based on the application using on-chip code-generation engines can be a more power-efficient approach.

ˆ All the sensors in this thesis were using standard 2D technologies with front-side illumination. Incorporating back-side illumination technology allows more metal routings to be used to transfer the codes to the pixels without compromising the aperture of the pixels. This parallel code transfer to the pixel array can improve the frame-code rates, crucial for higher image resolutions.

ˆ Additionally, going to the state-of-the-art technologies with the possibility of 3D integration enables transferring the codes directly to each pixel from a 2nd chip, enabling the ultimate frame-code rates. This may not shrink the pixel size, as the Chapter 6. Conclusion and future work 101

limitation will be the bonding pitch. For instance, a microbump map for a 2×2 pixel set is shown in Fig. 6.1 with the assumption that 3 signals (two complementary codes and one output of the pixel) per pixel are required to be connected to the 2nd chip. In this example, the pixel pitch needs to be twice the microbump pitch, meaning 2d. This means smaller pixels can be proposed using 3D integration only if the bonding pitch can become smaller than 3.5µm. Other benefits of using 3D integration are the possibility for higher fill-factor or CDS enabled pixels. Bibliography

[1] M.-W. Seo, S. Kawahito, K. Kagawa, and K. Yasutomi, “A 0.27 e-rms read noise 220-µV/e- conversion gain reset-gate-less CMOS image sensor with 0.11-µm CIS process,” IEEE Electron Device Letters, vol. 36, no. 12, pp. 1344–1347, 2015. 1

[2] S. Masoodian, J. Ma, D. Starkey, Y. Yamashita, and E. R. Fossum, “A 1Mjot 1040fps 0.22 e-rms Stacked BSI Quanta Image Sensor with Cluster-Parallel Readout,” in Proceedings of the 2017 International Image Sensor Workshop, Hiroshima, Japan, 2017, pp. 230–233. 1

[3] X. Ge and A. Theuwissen, “0.5 e- rms Temporal-Noise CMOS Image Sensor with Charge-Domain CDS and Period-Controlled Variable Conversion Gain,” in Proceed- ings of the International Image Sensor Society Workshop, Hiroshima, Japan, 2017, pp. 290–293. 1

[4] M. Kobayashi, Y. Onuki, K. Kawabata, H. Sekine, T. Tsuboi, T. Muto, T. Akiyama, Y. Matsuno, H. Takahashi, T. Koizumi, K. Sakurai, H. Yuzurihara, S. Inoue, and T. Ichikawa, “A 1.8e-rms Temporal Noise Over 110-dB-Dynamic Range 3.4 µm Pixel Pitch Global-Shutter CMOS Image Sensor With Dual-Gain Amplifiers SS- ADC, Light Guide Structure, and Multiple-Accumulation Shutter,” IEEE Journal of Solid-State Circuits, vol. 53, no. 1, pp. 219–228, Jan 2018. 1, 43, 79, 83

[5] F. Lalanne, P. Malinge, D. H´erault,and C. Jamin-Mornet, “A native HDR 115 dB 3.2 µm BSI pixel using electron and hole collection,” Proceedings of the International

102 BIBLIOGRAPHY 103

Image Sensor Worshop, Hiroshima, Japan, vol. 30. 1

[6] S. Velichko, S. Johnson, D. Pates, C. Silsby, C. Hoekstra, R. Mentzer, and J. Beck, “140 dB Dynamic Range Sub-electron Noise Floor Image Sensor,” Proceedings of the International Image Sensor Worshop, Hiroshima, Japan, vol. 30, 2017. 1

[7] J. Bogaerts, R. Lafaille, M. Borremans, J. Guo, B. Ceulemans, G. Meynants, N. Sarhangnejad, G. Arsinte, V. Statescu, and S. van der Groen, “6.3 105 × 65mm2 391Mpixel CMOS image sensor with > 78dB dynamic range for airborne mapping applications,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC), Jan 2016, pp. 114–115. 1

[8] R. Raskar, A. Agrawal and J. Tumblin, “Coded Exposure Photography: Motion De- blurring using Fluttered Shutter,” ACM Transactions on Graphics (TOG), vol. 25, no. 3, pp. 795–804, 2006. 3

[9] Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, and S. K. Nayar, “Video from a single coded exposure photograph using a learned over-complete dictionary,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 287–294. 3, 14, 23, 25

[10] M. Wei, N. Sarhangnejad, Z. Xia, N. Gusev, N. Katic, R. Genov, and K. N. Ku- tulakos, “Coded Two-Bucket Cameras for Computer Vision,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 54–71. 3, 16, 23, 24, 57, 59, 73, 93, 98

[11] G. Wan, X. Li, G. Agranov, M. Levoy, and M. Horowitz, “CMOS image sensors with multi-bucket pixels for computational photography,” IEEE Journal of Solid-State Circuits, vol. 47, no. 4, pp. 1031–1042, 2012. 3, 14, 23, 31, 71, 79, 96 BIBLIOGRAPHY 104

[12] C. Zhou and S. K. Nayar, “Computational cameras: convergence of optics and processing,” Image Processing, IEEE Transactions on, vol. 20, no. 12, pp. 3322– 3340, 2011. 4, 13

[13] G. P. Weckler, “Operation of p-n Junction Photodetectors in a Photon Flux Inte- grating Mode,” IEEE Journal of Solid-State Circuits, vol. 2, no. 3, pp. 65–73, 1967. 6

[14] D. Durini and D. Arutinov, “Operational principles of silicon image sensors,” in High Performance Silicon Imaging. Elsevier, 2020, pp. 25–73. 7

[15] Y. Chen, Y. Xu, Y. Chae, A. Mierop, X. Wang, and A. Theuwissen, “A 0.7erms- temporal-readout-noise CMOS image sensor for low-light-level imaging,” in 2012 IEEE International Solid-State Circuits Conference, 2012, pp. 384–386. 12

[16] H. Kim, J. Park, I. Joe, D. Kwon, J. H. Kim, D. Cho, T. Lee, C. Lee, H. Park, S. Hong, C. Chang, J. Kim, H. Lim, Y. Oh, Y. Kim, S. Nah, S. Jung, J. Lee, J. Ahn, H. Hong, K. Lee, and H. Kang, “A 1/2.65in 44Mpixel CMOS Image Sensor with 0.7m Pixels Fabricated in Advanced Full-Depth Deep-Trench Isolation Technology,” in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 104–106. 13

[17] M. Sato, Y. Yorikado, Y. Matsumura, H. Naganuma, E. Kato, T. Toyofuku, A. Kato, and Y. Oike, “A 0.50e-rmsNoise 1.45m-Pitch CMOS Image Sensor with Reference- Shared In-Pixel Differential Amplifier at 8.3Mpixel 35fps,” in 2020 IEEE Interna- tional Solid- State Circuits Conference - (ISSCC), 2020, pp. 108–110. 13

[18] Y. Sakano, T. Toyoshima, R. Nakamura, T. Asatsuma, Y. Hattori, T. Yamanaka, R. Yoshikawa, N. Kawazu, T. Matsuura, T. Iinuma et al., “A 132dB single-exposure- dynamic-range CMOS image sensor with high temperature tolerance,” in 2020 IEEE BIBLIOGRAPHY 105

International Solid-State Circuits Conference-(ISSCC). IEEE, 2020, pp. 106–108. 13, 44

[19] R. Ng, M. Levoy, M. Br´edif,G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Computer Science Technical Report CSTR, vol. 2, no. 11, 2005. 13

[20] C. Perwass and L. Wietzke, “Single lens 3D-camera with extended depth-of-field,” in IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2012, pp. 829 108–829 108. 13

[21] K. Venkataraman, D. Lelescu, J. Duparr´e,A. McMahon, G. Molina, P. Chatterjee, R. Mullis, and S. Nayar, “PiCam: an ultra-thin high performance monolithic camera array,” ACM Transactions on Graphics (TOG), vol. 32, no. 6, p. 166, 2013. 13

[22] C. Bamji and E. Charbon, “CMOS-compatible three-dimensional image sensing us- ing reduced peak energy,” Jul. 1 2003, uS Patent 6,587,186. 13

[23] B. Freedman, A. Shpunt, M. Machline, and Y. Arieli, “Depth mapping using pro- jected patterns,” Jul. 23 2013, uS Patent 8,493,496. 13

[24] A. Payne, A. Daniel, A. Mehta, B. Thompson, C. S. Bamji, D. Snow, H. Oshima, L. Prather, M. Fenton, L. Kordus et al., “A 512× 424 CMOS 3D Time-of-Flight image sensor with multi-frequency photo-demodulation up to 130MHz and 2GS/s ADC,” in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International. IEEE, 2014, pp. 134–135. 13, 43

[25] F. Heide, M. B. Hullin, J. Gregson, and W. Heidrich, “Low-budget transient imaging using photonic devices,” ACM Transactions on Graphics (ToG), vol. 32, no. 4, p. 45, 2013. 14, 23 BIBLIOGRAPHY 106

[26] A. Kadambi, A. Bhandari, R. Whyte, A. Dorrington, and R. Raskar, “Demulti- plexing illumination via low cost sensing and nanosecond coding,” in 2014 IEEE International Conference on Computational Photography (ICCP). IEEE, 2014, pp. 1–10. 14, 23

[27] M. O’Toole, S. Achar, S. G. Narasimhan, and K. N. Kutulakos, “Homogeneous codes for energy-efficient illumination and imaging,” ACM Transactions on Graphics (ToG), vol. 34, no. 4, p. 35, 2015. 16, 47, 95

[28] C. Callenberg, F. Heide, G. Wetzstein, and M. Hullin, “Snapshot difference imaging using time-of-flight sensors,” arXiv preprint arXiv:1705.07108, 2017. 14, 23

[29] D. Liu, J. Gu, Y. Hitomi, M. Gupta, T. Mitsunaga, and S. K. Nayar, “Efficient space-time sampling with pixel-wise coded exposure for high-speed imaging,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp. 248–260, 2014. xi, 14, 16, 18, 47

[30] M. O’Toole, R. Raskar, and K. N. Kutulakos, “Primal-dual coding to probe light transport.” ACM Trans. Graph., vol. 31, no. 4, pp. 39–1, 2012. 14, 23, 47

[31] B. Tyrrell, K. Anderson, J. Baker, R. Berger, M. Brown, C. Colonero, J. Costa, B. Holford, M. Kelly, E. Ringdahl et al., “Time delay integration and in-pixel spa- tiotemporal filtering using a nanoscale digital CMOS focal plane readout,” IEEE Transactions on electron Devices, vol. 56, no. 11, pp. 2516–2523, 2009. 14

[32] F. Mochizuki, K. Kagawa, S. I. Okihara, M. W. Seo, B. Zhang, T. Takasawa, K. Ya- sutomi, and S. Kawahito, “Single-shot 200Mfps 5 × 3-aperture compressive CMOS imager,” Digest of Technical Papers - IEEE International Solid-State Circuits Con- ference, vol. 58, pp. 116–117, 2015. 14, 71, 96

[33] J. Zhang, T. Xiong, T. Tran, S. Chin, and R. Etienne-Cummings, “Compact all- CMOS spatiotemporal compressive sensing video camera with pixel-wise coded ex- BIBLIOGRAPHY 107

posure,” Optics express, vol. 24, no. 8, pp. 9013–9024, 2016. 14, 16, 23, 25, 27, 28, 56, 71, 79, 80

[34] H. Ke, N. Sarhangnejad, R. Gulve, Z. Xia, N. Gusev, N. Katic, K. N. Kutulakos, and R. Genov, “Extending Image Sensor Dynamic Range by Scene-aware Pixelwise- adaptive Coded Exposure,” in Proc. Int. Image Sensor Workshop, 2019. x, 15, 79, 98

[35] N. Sarhangnejad, N. Katic, Z. Xia, M. Wei, N. Gusev, G. Dutta, R. Gulve, H. Haim, M. M. Garcia, D. Stoppa, K. N. Kutulakos, and R. Genov, “Dual- Tap Pipelined-Code-Memory Coded-Exposure-Pixel CMOS Image Sensor for Multi- Exposure Single-Frame Computational Imaging,” 2019 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 102–104, Feb 2019. xii, xiii, xiv, 23, 58, 60, 62, 67, 70, 72, 74, 79, 81, 98

[36] M. O’Toole, F. Heide, L. Xiao, M. B. Hullin, W. Heidrich, and K. N. Kutulakos, “Temporal frequency probing for 5D transient analysis of global light transport,” ACM Transactions on Graphics (ToG), vol. 33, no. 4, p. 87, 2014. 23

[37] M. O’Toole, J. Mather, and K. N. Kutulakos, “3D shape and indirect appearance by structured light transport,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3246–3253. 23, 25

[38] M. Sheinin, Y. Y. Schechner, and K. N. Kutulakos, “Computational imaging on the electric grid,” in IEEE CVPR, vol. 2, 2017. 23

[39] Y. Luo, D. Ho, and S. Mirabbasi, “Exposure-Programmable CMOS Pixel With Selective Charge Storage and Code Memory for Computational Imaging,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 5, pp. 1555– 1566, 2018. 23, 25, 27, 29, 56, 71, 79 BIBLIOGRAPHY 108

[40] N. Sarhangnejad, H. Lee, N. Katic, M. OToole, K. Kutulakos, and R. Genov, “CMOS Image Sensor Architecture for Primal-Dual Coding,” in Proc. Int. Image Sensor Workshop, 2017. 23, 25, 27, 29, 56, 79, 98

[41] F. Acerbi, M. M. Garcia, G. K¨okl¨u,B. B¨uttgen, R. Gancarz, A. Biber, D. Furrer, and D. Stoppa, “Transfer-Gate Region Optimization and Pinned-Photodiode Shaping for High-Speed TOF Applications,” in International Image Sensor Workshop (IISW- 2017), 2017, pp. 145–148. 25, 30

[42] C. S. Bamji, S. Mehta, B. Thompson, T. Elkhatib, S. Wurster, O. Akkaya, A. Payne, J. Godbaz, M. Fenton, V. Rajasekaran et al., “IMpixel 65nm BSI 320MHz demod- ulated TOF Image sensor with 3µm global shutter pixels and analog binning,” in Solid-State Circuits Conference-(ISSCC), 2018 IEEE International. IEEE, 2018, pp. 94–96. 25, 42

[43] Y. Kato, T. Sano, Y. Moriyama, S. Maeda, T. Yamazaki, A. Nose, K. Shiina, Y. Yasu, W. van der Tempel, A. Ercan et al., “320 × 240 Back-Illuminated 10- µm CAPD Pixels for High-Speed Modulation Time-of-Flight CMOS Image Sensor,” IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 1071–1078, 2018. 25, 42

[44] K. Murari, R. Etienne-Cummings, N. Thakor, and G. Cauwenberghs, “Which photo- diode to use: a comparison of CMOS-compatible structures,” IEEE Sensors Journal, vol. 9, no. 7, pp. 752–760, 2009. 27

[45] Y. Luo and S. Mirabbasi, “A CMOS pixel design with binary space-time expo- sure encoding for computational imaging,” in Custom Integrated Circuits Conference (CICC), 2017 IEEE. IEEE, 2017, pp. 1–4. 27, 30

[46] P. V. R. H. Kalyanam, “Device modeling and advanced 2-D TCAD simulation of multifinger photogate APS for enhanced sensitvity,” Master’s thesis, Simon Fraser Univeristy, School of Engineering Science, 2011. 27 BIBLIOGRAPHY 109

[47] T. Y. Lee, Y. J. Lee, D. K. Min, S. H. Lee, W. H. Kim, J. K. Jung, I. Ovsian- nikov, Y. G. Jin, Y. Park, and E. R. Fossum, “A time-of-flight 3-D image sensor with concentric-photogates demodulation pixels,” IEEE Transactions on Electron Devices, vol. 61, no. 3, pp. 870–877, 2014. 29, 42

[48] C. S. Bamji, P. O’Connor, T. Elkhatib, S. Mehta, B. Thompson, L. A. Prather, D. Snow, O. C. Akkaya, A. Daniel, A. D. Payne, T. Perry, M. Fenton, and V. Chan, “A 0.13 µm CMOS System-on-Chip for a 512 × 424 Time-of-Flight Image Sensor With Multi-Frequency Photo-Demodulation up to 130 MHz and 2 GS/s ADC,” IEEE Journal of Solid-State Circuits, vol. 50, no. 1, pp. 303–319, 2015. 29, 42

[49] K. Murari, R. Etienne-Cummings, N. V. Thakor, and G. Cauwenberghs, “A CMOS in-pixel CTIA high-sensitivity fluorescence imager,” IEEE transactions on biomedi- cal circuits and systems, vol. 5, no. 5, pp. 449–458, 2011. 30, 44

[50] D. Stoppa, L. Viarani, A. Simoni, L. Gonzo, M. Malfatti, and G. Pedretti, “A 50×30- pixel CMOS Sensor for TOF-based Real Time 3D Imaging,” in IEEE Workshop CCD&AIS, 2005, pp. 230–233. 30

[51] B. Rodrigues, M. Guillon, N. Billon-Pierron, J.-B. Mancini, O. Saxod, B. Giffard, Y. Cazaux, P. Malinge, P. Waltz, A. Ngoua et al., “Indirect ToF Pixel integrating fast buried-channel transfer gates and gradual epitaxy, and enabling CDS,” Proc. IISW, pp. 266–269, 2017. 31, 42, 60

[52] E. R. Fossum and D. B. Hondongwa, “A Review of the Pinned Photodiode for CCD and CMOS Image Sensors,” IEEE Journal of the Electron Devices Society, vol. 2, no. 3, pp. 33–43, May 2014. 34

[53] M.-W. Seo, K. Kagawa, K. Yasutomi, T. Takasawa, Y. Kawata, N. Teranishi, Z. Li, I. A. Halin, and S. Kawahito, “11.2 A 10.8 ps-time-resolution 256× 512 image sensor with 2-Tap true-CDS lock-in pixels for fluorescence lifetime imaging,” in 2015 IEEE BIBLIOGRAPHY 110

International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE, 2015, pp. 1–3. 34, 72

[54] D. Stoppa, N. Massari, L. Pancheri, M. Malfatti, M. Perenzoni, and L. Gonzo, “A Range Image Sensor Based on 10-µm Lock-In Pixels in 0.18-µm CMOS Imaging Technology,” IEEE journal of solid-state circuits, vol. 46, no. 1, pp. 248–258, 2011. 42

[55] T. Geurts, B. Cremers, M. Innocent, W. Vroom, C. Esquenet, T. Cools, J. Com- piet, B. Okcan, G. Chapinal, C. Luypaert et al., “A 98 dB linear dynamic range, high speed CMOS image sensor,” Proceedings of the International Image Sensor Worshop, Hiroshima, Japan, vol. 30, 2017. 43

[56] M. Murata, R. Kuroda, Y. Fujihara, Y. Otsuka, H. Shibata, T. Shibaguchi, Y. Ka- mata, N. Miura, N. Kuriyama, and S. Sugawa, “A high near-infrared sensitivity over 70-dB SNR CMOS image sensor with lateral overflow integration trench capacitor,” IEEE Transactions on Electron Devices, vol. 67, no. 4, pp. 1653–1659, 2020. 43, 44

[57] G. Meynants, B. Wolfs, J. Bogaerts, P. Li, Z. Li, Y. Li, Y. Creten, K. Ruythooren, P. Francis, R. Lafaille et al., “A 47 MPixel 36.4 x 27.6 mm2 30 fps Global Shutter Image Sensor,” in Proc. Int. Image Sensor Workshop, 2017, pp. 410–413. 43

[58] M. Takase, S. Isono, Y. Tomekawa, T. Koyanagi, T. Tokuhara, M. Harada, and Y. Inoue, “An over 120 dB wide-dynamic-range 3.0 µm pixel image sensor with in- pixel capacitor of 41.7 fF/um2 and high reliability enabled by BEOL 3D capacitor process,” in 2018 IEEE Symposium on VLSI Technology, June 2018, pp. 71–72. 43

[59] I. Mizuno, M. Tsutsui, T. Yokoyama, T. Hirata, Y. Nishi, D. Veinger, A. Birman, and A. Lahav, “A high-performance 2.5 µm charge domain global shutter pixel and near infrared enhancement with light pipe technology,” Sensors, vol. 20, no. 1, p. 307, 2020. 43 BIBLIOGRAPHY 111

[60] M. OToole, D. B. Lindell, and G. Wetzstein, “Confocal non-line-of-sight imaging based on the light-cone transform,” Nature, vol. 555, no. 7696, pp. 338–341, 2018. 45

[61] K. N. Kutulakos and M. O’Toole, “Transport-aware imaging,” in Emerging Digital Micromirror Device Based Systems and Applications VII, vol. 9376. International Society for Optics and Photonics, 2015, p. 937606. 47

[62] M. O’Toole, R. Raskar, and K. N. Kutulakos, “Primal-dual coding to probe light transport,” ACM Transactions on Graphics, vol. 31, no. 4, pp. 1–11, 2012. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2185520.2185535 56, 88

[63] B. Buttgen, F. Lustenberger, and P. Seitz, “Demodulation pixel based on static drift fields,” IEEE Transactions on Electron Devices, vol. 53, no. 11, pp. 2741–2747, 2006. 60

[64] D. Stoppa, N. Massari, L. Pancheri, M. Malfatti, M. Perenzoni, and L. Gonzo, “An 80× 60 range image sensor based on 10µm 50MHz lock-in pixels in 0.18 µm CMOS,” in 2010 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2010, pp. 406–407. 60

[65] R. J. Woodham, “Photometric method for determining surface orientation from multiple images,” Optical engineering, vol. 19, no. 1, p. 191139, 1980. 73

[66] T. Yoda, H. Nagahara, R.-i. Taniguchi, K. Kagawa, K. Yasutomi, and S. Kawahito, “The dynamic photometric stereo method using a multi-tap CMOS image sensor,” Sensors, vol. 18, no. 3, p. 786, 2018. 74

[67] Y. Qu´eau,R. Mecca, J.-D. Durou, and X. Descombes, “Photometric stereo with only two images: A theoretical study and numerical resolution,” Image and Vision Computing, vol. 57, pp. 175–191, 2017. 74 BIBLIOGRAPHY 112

[68] N. Sarhangnejad, N. Katic, Z. Xia, M. Wei, N. Gusev, G. Dutta, R. Gulve, P. Z. X. Li, H. F. Ke, H. Haim, M. Moreno-Garca, D. Stoppa, K. N. Kutulakos, and R. Genov, “Dual-Tap Computational Photography Image Sensor With Per-Pixel Pipelined Dig- ital Memory for Intra-Frame Coded Multi-Exposure,” IEEE Journal of Solid-State Circuits, vol. 54, no. 11, pp. 3191–3202, 2019. 79, 80, 87, 98

[69] Y. Luo, J. Jiang, M. Cai, and S. Mirabbasi, “CMOS computational camera with a two-tap coded exposure image sensor for single-shot spatial-temporal compressive sensing,” Opt. Express, vol. 27, no. 22, pp. 31 475–31 489, Oct 2019. 79, 80, 81

[70] M. Sakakibara, K. Ogawa, S. Sakai, Y. Tochigi, K. Honda, H. Kikuchi, T. Wada, Y. Kamikubo, T. Miura, M. Nakamizo et al., “A 6.9-µm Pixel-Pitch Back- Illuminated Global Shutter CMOS Image Sensor With Pixel-Parallel 14-Bit Sub- threshold ADC,” IEEE Journal of Solid-State Circuits, no. 99, pp. 1–9, 2018. 83

[71] Y. Zhao, H. Guo, Z. Ma, X. Cao, T. Yue, and X. Hu, “Hyperspectral Imaging With Random Printed Mask,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 149–10 157. 88

[72] T. C. Millar, N. Sarhangnejad, N. Katic, K. Kutulakos, and R. Genov, “The Ef- fect of Pinned Photodiode Shape on Time-of-Flight Demodulation Contrast,” IEEE Transactions on Electron Devices, vol. 64, no. 5, pp. 2244–2250, 2017. 97