Employing Fpga Dsp Blocks for Time-To-Digital Conversion
Total Page:16
File Type:pdf, Size:1020Kb
Metrol. Meas. Syst., Vol. 26 (2019) No. 4, pp. 631–643 DOI: 10.24425/mms.2019.130570 METROLOGY AND MEASUREMENT SYSTEMS Index 330930, ISSN 0860-8229 www.metrology.pg.gda.pl EMPLOYING FPGA DSP BLOCKS FOR TIME-TO-DIGITAL CONVERSION Paweł Kwiatkowski Military University of Technology, Faculty of Electronics, Gen. S. Kaliskiego 2, 00-908 Warsaw, Poland (B [email protected], +48 261 839 602) Abstract The paper presents a novel implementation of a time-to-digital converter (TDC) in field-programmable gate array (FPGA) devices. The design employs FPGA digital signal processing (DSP) blocks and gives more than two-fold improvement in mean resolution in comparison with the common conversion method (carry chain-based time coding line). Two TDCs are presented and tested depending on DSP configuration. The converters were implemented in a Kintex-7 FPGA device manufactured by Xilinx in 28 nm CMOS process. The tests performed show possibilities to obtain mean resolution of 4.2 ps but measurement precision is limited to at most 15 ps mainly due to high conversion nonlinearities. The presented solution saves FPGA programmable logic blocks and has an advantage of a wider operation range when compared with a carry chain-based time coding line. Keywords: time-to-digital converter, time coding line, time interval counter, digital signal processing, field- programmable gate array. © 2019 Polish Academy of Sciences. All rights reserved 1. Introduction Precise time-to-digital conversion is widely used in many different areas. Particle and nu- clear physics [1, 2], laser range finding [3], ultrasonic flow measurement [4], positron emission tomography [5] or time and frequency distribution [6] are just merely selected examples where it is applicable. Electronic circuits that perform this task are called time-to-digital converters (TDCs). The TDCs are widely implemented with the use of field-programmable gate array (FPGA) technology. The main advantages of this technology are lower cost and shorter time-to- market comparing with the use of application-specific integrated circuits (ASICs). In the past the FPGA was defined as a device that is built from three main types of elements, i:e. (1) I/O blocks, (2) a matrix of programmable logic blocks and (3) routing resources placed between them. In gen- eral, it is still true but nowadays there can be found in it lots of additional dedicated functional blocks, such as phase or delay locked loops (PLL or DLL, respectively), serializers/deserializers, digital signal processing (DSP) units, hard processors, memory blocks, etc. Some of them greatly facilitate the implementation of TDCs, e:g. PLL/DLL can easily generate a multiphase clock [7], Copyright © 2019. The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution- NonCommercial-NoDerivatives License (CC BY-NC-ND 3.0 https://creativecommons.org/licenses/by-nc-nd/3.0/), which permits use, dis- tribution, and reproduction in any medium, provided that the article is properly cited, the use is non-commercial, and no modifications or adaptations are made. P. Kwiatkowski: EMPLOYING FPGA DSP BLOCKS FOR TIME-TO-DIGITAL CONVERSION a block memory is often used to store calibration data [8] and a hard processor can process raw measurement data as well as provides common communication interfaces [9]. In the literature a vast amount of works related to FPGA TDC implementation can be found in recent years [5, 8–25]. The vital element of most high-performance TDCs are picosecond- resolution tapped delay lines [26]. A time coding line (TCL) directly converts a single delay buffer propagation time into measurement resolution. This design is commonly implemented in FPGAs with the use of carry chain elements that are often the core part of a programmable logic block [27]. Several methods were proposed to deal with carry chain-based TCL conversion nonlinearities as well as to improve resolution beyond buffer propagation time, e:g. a tuned delay line [11, 22], multiple TCLs [10, 13] or a wave union [12, 24, 28]. A Vernier method employs two delay lines with slightly different propagation times and can be implemented using, i:a., FPGA routing resources [29, 30]. A variant of the method is the one using two oscillators of different frequencies [15]. The difference between low-to-high and high-to-low delay element propagation times is used in the pulse shrinking method [21, 31]. The mentioned methods are vulnerable to process, voltage and temperature (PVT) variations, but can provide outstandingly low mea- surement uncertainty. When PVT resistance or logic resource saving is expected more than the picosecond resolution and measurement uncertainty, the PLL can be used to create a multiphase clock. The multiphase clock is then used in a user-defined circuit implemented in programmable logic to perform time-to-digital conversion [7, 20, 32]. Similar functionality is provided also by another dedicated block, the serializer/deserializer, that is commonly implemented as a part of FPGA I/O buffers [33–36]. Regardless a vast amount of FPGA-based TDC circuits the use of DSP blocks in this context is a new concept. The DSP blocks contain, i:a., fast adders whose output can be directly connected to registers. Thus, this structure seems to be suitable to implement a high-resolution TDC. First works regarding this topic have been published very recently [37, 38] and this paper presents the results obtained in independently and simultaneously performed research. 2. DSP slice configuration A typical DSP operation is called multiply-and-accumulate (MAC). Such an operation com- putes the product of two numbers and adds that product to the accumulator. In hardware it is implemented as a multiplier followed by an adder and a register. In an FPGA device the MAC operation can be performed using programmable logic blocks. However, this solution either con- sumes a lot of logic resources or provides low processing speed. The response of the main FPGA device manufacturers (mainly Xilinx and Intel, formerly Altera) to fast signal processing was to place dedicated DSP blocks in a chip. Modern FPGA devices can contain even thousands of DSP blocks that can be either connected in series to perform operations on wide data buses or work in parallel to speed up the processing. While the detailed DSP block structure may differ between manufacturers and chip generations, the core elements such as the multiplier, adder and register are rather common. The typical digital way of TDC implementation employs a tapped delay line and a register and is known as the time coding line (TCL). One signal (start) propagates along the delay line and the other (stop) causes latching of the register flip-flops states. The first signal reaches subsequent register flip-flop data inputs with the delay equal to a single delay line buffer propagation time. Thus, the number of flip-flop outputs that store logic states “1” is proportional to the time difference between active edges of start and stop signals. The result is then represented in the unary code (thermometer code). However, in modern TCL solutions, especially implemented in 632 Metrol. Meas. Syst.,Vol. 26 (2019), No. 4, pp. 631–643 DOI: 10.24425/mms.2019.130570 FPGA devices, it is hard to achieve strictly this code [11, 13]. Looking carefully at different tapped delay line architectures used in TDCs [25] one may conclude that it is important to obtain a possibly high number of output codes whose occurrence is evenly spread in the converter measurement range. Such reasoning has led to the creation of, e:g., the BOUNCE architecture [39]. Therefore, a delay line can be implemented as any kind of one-input multiple-output combinational circuit that directly transmits an input signal to outputs but with different propagation times for each output tap. Following this reasoning, a binary adder with one data input set to the maximum value (logic states “1. 1”) and another one connected to the start signal (one bit that changes from logic state “0” to “1”) is also applicable as a delay line. Thus, the fast DSP adders that can be directly connected to the registers look perfect for the TCL implementation. In Kintex-7 FPGA, manufactured by Xilinx, the basic DSP element is called the DSP48E1 slice [40]. Its simplified structure along with configuration as a TDC is presented in Fig. 1. Each DSP slice has four data inputs (A, B, C and D) and one data output (P), a 25-bit pre-adder, a 25×18 multiplier and a 48-bit arithmetic logic unit (ALU). To speed up DSP operation pipeline registers are placed after each combinational circuit as well as at the data inputs. Several DSP slices can be cascaded using dedicated connection paths. While there are two data-paths and three DSP slice outputs that provide the cascade capability, only one path (ACIN-ACOUT) enables to connect several DSP slice adders or ALUs and bypass pipeline registers (the other connections are not shown in Fig. 1. for simplicity). Thus, this path is suitable to implement a long-range delay line for the start pulse. The DSP slice configuration is supported by software tools, such as Xilinx LogicCORE DSP48 Macro, making TDC implementation convenient. a) b) Fig. 1. A simplified DSP48E1 slice [40] configuration as a TDC with the use of ALU (a) or a pre-adder (b). In this paper two DSP slice configurations are considered as TDC. In both configurations the start signal is connected to the first bit of data port A and then propagates along the ACIN-ACOUT path to neighbouring DSP slices. In the first configuration (Fig. 1a) the ALU operates as a 48-bit adder and all nets of data port C are set to logic high (“1”).