GPU Day 2023

Name: GPU Day 2023
Start: 2023-05-15T08:20:00+02:00
End: 2023-05-16T23:50:00+02:00
Location: Wigner Datacenter - Wigner Research Centre for Physics

15–16 May 2023

Wigner Datacenter - Wigner Research Centre for Physics

Europe/Budapest timezone

Support

Minimal Path Delay Leading Zero Counters on Xilinx FPGAs

16 May 2023, 09:25

25m

Wigner Datacenter - Wigner Research Centre for Physics

Wigner RCP 1121 Budapest, Konkoly-Thege Miklós rd 29-33, Hungary

Lecture Session V

Gregory Morse (Eötvös Loránd University)

We present an improved efficiency Leading Zero Counter for Xilinx FPGAs which improves the path delay while maintaining the resource usage, along with generalizing the scheme to variants whose inputs are of any size. We also show how the Ultrascale architecture also allows for better Intellectual Property solutions of certain forms of this circuit with its newly introduced logic elements. We also present a detailed framework that could be the basis for a methodology to measure results of small-scale circuit designs synthesized via high-level synthesis tools. Our result shows that very high frequencies are achievable with our design, especially at sizes where common applications like floating point addition would require them. For $16$, $32$ and $64$-bit, our real-world build results show a 6\%, 14\% and 19\% path delay improvement respectively, enough of an improvement for large scale designs to have the possibility to operate close to the maximum FPGA supported frequency.

Leading Zero Counters (LZC [1] are of importance for various bit-level tasks, most notably floating point addition and subtraction [2, 3] such as in the IEEE-754 standard. In fact, a traditional clever use of floating point units (FPUs) addition/subtraction unit has been using the normalization process post-subtraction with custom byte-packing [4].

More formally, we define the LZC-n for bit-vector $X_{n..1}$ as an ordered pair $(V, C)$ where
$ V = \bigwedge\limits_{K=n}^{1} \overline{X_k} = \overline{X_n} \wedge \overline{X_{n-1}} \wedge \dots \wedge \overline{X_1} $
is the all-zero signal and
$ z(i, j)=\left(\bigvee\limits_{k=n-2^{i+j}}^{n-2^{i+j+1}} X_k\right) \vee \left(\bigwedge\limits_{k=n-2^{i+j+1}}^{n-2^{i+j+2}} \overline{X_k} \wedge z(i, j+2) \right) $ with \
$ C=\parallel_{i=0}^{\lceil\log_2 n\rceil-1} \left( V \vee \left(\bigwedge\limits_{k=n}^{n-2^i} \overline{X_k} \wedge z(i, 0) \right) \right) $
represents the leading zero count as a bit-string (which is built via the concatenation operator $\parallel$) in Boolean algebra as an infinite recurring relationship (where $\vee$ and $\wedge$ are logical OR and logical AND respectively). In our notation, a bar above represents a logical negation and $\lceil x \rceil$ is the ceiling operation of rounding $x$ up to the nearest integer.

Although traditionally a focus on power is prevalent, we have chosen to focus on performance, then area and only minimize power if it does not effect performance or area. As higher area allows more concurrency and thus more performance, our justification for high-performance computing (HPC) is due to work in the area of Quantum simulation. But an investigation into the latest offerings for HPC in Ultrascale and Vivado is thus forthcoming.

Our contribution is thus a more general framework which uses careful and precise integration of a more modular framework, which yields a better result. Expert re-synthesis of integrated units of a modular design, can unsurprisingly yield a better state-of-the-art result. The exact ideas and optimizations used are important in a broader range of circuits.

We specifically offer an improvement over the method which Zahir, et al. introduced [5]. Their method used an LZC-8 consisting of 3 LUTs cascading into a LUT6-2 as a primitive for larger LZC units. By removing the cascaded LUT structure of their 8-bit LZC primitive which although necessary for a 7 or 8 bit LZC, turns out to be logic expandable and reducible into the 16-bit layer, the path delay can be significantly improved.

[1] Giorgos Dimitrakopoulos, Kostas Galanopoulos, Christos Mavrokefalidis, and Dimitris Nikolos. Low-power leading-zero counting and anticipation logic for high-speed floating point units. IEEE Trans. Very Large Scale Integr. Syst., 16(7):837–850, jul 2008.
[2] H. Suzuki, H. Morinaka, H. Makino, Y. Nakase, K. Mashiko, and T. Sumi. Leading-zero anticipatory logic for high-speed floating point addition. IEEE Journal of Solid-State Circuits, 31(8):1157–1164, 1996.
[3] Pallavi Srivastava, Edwin Chung, and Stepan Ozana. Asynchronous floating-point adders and communication protocols: A survey. Electronics, 9(10), 2020.
[4] Sean Eron Anderson. Bit twiddling hacks. URL: http://graphics. stanford. edu/ ̃ seander/bithacks. html, 2005.
[5] Ali Zahir, Anees Ullah, Pedro Reviriego, and Syed Riaz Ul Hassnain. Efficient leading zero count (lzc) implementations for xilinx fpgas. IEEE Embedded Systems Letters, 14(1):35–38, 2022.

Gregory Morse (Eötvös Loránd University) Peter Rakyta (Department of Physics of Complex Systems, Eötvös Loránd University) Dr Tamás Kozsik (Eötvös Loránd University)

MinimalPathDelayLZCPresentation.pdf

GPU Day 2023

Support

Minimal Path Delay Leading Zero Counters on Xilinx FPGAs

Wigner Datacenter - Wigner Research Centre for Physics

Speaker

Description

Primary authors

Presentation materials

Choose timezone

GPU Day 2023

Support

Speaker

Description

Primary authors

Presentation materials