# Stefan Hadjis<sup>1</sup>, Andrew Canis<sup>1</sup>, Jason Anderson<sup>1</sup>, Jongsok Choi<sup>1</sup>, Kevin Nam<sup>1</sup>, Stephen Brown<sup>1</sup>, and Tomasz Czajkowski<sup>‡</sup>

<sup>1</sup>ECE Department, University of Toronto, Toronto, ON, Canada, <sup>‡</sup> Altera Toronto Technology Centre, Toronto, ON, Canada

# Introduction

- **Resource sharing** is an area optimization in high-level synthesis (HLS) in which a single hardware functional unit is shared by multiple operations
- How should resource sharing be adapted for different logic element architectures?
- LegUp open-source HLS tool, targets Cyclone II (4-LUTs) vs. Stratix IV (ALUTs)

## www.legup.org

# **Example: 4-bit Adder**

A sample C program performs two additions. Two hardware implementations exist:



How does Quartus map these to LUTs on Cyclone II and Stratix IV?







# Impact of FPGA Architecture on Resource Sharing in High-Level Synthesis

# **Single Operator Sharing Results**

Which operators reduce area when shared?

## Cyclone II

Dividers Modulus Multipliers **Barrel Shift** 

# Pattern Sharing Algorithm

- Main area reduction comes from sharing patterns of smaller operators
- Patterns are **Data Flow Graphs** with a single root "output" node

## **Discover Computational Patterns** in the Software Program

Accomplished by walking LegUp's DFG



## (2) Group together equivalent patterns

 Patterns are sorted by isomorphic equivalence to consider commutativity

## (3) Pairing Patterns for Sharing

- Pair patterns to be implemented using the same hardware. Consider:
- **1.** Operator bitwidths
- **2.** Variable lifetimes
- **3.** Shared input variables



Shared Input Variables (Reduces Area)

|      | Stratix IV             |
|------|------------------------|
|      | Dividers               |
|      | Modulus                |
| S    | Multipliers            |
| ters | Barrel Shifters        |
|      | Add/Subtract           |
|      | Bitwise (OR, XOR, AND) |







## Two equivalent graphs due to commutativity



Independent Variable Lifetimes

# **Pattern Sharing Results**

- LUT underutilization





## Summary



• While sharing larger patterns produces greater area reduction, the major factor is

• Allows MUXes to be incorporated into the same LUTs as the operator • **Registers** prevent an efficient mapping of operators into LUTs

• Geomean reduction in Fmax due to Pattern Sharing is 4% across all benchmarks

• Below are area results for 13 benchmarks using three increasing levels of sharing

### - 3% reduction (geomean across all benchmarks) from sharing div/mod - 4.2% additional reduction due to pattern sharing - 16% reduction using LUT-based multipliers

• Logic element architecture significantly impacts resource sharing >10% area reduction in some circuits • Future work: altering scheduling phase of HLS to favor the creation of patterns to provide further sharing opportunities