Full Record

Author | Campbell, Keith A |

Title | Low-cost error detection through high-level synthesis |

URL | http://hdl.handle.net/2142/89068 |

Publication Date | 2015 |

Date Accessioned | 2016-03-02 19:34:35 |

Degree | MS |

Discipline/Department | Electrical & Computer Engineering |

Degree Level | thesis |

University/Publisher | University of Illinois – Urbana-Champaign |

Abstract | System-on-chip design is becoming increasingly complex as technology scaling enables more and more functionality on a chip. This scaling and complexity has resulted in a variety of reliability and validation challenges including logic bugs, hot spots, wear-out, and soft errors. To make matters worse, as we reach the limits of Dennard scaling, efforts to improve system performance and energy efficiency have resulted in the integration of a wide variety of complex hardware accelerators in SoCs. Thus the challenge is to design complex, custom hardware that is efficient, but also correct and reliable. High-level synthesis shows promise to address the problem of complex hardware design by providing a bridge from the high-productivity software domain to the hardware design process. Much research has been done on high-level synthesis efficiency optimizations. This thesis shows that high-level synthesis also has the power to address validation and reliability challenges through two solutions. One solution for circuit reliability is modulo-3 shadow datapaths: performing lightweight shadow computations in modulo-3 space for each main computation. We leverage the binding and scheduling flexibility of high-level synthesis to detect control errors through diverse binding and minimize area cost through intelligent checkpoint scheduling and modulo-3 reducer sharing. We introduce logic and dataflow optimizations to further reduce cost. We evaluated our technique with 12 high-level synthesis benchmarks from the arithmetic-oriented PolyBench benchmark suite using FPGA emulated netlist-level error injection. We observe coverages of 99.1% for stuck-at faults, 99.5% for soft errors, and 99.6% for timing errors with a 25.7% area cost and negligible performance impact. Leveraging a mean error detection latency of 12.75 cycles (4150x faster than end result check) for soft errors, we also explore a rollback recovery method with an additional area cost of 28.0%, observing a 175x increase in reliability against soft errors. Another solution for rapid post-silicon validation of accelerator designs is Hybrid Quick Error Detection (H-QED): inserting signature generation logic in a hardware design to create a heavily compressed signature stream that captures the internal behavior of the design at a fine temporal and spatial granularity for comparison with a reference set of signatures generated by high-level simulation to detect bugs. Using H-QED, we demonstrate an improvement in error detection latency (time elapsed from when a bug is activated to when it manifests as an observable failure) of two orders of magnitude and a threefold improvement in bug coverage compared to traditional post-silicon validation techniques. H-QED also uncovered previously unknown bugs in the CHStone benchmark suite, which is widely used by the HLS community. H-QED incurs less than 10% area overhead for the accelerator it validates with negligible performance impact, and we also introduce techniques to minimize any possible intrusiveness introduced by H-QED. |

Subjects/Keywords | High-level synthesis; Automation; error detection; scheduling; binding; compiler transformation; compiler optimization; pipelining; modulo arithmetic; logic optimization; state machine; datapath, control logic; shadow logic; low cost; high performance; electrical bugs; Aliasing; stuck-at faults; soft errors; timing errors; checkpointing; rollback; recovery; post-silicon validation; Accelerators; system on a chip; signature generation; execution signatures; execution hashing; logic bugs; nondeterministic bugs; masked errors; circuit reliability; hot spots; wear out; silent data corruption; observability; detection latency; mixed datapath; diversity; checkpoint corruption; error injection; error removal; Quick Error Detection (QED); Hybrid Quick Error Detection (H-QED); hybrid hardware/software; execution tracing; address conversion; undefined behavior; High-Level Synthesis (HLS) engine bugs; detection coverage |

Contributors | Chen, Deming (advisor) |

Language | en |

Rights | Copyright 2015 Keith A. Campbell |

Country of Publication | us |

Record ID | handle:2142/89068 |

Repository | uiuc |

Date Indexed | 2020-03-09 |

Grantor | University of Illinois at Urbana-Champaign |

Issued Date | 2015-12-08 00:00:00 |

Sample Search Hits | Sample Images | Cited Works

…source code in this paper) as well as bugs in the implementation caused by
the HLS tool.
18
CHAPTER 3
ERROR DETECTION THROUGH
MODULO-3 *SHADOW* DATAPATHS
In this chapter, I propose creating a redundant, but smaller “*shadow*” *datapath* based on modulo…

…arithmetic to detect reliability problems in an HLS
design’s main *datapath*. I automate the creation of this “*shadow*” *datapath*
through a series of modulo-3 *shadow* *datapath* HLS transformations. Our
main innovations are:
1. Intelligent scheduling of intermediate…

…Section 3.2 discusses our experimental setup and results.
19
3.1 Method
Our approach to protecting a hardware design is a series of modulo-3 *shadow*
*datapath* HLS transformations. An overview of how these transformations
fit into the HLS process is…

…Scheduler (LegUp)
Scheduled CDFG
Modulo-3 Transform
=
Scheduled CDFG
+
*Shadow* *Datapath*
Optimization Passes
error
+
+
++
+
*Shadow*
Functional
Units
%3
Scheduled CDFG
Register
Checkers
Binder (in-house)
=
Verilog RTL
error…

…mod-3 transform. The original *datapath* is colored black/white and
the *shadow* *datapath* is in blue.
Figure 3.1b provides an overview of our basic modulo-3 *shadow* *datapath*
transformation. For each input port, we add a mod-3 reducer to compute
the input…

…value mod-3 residue, e↵ectively creating a *shadow* mod-3 input.
For each arithmetic functional unit (e.g. add, subtract, multiply), we add
20
a corresponding *shadow* mod-3 functional unit. For each *datapath* flip-flop,
we add a corresponding 2…

…transformations, as illustrated in Figure 3.1 on page 20, consist
of a core mod-3 transform that generates the *shadow* *datapath* as well as
some dataflow-level optimization passes on the generated mod-3 logic. Our
transformations operate on a scheduled control/data…

…mixed arithmetic-nonarithmetic datapaths, the
scheduling of intermediate register consistency checks for maximum coverage
with optimized sharing, pipelining for deferred *shadow* *datapath* scheduling
to eliminate clock period overhead and lower area cost…