T1 - High-level synthesis for low-power design

N2 - General purpose DSPs typically used to implement speech coders in digital cellular phones do not allow enough exploitation of the speech coding algorithm itself for power reduction. In this paper, high-level design synthesis of a low power, VLIW (very long instruction word) processor dedicated to implementing the IS-54 VSELP speech encoding algorithm is presented. Significant power reduction is achieved through algorithm dependent techniques, including application specific hardware design, supply voltage reduction through highly parallel execution, and exploitation of data correlation inherent to the algorithm. Preliminary estimates indicate that the design could result in a 5.35 mm2 processor that executes in real-time with an average power dissipation of about 28 mW.

The high-level synthesis flow (Source: Forte Design Systems)

T1 - High-level design synthesis of a low power, VLIW processor for the IS-54 VSELP speech encoder

High level synthesis of VLSI systems for low power.

David Pursley is Director of Product Marketing for . He previously held various positions as a field applications engineer, technical marketing engineer, marketing manager, and product line manager in the fields of electronic design automation and embedded computer technology. David would welcome reader’s comments on this and other articles in the high-level synthesis for low power series and on HLS generally. You can contact him at dpursley AT ForteDS DOT com.

Low-Power High-Level Synthesis for Nanoscale CMOS Circuits.

Continuing our series on high-level synthesis (HLS) for low power design. Part Two details how HLS helps you make and evaluate architectural decisions.

Interconnect-aware Low Power High-level Synthesis

Engineers know that the decisions that have most impact on power are those taken early in the design process at the architectural level. They also likely know that high-level synthesis (HLS) allows them to quickly generate hardware from an architectural description. HLS allows engineers to rapidly assess the impact of architectural and algorithmic changes in terms of power, performance, and area at that early stage. However, this is the tip of the iceberg when it comes to exploiting HLS’ qualities for low power design. HLS tools can optimize a design for power in ways that are at best difficult when writing register transfer level (RTL) code by hand. HLS incorporates knowledge of both a design’s specific algorithms and their hardware implementations. As a result, it helps a design team reach many of the best power conclusions when creating the RTL code. Some will enhance the effectiveness of strategies typically used in a low power flow, such as clock gating and multiple threshold voltages. Others, such as those for power-aware sharing and mapping, will further reduce power consumption by changing the microarchitecture altogether. Combined, these optimizations can have a major impact across a broad range of designs. Figure 1 shows some examples. These are all for real commercial designs. They are design- and test vector-dependent, but nevertheless show how HLS delivers very strong results. The following section describes the core HLS flow and how it can be leveraged for low power design. Future articles will dive into more detail on leveraging HLS when creating and optimizing a design for low power. Specifically, they will address the following aspects of low power design with HLS.

Low-power high-level synthesis using latches

It is important for designers to understand the behavior of the hardware when loop merging does and does not take place so there are no unexpected results.Example 4-22 shows the case where there are two sequential loops that are used to separately accumulate two four-element arrays.

Example 4-22. Independent Sequential Loops

High-level synthesis can automatically merge these loops because there are no dependencies between the loops and the indexing of the arrays is based solely on the loop iterators. With loops left rolled and automatically merged, and the main loop pipelined with II=1, the resulting schedule looks like that shown in Figure 4-35.

Figure 4-35. Schedule of Merged Sequential Loops

The schedule shown above indicates that the loop iterations in each of the ACCUM loops can be run at the same time, resulting in a design that has two accumulators and runs in four clock cycles (Figure 4-36). If this kind of performance and increase in area is not required, automatic loop merging can be disabled during synthesis, allowing the loops to execute sequentially. This is discussed in the next section.

Figure 4-36. Hardware of Merged Sequential Loops

In some instances sequential loops are not automatically merged. This can occur either intentionally because the design does not require the extra performance, usually at the cost of higher area, or because there are dependencies between the loops that break loop merging optimizations. Other operations such as conditional index assignment for reading or writing an array can also prevent loop merging optimizations. In either of these cases it results in designs that have both longer latency and throughput.

Low-power high-level synthesis using latches - IEEE …

We have taken an overview of the high-level synthesis flow and described specifically what it offers during power optimization. and provides a detailed example of how to use HLS to optimize and quantify architectural decisions –– decisions that will have the most impact. Future installments will outline how HLS can increase the value engineers get from their existing low power tools, and discuss optimizations that could only be implemented via HLS.

Low-Power High-Level Synthesis for FPGA Architectures

From the highest level, high-level synthesis is used to synthesize an algorithmic description into an RTL implementation. In a real production flow, engineers don’t want to create just “an” RTL implementation. Instead, they need an implementation optimized for their architecture and design constraints. Figure 2 shows an abstracted version of the HLS flow and how it takes an untimed architectural description of the system and design constraints to create an optimized, verified RTL implementation. The graphic specifically describes the flow in terms of from Forte Design Systems, but is essentially generic. Note that the HLS flow sits on top of the existing RTL flow. It does not replace the RTL flow. It does mean, however, that an engineer spends more time designing and less time writing blocks, sensitivity lists, and finite state machines. It also means that the days of debugging handshakes are over. That though is a discussion for another day. Having HLS on the front-end of the RTL flow entails a few things, but three in particular.