My beautiful design is failing to meet timing, what can I do to make it happy?

First of all, welcome to the real world! Sometimes FPGA Engineers think their job is simply to make timing happy and while there is an element of truth to this the good news is that it's not a case of mission impossible or trial and error (Okay, there might be a little bit of this, but I promise there are techniques to achieve higher probability of a design passing timing where the improvement in timing isn't achieved by hitting the implement design button again with fingers and toes crossed).

So rather than jumping straight into the solutions, let's first understand the all important why to appreciate how timing can fail.

Why does timing fail?

To answer this question we need to go back to the fundamentals, what elements exist within a typical FPGA design? If we consider the most simple sequential design then we will have a clock, LUT's and FF's, in other words we have our classical Register Transfer Level, RTL diagram, as below:

A simple design featuring a 2-input LUT and a FF

Shown above is a simple digital circuit consisting of 2 FF's driving a 2-input LUT which then drives a final FF. How could we fail to meet timing in the above circuit? The answer lies in the limitations of a practical FF vs an ideal FF.

If we were to open up the D input of a FF we would see logic gates connected in a specific way to achieve the D flip flop behaviour where the output signal, Q, only changes states on a defined edge transition, a rising edge for example. Shown below is a VERY simplified representation, in reality there are many many more components such as distributed inductance and additional transistors in the D FF but these can safely be ignored for understanding this concept. What we have is a 2-input NAND logic gate consisting of R1, C1, Q1, R2, C2, Q2, R3 and C3 which then drives the D flip flop which we have modelled as a single transistor Q3, input capacitor C4, output resistance R5 and output capacitance C5 (Please note that this is not accurate at all, only helpful for visualising the propagation flow, a real D flip flop has many more transistors, we can look at this in another post).

To develop intuition let's assume all capacitors are discharged and the input is going to drive the output from a '0' to a '1' on the next clock edge. To achieve this signals propagate through input A and input B charging capacitors C1 and C2, once these are charged to a voltage to the Vbe threshold of the transistors we have a signal flowing down through Q1 and Q2 which then flows through R3, C3, R4 and C4 which begins to charge C4 until Q3 is turned out which then causes a signal to flow through R5 and C5 finally charging C5 until we measure a '1' at the output. That's a lot of charging! It is hopefully now clear where each of these delays are coming from :)

The key part to understand here is that none of this is instantaneous, the electromagnetic wave itself while very fast must still propagate through all of the interconnects and transistors charging each of the capacitances throughout to turn on the transistors and achieve the desired logic level at the output.

In addition to the above we have our old friend clock jitter where the clock does not arrive at the same time each period due to non-linearities and noise sources. And so the reality is we have all of these propagation delays as well as clock jitter to deal with amongst a few other things such as the environment this circuit is operating in and the result is that we need to allow time for the signal to propagate through each of the interconnects and gates. What would happen if we don't allow enough time for the signal to propagate and what is considered to be enough? These are all key questions to understanding why timing fails and if we consider an edge case it becomes more apparent.

If we have a lot of propagation delays within the 2-input LUT we have shown above to the point that it takes longer than a clock period for the signal on the D input of the FF on the RHS to settle to the desired state then what would the FF see? It would see a voltage which has not settled to the target state, it is somewhere between a logical '0' and logical '1', we've now arrived in the dreaded state of metastability. So what would the FF do? Well the answer is we can't be certain, it depends on many factors such as the implementation detail, the environment, the interconnect etc and so it may "see" a logical '0' or a logical '1' and if it happens to see the previous state then it could be that this circuit is then going to drive the next stage incorrectly and ultimately deliver an undesired output.

So to answer the question of why does timing fail succinctly we did not leave enough time for the signal on the input of a FF to settle. We have two distinct time periods to consider, the time before the rising edge and the time after, known as setup time and hold time respectively. To pass timing with confidence we need to ensure we hold the input for at least the setup and hold time to prevent metastability.

Setup and hold time for a flip flop

What can we do to prevent timing failures?

Armed with our knowledge from the above the key lies in using techniques which ensure we hold the input for at least the setup and hold time. A few solutions which come to mind are:

1. Reduce the amount of logic between the flip flops. (By doing this we have less propagation delays)

2. Add flip flops between the logic, commonly known as pipelining. (By doing this each flip flop has smaller propagation delays to handle, but note that this comes at the cost of adding more flip flops and adding latency to the output)

3. Change the timing optimisation strategy. (A different algorithm may achieve timing, but this should only be considered as a last resort).

Armed now with this knowledge I empower you to go forward and make the timing gods happy!

We will take a much deeper dive into each of the above techniques as part of the FPGA and Digital Signal Processing course.