Foundations of Metastability-Containing Computation
This talk focuses on theoretical and practical aspects of the design and analysis of reliable and metastability-containing (MC) circuits for multiple-chip systems with multiple-clock domains. In order to allow an upset-free communication between the chips in the system, we maintain synchrony between them *without* a global (expensive, power consuming) machinery, e.g., a clock tree. Furthermore, to minimize deadtime of control loops regulating the clock speeds, we do not make use of synchronizers.
In order to maintain such synchrony, we apply techniques from the theory of distributed systems (TDS). These kinds of algorithm usually involve sampling of other chips’ local (analog) oscillators, making a discrete decision, and performing a local analog correction of the local oscillator. Unfortunately, this kind of algorithm cannot be implemented in hardware without causing metastable upsets that occur due to the inability to make meaningful discrete decisions based on analog inputs. These metastable bits are neither 0 nor 1,thus breaking the Boolean abstraction. In turn, these metastable bits might infect the computations that are made resulting with deteriorated accuracy when applying analog corrections. Hence, we require that the metastability is contained (MC) and does not infect the entire circuit. This demand gives rise to a new paradigm of designing such circuits with the common narrative of preserving accuracy when going back to the analog world.
In the first part of the talk, I will outline a rather generic technique that given a Boolean function specification, outputs an MC (accuracy preserving) circuit. In the second part of the talk, I will show a digital MC controllers for a FIFO link controller and use (an extension of) it to implement a clock synchronization algorithm. I will compare this clock synchronization algorithm implementation to a clock tree distribution network and show that we outperform it in practice and in theory. That clock synchronization algorithm guarantees a local skew of less than a single clock cycle, thus creating the illusion that the entire chip is a single clock domain - meeting our initial goal of synchrony. All the designs in this talk have mathematically provable guarantees and matching simulated performance.