# A Better-Than-Worst-Case Robustness Measure Stefan Frehse Görschwin Fey Rolf Drechsler Institute of Computer Science University of Bremen 28359 Bremen, Germany {sfrehse,fey,drechsle}@informatik.uni-bremen.de Abstract—In presence of increasing soft error rates due to shrinking feature sizes, design tools are required to analyze fault tolerance and robustness of circuits. Here, we propose a new measure that identifies hot-spots in the design. On the one hand the measure is more accurate than a "worst-case analysis" that ignores excitation probabilities. On the other hand the computation of the new measure is more efficient than a "probabilistic analysis" that considers excitation probabilities at the cost of a higher computational complexity. Both of these extremes can be embedded in the new measure. Experimental results on circuits with protection against soft errors show that the new measure can be calculated effectively. #### I. Introduction Continuously shrinking feature sizes of digital circuits allow for the integration of more and more components in a single chip. Shrinking feature sizes have several positive side effects, e.g., low-power circuitry operating at high frequency. But there are substantial drawbacks as well. Manufacturing failures and transient faults may increasingly tamper the functionality, consequently the *Soft Error Rate* (SER) increases. Precautions against soft errors are taken at different levels, e.g., architectural level, algorithmic level, or layout level [1]. But the implementation of these techniques has to be verified. During implementation bugs may be introduced. Thus, checking the fault tolerance of a given implementation early in the design flow becomes an important step in the design process. Several formal and non-formal verification methods have been applied for this purpose. Non-formal methods, like simulation or emulation [2] perform fault injection to check a limited number of simulation traces whether those lead to faulty behavior. Due to the simulation-based nature, this is a fast but not a complete method with respect to all potential scenarios or all faults. Formal methods can cover all inputs and any fault covered by the given fault model. Various methods have been proposed based on symbolic methods using *Binary Decision Diagrams* (BDDs) [3]–[6] or *Boolean Satisfiability* (SAT) [7]–[9]. The approaches of [6], [8], [9] are most tightly related to our approach. Based on the *Stuck-At Fault Model* (SAFM) the method in [6] computes the probability for transient faults to be propagated to primary outputs of combinational circuits. The given theoretical extension to sequential circuits is too complex to be applied in practice as the analysis relies on BDDs. Internally, all test patterns are calculated for each fault to be considered. By this, the probability of applying a testpattern for a certain This work has been funded in part by DFG grants DR 287/19-1 and FE 797/5-1. fault can be calculated. In the following we refer to [6] as a *probabilistic approach*. In [10] the authors proposed an alternative propabilistic approach to approximate the signal probabilities for combinational circuits. The work in [9] analyzes the *self-checking fault secureness* [11] for combinational circuits and provides a robustness measure with a lower and an upper bound for the SAFM. This method analyzes a circuit model by using ATPG-algorithms. For the analysis assumptions on the environment and the checker functionality of the circuit are required. In [8] a model similar to *Bounded Model Checking* (BMC) has been proposed for the analysis of soft errors. As well as [9] this method provides a lower and an upper bound for the robustness. Both methods can be seen as a worst-case analysis: a component is classified non-robust if there exists at least a single testpattern, such that a fault of this component may change the output behavior. We class this classification *worst-case analysis*, i.e. the probability for excitation and propagation is ignored by the robustness measure. In contrast the work in [6] considers all test patterns. But this probabilistic approach relies on BDDs for the analysis restricting the application to very small circuits. In this work we propose a robustness measure that constitutes a trade-off between the worst-case analysis and the consideration of all test patterns. Thus, a limited number of test patterns that show the faulty computation is considered. Therewith a more detailed view of the robustness is given, which can be computed in a feasible run time. We use the model of [8] which considers soft-errors in sequential circuits. Technically, a sequential ATPG-engine is used to consider a bounded time window for the calculation. When calculating only a single testpattern for each soft error, worst-case analysis is performed. By calculating all test patterns the probabilistic analysis is performed at high computational costs. Our approach finds up to a predefined number of test patterns. As a result our approach identifies hot-spots in the circuit, that are easily sensitized to propagate soft errors. This knowledge guides the designer to increase the robustness of the circuit. The previous measures, worst-case analysis and usage of all input stimuli, can be embedded into the new measure. The remainder of this paper is structured as follows: Section II reviews the preliminaries. The underlying circuit model as well as the approach of robustness computation based on [8] are described. Section III introduces the new measure in detail. An algorithm is presented in Section IV that computes the test patterns. Experimental results are reported in Section V. Finally, conclusions are stated in Section VI. Fig. 1. Sequential model with fault injection logic ### II. PRELIMINARIES # A. Boolean Satisfiability The Boolean Satisfiability (SAT) problem is a well-known $\mathcal{NP}$ -complete decision problem [12] that asks whether a Boolean formula $f: \mathbb{B}^n \to \mathbb{B}$ is satisfiable or not (unsatisfiable). If a formula f is satisfiable, a satisfying value assignment of the variables can be found. Typically, the problem is given in Conjunctive Normal Form (CNF), whereas every Boolean formula can be converted into a CNF. Nowadays, large formulas related to practical problems with millions of clauses and variables can usually be solved in feasible time by state-of-the-art SAT-solvers [13], [14]. In this work the satisfying value assignments are of interest. These assignments can be computed by performing subsequent calls to the SAT-solver and adding *blocking clauses*. A blocking clause forbids a (previously computed) solution. #### B. Circuit Model We consider sequential and combinational circuits C with $Primary\ Inputs\ PI(C)$ , $Primary\ Outputs\ PO(C)$ and $State\ Elements\ S(C)$ . For combinational circuits $S(C)=\emptyset$ holds. Furthermore a circuit consists of various components. For example, such components can be primitive gates $(AND,\ OR,\ etc.)$ , modules $(Adder,\ Multiplier,\ etc.)$ or statements of a hardware description language $(if\ (\ldots)\ then\ \ldots\ endif;\ etc.)$ . The number of components of a circuit C is denoted by |C|. A circuit C can be converted into CNF in time and space linear to |C| [15]. # C. Classification of Components As a basis for the robustness computation we use the following classification of each component $g \in C$ as presented in [8]. A soft error in component g is either a bit-flip from 0 to 1 or from 1 to 0 on one or on multiple output signals of $g^1$ . A component $g \in C$ belongs to one of three disjoint classes [8]: - 1) non-robust A soft error of component g leads to an abnormal output behavior under at least one input trace $\tau^t$ for $\operatorname{PI}(C)$ at a certain time frame t, i.e., the primary outputs differ from the fault-free computation. Additionally, if the circuit is equipped with a fault detection signal flt, this signal does not report a fault, i.e., flt = 0. The input trace $\tau^t$ is also called a *testpattern* for the soft error. - non-classified Similar to the non-robust classification, with the exception that the states differ from the faultfree states but the primary outputs are equal. This case shows Silent Data Corruption (SDC). - 3) robust That means, a soft error on component g is reported by the fault detection signal (flt = 1) or is corrected by the internal logic. Consequently, the primary outputs are correct with respect to the fault-free computation under every possible input. All components of the circuit are partitioned into the set $\mathbb T$ of robust components, the set $\mathbb S$ of non-robust components and the set $\mathbb U$ of non-classified components, i.e., $C=\mathbb S\cup\mathbb T\cup\mathbb U$ . For combinational circuits the set $\mathbb U$ is empty, because without state elements the second case above is not applicable. # D. Sequential Modeling In [8] the sets $\mathbb{T}$ , $\mathbb{S}$ and $\mathbb{U}$ were computed by using a SAT-solver. Basically, a *Sequential Equivalence Check* (SEC) of a circuit C and a circuit C' with additional logic to model soft errors is performed. The circuit is unrolled up to a given time limit $t^d$ . The schematic view of the model is shown in Figure 1. A soft error is assumed to be corrected or detected (i.e., signalized by a fault signal flt) within a short period of time or $^{1}$ For simplicity we consider both cases as the same fault, i.e. a soft error at g. If a finer differentiation is required this can be easily integrated into the algorithms the fault remains undetected. Consequently, the analysis can be bounded by a certain time limit $t^d$ . Combinational circuits can be easily embedded in the sequential model. For the approach in [8] it is sufficient to compute at least one testpattern as mentioned in Section II-C for a component g, to classify g as non-robust. Given the sets $\mathbb{T}, \mathbb{S}$ and $\mathbb{U}$ at a certain time frame $\mathbf{t}$ the robustness is defined as follows: $$\begin{split} R_{\mathrm{lb}}^{\mathbf{t}} &= \frac{|\mathbb{T}^{\mathbf{t}}|}{|C|} = 1 - \frac{|\mathbb{S}^{\mathbf{t}}| + |\mathbb{U}^{\mathbf{t}}|}{|C|} \\ R_{\mathrm{ub}}^{\mathbf{t}} &= \frac{|\mathbb{U}^{\mathbf{t}}| + |\mathbb{T}^{\mathbf{t}}|}{|C|} = 1 - \frac{|\mathbb{S}^{\mathbf{t}}|}{|C|} \end{split}$$ For combinational circuits the measure yields $R_{\rm lb}^{\bf t}=R_{\rm ub}^{\bf t}$ , since $|\mathbb{U}|=0$ and $\mathbf{t}=0$ . For sequential circuits the lower and upper bound may differ even if an unlimited number of time frames would be considered. In particular, if SDC occurs and the faulty system state is not corrected, the divergence between faulty system and fault free system may persist. As mentioned before for this robustness-measure it is sufficient to find one testpattern to classify g as non-robust, or to prove that no testpattern exists – the component is robust. This robustness measure considers a $Single\ Testpattern\ (ST)$ . In the following we denote this by $R_{\rm ST}^{\bf t}$ as well as $R_{\rm ST,lb}^{\bf t}$ and $R_{\rm ST,ub}^{\bf t}$ for lower bound and upper bound at time frame ${\bf t}$ , respectively. #### III. ANALYSIS USING MULTIPLE TESTPATTERNS In the following we illustrate drawbacks of the measure introduced in Section II using an example. Our new measure using an analysis based on *Multiple Testpatterns* (MT) that overcomes those limitations is introduced afterwards. Then, the relation to previously defined measures is analyzed. # A. Motivating Example The robustness measure discussed in Section II can be considered as a "worst-case analysis": a component is considered non-robust as soon as there is a single testpattern that shows faulty behavior of this component at least at one primary output. The probability to apply such a pattern, i.e., the excitation probability for the fault, is ignored. Consider the following example. Example 1: Consider a combinational circuit C with four primary inputs, i.e., $|\operatorname{PI}(C)|=4$ . Furthermore, let $a,b\in C$ be two non-robust and $c,d,e\in C$ be robust components. The worst-case analysis yields $R_{\mathrm{ST}}=3/5=60\%$ . Further assume, that there are only two test patterns that excite a fault in a, denoted by $\psi(a)=2$ . Given the total number of $2^{|\operatorname{PI}(C)|}=2^4=16$ input traces, the probability to excite the fault in a is only 2/16=12.5%. Moreover, let any input trace be a testpattern for a fault at b, i.e., $\psi(b) = 16$ and the excitation probability at b is 100%. The worst-case analysis does not differentiate the two components. Both are simply classified as non-robust, even though b can be considered as a hot-spot while a is relatively save. The exact computation of excitation probabilities along the lines of [6] would overcome this limitation. But in that case all test patterns – potentially a number exponential in the number of primary inputs – have to be found. In [6] a BDD-based symbolic analysis was used for this purpose. But a BDD-based analysis is typically limited to small circuits. ## B. New Robustness Measure Instead of considering only a single testpattern per component, the new robustness measure takes *Multiple Testpatterns* (MT) into account. By this, components that only have a few test patterns can be differentiated from those components having many test patterns. As a result a grading of the non-robust components is achieved which can be utilized to identify hot-spots in the circuit. In the following the combinational case is considered first, i.e. lower bound and upper bound for the robustness are identical. Let $\psi(g)$ denote the number of test patterns at component g. Then, the quotient between the number of test patterns $\psi(g)$ and all input traces $\Psi=2^{|\operatorname{PI}(C)|}$ yields the excitation probability for a soft error on g: $$e(g) = \frac{\psi(g)}{\Psi}$$ Or alternatively, as a measure for robustness, the probability that a soft error is not observable, is given by: $$f(g) = 1 - e(g) = 1 - \frac{\psi(g)}{\Psi}$$ As explained, calculating $\psi(g)$ is often too expensive. Instead we limit the number of test patterns by a user defined parameter $0<\lambda\leq 1$ . If the portion of input traces that are test patterns exceeds $\lambda$ , a component is considered nonrobust. For such hot-spots no further differentiation is required. Only below this percentage a more fine grained resolution is determined. Then, the robustness of a component g is determined by $$r(g,\lambda) = 1 - \frac{\min\{\psi(g), \lceil \lambda \Psi \rceil\}}{\lceil \lambda \Psi \rceil}.$$ (1) Consequently, the robustness of the circuit is measured by $$R_{\rm MT}(\lambda) = \frac{1}{|C|} \sum_{g \in C} r(g, \lambda). \tag{2}$$ Example 2: Consider again Example 1 and let $\lambda=0.5$ such that up to 50% of all input traces are considered. Again let $\psi(a)=2, \psi(b)=16$ and $\psi(c)=0, \psi(d)=0, \psi(e)=0$ . Then, $r(a,0.5)=3/4, \, r(b,0.5)=0$ and $$R_{\rm MT}(0.5) = \frac{1}{5}(3/4 + 0 + 3) = 75\%.$$ The overall value of the new measure increases compared to the worst-case analysis. More important is the observation that components a and b can be differentiated. The presented measure can be extended to the sequential case. In this case test patterns span multiple time frames and up to t time frames are considered in the analysis. The number of all input traces is given by $\Psi=2^{|\operatorname{PI}(C)|\cdot t}$ . Also a component may cause SDC and is considered non-classified in this case. Such components are collected in a set $\mathbb U$ . Then a lower bound and an upper bound for the robustness are determined by assuming that all non-classified components Fig. 2. Sequential-ATPG model may turn out to be non-robust or robust, respectively. For non-classified components no testpattern can be found that shows a soft error at one of the primary outputs, i.e. $\psi(g)=0$ holds for $g\in\mathbb{U}$ . We retrieve the following bounds: $$R_{\mathrm{MT,lb}}^{\mathbf{t}}(\lambda) = \frac{1}{|C|} \sum_{g \in C} r(g, \lambda) - |\mathbb{U}|$$ $$R_{\mathrm{MT,ub}}^{\mathbf{t}}(\lambda) = \frac{1}{|C|} \sum_{g \in C} r(g, \lambda)$$ ### C. Embedding Previous Measures First, the new measure is compared to the worst-case analysis previously proposed in [8] and described in Section II. Assume that $\lambda'$ is close to zero. Then, $\lceil \lambda \Psi \rceil$ as used above becomes 1. In this case the robustness of a component g as defined in Equation (1) becomes: $$r(g, \lambda') = 1 - \min\{\psi(g), 1\}$$ If there exists at least one testpattern, the robustness of a component becomes 0. If no testpattern exists, the component is considered robust. Consequently, for $\lambda$ close to zero the new measure converges to the one of [8]. The probabilistic approach of [6] considers exact excitation probabilities to determine the robustness of components<sup>2</sup>. This is achieved using our measure by setting $\lambda$ to 1, i.e., Equation (1) becomes $$r(g,1) = 1 - \frac{\min\{\psi(g), \Psi\}}{\Psi} = 1 - \frac{\psi(g)}{\Psi} = f(g).$$ This is the probability of a soft error to remain undetected. # IV. COMPUTATION In this section we present an algorithm using a SAT-based sequential ATPG engine to calculate the new robustness measure and discuss potential extensions. ### **Algorithm 1**: COMPUTETESTPATTERNS ``` Input: circuit C, component g, current time frame t, bounded time frame t, solver object s, accuracy-parameter \lambda Output: \psi(g) begin createOrAppendSATInstance(s, C, g, t); 2 3 Cone_{\mathrm{PI}} = \mathrm{PI}^{\mathbf{t}}(C) \cap \mathrm{computeTransitiveInputCone}(g, \mathbf{t}); 4 \Psi' = \lceil \lambda \cdot 2^{|Cone_{\text{PI}}|} \rceil; 5 while s.solve() = SATISFIABLE \wedge \psi(g) < \Psi' do 6 7 \psi(q)++: (\bigvee_{\text{pi} \in \textit{Cone}_{PI}} \neg s.model(\text{pi})); s.addClause( 8 9 if s.solve() = UNSATISFIABLE \wedge \psi(g) < \Psi' then 10 11 return \psi(g) + computeInputTraces(C, g, t + 1, \mathbf{t}, s) 12 13 return \psi(q) 14 15 end 16 end ``` ## A. Algorithm The model for sequential SAT-based ATPG is shown in Figure 2. Given are a component g and a limit $\mathbf{t}$ for the number of time frames to be considered. For time frame 0 the fan-out cone of g is modeled. For all outputs in this fan-out cone, the transitive fan-in of these outputs is also modeled until the primary inputs or a state element are reached. In time frame 0 the state elements remain unconstrained during the analysis or they are constrained as proposed in [8]. For time frame 1, the same copy is created. But the traversal does not stop when reaching state elements in time frame 1. Instead the model also includes the driving circuitry of time frame 0 to adequately model the sequential behavior. This process is repeated until time frame $\mathbf{t}$ is reached. In principle this is a standard procedure for sequential ATPG. In our case faults are only injected in time frame 0, because soft errors are modeled and all states are covered by leaving the initial state unconstrained or allowing all reachabable states, respectively. Moreover, the problem instance for analyzing t time frames is reused and extended to analyze t+1 time frames. This improves the efficiency as learned information is reused by the SAT-solver [16]. Over time the number of primary inputs contained in the model for a fixed time frame may change, as the transitive fan-in cone increases. This has to be taken into account when calculating the number of test patterns. The pseudocode to determine the test patterns for a soft error in a component g is given in Algorithm 1. A circuit C, a considered component g, the current time t, the maximum considered time frame $\mathbf{t}$ , the SAT-solver object s as well as the parameter $\lambda$ are given as input to the algorithm. The algorithm creates a SAT-instance for the component g at time frame t. If t>0 the existing problem instance is extended as explained above and as shown in Figure 2. Only the primary inputs that are contained in the cone of g are considered. The maximum number of test patterns to be considered is configured by $\lambda$ and then stored in $\Psi'$ in line 5. From line 6 to line 9, the test patterns are computed. While a satisfying assignment exists and the maximum number of test patterns $\Psi'$ is not exceeded, <sup>&</sup>lt;sup>2</sup>Indeed a different fault model has been used in [6], but the fault models can be aligned. TABLE I RESULTS FOR COMBINATIONAL CIRCUITS | (a | ) Measures | $R_{cm}$ | and | $R_{\rm MM}$ | ( <b>\</b> ) | ١ | |----|-------------|----------|-----|--------------|---------------------------|---| | (a | ) Wicasuics | TUST | anu | IUMT | $\langle \Lambda \rangle$ | , | | CIRCUIT | | | "WORST-CASE" | | | NEW MEASURE (100 TESTS) | | | | |-------------|----------------------------|------|--------------|-----|--------|-------------------------|-------------------|------------------|---------| | NAME | $ \operatorname{PI}(C) $ | C | $R_{\rm ST}$ | S | TIME | $\lambda$ | $R_{\mathrm{MT}}$ | $> \lambda \Psi$ | TIME | | par_5xp1 | 7 | 391 | 88.44% | 49 | 0.65s | 78.12% | 94.51% | 7 | 2.71s | | par_9sym | 9 | 655 | 98.69% | 9 | 0.26s | 19.53% | 98.69% | 9 | 2.17s | | par_alu4 | 14 | 5931 | 95.40% | 275 | 73.58s | 0.61% | 96.06% | 197 | 547.52s | | par_apex7 | 49 | 720 | 77.48% | 204 | 6.21s | < 0.01% | 77.48% | 204 | 72.64s | | par_cm42a | 4 | 81 | 85.71% | 15 | 0.10s | 100.00% | 92.02% | 0 | 0.09s | | par_cm82a | 5 | 62 | 73.17% | 22 | 0.07s | 100.00% | 86.97% | 0 | 0.11s | | par_cmb | 16 | 136 | 63.16% | 70 | 0.29s | 0.15% | 80.84% | 14 | 1.77s | | par_comp | 32 | 385 | 41.36% | 285 | 2.69s | < 0.01% | 41.36% | 285 | 35.51s | | par_con1 | 7 | 65 | 81.11% | 17 | 0.06s | 78.12% | 93.61% | 0 | 0.17s | | par_cordic | 23 | 2866 | 97.65% | 69 | 10.93s | < 0.01% | 97.65% | 69 | 95.84s | | par_cu | 14 | 166 | 76.92% | 51 | 0.29s | 0.61% | 78.15% | 47 | 2.45s | | par_duke2 | 22 | 976 | 76.23% | 255 | 7.98s | < 0.01% | 76.27% | 254 | 77.39s | | par_e64 | 65 | 1409 | 88.45% | 193 | 20.27s | < 0.01% | 89.20% | 177 | 116.45s | | par_f51m | 8 | 318 | 91.76% | 29 | 0.36s | 39.06% | 92.35% | 17 | 2.37s | | par_frg1 | 28 | 393 | 92.74% | 35 | 0.55s | < 0.01% | 92.74% | 35 | 4.73s | | par_rd84 | 8 | 1157 | 82.81% | 204 | 6.23s | 39.06% | 91.22% | 16 | 43.07s | | par_rot | 135 | 1932 | 76.17% | 583 | 72.30s | < 0.01% | 76.17% | 583 | 893.19s | | par_sao2 | 10 | 502 | 82.16% | 96 | 1.24s | 9.77% | 91.38% | 31 | 6.09s | | par_sqrt8ml | 8 | 447 | 60.80% | 187 | 2.29s | 39.06% | 81.70% | 30 | 11.42s | | par_squar5 | 5 | 273 | 80.87% | 57 | 0.44s | 100.00% | 91.04% | 0 | 0.94s | | par_t481 | 16 | 1752 | 99.11% | 16 | 1.29s | 0.15% | 99.11% | 16 | 11.92s | | par_table5 | 17 | 1455 | 70.58% | 448 | 20.66s | 0.08% | 72.63% | 406 | 187.73s | Number of testpatterns a new testpattern is extracted from the SAT-model and the number of test patterns is incremented. To compute another testpattern, the currently computed satisfying assignment is excluded by adding a blocking clause that contains values of primary inputs and state elements in time frame t=0. Afterwards the computation continues. The computation is finished, if 1) the number of considered time frames is exceeded, 2) the SAT-instance becomes unsatisfiable, i.e, there are no more test patterns, or 3) the number of test patterns exceeds $\Psi'$ . Finally, the number of computed test patterns is returned. ## B. Discussion In [8] an algorithm to perform the worst-case analysis was proposed. That algorithm was based on SEC. All potential faults are modeled in a single problem instance and learned information can potentially be reused for all other faults. Preliminary experiments have shown that this is typically more efficient than using a large number of ATPG calls. But, here we also calculate multiple test patterns for each fault. Therefore we chose an ATPG engine to keep the size of the problem instances smaller. Compared to [6] we could use an All Solution SAT solver to compute all test patterns without BDDs. By this an exact solution would be achieved, but still the computational cost would be extremely high. Instead lightweight methods to explore a larger number of test patterns are of interest. Instead of only extracting a single satisfying solution a simulation based relaxation can extract a sufficient set of assignments (e.g. [17]). Using this relaxation multiple test patterns are extracted after each call to the SAT-solver and results that are even closer to a probabilistic analysis can be extracted. # V. EXPERIMENTAL RESULTS This section presents the evaluation of the new robustness measure. Combinational circuits were taken from the LGsynth93 benchmark suite and sequential circuits from the ITC'99 benchmark suite, respectively. For every circuit a parity checker was implemented and optimized using SIS [18]. The parity is checked at the primary outputs and on the state elements. A wrong parity is signalized by setting the fault signal flt to 1. The robustness of the parity circuits is less than 100%: fault masking may occur, i.e., a single soft error flips an even number of outputs and state elements. All experiments were carried out on an AMD Dual-Core Opteron Processor with 32GB main memory under Linux. The algorithm is implemented in C++. As underlying SAT-solver MiniSat v2.0 [13] with a feature to allow incremental satisfiability is used. To have the run times comparable we adjusted the parameter $\lambda$ for each benchmark such that up to 100 test patterns were calculated. #### A. Combinational Circuits Table I(a) shows the results for the combinational circuits. The worst-case analysis $R_{\rm ST}$ as well as the newly introduced measure $R_{\rm MT}$ have been evaluated. The first three columns describe properties of the circuit: the name, the number of primary inputs and the number of gates in the circuit. The results of the worst-case analysis are shown in column "WORST-CASE". Since, combinational circuits are considered the upper and lower bound are equal. The robustness measure $R_{\rm ST}$ is given in column 4. The resulting number of non-robust components is denoted by $|\mathbb{S}|.$ The results for computing the measure $R_{\rm MT}$ proposed in this work are given in the columns titled NEW MEASURE. The parameter $\lambda$ , the computed robustness $R_{\rm MT},$ the number of components that exceeded the maximum number of 100 test patterns to be considered (column $> \lambda \Psi$ ), and the run times are given. As expected the value of $R_{MT}$ is larger or equal to $R_{ST}$ . For $R_{MT}$ non-robust components are not considered "100% TABLE II $R_{\mathrm{ST}}$ – and $R_{\mathrm{MT}}(\lambda)$ –robustness for sequential circuits | CIRCUIT | | | "WORST-CASE" | | | NEW MEASURE (100 TESTS) | | | | | |---------|----|------|--------------|----------------|----------|-------------------------|--------------|------------------|----------|--| | NAME | PI | C | $R_{\rm ST}$ | $ \mathbb{S} $ | TIME | λ | $R_{\rm MT}$ | $> \lambda \Psi$ | TIME | | | par_b01 | 2 | 145 | 88.57% | 20 | 0.26s | 100.00% | 88.57% | 7 | 0.61s | | | par_b02 | 1 | 76 | 87.76% | 12 | 0.10s | 100.00% | 88.14% | 4 | 0.10s | | | par_b03 | 4 | 408 | 85.14% | 81 | 8.23s | < 0.01% | 85.14% | 74 | 21.99s | | | par_b04 | 11 | 2130 | 78.95% | 513 | 131.5s | < 0.01% | 78.95% | 513 | 290.88s | | | par_b05 | 1 | 2145 | 68.56% | 730 | 114.6s | 100.00% | 68.58% | 0 | 130.07s | | | par_b06 | 2 | 152 | 80.60% | 39 | 0.43s | 100.00% | 81.84% | 0 | 0.76s | | | par_b07 | 1 | 1143 | 76.33% | 320 | 80.89s | 78.12% | 84.25% | 14 | 88.45s | | | par_b08 | 9 | 457 | 74.91% | 144 | 20.33s | < 0.01% | 74.91% | 144 | 32.73s | | | par_b09 | 1 | 454 | 80.59% | 111 | 6.32s | 19.53% | 80.59% | 3 | 11.99s | | | par_b10 | 11 | 571 | 81.32% | 127 | 12.61s | < 0.01% | 81.32% | 127 | 23.06s | | | par_b11 | 7 | 1633 | 78.72% | 380 | 1023.44s | < 0.01% | 78.72% | 380 | 1136.27s | | non-robust", but may have a certain robustness. Here column "> $\lambda\Psi$ " gives further insight. If this is zero, there was no nonrobust component with more than 100 test patterns. In these cases the analysis is identical to a probabilistic analysis. This typically occurs for circuits with a small number of primary inputs (e.g. for par\_cm42a, par\_squar5). The opposite happens when more than 100 test patterns are found for all components that were classified as non-robust by the worst-case analysis. In this case $R_{\rm ST}$ and $R_{\rm MT}$ produce the same outcome. The run times are expected to increase for the new measure as more than one solution must be determined. This is also true for our experiments, but the increase in run time is moderate in all cases. Once the SAT-solver found a testpattern, finding the next is often easier because previously learned information is reused. # B. Sequential Circuits Furthermore, sequential circuits have been considered. The results obtained are shown in Table II. Up to 10 time frames are considered. All circuits are fully classified using the worst-case analysis, i.e, upper and lower bound meet each other. Therefore, only one value is given for the robustness. Similarly, also the bounds for the new measure are identical. Here, also up to 100 test patterns extending over 10 time frames have been considered. Again for circuits with a small number of inputs, this number is sufficient to take all test patterns into account (e.g., par\_b05). For circuits with a larger number of inputs, also some differentiation may be achieved (e.g., par b03). But typically, the number of test patterns exceeds 100. As a consequence, a more effective algorithm is required being able to explore a larger number of test patterns. # C. Detailed Example A more detailed evaluation for the combinational circuit par\_cmb is shown by the histogram in Figure I(b). The x-axis depicts the number of test patterns. The y-axis gives the number of components that had a certain number of test patterns. In total there are $2^{16} = 65536$ input traces. The number of considered test patterns was limited to 1000. Figure I(b) shows, that most of the 136 non-robust components have less than 1000 non-robust test patterns. As a result non-robust components can be differentiated by the new measure and hot-spots can be identified. The run times are moderate and simple improvements will provide an even better performance as well as differentiation. #### VI. CONCLUSION In this paper we proposed a new robustness measure. This measure constitutes a trade-off between worst-case analysis and a probabilistic approach. The worst-case analysis as well as the probabilistic approach can be embedded in the proposed measure. With the new measure a more detailed view of the robustness can be achieved in a feasible run time. Furthermore, the measure identifies hot-spots in the design, i.e., easily sensitizable components. A sequential-ATPG engine considering a bounded time window was used to compute the test patterns efficiently. #### VII. ACKNOWLEDGMENT We would like to thank Andre Sülflow for helpful discus- #### REFERENCES - [1] C. Zhao and S. Dey, "Improving transient error tolerance of digital VLSI circuits using RObustness COmpiler (ROCO)," in Int'l Symp. on Quality Electronic Design, 2006, pp. 133-140. - A. Pellegrini, K. Constantinides, D. Zhang, S. Sudhakar, V. Bertacco, and T. Austin, "CrashTest: A fast high-fidelity FPGA-based resiliency analysis framework," in Int'l Conf. on Comp. Design, 2008. - U. Krautz, M. Pflanz, C. Jacobi, H. W. Tast, K. Weber, and H. T. Vierhaus, "Evaluating coverage of error detection logic for soft errors using formal methods," in *Design, Automation and Test in Europe*, 2006, pp. 176–181. - [4] M. Miskov-Zivanov and D. Marculescu, "Circuit reliability analysis using symbolic techniques," IEEE Trans. on CAD, vol. 25, no. 12, pp. 2638-2649, 2006. - [5] M. Bozzano, A. Cimatti, and F. Tapparo, "Symbolic fault tree analysis for reactive systems," in Automated Technology for Verification and Analysis, ser. LNCS, vol. 4762, 2007, pp. 162-176. - J. Hayes, I. Polian, and B. Becker, "An analysis framework for transienterror tolerance," in VLSI Test Symp., 2007, pp. 249-255. - [7] G. Fey and R. Drechsler, "A basis for formal robustness checking," in - Int'l Symp. on Quality Electronic Design, 2008, pp. 784–789. G. Fey, A. Sülflow, and R. Drechsler, "Computing bounds for fault tolerance using formal techniques," in Design Automation Conf., 2009, pp. 190-195. - M. Hunger, S. Hellebrand, A. Czutro, I. Polian, and B. Becker, "ATPG-Based grading of strong fault-secureness," in IEEE International On-Line Testing Symposium, 2009. - [10] I. Polian, S. M. Reddy, and B. Becker, "Scalable calculation of logical masking effects for selective hardening against soft errors," in IEEE Annual Symposium on VLSI, 2008, pp. 257-262. - [11] J. Smith and G. Metze, "Strongly fault secure logic networks," IEEE Trans. on CAD, vol. 27, no. 6, pp. 491-499, 1978. - [12] S. Cook, "The complexity of theorem proving procedures," in 3. ACM Symposium on Theory of Computing, 1971, pp. 151–158. - [13] N. Eén and N. Sörensson, "An extensible SAT solver," in SAT 2003, ser. LNCS, vol. 2919, 2004, pp. 502–518. [14] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik, "Chaff: - Engineering an efficient SAT solver," in Design Automation Conf., 2001, pp. 530-535 - [15] G. Tseitin, "On the complexity of derivation in propositional calculus," in Studies in Constructive Mathematics and Mathematical Logic, Part 2, 1968, pp. 115-125, (Reprinted in: J. Siekmann, G. Wrightson (Ed.), Automation of Reasoning, Vol. 2, Springer, Berlin, 1983, pp. 466-483.). - [16] J. Whittemore, J. Kim, and K. Sakallah, "SATIRE: A new incremental satisfiability engine," in *Design Automation Conf.*, 2001, pp. 542–545. [17] S. Eggersglüß and R. Drechsler, "Improving test pattern compactness in SAT-based ATPG," in *Asian Test Symp.*, 2007, pp. 445–452. - E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. Stephan, R. Brayton, and A. Sangiovanni-Vincentelli, "SIS: A system for sequential circuit synthesis," University of Berkeley, Tech. Rep., 1992.