Bellcore / Telcordia - Reliability Prediction Procedure

Maintaining reliability and providing reliability engineering is an essential need with modern electronic systems. Reliability engineering for electronic equipment requires a means for a quantitative baseline, or a reliability prediction analysis. The Bellcore procedure or standard was developed for the telecommunications industry and their applications; however, it has become widely used for industrial and commercial electronic equipment applications. Using the Bellcore standard for reliability prediction produces calculated Failure Rate and Mean Time Between Failures (MTBF) numbers for the individual components, equipment and the overall system. The final calculated prediction results are based on the roll-up, or summation, of all the individual component failure rates.

Published by Bell Communications Research, Inc., the standard is now maintained and updated by Telcordia Technologies, Inc. The purpose for developing this procedure was for use by Bellcore Client Companies (BCC) and other buyers of electronic equipment to establish and maintain consistent and uniform methods for estimating the reliability of electronic equipment and systems. The Bellcore procedure defines reliability as a measure of the frequency of equipment failures as a function of time.

Equipment
The Bellcore reliability prediction procedure refers to electronic systems as hierarchical assemblies constructed from device and unit level hardware.

  • Device - Refers to a basic component such as a capacitor, integrated circuit, etc.
  • Unit - Refers to any customer replaceable assembly of devices such as circuit packs, modules, plug-in units, racks, power supplies and ancillary equipment. A unit will usually be the lowest level of replaceable assemblies or devices. Units are considered non-repairable. The Bellcore procedure is aimed primarily at the reliability prediction of units.
  • Systems - The procedure refers to serial systems, which is any assembly of Units for which the failure of any single unit will cause a system failure.

The Bellcore procedure's focus is predicting failure rates for electronic equipment, similar to the MIL-217 standard. The procedure does not address other aspects of reliability such as downtime, repairable equipment, availability, redundancy and the impacts from failures.

Part Stress Analysis
The Bellcore procedure uses a part stress analysis similar to the MIL-217 standard. By component stresses, the standard is referring to the actual operating conditions such as environment, temperature, voltage, current and power levels applied, for example. The Bellcore standard groups components or devices by major categories and then has subgroups within the categories. An example is a "Fixed, Tantalum, Solid" is a subcategory of the "Capacitor" group. Each device or part category and it's subgroups have individual base failure rates that are applied to the model calculating the failure rate. The failure rate model also includes the various part stresses by applying pi factors in the model.

Failure Rate and pi Factors
The failure rate model referred to above include a base or generic steady-state failure rate, , for the category and subgroup selected. The base failure rates apply to components and parts operating under normal environmental conditions, with power applied, performing the intended function(s), using base component quality levels and operating at the design stress levels. The standard then applies many pi factors, or multiplying factors, to the base failure rates in order to factor in the actual operating conditions, environment and stress levels referred to above. Base failure rates are adjusted by applying the pi factors, which range from 0 to 1.0, to the underlying equation or model provided for each component category.

Bellcore Prediction Methods
The above procedure calculates the predicted failure rate at the actual operating conditions for each component in the project. The Bellcore procedure provides three methods for predicting device and unit reliability. The method used is normally determined by the information that is available.

  • Method I - The "parts count" or "Black Box" method is very similar to and was modeled from the MIL-217 standard. This method assumes no reliability data is available on the devices and units included in the system. Predictions are based on the generic reliability parameters discussed. This is the simplest of the three methods and can be applied to devices or units.
  • Method II - Use this method if laboratory failure rate data is available for some or all the devices or units. This method allows the laboratory data to be combined with the generic data from Method I. The resultant prediction lies somewhere between Method I and Method II.
  • Method III - Use Method III if field tracking reliability data is available for some or all the devices or units. As above, this method will allow actual field reliability data to be combined with the generic data to obtain predictions. This is obviously the most accurate of the prediction methods but requires considerable actual field data.

The three methods are further broken down into three "cases" based on stress values and if the unit or system has a "burn-in" period.

First-Year-Multiplier
Bellcore stresses the facts related to early life of electronic components and systems and the use of "burn-in". The Bellcore model applies a "First-Year-Multiplier" factor in the failure rate prediction which factors in early life and the use of a "burn-in" period. Bellcore defines this multiplier to be the average failure rate during the first year of operation (8760 hours) expressed as a multiple of the Steady-State Failure Rate. The "First-Year-Multiplier" together with the Steady-State Failure Rate provides a prediction of the expected number of failures in the first year of operation.

Calculation Flow
The procedure to determine the overall system level or equipment failure rate is to sum, or roll up, the individually calculated failure rates for each device and then unit. The calculation prediction flows up the hierarchy.

  • Device predictions are made first using one of the three methods.
  • Unit level calculations are made using "steady-state" failure rates and the "First-year Multipliers" for all devices contained in the unit are used as an input to Method I, II or III.
  • System level calculations are made using "steady-state" failure rates and the "First-year Multipliers" for all units contained in the system are used as an input to Method I, II or III.

Therefore, the predictions made at the lower hierarchy are used to feed predictions at the higher unit and system levels. Failure rates predicted by Bellcore should be conservative and use 90% Upper Confidence Level estimates. There is a 90% chance that the actual device generic failure rate is lower than the data provided by Bellcore.

The Bellcore procedure produces failure rate calculation results using FITS (Failures in Time) as the units. One FITS equals 109 failures or one failure per billion hours.

Printed Circuit Boards (PCB)
Most manufactures of electronic equipment assemble a majority of the components to various types of printed circuit boards (PCBs) or as part of a hybrid construction. A failure rate is determined for the PCB or hybrid device by the summation of the failure rates for the numerous components, connectors and other types of construction involved. There are a few differences between Bellcore and MIL-217 that should be noted. Bellcore does not calculate and sum the failure rates for individual device connections (solder joints, etc.) into the model. This failure rate effect is assumed to be included in the device generic failure rate data. Bellcore states when unit failure rates are being predicted, wire, cable, solder connections, wire wrap connections and printed circuit boards may be excluded.

Bellcore also does not include different models for other PCB construction technologies such as plated-through-holes (PTH) and surface mount technology (SMT). The procedure treats these types of construction the same as a standard PCB. There is a hybrid model, however.

Component Quality
The design quality or "as purchased" quality of the component utilized has a direct effect on the part failure rate and appears in the models as a pi factor, pQ. Many of the components covered by the Bellcore procedure are available in several quality levels and each has an associated pi factor, pQ as defined by Bellcore.

Environment
Environmental stress is of major concern in establishing the failure rate for components and parts included in a system per the Bellcore model. Environmental stresses can be quite different from one application environment to another and can subject the equipment to a controlled environment with constant temperature and humidity, or an environment with rapid temperature changes, high humidity, high vibration and high acceleration, for example. The environmental designations included within Bellcore are included in the formulas as pE.

Thermal Environment
Ambient and operating temperatures have a major impact on the failure rate prediction results of electronic equipment, especially equipment involving semiconductors and integrated circuits. The Bellcore procedure requires an input of ambient temperatures. A thermal analysis should be a part of the design and reliability analysis process for electronic equipment. Ambient temperatures for overall equipment should be the ambient temperature close to or between the equipment involved. Individual component or device ambient temperatures should use the operating ambient temperature inside the equipment where they reside. The ambient temperature for components or parts located within the area of hot spots should be adjusted for the higher ambient temperature in the area.