Availability

High availability of the telecom network and associated services is the single most important operator and sub­scriber requirement. Normal requirements on maximum unavailability are in the order of one or a few minutes of subscriber unavailability per year. This includes down­time due to faults in the exchange and in the transmis­sion equipment and software, but also unavailability due to planned software upgrades and often also accidents out­side the control ofthe vendor, such as fires, damaged trans­mission cables, and incorrect operation of the exchange. Several methods are used to increase the availability of the exchange to the subscriber:

• Redundancy, including fault-tolerant processors

• Segmentation

• Diagnostics of hardware and software faults

• Recovery after failure

• Handling of overload

• Disturbance-free upgrades and corrections

• Robustness to operator errors

Each of these is treated briefly below.

Redundancy. In order to cope with hardware faults, re­dundant hardware is used for those parts of the switch that are critical for traffic execution. Requirements are in the order of 1000 years for the mean time between system failures (MTBSF).

Specifically, current technology requires that a redun­dant processor is available in synchronized execution (hot standby), ready for a transparent takeover if a single hard­ware fault occurs in one processor. An intelligent fault anal­ysis algorithm is used to decide which processor is faulty. In a multiprocessor system, n + 1 redundancy is normally used, where each processor can be made of single, double, or triple hardware. When one processor fails, its tasks are moved to the idle (cool standby) processor. A similar redun­dancy method is based on load sharing, where the tasks of the failed processor are taken over by several of the other

processors that are not overloaded themselves.

The group switch hardware is also normally duplicated or triplicated, because it is so vital to the exchange func­tions. The less central hardware devices, such as trunk de­vices, voice machines, transceivers, signal terminals, and code receivers, are normally pooled, so that a faulty device is blocked from use and all users can instead access the remaining devices until the faulty device is repaired.

Segmentation. To avoid system failure a fault must be kept isolated within a small area of the exchange. This is done by segmentation of hardware, with error supervision at the interfaces. In software, the segmentation is made by partitioning of and restricted access to data structures; only owners of data can change the data, where the owner can be a call process or a function.

Diagnostics of Faults. After the occurrence of a fault in hardware or software, the fault must be identified and lo­calized, its effect restricted, and the exchange moved back to its normal state of execution. For this to work, the di­agnostics must be extensive and automatic. The exchange must be able to identify the faulty software and hardware and must be able to issue an alarm, usually to a remotely located operator.

Recovery After Failure. After a fault has been detected, the effect should be restricted to only the individual call or process (for instance, an operator procedure or a location update by a mobile subscriber) or an individual hardware device. This call or process is aborted, while the rest of the exchange is not affected. The recovery must be automatic and secure. In a small fraction of events, the fault remains after the low-level recovery, or the initial fault is consid­ered too severe by the fault handling software, so that a more powerful recovery procedure must be used. The pro­cess abort can be escalated to temporary blocking of hard­ware devices or software applications and, if required, re­sult in the restart of an entire processor or a number of processors. If the restart fails in recovering the exchange into normal traffic handling, new data and software are loaded from internal or external memory.

Handling of Overload. The exchange is required to exe­cute traffic literally nonstop and when offered more traffic than it can handle, the rejection of overflow traffic should be made gracefully. ITU requires, that an exchange that is offered 150% of what it was designed for should still have 90% of its maximum traffic handling capacity. The ex­change must also be able to function without failure during extreme traffic loads. Such extreme loads can be both short peaks lasting a few milliseconds or sustained overload due to failures in other parts of the network. Overload han­dling is accomplished by rejecting excess traffic very early in the call setup, before it has used too much processor time or any of the scarce resources in the switching path. Fig­ure 9 shows the overload performance with and without an overload control function.

the exchange must not disturb ongoing traffic execution. This should be true both for fault corrections and when new functions are introduced. The architecture must thus allow traffic to be executed in redundant parts, while some parts of the exchange are upgraded.

Robustness to Operator Errors. Security against unau­thorized access is accomplished by use of passwords and physically locked exchange premises. The user interface part of the exchange can supervise that valid operation instructions are followed for operation and maintenance, and it can issue an alert or prohibit other procedures. Log­ging of operator procedures and functions for undoing a procedure can be used. If a board is incorrectly removed from the exchange, the exchange should restrict the fault to that particular board, use redundant hardware to min­imize the effect of the fault, and then indicate that this board is unavailable. A simple user interface with on-line support makes operator errors less probable.

Grade of Service. The real-time delays in the exchange must be restricted to transmit speech correctly. Packet — switched connections have problems achieving good real­time speech quality for this reason, especially during heavy usage. Circuit-switched networks have so far given the best real-time performance regarding delays and grade of service for voice and video, compared with packet data networks. ATM switching (due to its short and fixed size cell/packets) also fulfills the grade of service requirements and also defines service classes for different requirements.

Scalability

There is a need for scalable exchanges regarding capacity, from the very small (such as base stations and local ex­changes in desolate areas) to the very large, mainly the hubs (transit exchanges) and MSC:s of the networks and exchanges in metropolitan areas. Furthermore, there is sometimes a requirement for downward scalability regard­ing physical size and power consumption, particularly for indoor or inner city mobile telephony.

The following are the common system limits for down­ward scalability of an exchange:

• The cost to manufacture and to operate in service will be too high per subscriber or line for small configura­tions.

• The physical size is limited by the hardware technol­ogy and by the requirements for robustness to the en­vironment, and what is cost efficient to handle.

• The power consumption is limited by the hardware technology chosen.

The following are the common system limits for upward scalability of an exchange, each treated briefly below:

• (Dynamic) real-time capacity

• (Static) traffic handling capacity

• Grade of service (delays)

• Memory limits

• Data transfer capacity

• Dependability risks

Processing Capacity. New more advanced services and techniques requires more processing capacity, this trend has been valid for (a) the replacement of analog technol­ogy with digital processor controlled functions and (b) the development of signaling systems from decadic and multi­frequency to packet mode digital signaling, including trunk signaling protocols such as ISDN user part (ISUP) together with mobile application part (MAP) and transaction capa­bility application part (TCAP) and the development of mo­bile telephony. In the charging area, the trend from pulse metering to detailed billing affects the call capacity.

The number of calls per subscriber has also increased due to lower call costs from deregulation and due to the use of subscriber redirection and answering services.

Traffic Capacity. It is very important to design and con­figure hardware and software correctly, in order to mini­mize hardware costs, and at the same time ensure suffi­ciently low congestion in the exchange and in the network. Normally, the switch fabric is virtually non-blocking, and the congestion occurs in other resources, such as the ac­cess lines, trunk lines, and other equipment. The relation between congestion probability and the amount of traffic is well known, if all devices are accessible and the traffic follows a Poisson process; that is, the times between of­fered calls are independent and exponentially distributed. In some cases, the probability can be calculated explicitly. In more complex device configurations with non-Poisson traffic, the congestion probabilities are most easily calcu­lated by simulation techniques.

Memory. The amount of memory per subscriber line or trunk line is another way to measure the complexity of a telecom application. The trend in this area is similar to that of processing capacity, and the same factors are responsible for the large increase in memory needs. Due to the real­time requirements, fast memory is used extensively, and secondary memory is only used for storage of backups, log files and other data where time is not critical.

Transfer Capacity. A third part of the switching sys­tem capacity is the data transfer from the exchange to other nodes, for example, to other exchanges and network databases, billing centers, statistical post processing, and nodes for centralized operation. There has been a growing demand for signaling link capacity due to large STPs, for transfer capacity from the exchange to a billing center due to detailed billing and large amounts of real-time statis­tics, and to transfer capacity into the exchange due to the increased amount of memory to reload at exchange failure.

Dependability Risks. Although dependability has in­creased for digital exchanges, there is a limit as to how large the nodes in the network can be built. First, the more hardware and the more software functions assembled in one exchange, the more errors there are. The vast majority of these faults will be handled by low-level recovery, trans­parent to the telecom function, or only affect one process. However, a small fraction of the faults can result in a major outage that affects the entire exchange during some time.

As an example, assume that the risk of a one-hour com­plete exchange failure during a year for one exchange is 1%. If we add the functionality of an SCP to an HLR node, then we more than double the amount of software, and presumably the number of faults, in the node. The risk of a major outage should be larger in a new exchange introduc­ing new software with new faults. Only if unavailability due to completely stopped traffic execution is much less than the total effect of process abortions and blocked hard­ware devices can we build exchanges of unlimited software complexity.

The second reason for a limited complexity from a de­pendability point of view is that network redundancy is required and only can be used if there are several transit exchanges in the network.

Life-Cycle Cost

Since the 1980s, the operating cost has become larger than the investment cost of an exchange. Thus, the emphasis on efficient operation and maintenance has increased, regard­ing both ease of use and utilization of centers that remotely operate a number of exchanges that are not staffed. For ease of use, the telecommunication management network (TMN) was an attempt by ITU to standardize the operator interface. After several years this standard is still not used much. Instead, the operator interface is to a large extent dependent on the exchange manufacturer as well as the requirements from the telecom operator company. Several open and proprietary user interfaces are common.

For central operation, more robust methods of remote activities have evolved. Software upgrades and corrections, alarm supervision and handling, collection of statistics and charging data, and handling and definition of subscriber data are all made remotely. Transmission uses a multitude of techniques and protocols. Open standard protocols have taken over from proprietary protocols.

In addition, important parts of the life-cycle cost are (a) product handling for ordering and installation and (b) spare part supply.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *