Reconfigurable computing machines are constructed by in­terconnecting one or more FPGAs. Functionally, we can view FPGA-based systems as consisting of two compo­nents, reprogrammable FPGAs providing logic implemen­tation and field programmable interconnect chips (FPICs) providing connectivity among FPGAs. The FPICs, in turn, could be implemented as ASICs or using FPGAs. Most sys­
tems include other elements, such as microprocessors and storage, and can be treated as processing elements and memory that are interconnected. Obviously, the arrange­ment of these elements affects the system performance and routability.

The simplest topology involves FPGAs directly con­nected in a ring, mesh, or other fixed pattern. FPGAs serve as both logic and interconnect, providing direct commu­nication between adjacent devices. Such an architecture is predicated on locality in the circuit design and further assumes that the circuit design maps well to the planar mesh. This architecture fits well for applications with reg­ular local communications (30). However, in general, high performance is hard to obtain for arbitrary communication patterns because the architecture only provides direct com­munications between neighboring FPGAs and two distant FPGAs may need many other devices as “hops” to commu­nicate, resulting in long and widely variable delays. Fur­thermore, FPGAs, when used as interconnects, often result in poor timing characteristics.

A major change in the architecture of FPGA-based sys­tems was the concept of a partial crossbar interconnect, as in Realizer (26) and BORG (31). This scheme is common in logic emulation systems. Interconnection through FPICs implies that all pairs of FPGAs are neighbors, resulting in predictable interconnect delays, better timing characteris­tics, and better overall system performance (32,33). Figure 4, from Reference (26) depicts a reconfigurable computing system designed for logic emulation. Arrays of reconfig — urable processors and FPICs, both implemented using FP — GAs, reside on the emulation modules. The user inputs the emulated design netlist and commands from the worksta­tion. The workstation and control processor personalize the emulation module, which are used in place ofthe emulated chip. Thus, the target system can function properly before the actual chip is available. Furthermore, testing and de­sign change can be made by modifying software instead of reworking hardware.

Figure 5 depicts the SPLASH 2 architecture (34). Each board contains 16 FPGAs, X1 through X16. The blocks M1 through M16 are local memories of the FPGAs. A simplified 36-bit bus crossbar, with no permutation of the bit-lines within each bus, interconnects the 16 FPGAs. Another 36- bit bus connects the FPGAs in a linear systolic fashion. The local memories are dual ported with one port connecting to the FPGAs and the other port connecting to the external bus. It is interesting to note that the crossbar was added to the SPLASH 2 machine, the original SPLASH 1 machine only having the linear connections. SPLASH 2 has been successfully used for custom computing applications such as search in genetic databases and string matching (20).

Other designs have used a hierarchy of interconnect schemes, differing in performance. The use of multi-gigabit transceivers (MGT) available on contemporary FPGAs al­lows high bandwidth interconnection using commodity components. An example is the Berkeley Emulation En­gine 2 (BEE2) (22), designed for reconfigurable computing and illustrated in Fig. 6. Each compute module consists of five FPGAs (Xilinx XC2VP70) connected to four double data rate 2 (DDR2) dual inline memory modules (DIMMs) with a maximum capacity of 4GB per FPGA. Four FP-

GAs are used for computation and one for control. Each PPGA has two PowerPC 405 processor cores. A local mesh connects the computation FPGAs in a 2-D grid using low — voltage CMOS (LVCMOS) parallel signaling. Off-module communications are of via 18 (two from the control FPGA and four from each of the compute FPGAs) Infiniband 4X channel-bonded 2.5-Gbps connectors that operate full — duplex, which corresponds to a 180-Gbps off-module full — duplex communication bandwidth. Modules can be inter­connected in different topologies including tree, 3-D mesh, or crossbar. The use of standard interfaces allows standard network switches such as Infiniband and 10-Gigabit Eth­ernet to be used. Finally, a 100 base-T Ethernet connection to the control FPGA is present for out-of-band communica­tions, monitoring, and control.

Commercial machines such as the Cray XD1 (35), SRC SRC-7 (36), and Silicon Graphics RASC blade (37), have a similar interconnect structure to the BEE2 in that they are parallel machines employing high performance micro­processors tightly coupled to a relatively small number of FPGA devices per node. Nodes are interconnected via high speed switches and for specialized applications, such ma­chines can have orders ofmagnitude performance improve­ment over conventional architectures. Switching topolo­gies can be altered via configuration of the switching fabric.

Добавить комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *