Multiple processor systems are an important class of parallel systems. Over the years, several architectures have been proposed to build such systems to satisfy the requirements of high performance computing. These architectures span a wide variety of system types. At the low end of the spectrum, we can build a small, shared-memory parallel system with tens of processors. These systems typically use a bus to interconnect the processors and memory. Such systems, for example, are becoming commonplace in high-performance graph- ics workstations. These systems are called uniform memory access (UMA) multiprocessors because they provide uniform access of memory to all pro- cessors. These systems provide a single address space, which is preferred by programmers. This architecture, however, cannot be extended even to medium systems with hundreds of processors due to bus bandwidth limitations. To scale systems to medium range i. e. , to hundreds of processors, non-bus interconnection networks have been proposed. These systems, for example, use a multistage dynamic interconnection network. Such systems also provide global, shared memory like the UMA systems. However, they introduce local and remote memories, which lead to non-uniform memory access (NUMA) architecture. Distributed-memory architecture is used for systems with thousands of pro- cessors. These systems differ from the shared-memory architectures in that there is no globally accessible shared memory. Instead, they use message pass- ing to facilitate communication among the processors. As a result, they do not provide single address space.