Overview of Microsoft® Windows® Compute Cluster Server 2003

Microsoft Windows Compute Cluster Server 2003 provides an integrated application platform for developing, deploying, running, and managing high-performance computing (HPC) applications. Using this platform, individuals and organizations can perform multi-node workload computing using commodity hardware in an environment that will shorten their time to insight. The combination of this powerful cluster platform with SGI® Altix® XE high performance clusters provides customers with the one of the industry's most advanced cluster solution available today.

HPC is increasingly being achieved with clusters of industry standard servers that can range from a few nodes (individual computers) to hundreds of nodes. Wiring, provisioning, configuring, monitoring, and managing these nodes and providing appropriate, secured user access is a complex endeavor that often requires costly support and administrative resources. Because users typically spend more time on cluster administration and management tasks than on running jobs, organizations experience a loss in productivity, as well. The goals of Windows Compute Cluster Server 2003 are to simplify management and reduce the total cost of ownership (TCO) of compute clusters, making them accessible to a broader audience.

Based on these goals, Windows Compute Cluster Server 2003 has been designed to be intuitive to administer and manage. Its installation and system configuration processes are fully prescribed and largely automated. In addition, users will probably be familiar with the standard Windows features that it includes for deploying and managing clusters remotely. For example, because Windows Compute Cluster Server 2003 is fully integrated with the Microsoft Windows Server System™ solution stack, users can also benefit from the advanced management technologies available in the Active Directory® directory service and in Microsoft Operations Manager (MOM). Users familiar with the Windows Server™ platform can become productive faster.

Users whose work demands HPC solutions also require applications that execute complex computations and elaborate data output. Microsoft has worked with independent software vendors (ISVs) to port applications to Windows Compute Cluster Server 2003 that serve several markets, such as manufacturing, life sciences, geological sciences, and financial services. To help deliver on the promise of usability, a full-function Job Scheduler is provided which enables comprehensive job management through a Job Manager user interface (UI) or through a command line interface (CLI).

Windows Compute Cluster Server 2003 supports the execution of parallel applications based on the Message Passing Interface (MPI) standard. Users can take advantage of the enhancements in Microsoft Visual Studio® 2005 aimed at parallel computing, including support for the OpenMP standard and a parallel debugging capability that supports MPI.

When a user submits a job to the cluster, the job is recorded in the head node database along with its properties, entered into the execution queue, and then run when the resources it requires become available. Because the cluster is in the user's Active Directory domain, jobs execute using that user's permissions. As a result, the complexity of using and synchronizing different credentials is eliminated, and the user does not have to use different methods of sharing data or compensate for permission differences among different operating systems. This means that Windows Compute Cluster Server 2003 offers transparent execution, access to data, and integrated security technologies. Windows Compute Cluster Server 2003 is made up of the components listed in Table 1 and is deployed by installing two CDs. CD1 contains Windows Server 2003, Computer Cluster Edition. CD2 contains the Compute Cluster Pack.

Components of Windows Compute Cluster Server 2003

Component Description
Active Directory directory service Each node of a cluster must be a member of an Active Directory domain, because Active Directory provides authorization and authentication services for Windows Compute Cluster Server 2003. The domain can be independent of the cluster; for example, with a cluster running in a production Active Directory domain. Alternatively, it can run within the cluster, on the head node in scenarios where the cluster is a production environment.
Head Node Provides user interfaces (UIs) and management services to the server cluster. The UIs include the Compute Cluster Administrator, the Compute Cluster Job Manager, and a command line interface (CLI). Management services include job scheduling, job and resource management, and Remote Installation Services (RIS). The head node can also serve as a network address translation (NAT) gateway between the public network and the private network that make up the cluster.
Compute Nodes Computers configured as part of a compute cluster that provide computational the resources users to run jobs. Compute nodes can only be created on computers running a supported operating system, but nodes within the same cluster do not have to run the same operating system or use the same hardware configurations. However a similar configuration simplifies deployment, administration, and (especially) resource management. Tasks. Those with different hardware configurations will limit the cluster's capabilities, because jobs running in Parallel mode and requiring nodes of different capabilities will be able to run only at the speed of the slowest processor in the selected nodes.
Job Scheduler A service that runs on the head node and manages the job queue, resource allocation, and job execution by communicating with the Node Manager Service that runs on each compute node. MS-MPI Software A cluster's key networking component that can use any Ethernet connection supported by Windows Server 2003, as well as low-latency and high-bandwidth connections such as InfiniBand or Myrinet. Gigabit Ethernet provides a high-speed and cost-effective connection fabric, while InfiniBand is ideal for latency sensitive and high-bandwidth applications. MS-MPI supports several networking scenarios.
Management Infrastructure The Compute Cluster Pack offers a complete management infrastructure that enables the cluster administrator to deploy and manage compute nodes. This infrastructure consists of the cluster services running on the head node and all compute nodes, providing the administrative, user, and command-line interfaces used to administer the cluster, submit jobs, and manage the job queue.
Compute Cluster Administrator and Job Manager Interfaces used by cluster administrators and users for cluster operations, job submissions, and management. The Compute Cluster Administrator is used to configure the cluster, manage nodes, and monitor cluster activity and health. The Job Manager is used for job creation, submission, and monitoring.
Command Line Interface (CLI) The Compute Cluster Pack offers a CLI for node and job management. These operations can also be scripted. Administrators can use the CLI to automate job, job queue, and node operations.
Public and Private Networks Compute nodes are often connected to each other through network interfaces. To manage and deploy nodes, administrators can configure compute clusters with a private network. They can also use a private network for MPI traffic. This traffic can be shared with the private network used for management, but the highest level of performance is achieved with a second, dedicated private network that supports only MPI traffic.

About MPI, MPICH, and MS-MPI
MPI is a standard API and specification for message passing. It was designed specifically for HPC scenarios executed on large computer systems or on clustered commodity computers.

The MS-MPI is a version of the Argonne National Labs Open Source MPICH2 implementation that is widely used by existing HPC clusters. MS-MPI is compatible with the MPICH2 Reference Implementation and includes a full-featured API with more than 160 function calls.

More Information: