SGI® Management Suite

Provisioning, System Health Management and Power Resource Management Tools

SGI Management Center

High Speed Provisioning
SGI Management Center combines discovery of cluster nodes and multicast provisioning to significantly shorten bare-metal provisioning time of large scale clusters. Provisioning uses the system configuration file to locate each server in the clustered system that requires a copy of the operating system. Multicast technology provisions servers in parallel - significantly decreasing downtime during maintenance periods.

Benefits include:
  • Systems can be installed or updated in a matter of minutes instead of hours or days.
  • Single provisioning session at large scale - no need to break up cluster
  • Archive multiple Linux operating system images that can be quickly provisioned on-demand

Version Controlled Image Management
Version-controlled image management is built into SGI Management Center. It tracks the changes to the Linux® operating system over time. These images in RPM format are easily deployed onto servers in the system. If problems arise after an operating system upgrade, the system can easily be returned to a known working states.

Benefits include:
  • Run different Linux operating system versions on compute nodes to support users' application requirements
  • Reduce the risk of upgrading to a new Linux operating system
  • Easily revert back to the previously working software image if issues occur

System Health Monitoring and Management
System administrators depend on regular system health updates in the datacenter. SGI Management Suite collects health status information from the hardware system, identifies changes that require action and provides proactive solutions to specific problems.

SGI Management Suite provides all-around coverage with 24x7 monitoring of the hardware system.

System Health Monitoring
SGI Management Suite offers two methods of system monitoring which can be enabled together or independently. SGI Management Center monitors the system and gathers health information from the system logs alerting the system administrator of issues that need attention. SGI Remote Services also monitors the system logs for alerts that are immediately sent to SGI Technical Support for analysis, action, and communication with the customer's system administrator.

  • Predictive Failure Analysis and Proactive Action for Memory
    Memory errors can occur with little warning and failures can cause unplanned downtime and lost productivity for production systems. To avoid system downtime, SGI developed predictive failure analysis for memory which monitors the logged memory status data and retires the 4k memory page containing the failure. The page retirement system will move any user programs or data from the failing page to a new memory page. All of this is done without any interruption or shutdown to the production system.

  • Power Resource Management
    SGI Power Management collects power measurements on individual servers including cluster nodes, and provides flexible reporting of the power measurements in kilowatts by node, rack, and total by system. Limiting power or "power capping" is a key feature of SGI Power Management. Power can be limited on a per-node basis for all nodes or for a specific set of nodes in a system. This enables finer-grained control over the power usage across the system.

    Benefits include:
    • Accurately measure and predict power usage for better capacity planning
    • Cap power before datacenter limits are exceeded
    • Prevent servers from overheating in unexpected circumstances