SGI® Management Suite
Provisioning, System Health Management and Power Resource Management Tools
Efficiently deploy and manage all your SGI solutions with one common tool. SGI Management Suite consists of SGI Management Center and SGI Foundation Software, and offers an array of features that can reduce the time and resources spent deploying, running, and administering systems.
Reduce Downtime During Maintenance Periods
SGI Management Center combines discovery of cluster nodes and multicast technology to significantly shorten bare-metal provisioning time of large scale clusters. By simultaneously allowing for both parallel and large scale provisioning, system administrators can install systems in a matter of minutes instead of hours or days.
Enable Users to Run More Jobs Without Interruption
Proactively maintaining system health of your growing HPC environment can be a significant challenge, and can ultimately impact the overall productivity of your users. SGI has an array of features built into SGI Management Suite that assist in the collection of data related to overall system health, identifying changes that require action, and providing proactive solutions to specific problems.
Optimize Power Efficiency and Confidently Operate Within Predefined Power Envelopes
SGI Power Management allows for fine-grained control of the power usage across your entire HPC system. Power can be limited on a per-node basis for any combination of nodes in system, or even at the job level with Altair PBS Professional. This enables administrators to cap power before data center limits are exceeded, and can ultimately help prevent servers from overheating in unexpected circumstances.
Protect Against Memory Failures
Memory errors can occur with little warning and failures can cause unplanned downtime and lost productivity for production systems. To help better avoid system downtime, SGI Foundation Software includes predictive failure analysis for memory, which continually analyzes logged memory status, identifies bad memory, and safely retires any bad memory containing failure. All of this is done without disruption, and the end result is reduced downtime.