Performance Co-PilotTM
Monitoring and Managing System-Level Performance Performance Co-Pilot is an exciting family of products from SGI that delivers system-level performance monitoring and management services. Performance Co-Pilot is designed for both operational monitoring (tactical performance management) and the in-depth analysis that is needed to understand and manage the hardest performance problems in our most complex systems (strategic performance management). Performance data can be collected by Performance Co-Pilot from the hardware, the operating system, layered services, end-user applications, the network, and distributed solution architectures. Performance Co-Pilot tools process this information to provide a complete picture of those factors influencing resource utilization, bottlenecks, and end-user performance, delivering an unprecedented power and flexibility in the ways you may view and manage the performance of your system. Features
The Performance Co-Pilot Product Overview provides a shorter description of the product capabilities. Product Positioning and Background Performance Co-Pilot is an exciting family of products from SGI that provides a suite of tools that cooperate to deliver system-level performance monitoring and management services. Performance Co-Pilot is designed for both operational monitoring (tactical performance management) and the in-depth analysis that is needed to understand and manage the hardest performance problems in our most complex systems (strategic performance management). The focus of Performance Co-Pilot is on system-level performance, where the contributing factors may span multiple areas, namely:
Given the diverse ownership of the areas and the radical variations in application mixes and operational environments between systems, Performance Co-Pilot has been designed to deliver a rich and powerful collection of services, data collection tools, performance metric delivery infrastructures, and tool kits that can be deployed, configured, extended, and customized to meet the performance needs of individual customers and sites. Performance Co-Pilot collects and makes available low-level performance data from the hardware and the operating system, and more abstract or application-specific performance data from layered services (such as domain name servers, Web and e-mail servers, and RPC servers), environmental monitors, Cisco® routers, response-time probes, and other "interesting" processes (e.g., those making excessive resource demands). Libraries, tools, debuggers, and source-code examples are provided to encourage new agents to be developed and deployed to export performance from end-user applications and quality-of-service probes. Performance Co-Pilot is targeted at those with an interest in overall system performance: performance analysts, benchmarkers, engineering developers, database administrators, capacity planners and system administrators.
Uniform Naming and Access to Performance Metrics The Performance Co-Pilot protocols and interfaces provide an abstraction that hides all of the implementation details from multiple domains of performance metrics (e.g., where the performance data comes from, who owns it, and how it was collected). Metadata describing the format, interpretation, units, and scale of the data is also provided so that the data semantics can be discovered at run-time and the semantics may change over time without requiring changes to the applications that process the performance data. At the lowest level, performance metrics are collected and managed in autonomous performance domains, e.g., IRIX®, a Web server, or an end-user application, and the Performance Co-Pilot infrastructure reflects this with independent Performance Metric Domain Agents (PMDAs or plugins) for each domain, as shown below.
The Performance Metrics Collector Daemon is a message routing server, accepting requests from the client monitoring tools, forwarding the relevant components of the request to each PMDA, co-ordinating the responses from the PMDAs, and sending a single reply to the client tool. Flexible Logging and Retrospective Analysis Often, performance analysis is expedited when it is possible to compare today's end-user performance, activity levels, and resource utilization against the same information from yesterday, last week, or last month. This form of retrospective playback is most useful in problem analysis, hypothesis evaluation, remote diagnosis, and capacity planning.
A universal replay mechanism (modeled on a VCR paradigm) is used by most Performance Co-Pilot tools to provide "stop, seek, rewind, and replay at variable speed" processing of historical performance data . The requirement for uniformity also leads Performance Co-Pilot to treat real-time and historical sources of performance data as interchangeable and semantically equivalent. A set of scripts and control files combine to provide integrated management of the process of collecting Performance Co-Pilot archives, including automatic starting and monitoring of the logger processes, daily log rotation, log culling, log merging and extraction, and flexible deployment of the logs and logging processes across multiple hosts. From a purely pragmatic viewpoint, a single workstation must be able to concurrently monitor the performance of multiple remote hosts. At the same time, a single host may be monitored from multiple remote workstations. Performance Co-Pilot uses a classical "client/server" architecture to provide seamless and concurrent access to performance metrics, independent of their host locations. In this way, Performance Co-Pilot enables centralized performance monitoring and management for highly distributed application deployments. Performance Co-Pilot provides an inference engine that evaluates a set of assertions against a time-series of performance data collected in real-time or from one or more Performance Co-Pilot archives. For those assertions that are found to be true, the inference engine is able to print messages, activate alarms, write syslog entries, and launch arbitrary programs. Typical use of automated reasoning about system-level performance might include:
Visualization of Exported Performance Data
Performance Co-Pilot provides libraries, source code examples, tools, and debuggers that encourage the development and integration of new sources of performance metrics as peers into the collection infrastructure. In the simplest case, agent development involves no more than writing a single function in C to instantiate metric values on demand, with all communication, protocol handling, and administrative services delegated to Performance Co-Pilot libraries. Production-quality agents have been routinely developed in a matter of a few hours, and source code examples are included in the Performance Co-Pilot distribution. For applications where source code is available, another Performance Co-Pilot library supports a simple API for collecting measurements of application activity and aggregate elapsed time for arbitrary operations. The library automatically arranges for the collected data to be exported to the Performance Co-Pilot framework using a purpose-built domain agent (the trace PMDA). | |