Operation and maintenance process management tool
Publish change process management tools: as a system interface to work with other roles. It also provides an approval link to control the risk of posting changes. The process management tool is not responsible for the execution of specific business operations, but only as a document system to track processes and ensure closed loop.
Alarm and Burst Management Tools: Automatic ticket creation management that reflects business damage. Upgrade to a sudden order after manual confirmation. The KPI is provided by building a single management alarm and a closed loop of the burst assurance process, as well as the ability to summarize the experience for each failure and not measuring the availability of the business.
Operation and maintenance release change tool
Version Management Tools (Database): All releases should start with version management. The version package developed by the company is first imported into the version management tool, and then distributed from the version management tool to the current network. Put an end to rsync one server to release another.
Configuration Management Tool (Database): The version plus configuration is equal to the state of each machine on the live network. The most coarse-grained configuration management is to the IP level, which is equivalent to asset management for machines, grouping into different business concepts such as services, modules and regions. Fine-grained management of processes and related configurations of processes.
Configuration and version delivery tools: Deliver the specified version and the configured configuration to the existing network. Different versions and configuration methods require completely different delivery methods. The delivery method represented by ssh/fabric is script-centric. The delivery method represented by puppet/chef is configuration-centric.
The current network status synchronization tool: In order to avoid the state network drift, it is inconsistent with the records in the management tool. A tool is required to report the actual status of the live network at regular intervals.
Service Scheduling Tool: Posting changes often requires a serial process, first doing the A module, then doing the B module. In many machines, concurrent operations need to be performed concurrently, and concurrent operations cannot be performed in parallel. At the same time, many release change processes require services outside the scope of operations management, such as cloud server records in the cloud. This requires a service scheduling tool unified scheduling configuration and version delivery tools, process documentation tools, and other system API interfaces to be assembled into a process.
Resource management and isolation tools: The tools represented by xen/kvm allow operation and maintenance to cut resources more flexibly. For example, the virtual machine starts and stops quickly, and the ip drifts in the idc. The tool represented by lxc/docker allows the operation and maintenance to further cut resources to the process level. The fine-grained resource control of the Resource Isolation Agent allows for better resource utilization and easier scalability for resource provisioning.
Publish change unified interface: wrap all the underlying tools, provide a simple interface to complete standardized release changes.
Operation and maintenance monitoring alarm tool
Collection tools: generally collect log files, or you can periodically poll the DB or other system interfaces. The popular open source solution is logstash.
Collection tool: The collection tool is reported to the collection tool. Or directly modify the code reporting indicator to the collection tool by development. The open source solution for the process is still logstash.
Statistical warehousing tools: The reporting may be reported once per call, and the statistical tool is responsible for counting the number of times in a minute. The report may also report the value every 5 seconds, and the statistical tool is responsible for counting the maximum value in one minute. The existence of statistical tools is for the convenience of reporting. The popular open source solution is statsd, and there are also large companies based on storm for secondary development.
Time series database: All timing indicators will fall to the database. The database needed to monitor alarms needs to be able to support very large amounts of data, but there are no strict ACID requirements.
Operation and maintenance event database: Record all alarms. This includes obtaining alerts from other systems and recording all changes to the live network. These data are used to support the cause of the alarm.
Indicator anomaly detection tool: Based on the mathematical model to find out whether the indicator deviates from the past stable mode, and speculates that the network state changes.
Dial-up tool: Timed PING or HTTP GET, simulates whether the actual user finds that the service is interrupted and generates an alarm. At the same time, indicators are also reported to the collection system. The dial test is divided into local dial test and remote dial test. Local dial tests can be used to discover local alarms such as disk read-only. The remote dial test can simulate the geographical distribution of the user, and the link status of the network is also included in the range of the dial test coverage.
Alarm Convergence Tool: Synthesize alarms from all sources for frequency convergence and root cause analysis. Unified aggregation into reports urges manual repair.
Automatic alarm repair tool: accepts alarms for automated processing. The maintenance dimension completes the operation of the fixed faulty machine off the shelf. Or, in the case that the service itself is not highly available, the faulty machine replacement, ip drift and other existing network repair operations are performed to improve the service availability to a certain extent.
Alarm Notification Tool: Important alarms need to be upgraded to phone. Need to have high-availability phone, SMS, WeChat and other notification interfaces.
Unified interface for monitoring alarms: Shields various tools in the lower layer, provides unified agent installation, indicator collection settings, indicator curve display, and alarm query interface. A place knows all the problems of the live network.
Metro Pcb Board,Pcb Board Slot,Metro Pcb Motherboard,Game Machine Pcb
Guangzhou Ruihong Electronic Technology CO.,Ltd , https://www.callegame.com