hebergement.jpg

System Monitoring

A set of probes specific to your web applications

Contact-us

Each server EMENCIA deploys has a set of database probes (network connectivity, use of the CPU, hard drive and memory).

When the Customer’s applications are deployed, EMENCIA puts in place a set of probes that are specific to these applications: for example, when installing a Java application server, these probes will check whether the number of processes and threads, as well as the memory and processor occupancy specific to the application server, in fact fall within an acceptable range. For an SQL server, a specific probe will regularly ensure that a connection to the SQL server is indeed possible, as is submitting a basic query within a reasonable timeframe.

Installation of specific probes

At the time of implementation, EMENCIA deploys a set of probes that constantly check:

 

  • Network connectivity (PING response) of virtual machines assigned to the project
  • The availability of the PostgreSQL database
  • The proper operations of the application stack by checking the HTTP response code for a predefined URL set (e.g., site home pages and the result of a search form or a dynamic page that queries the database)

All your application components are monitored in real time by the monit system, to warn of any impact.

Also, all the application components are monitored in real time by the monit system. If an incident is identified, an email is automatically sent to EMENCIA’s NOC and the Customer (if requested). For some types of incident, monit may undertake corrective actions to restore proper service operations. If the consumption of resources (processor or memory) exceeds critical thresholds, monit can cut or restart certain services in order to protect the proper operations of the main elements. For example, if one of the sites is considered secondary, it can be cut if resources are saturated to maintain optimal service operations on the main sites.

In addition to these monitoring and self-healing features, monit offers a general overview of the services (see the screen shot below).

Monit Service Manager

Munin, System an Network Monitoring

MUNIN is an open source system and network monitoring tool with a GNU general public license that uses the RRRDtool (database recording and graphics system) and its framework is written in Perl.

It memorizes everything it has seen on the network and then presents this information in the form of graphics available via a web interface. A sample screenshot from the server’s weekly usage memory is set out below.

This tool allows for easy monitoring of the performance of the system, network and applications. It helps determine the time at which a performance issue arises.

For more information

PINGDOM Probe

Emencia offers the configuration of a PINGDOM probe on the site to be monitored, with a text (SMS) alert. Pingdom provides an application monitoring service and helps measure the availability of applications and services from a number of geographic points.


Il offers :

  • An average application response time
  • Type of checking available: HTTP, HTTP CUSTOM, TCP Port, Ping, DNS, UDP, SMTP, POP3, IMAP
  • Uptime
  • Performance graphics
  • Configurable email, twitter and text alerts in the event of incidents
  • Incident log details.


Monitoring Supervisord

Emencia deploys a proven open source monitoring solution, Supervisord, which is used by many companies and organizations. Our monitoring system is hosted on a dedicated server. This system allows for the monitoring of servers and related services.

The verification system for the proper service or server operations can be configured so that it proactively responds if a problem occurs. The use of scripts allows for an action to be taken for a service, for example to automatically restart a defective web server.

For more information


Nagios Monitoring

In addition to the onboard monit service, Emencia offers monitoring with the Nagios service. Nagios is an application that allows for system and network monitoring. It monitors the specified hosts and services, issuing alerts when systems work poorly and when they improve. It is free GPL-licensed software.

This is a modular program that is broken down into three parts :

  1. The application engine that schedules the monitoring tasks
  2. The web interface, which allows for an overview of the information system and potential anomalies
  3. The plugins, close to a hundred mini-programs that can be added in accordance with the needs of everyone to monitor each service or resource available over all of the information system’s computers or network elements.

For more information on Nagios :
www.nagios.org
wikipedia

EMENCIA’s monitoring includes queries to the customer’s web platform every two to five minutes in order to ensure it is operating properly. Once a problem is identified, a warning is recorded, then the verification is launched every minute. At the end of three unsuccessful verifications, an alert is sent by email and text message to the contacts the customer has declared, as well as to EMENCIA’s technical team, which intervenes in order to find a solution.

In the example above, this system allows for the hardware standby team (authorized to work on the network) to be alerted without unnecessarily involving the software standby team (authorized to work on the web server).


Alert Escalation

Image - server.png

When the Nagio system identifies an anomaly affecting a critical service that has not been corrected by monit’s self-healing procedures, an alert is sent by email and text message to EMENCIA’s team (as well as to the Customer for information of the change in status). EMENCIA’s technicians then initiate the actions required to restore the proper operations of the systems. These actions may range from a simple manual rebooting of a service to the execution of more complex procedures.

Some interventions may impact services other than the one at fault. For example, if a disk is full, temporary files may need to be erased or a database may need to be cleaned. If this occurs, EMENCIA informs the Customer for its approval or additional instructions before going further.

If the technical team does not resolve the alert, escalation is triggered in order to alert EMENCIA’s Level 2, as well as the emergency contacts designated by the Customer. The Customer may indicate, department by department, whether its teams should be contacted outside of normal working hours (so as to avoid unnecessarily causing actions on the Customer’s side for a problem that is not urgent.)

Do you have a project? Contact-us