NOTICE TIME ZONE Australia Brisbane GMT+10
STATUS (Open/Closed) Closed
INCIDENT START DATE 20200528
INCIDENT START TIME (HH:MM) 13:05
OUTAGE DURATION 20 Minutes
ESTIMATED TIME TO RESOLUTION Resolved
XCRM TICKET NUMBER Not Applicable
BRAND XRACK
PRIORITY P2
CUSTOMERS AFFECTED Approximately 4 Clients X32, X67, X73 & X98
DESCRIPTION OF INCIDENT Multiple virtual machines lost network connectivity. Intial findings thought to be a faulty NIC, the redundant NIC was tried but the hyper-visor locked up preventing changes. A reboot of the server was performed and services resumed resulting in a 20 minute outage.
DESCRIPTION IMPACT
  • Primary Effect
6 Vitual machines lost network connectivity through the primary network port
  • Secondary Effect
Access to the VMs and the services they run resulted in disconnection for some users
  • Residual Effect
None expected once the problem is resolved
EVENT TIMELINE
  • 13:05
PRTG Alerts indicated a problem numerous client services
  • 13:06
NOC staff alerted senior technicians of services offline
  • 13:11
Technician Luke commenced identification of primary cause
  • 13:12
Technician Luke identified cause to be isolated to one server and specifically the network interface for customers
  • 13:13
Technician Luke attempted to migrate a virtual machine over to redundant network interface
  • 13:17
Technician Luke found changes were hanging and not applying due to the hypervisor entering a hung state
  • 13:18
Technician Luke initiated a restart of the server
  • 13:23
Technician Luke confirmed server was back online and accessible
  • 13:25
Machine was fully operational again and all services resumed their normal operation
RECOVERY & RESOLUTION XSTRA identified issue was related to the communication between the network interface and the hypervisor of the host server. A reboot allowed the hypervisor to return to normal operation
ROOT CAUSE Hypervisor entered an hung state resulting in no communication between the virtual switch and the physical network interface
CORRECTIVE & PREVENTATIVE MEASURES Remove the host from production and perform routine maintenance and updates

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Please do not use this for support questions.
https://www.manula.com/manuals/xstra/the-xstra-resource-library/1/en/topic/welcome

Post Comment