Jump to content
 Share

Roy

Dallas PoP Downtime

Recommended Posts

Hello everyone,

 

At 1:47 PM CST, our Dallas PoP server went offline. Since our physical machines were routing to this PoP server, all of our game servers on the Anycast network (92.119.148.0/24) went offline. Three or so minutes later, everything came back up. The server was restarted (not by any of us, though).

 

I inspected the boot and system logs to see if there was anything suspicious, but I wasn't able to find anything that would have caused the machine to go down or reboot. There is also no monitoring on the server itself.

 

However, there was monitoring enabled for the node it was hosted on. Our hosting provider sent us an email around two hours after the event stating the node the server was hosted on was having issues and after investigation, they restarted the specific node (which restarted all of the instances including ours on the specific node).

 

Here is the email of the report:

 

Quote

Dear Customer,

 

Regarding the following subscriptions: 


xxxx MB Server - xxx.xxx.xxx.xxx (xxxxx) in Dallas

 

Our monitoring system indicated an issue with the hardware node hosting the instances listed in this email. Our engineering team has investigated the issue and initiated a restart of the host node in question.

 

Please note: While this event rebooted the instances listed in this email, we expect no impact on data and/or configurations.

 

Thank you,


Vultr.com Support

 

This issue was on our hosting provider's side, but things seem to be okay now.

 

I apologize for the inconvenience and thank you for understanding.

Share this post


Link to post
Share on other sites


In addition to the above, we experienced another period of timeouts from the Dallas PoP. The server was accessible and I wasn't able to pin down the issue at the time (I wasn't in the best mindset, lol).

 

I stopped announcing Telia on the Dallas PoP which resulted in getting routed to the LA PoP and then to Chicago.

 

I'll be announcing Telia again tomorrow at some point.

 

Unfortunately, I haven't seen any updates about the timeouts that occurred earlier.

 

Thank you for understanding.

Share this post


Link to post
Share on other sites




×
×
  • Create New...