Jump to content
Liloz01

Outages 03/06/2020

Recommended Posts

Please find updates to the current situation in this thread.

 

Currently, all servers but CS:GO ZE, CS:GO Surf #4, GMod Hide and Seek and CS:S Surf have been taken offline due to an issue with not being able to connect to databases.

Share this post


Link to post
Share on other sites

All servers are coming back up, however, the cause of the issue was not fixed, or even properly identified. All servers may go back down, we will let you know when they are stable again.

  • Like 3
  • Thanks 1

Share this post


Link to post
Share on other sites

Update

Unfortunately, I was asleep when all of this was going down. This was likely not due to any filters put in-place recently on certain POPs. Services weren't able to connect due to not being able to resolve the needed host names. It's hard to say what the exact cause was since I wasn't here. However, our NYC POP's DC was having a network issue around the time this started occurring. I know many of our service's traffic would route through the NYC POP and this probably included DNS.

 

With that said, there was a (D)DoS attack at this time as well (what a surprise), but this only lasted a couple minutes and the traffic wasn't being forwarded to the game server machine based off of the graphs I've seen. The CPU on the NYC POP only spiked to 60% from this and we saw around 500 - 600 mbps inbound from the attack. The outage occurred a bit after the attack as well.

 

I've advised our staff to next time resolve the host names and replace them with the IPs. This should allow us to see if it was only a DNS issue or if the POP (probably NYC) was having actual networking issues and all needed traffic (including DNS) was trying to flow through it.

 

If this happens next, the following troubleshooting steps will need to be attempted:

  • Resolve host names and replace host names with IPs indicating whether this is only a DNS issue or not.
  • If not only a DNS issue, go into the web machine (where database traffic and so on is coming from), and perform an MTR to the network to see which POP the web server routes through.
  • Connect to affected POP and do generic troubleshooting. My guess is the NYC POP wouldn't even be connectable due to the DC outage. 

 

Thanks.

  • Like 4
  • Thanks 1

Share this post


Link to post
Share on other sites

Last Update

I just wanted to provide one last update regarding this issue. As stated in my last reply, the cause to this issue was more than likely the NYC DC having an outage that affected our NYC POP (where traffic was flowing through). Unfortunately, in cases like these, there's not much we can do. However, I'm going to look into BIRD and see if it's possible to turn off BIRD if the network is detected down. This way, if there's an outage of some sort, the POP will stop announcing the network and therefore, no traffic should flow through it.

 

With that said, we have a plan for Compressor V2 that'll result in outbound traffic from the game server and not from Steam or back to other players not having to go through the Anycast network. This'll be a lot more reliable, but won't be ready until Compressor V2 (or whatever name we go with).

 

In the meantime, I'm going to be looking into BIRD option.

 

Thank you!

Share this post


Link to post
Share on other sites

Guest
This topic is now closed to further replies.

×
×
  • Create New...