Jump to content
Sign in to follow this  
Roy

Yet Another Issue with Vultr's Dallas PoP...

Recommended Posts

Hello everyone,

 

Around an hour and a half ago, we started experiencing yet another major issue with Vultr's Dallas network. Our physical hosting provider (Nexril) started routing to the New York City PoP server instead of the PoP server located right in Dallas, TX. I also noticed my home network's route has changed from Dallas, TX to NYC. After trying to mess with Vultr's BGP communities on the NYC and Dallas PoPs to see if I could get it back, I witnessed there was little to no traffic on our Dallas PoP:

 

466-05-19-2019-aJgJYEoG.png

 

Therefore, I decided to stop Compressor on the Dallas PoP and run a tcpdump for our Anycast network. There was one poor client who I believed was connected to one of our game servers through the Dallas PoP still (assuming by the amount of port 27015 packets they were sending, which was nearly 30 - 50 per second). They more than likely timed out during this session twice :( Anyways, here's the output of the tcpdump command while excluding the client mentioned before:

 

roy@da01:~$ sudo tcpdump -i any net 92.119.148.0/24 and not host xxx.xxx.xxx.xxx -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
06:08:45.369813 IP xxx.xxx.xxx.xxx.55013 > 92.119.148.239.5984: Flags [S], seq 2175052956, win 65535, length 0
06:08:58.826801 IP xxx.xxx.xxx.xxx.55028 > 92.119.148.10.27015: UDP, length 25
06:08:59.549474 IP xxx.xxx.xxx.xxx.55028 > 92.119.148.19.27015: UDP, length 25
06:08:59.862646 IP xxx.xxx.xxx.xxx.55028 > 92.119.148.30.27015: UDP, length 25
06:09:04.985738 IP xxx.xxx.xxx.xxx > 92.119.148.33: ICMP xxx.xxx.xxx.xxx udp port 7777 unreachable, length 70
06:09:28.129399 IP xxx.xxx.xxx.xxx.46212 > 92.119.148.38.5984: Flags [S], seq 209625597, win 65535, length 0
06:09:54.208208 IP xxx.xxx.xxx.xxx.61494 > 92.119.148.56.53: 37944+ A? 153edd71.openresolverproject.org. (50)

As you can see, we only received 7 total packets in the span of over a minute. Other PoP servers normally have 400 - 500+ packets at this point from what I've seen.

 

I've submitted a ticket to Vultr about this and requested this to be looked into ASAP. We will be moving off of Vultr once we acquire our own ASN in most locations more than likely due to a number of reasons (I'll write about this more in the next big network update). Once we do, we shouldn't have this issue again because we'll have a PoP server with the same hosting provider as the physical hosting provider.

 

This is more than likely causing an additional 50 - 100+ms ping for clients connected to our game servers under our Anycast network.

 

I've also submitted a ticket to Nexril regarding this to see if there's anything they can do. However, I highly doubt it considering there's only a few hosts that is routing to our Dallas PoP server right now, LOL.

 

I do apologize for the inconvenience this has caused. For whatever reason, Vultr's NYC PoP was experiencing packet loss as well from the network (thanks, Vultr). Therefore, all of our clients were also experiencing this as well. This is a still a new network to us and there are many upgrades we need to make to it in the future (all of this will start happening once we acquire our own ASN which should be very soon).

 

Thank you for understanding and I will post back here once I receive an update or the issue resolves itself.

  • Like 2

Share this post


Link to post
Share on other sites

We have resolved this issue ourselves. We stopped announcing to Level 3 and Cogent to the New York City PoP. This then resulted in the physical servers routing back to the Dallas PoP. For some reason, this didn't work the night this started occurring (I basically put in the same BGP communities I had from the other night in). Perhaps it may have needed some propagation time. However, usually things are instant like today was. Either way, I don't really like not announcing to Level 3 and Cogent entirely due to them being Tier 1 peers. Unfortunately, we'll have to do this for now until Vultr improves their networking (which, at this point, doesn't seem like a high possibility).

 

I do apologize for the delay on this and as soon as we have our own ASN, we are going to be putting measurements in-place to make sure this issue won't occur again. We've been trying to push Vultr to look into this deeper. However, there wasn't much success with that. I seriously can't wait until we acquire our own ASN :P 

 

Considering most clients that were routing to the Dallas PoP started preferring a route over Level 3 to the NYC PoP, this was ultimately an issue with Vultr's Dallas networking.

 

Anyways, things should be good now.

 

With that said, there is currently a major issue with our GS09 machine. I will be posting about that shortly.

 

Thank you.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...