Jump to content
 Share

Roy

[C] IPIP Mapper + Why It'd Be Useful For GFL's Anycast Network

Recommended Posts

Hey everyone,

 

I just wanted to share the latest project I started working on last night. Now keep in mind, this is not fully tested yet. However, it should work theoretically. The only thing I'm uncertain about is whether the BPF map will be shared between both TC programs which is required in this case. If it isn't, I believe I'll need to build libbpf, include the objects in the loader, and implement changes to the userspace/loader program.

 

Why This Would Benefit GFL's Anycast Network + Description/Overview

What this project does is allows multiple remote IPs to be used on the IPIP endpoint. I made this in preparation to possible changes I'll be pushing to GFL's Anycast network at some point in the future. However, my future is undetermined at the moment. Therefore, I don't know if I'm going to be making changes to the GFL Anycast network directly or not.

 

Anyways, the main issue we've faced with our current Anycast network setup is the fact that the endpoint IPIP tunnel only supports one remote IP. Since this needed to be set to the Anycast IP, it resulted in all outbound traffic from the game server (through the IPIP tunnel) to route to the closest POP server on our Anycast network. This resulted in one of our POP servers getting pegged with a lot of traffic. A work-around I made was my IPIP Direct program that sent traffic back to the client directly from the game server machine. While this was a nice work-around, the game server machines would need to support sending traffic sourced as the Anycast network (which in a lot of cases was considered spoofing technically) and it also wouldn't go back to the client within the same network it came (this could be a pro or con, but still rather inconsistent).

 

The solution to this, which I've brought up in my Bifrost updates made in this forum, was to create a "driver" that looks for all incoming IPIP packets, and maps the client IP (inner IP header's source IP address) to the remote/POP IP (outer IP header's source IP address). I decided to make this using TC (Traffic Control) which is the same hook the IPIP Direct program uses. Since TC also utilizes BPF (similar to XDP), I could use BPF (key => value) maps to map the client and remote IPs. Since the TC hook is within the Linux kernel, the packet doesn't get pushed to the user space enabling us to modify the packet directly within memory from the TC programs themselves and the packet won't be copied to the user space (better for performance).

 

The TC program (ingress) that maps the client IP and remote IP may be found here. The TC program (egress) that changes the outgoing IPIP packets and replaces the remote IP if the client IP is mapped to one may be found here.

 

If the client IP isn't mapped, the packet is passed along unchanged.

 

This implementation would come with quite a few pros and only one con which may be found below.

 

  • [Pro] Since outgoing packets destined for players/clients would go back to the same POP it came from, this would result in the traffic staying within the same network. Therefore, this would be more consistent.
  • [Pro] The game server machine would not need to source as the Anycast network (considered spoofing for some hosting providers since we're not announcing our IP blocks with them).
  • [Pro] We'd be able to implement more consistent filters since this would allow us to verify server-side responses from the game server machine.
  • [Con] Each POP would now start seeing outbound traffic resulting in higher CPU usage.

 

Implementation/Preparations Needed For This Change

Unfortunately, there are preparations that would be needed for this change and downtime would definitely be associated. Assuming we had a script to update everything needed and things worked, a server restart would be all that's needed.

 

At the moment, IPIP traffic from our POP servers to our game server machines look like the following.

 

(Outer IP Header) Anycast IP => Game Server Machine IP (Inner IP Header) Client IP => Game Server Internal IP

 

The internal IP of the game server is something we assign within the LAN IP ranges. It doesn't matter what LAN IP the server is assigned to as long as all of the POPs point towards the same one. This is the /32 IP the endpoint IPIP tunnel/interface in each Docker container/network namespace is assigned to.

 

For this project to be implemented, we'd need to change Compressor (our current packet processing software) or ensure we're using the following format with future packet processing/filtering software we make.

 

(Outer IP Header) Remote/POP Multicast IP => Game Server Machine IP (Inner IP Header) Client IP => Anycast IP

 

With that said, we'll need to modify our endpoint scripts on the game server machine themselves such as Docker Gen to ensure we're assigning the Anycast IP address itself to the endpoint IPIP tunnel/interface. This will be replacing the internal IP. I don't believe there will be any issues with using the Anycast IP as the default remote IP when sending outbound IPIP traffic.

 

That's really about it, though. This would be a huge step forward for us and I'm glad I finally started working on these programs.

 

GitHub Repository

Share this post


Link to post
Share on other sites


Hidden

My major question is: How much more CPU would be utilized for this to work? I can imagine it'll be costly if it's covering all of our servers. But if it isn't TOO costly (so you don't gotta upgrade the processor, which in itself will be expensive as all hell), it would definitely be worth it.

 

Also is the driver written in C? 🙂


I write programs and stuff.

 

If you need to contact me, here is my discord tag: Dustin#6688

 

I am a busy person. So responses may be delayed.

1840045955_Thicco(1).thumb.png.87c04f05633286f3b45b381b4acc4602.png

 

Share this post


Link to post

1 hour ago, _Rocket_ said:

My major question is: How much more CPU would be utilized for this to work? I can imagine it'll be costly if it's covering all of our servers. But if it isn't TOO costly (so you don't gotta upgrade the processor, which in itself will be expensive as all hell), it would definitely be worth it.

 

Also is the driver written in C? 🙂

Good question! The additional packet processing would be the traffic the game server is sending back to the client. I haven't inspected packet captures in both directions yet, but according to net_graph 4 when playing on GFL's CS:S BHop server, the server is sending slightly more data than the client is sending to the server. However, this is the bandwidth itself and not how many packets per second which is likely going to matter the most in this scenario. The PPS matters the most because Compressor would be decapsulating each IPIP packet which takes CPU cycles.

 

My guess is the CPU usage on each POP would increase by two times which is obviously quite a bit, but we can easily scale out from here. Additionally, we also perform a lot of filtering checks on incoming packets from the client on the POP server, so the current average CPU usage is probably higher from that as well.

 

With that being said, this is not our biggest bottleneck in terms of CPU usage on our POP servers. As of right now, Compressor's XDP program is attaching within the SKB/generic hook which is A LOT slower than DRV mode. The DRV hook occurs within the NIC driver itself whereas SKB mode/hook occurs after SKB allocation which has similar performance to the Netfilter hook (IPTables and NFTables). The reason we aren't using DRV mode right now is because the AF_XDP sockets break when doing so which is responsible for A2S_INFO caching and responses altogether. I explain a lot about my findings on this here (under the "Handling Cached Packets" section). At first, I thought it was due to Compressor using outdated AF_XDP code, but after creating a test AF_XDP program here based off of the AF_XDP code from the official xdp-tutorial repository here, I still encounter the same exact issue. I'm convinced I'm doing something wrong, but I'm really not sure what or perhaps it could be something related to the virtio_net driver. This is some pretty complicated code and requires extensive knowledge in kernel networking within the Linux kernel in my opinion. I'm going to figure out eventually and if I can't, I'm likely going to try another attempt at the mailing list (I already tried many months ago here). AF_XDP sockets are so complicated at times 😞

 

Additionally, with this change, I could also try moving the caching management into the XDP code itself instead of using AF_XDP sockets. While AF_XDP sockets are fast because XDP creates a fast path within the Linux kernel to them along with zero-copy support, they are quite complicated and aren't as fast as performing everything inside of the XDP program itself. If we were to do this, we would eliminate the need of a Redis server and each POP would have its own A2S_INFO cache for each game server inside of BPF maps. This does come with a draw back, though. With that said, modifying the packet will be tricky within the XDP program due to how strict the BPF verifier is. However, I do believe it should be possible. The hardest part will be shrinking and expanding the packet itself within XDP. Since we'd be modifying the packet directly and sending it back out the TX path, we'd have to use something like the following to shrink or expand the packet based off of how long the original packet is compared to the cached A2S_INFO response.

 

Quote

long bpf_xdp_adjust_head(struct xdp_buff *xdp_md, int delta)

        Description
                Adjust (move) xdp_md->data by delta bytes. Note
                that it is possible to use a negative value for
                delta. This helper can be used to prepare the
                packet for pushing or popping headers.

                A call to this helper is susceptible to change the
                underlying packet buffer. Therefore, at load time,
                all checks on pointers previously done by the
                verifier are invalidated and must be performed
                again, if the helper is used in combination with
                direct packet access.

        Return 0 on success, or a negative error in case of
                failure.

 

The only drawback would be dependent on the time we keep it cached for. Since to my understanding, we can't create a packet within the XDP program, when a response's cache time is up, we'll have to send the A2S_INFO request packet to the game server machine which will result in that client getting the response time from the game server machine instead of a POP. So let's say for example the cache time is 45 seconds, every 45 seconds or so, there's a possibly a client will get the response time from the game server itself instead of the closest POP. I'm going to be checking if it's possible to create an additional packet in the XDP program itself. Another idea I just thought of is when the cache time is up, set a key => value on a BPF map and parse that inside the user space every few seconds. Afterwards, we can create the second packet inside of the user space instead and the XDP program should be able to intercept the response along with handle it. Actually, I like that idea better 🙂 If there's no cached response, it will need to be sent to the game server though, but that should only happen on the first A2S_INFO request per server after the packet processing software is started.

 

Also yes, the TC programs (the driver basically) are written in all C 🙂

 

I hope this clears some things up along with shows my thought process! And also answering your question of course, too.

Share this post


Link to post
Share on other sites


Hidden

Haha I appreciate the detailed response! I wish I could say I understood everything but man, I spent all my time on software development hahaha. But I believe I have a general jist. Good work man. I know Aurora is gonna scold the hell out of me for saying this one, but I always respect a "I'll write this myself" mindset. Being able to create your own libraries, drivers, and processes is super rewarding. Good luck with that stuff man.


I write programs and stuff.

 

If you need to contact me, here is my discord tag: Dustin#6688

 

I am a busy person. So responses may be delayed.

1840045955_Thicco(1).thumb.png.87c04f05633286f3b45b381b4acc4602.png

 

Share this post


Link to post



×
×
  • Create New...