Jump to content
 Share

Roy

[C] XDP Forwarding

Recommended Posts

Hey everyone and happy New Years!

 

I just wanted to share a big project I've been working on recently. It is still a big WIP, but I've made a lot of progress with it and the basic functionality works. I'm actually really happy with this project so far and it has so much potential.

 

I've made an XDP forwarding program that performs basic layer 3/4 forwarding. What's really neat about the program is it does source port mapping similar to IPTables/NFTables to keep track of connections. I haven't seen any XDP/BPF programs do this to date and to use the maximum port number of 65535, I had to make large increases to the BPF verifier limitations. Patches for these can be found inside the GitHub repository. You will need to compile the Linux kernel to get it working with higher max port numbers. My home VM is running it without any issues on a custom kernel when using 65535 as the max port range 🙂

 

I also made a YouTube video demonstrating the forwarding and I showed what happens when you set only one source port per bind address when multiple connections on the same protocol (e.g. UDP) go to the same bind port (packet loss occurs).

 

 

 

XDP Forwarding

 

Here is the README copies from the GitHub repository on December 31st, 2020 (I'd suggest viewing the GitHub repo because I'm sure this will change in the next upcoming weeks).

 

Description
A program that attaches to the XDP hook and performs basic layer 3/4 forwarding. This program does source port mapping similar to IPTables and NFTables for handling connections.

 

The XDP program tries to use DRV mode at first, but if that does not attach properly, it will fall back to SKB mode. You may specify the -o flag (as seen below) to use HW mode.

 

WARNING - There are still many things that need to be done to this project and as of right now, it only supports IPv4. IPv6 support will be added before official release. As of right now, the program may include bugs and forwarding features aren't yet available.

 

Note - Before release, I plan on making benchmarks on the XDP Forwarding program vs IPTables/NFTables. As of right now, I have no benchmarks.

 

Limitations
The default maximum ports that can be used per bind address is 1000 and is set here. You may raise this constant if you'd like along with the others there.

 

At first, I was trying to use all available ports (1 - 65535). However, due to BPF verifier limitations, I had to raise some constants inside the Linux kernel and recompile the kernel. I made patches for these and have everything documented here. I am able to run the program with 65535 max ports per bind address without any issues with the custom kernel I built using patches I made.

 

Mounting The BPF File System
In order to use xdpfwd-add and xdpfwd-del, you must mount the BPF file system since the XDP program pins the BPF maps to /sys/fs/bpf/xdpfwd. There's a high chance this is already done for you via iproute2 or something similar, but if it isn't, you may use the following command:

 

mount -t bpf bpf /sys/fs/bpf/


Command Line Usage
Basic
Basic command line usage includes:

 

-o --offload => Attempt to load XDP program with HW/offload mode. If fails, will try DRV and SKB mode in that order.
-c --config => Location to XDP Forward config (default is /etc/xdpfwd/xdpfwd.conf).
-h --help => Print out command line usage. (Not implemented yet).


XDP Add Program
The xdpfwd-add executable which is added to the $PATH via /usr/bin on install accepts the following arguments:

 

-b --baddr => The address to bind/look for.
-B --bport => The port to bind/look for.
-d --daddr => The destination address.
-D --dport => The destination port.
-p --protocol => The protocol number to use (17 for UDP, 6 for TCP, and 1 for ICMP).


This will add a forwarding rule while the XDP program is running. As of right now, it does not save this rule into the XDP config file. However, I will be implementing save functionality before release.

 

Additionally, the protocol will accept a string input in the future (before release) such as "udp", "tcp", and "icmp". This functionality is not currently implemented, though.

 

XDP Delete Program
The xdpfwd-del executable which is added to the $PATH via /usr/bin on install accepts the following arguments:

 

-b --baddr => The address to bind/look for.
-B --bport => The port to bind/look for.
-p --protocol => The protocol number to use (17 for UDP, 6 for TCP, and 1 for ICMP).


This will delete a forwarding rule while the XDP program is running. As of right now, it does not save the results into the XDP config file. However, I will be implementing save functionality before release.

 

Additionally, the protocol will accept a string input in the future (before release) such as "udp", "tcp", and "icmp". This functionality is not currently implemented, though.

 

Configuration
The default config file is located at /etc/xdpfwd/xdpfwd.conf and uses the libconfig syntax. Here's an example config using all of its current features.

 

interface = "ens18"; // The interface the XDP program attaches to.

// Forwarding rules array.
forwarding = (
    {
        bind = "10.50.0.3",     // The bind address which incoming packets must match.
        bindport = 80,          // The bind port which incoming packets must match.

        protocol = "tcp",       // The protocol (as of right now "udp", "tcp", and "icmp" are supported). Right now, you must specify a protocol. However, in the future I will be implementing functionality so you don't have to and it'll do full layer-3 forwarding.

        dest = "10.50.0.4",     // The address we're forwarding to.
        destport = 8080         // The port we're forwarding to (if not set, will use the bind port).
    },
    ...
);


Credits

 

I hope you guys like this neat project 🙂

 

Share this post


Link to post
Share on other sites


Just figured I'd share this:

 

linux-laptop-bigmode-06-14-40.png

 

I shared this project on my Linkedin and the CEO of VyOS commented tagging the VyOS maintainer. This is really neat because I remember using VyOS years ago and loved it! That said, they recently added XDP support and it's crazy these guys are likely viewing my code 😄 

Share this post


Link to post
Share on other sites


Hey everyone!

 

I rewrote parts of the XDP Forwarding program last night and today. For one, it turns out the way I was doing it before didn't actually use the correct last seen time when prioritizing existing connections on port exhaustion. Therefore, I rewrote parts of the XDP program to use the correct stats.

 

In addition to these changes, I decided I wanted to use the amount of packets per nano second each connection has to prioritize existing connections on port exhaustion (when selecting a new source port). I believe this makes most sense because these are the connections that'll be most sensitive to losing potential packets. Therefore, having these connections least interrupted makes sense in my opinion.

 

For example, a simple HTTP GET request will only send a few packets whereas UDP connections to a game server will be sending many packets per second. If an existing connection for the HTTP GET request gets removed for port exhaust, the program will more than likely select a new source port to use and be able to send the rest of the packets back with very little additional latency/overhead to that specific legitimate connection. With that said, this is only for when the ports are getting exhausted as well and likely would only be a potential issue when somebody is literally trying to exhaust all available source ports in unique ways (they'd need to send a consistent PPS from an IP count of the max source ports to cause any legitimate issues and at that point, you'll need a firewall either way).

 

I decided to do some pen-testing using my Packet Sequence program here against my XDP Forwarding program and IPTables. The video I made on this can be found below.

 

 

Note - I said in the video that the Packet Sequence delay was in milliseconds, but I'm dumb, it's in microseconds (I've been all over the place today, so I wasn't thinking about the measurement).

 

The Packet Sequence config I used can be found below.

 

interface: "ens18"

sequences:
    one:
        count: 0
        time: 0
        delay: 1000
        trackcount: true

        eth:
            dmac: "1a:c4:df:70:d8:a6"

        ip:
            ranges:
                - 172.16.0.0/16
                - 192.168.0.0/16
            #srcip: "192.168.90.4"

            protocol: "udp"
            dstip: "10.50.0.3"

        udp:
            srcport: 0
            dstport: 27015

 

I tested different delays (which are in microseconds as stated above). I tested with 10000, 1000, 100, 10, and 0. This sequence sends packets from the 172.16.0.0/16 or 192.168.0.0/16 IPv4 ranges with a random source port against UDP 10.50.0.3:27015. This is an example of an attack that is aimed to exhaust all of the source ports due to the randomness. The payload length is only 0 bytes, so these are empty UDP packets. IP and UDP checksums are OK and set within the program as well (no NIC offload).

 

My IPTable's NAT chains are setup like the following.

 

Chain PREROUTING (policy ACCEPT 2 packets, 124 bytes)
 pkts bytes target     prot opt in     out     source               destination
 604K   17M DNAT       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:27015 to:10.50.0.4:27015

Chain INPUT (policy ACCEPT 2 packets, 124 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 99 packets, 6934 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 99 packets, 6934 bytes)
 pkts bytes target     prot opt in     out     source               destination
2380K   67M MASQUERADE  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:27015

 

With net.ipv4.ip_forward set to 1 in the system via sysctl -w net.ipv4.ip_forward=1. The IPTables entries can be inserted with the following two commands (we want to insert into the PREROUTING and POSTROUTING chains).

 

# Prerouting command to do DNAT from 10.50.0.3:27015 (or 0.0.0.0:27015) to 10.50.0.4:27015.
iptables -t nat -A PREROUTING -p udp --dport 27015 -j DNAT --to-destination 10.50.0.4:27015

# Postrouting command so we forward the traffic as the Test02's interface IP (masquerade) instead of the client IP.
iptables -t nat -A POSTROUTING -p udp --dport 27015 -j MASQUERADE

 

The XDP Forwarding program is setup like the following.

 

interface = "ens18";

forwarding = (
        {
                bind = "10.50.0.3",
                bindport = 27015,

                dest = "10.50.0.4",
                destport = 27015,

                protocol = "udp"
        },
        {
                bind = "10.50.0.3",
                bindport = 8030,
                dest = "10.50.0.4",
                destport = 80,
                protocol = "tcp"
        }
);

 

For my testing, I was connecting to my CS:S test server on my home VM which the packets were forwarding to. These connections are UDP-based and very sensitive compared to simple TCP connections such as HTTP GET requests. Therefore, I felt using this type of connection for testing is best.

 

Additionally, I was only using 20 source ports on the XDP Forwarding program within the 500 - 520 port range. This is the default limit due to BPF limitations, but can be raised with a custom kernel (which my home VM is running). I also tried doing testing with 65500 max source ports and the results were actually quite neat (basically, it consumes a lot more resources than using 20 source ports which makes sense). I show this in the video and will briefly go into this below.

 

When using IPTables, I didn't have any packet loss at 10000 or 1000 microsecond delays. However, once I bumped it down to 100 or below, it started receiving heavy packet loss.

 

When using my XDP Forwarding program, I didn't have any packet loss at 10000 or 1000 microsecond delays. When I bumped down the delay to 100, I didn't have any packet loss as well! When I bumped it down to 10, I didn't have much packet loss, but there was a small spike 15 seconds or so after the attack which can be witnessed in the video. This was probably just a hiccup with the BPF program and to be expected during this attack on such a weak VM.

 

When I removed the delay entirely (so it sends as many packets as possible), I started suffering from high packet loss/timeouts. Since these packets didn't have any payload, there were probably a lot sent! I don't believe port exhaustion was even the issue at this point, but the server's resources themselves being fully consumed.

 

I'm not sure how IPTables prioritizes existing connections. However, in this case, my XDP Forwarding program did perform better. I'm sure IPTable's algorithm has its pros in certain cases, though. I am going to be studying this in the future.

 

I also tried testing with 65500 max source ports using the same Packet Sequence configs above. The XDP Forwarding program did suffer from packet loss at 1000 microsecond delays. This is most certainly because when choosing an available source port, it had to loop through more ports at a time. This is when the hardware itself comes into play along with where the XDP program is being loaded (e.g. in SKB, DRV, or HW mode). My home server is fairly weak running on an older Intel Xeon CPU and since the XDP program was loaded in SKB (closer to the netfilter hook), it most likely struggled with this. I believe performance would be much better running on a more modern CPU and having the XDP program load using DRV or HW mode. Overall though, I don't think having it use fully 65500+ ports is really a good option unless if you absolutely need to support that amount of concurrent legitimate connections and have very good hardware with DRV or HW XDP support.

 

Overall, I found these results very interesting and I hope you all do as well! I'm personally pretty happy with the progress I've made on this XDP Forwarding program 🙂

Share this post


Link to post
Share on other sites


  • 1 month later...

I've implemented ICMP support into the program. This was a bit more difficult than other protocols because there isn't anything to map within the ICMP header by default. Therefore, I had to append the client's 32-bit unsigned IP address (in network byte order) to the ICMP data when forwarding the packet from the forwarding server and when receiving the packet back, I'd need to parse the last 4 bytes of the ICMP data which represents the client to send it back to and remove the last four bytes of the packet.

 

The tricky part about this part was the BPF verifier. You for some reason can't consider the packet's end in memory (as a pointer) without the verifier complaining whereas you can put in account the packet's starting location without any issues. No idea why this is and I tried making a thread on the XDP Newbies mailing list here. So I had to calculate the packet's length which isn't difficult, but kept running into a lot of verifier issues regarding minimum bounds, etc. I had to do an AND on the length for 0x3fff which forced the verifier to believe it was safe.

 

Anyways, here's the main code for what I said above.

 

// Handle ICMP protocol.
if (icmph)
{
    if (info)
    {
        // We'll want to add the client's unsigned 32-bit (4 bytes) IP address to the ICMP data so we know where to send it when it replies back.
        // First, let's add four bytes to the packet.
        if (bpf_xdp_adjust_tail(ctx, (int)sizeof(uint32_t)))
        {
            return XDP_DROP;
        }

        // We need to redefine packet and check headers again.
        data = (void *)(long)ctx->data;
        data_end = (void *)(long)ctx->data_end;

        eth = data;

        if (eth + 1 > (struct ethhdr *)data_end)
        {
            return XDP_DROP;
        }

        iph = data + sizeof(struct ethhdr);

        if (iph + 1 > (struct iphdr *)data_end)
        {
            return XDP_DROP;
        }

        icmph = data + sizeof(struct ethhdr) + (iph->ihl * 4);

        if (icmph + 1 > (struct icmphdr *)data_end)
        {
            return XDP_DROP;
        }

        // Now let's add the new data.

        // Unfortunately, we can't start from the packet end (data_end) pointer. Therefore, we must calculate the length of the packet and use the data pointer. Thanks for the help, Srivats! (https://lore.kernel.org/bpf/CANzUK5-g9wLiwUF88em4uVzMja_aR4xj9yzMS_ZObNKjvX6C6g@mail.gmail.com/)
        unsigned int len = (ctx->data_end - ctx->data);

        if (data + len > data_end)
        {
            return XDP_DROP;
        }

        unsigned int off = (len - sizeof(uint32_t)) & 0x3fff;

        uint32_t *icmpdata = data + off;

        if (icmpdata + 1 > (uint32_t *)data_end)
        {
            return XDP_DROP;
        }

        memcpy(icmpdata, &conn->clientaddr, sizeof(uint32_t));

        // We'll want to add four bytes to the IP header.
        iph->tot_len = htons(ntohs(iph->tot_len) + sizeof(uint32_t));

        // Recalculate ICMP checksum.
        icmph->checksum = csum_diff4(0, conn->clientaddr, icmph->checksum);
    }
    else
    {
        // When sending packets back, we'll want to get the client IP address from the ICMP data (last four bytes).
        // First ensure the ICMP data is enough.
        if (icmph + sizeof(uint32_t) > (struct icmphdr *)data_end)
        {
            return XDP_PASS;
        }
        
        // Now access the data.
        unsigned int len = (ctx->data_end - ctx->data);

        if (data + len > data_end)
        {
            return XDP_DROP;
        }

        unsigned int off = (len - sizeof(uint32_t)) & 0x3fff;

        uint32_t *clientaddr = data + off;

        if (clientaddr + 1 > (uint32_t *)data_end)
        {
            return XDP_DROP;
        }

        iph->daddr = *clientaddr;
        
        // Now we'll want to remove the additional four bytes we added when forwarding.
        if (bpf_xdp_adjust_tail(ctx, 0 - (int)sizeof(uint32_t)))
        {
            return XDP_DROP;
        }

        // We need to redefine packet and check headers again.
        data = (void *)(long)ctx->data;
        data_end = (void *)(long)ctx->data_end;

        eth = data;

        if (eth + 1 > (struct ethhdr *)data_end)
        {
            return XDP_DROP;
        }

        iph = data + sizeof(struct ethhdr);

        if (iph + 1 > (struct iphdr *)data_end)
        {
            return XDP_DROP;
        }

        icmph = data + sizeof(struct ethhdr) + (iph->ihl * 4);

        if (icmph + 1 > (struct icmphdr *)data_end)
        {
            return XDP_DROP;
        }

        // Remove four bytes from the IP header's total length.
        iph->tot_len = htons(ntohs(iph->tot_len) - sizeof(uint32_t));

        // Recalculate ICMP checksum.
        icmph->checksum = csum_diff4(iph->daddr, 0, icmph->checksum);
    }
}

 

Fun stuff 🙂 

Share this post


Link to post
Share on other sites




×
×
  • Create New...