Jump to content
 Share

Roy

POP Upgrades + Some Other Changes

Recommended Posts

An Update On Bifrost

I am currently recompiling the 5.10.1 kernel to increase the max jump sequences to 126k (from 8192) on an one core VPS :) Hopefully this doesn't fail at the very end because if it does, I'll jump out the window. It's already taking really long as is!

 

Anyways, I believe Dreae and I have a plan. It turns out after further testing, the bpf_map_push_elem() function is only available for the map type BPF_MAP_TYPE_STACK. When compiling using BPF_MAP_TYPE_ARRAY OR BPF_MAP_TYPE_HASH, I'd always receive the following when executing the program.

 

36: (85) call bpf_map_push_elem#87
cannot pass map_type 2 into func bpf_map_push_elem#87
processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1

 

Where map_type was either 1 or 2 depending on the map used above. It compiled using the stack map type, but I could never get bpf_map_push_elem() to work with bpf_map_peek_elem() (hoping this would grab the last inserted key) anyways.


Therefore, we're left to do everything manually within a LRU hash map (we want to find a random port available). After discussions on Discord which can be found below. I believe we found the loop that will suit (possibly with slight modifications).

 

3562-12-23-2020-CdDFI6fd.png

 

3563-12-23-2020-9vZia4Mp.png

 

3564-12-23-2020-bAVyOrsK.png

 

3565-12-23-2020-2FDmHZvC.png

 

3566-12-23-2020-XM311v6s.png

 

In this case, the below would probably be the best way to approach this.

 

uint64_t leasttime = MAX_64BIT_INTEGER;
uint32_t leastkey = 0;

for (uint16_t i = 1; i <= 65535; i++)
{
  uint32_t key = i;

  uint64_t *val = bpf_map_lookup_elem(&port_map, &key);

  if (!val)
  {
    leasttime = now;
    leastkey = i;

    break;
  }
  else
  {
    if (*val < leasttime)
    {
      leasttime = *val;
      leastkey = i;      
    } 
  }
}

bpf_map_update_elem(&port_map, &leastkey, &now, BPF_ANY);

 

I just figured I'd be transparent about this all. The custom kernel I'm building is still compiling, but I'll let you know on the results :) Assuming we don't hit the BPF stack limit of 512 bytes or any other limitations, I believe we should be good to go if we can get the forwarding aspect working with BPF/XDP :)

 

Thanks!

Share this post


Link to post
Share on other sites


Another Update

I spent most of the day compiling the 5.10.1 Linux kernel. Initially, I tried raising the max jump sequence to 126K here and recompiling the kernel. However, I then started running into buffer size issues regarding iproute2.

 

root@test02:/home/roy/iproute2/ip# ./ip link set ens18 xdpgeneric obj /home/dev/HelloWorld/xdp_bpf_push.o section xdp_prog
Log buffer too small to dump verifier log 33554432 bytes (11 tries)!
Error fetching program/map!

 

It turns out the buffer was too small for the BPF error. I tried changing the max log size to an unsigned 64-bit integer with UINT64_MAX being the value here and compiling iproute2 on my own.

 

Unfortunately, I had no luck. Although the log size was higher than the original, it still wasn't enough. Therefore, I had to load the XDP program using libbpf in C which allowed me to see the error.

 

60: (85) call bpf_map_lookup_elem#1
61: (15) if r0 == 0x0 goto pc+38
 R0_w=map_value(id=0,off=0,ks=2,vs=24,imm=0) R6=pkt(id=0,off=26,r=34,imm=0) R7_w=inv(id=176463) R8_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R9_w=inv58821 R10=fp0 fp-8=mm?????? fp-32=??????mm fp-40=mmmmmmmm
62: (79) r1 = *(u64 *)(r0 +0)
 R0_w=map_value(id=0,off=0,ks=2,vs=24,imm=0) R6=pkt(id=0,off=26,r=34,imm=0) R7_w=inv(id=176463) R8_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R9_w=inv58821 R10=fp0 fp-8=mm?????? fp-32=??????mm fp-40=mmmmmmmm
63: (3d) if r1 >= r7 goto pc+3
 R0_w=map_value(id=0,off=0,ks=2,vs=24,imm=0) R1_w=inv(id=0) R6=pkt(id=0,off=26,r=34,imm=0) R7_w=inv(id=176463) R8_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R9_w=inv58821 R10=fp0 fp-8=mm?????? fp-32=??????mm fp-40=mmmmmmmm
BPF program is too large. Processed 1000001 insn
processed 1000001 insns (limit 1000000) max_states_per_insn 4 total_states 11772 peak_states 11772 mark_read 2

 

The BPF program was too large for the verifier via "BPF program is too large".

 

The check was occurring here. I had to raise the BPF_COMPLEXITY_LIMIT_INSNS constant which was declared here. I raised this to 100 million (instead of 1 million).

 

Sadly, again I ran into a max jump sequence limitation. So I had to raise my already increased max jump sequence limit to 126 million instead of 126K.

 

I also found out how to recompile the Linux kernel into deb files without cleaning everything which resulted in the compilation taking 4 - 6 hours even with 4 cores. I just had to use the make bindeb-pkg -j 4 command instead of make deb-pkg -j 4. This resulted in the compilation only taking around 40 minutes or so to generate those deb files instead of 4 - 6 hours :)

 

Within my last attempt, I was able to load the BPF/XDP program successfully and also checked /sys/kernel/debug/tracing/trace (or trace_pipe) to ensure the program was working properly as well.

 

root@test02:/home/dev/BPF-Loader# ./loader
libbpf: Kernel error message: virtio_net: Too few free TX rings available
XDP-Native may not be supported with this NIC. Using SKB instead.

root@test02:/home/dev/BPF-Loader# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 1a:c4:df:70:d8:a6 brd ff:ff:ff:ff:ff:ff
root@test02:/home/dev/BPF-Loader# ip link set ens18 xdpgeneric obj /home/dev/HelloWorld/xdp_bpf_push.o section xdp_prog
Note: 16 bytes struct bpf_elf_map fixup performed due to size mismatch!
root@test02:/home/dev/BPF-Loader# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 1a:c4:df:70:d8:a6 brd ff:ff:ff:ff:ff:ff
    prog/xdp id 15 tag 3b09307435253d95 jited
root@test02:/home/dev/BPF-Loader# ip link set ens18 xdpgeneric off
root@test02:/home/dev/BPF-Loader#

 

From the trace pipe:

 

          <idle>-0       [007] d.s.   429.892795: bpf_trace_printk: Using port 1 with 50331914

          <idle>-0       [007] d.s.   429.937832: bpf_trace_printk: Using port 1 with 50331914

          <idle>-0       [007] d.s.   429.982814: bpf_trace_printk: Using port 1 with 50331914

          <idle>-0       [007] d.s.   430.027811: bpf_trace_printk: Using port 1 with 50331914

          <idle>-0       [007] d.s.   430.073012: bpf_trace_printk: Using port 1 with 50331914

          <idle>-0       [007] d.s.   430.073291: bpf_trace_printk: Using port 1 with 50331914

 

This shows the program is working :) Here's the sample XDP program I made.

 

#include <linux/bpf.h>
#include <linux/bpf_common.h>

#include <inttypes.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/in.h>

#include "/home/dev/XDP-Firewall/libbpf/src/bpf_helpers.h"

#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define htons(x) ((__be16)___constant_swab16((x)))
#define ntohs(x) ((__be16)___constant_swab16((x)))
#define htonl(x) ((__be32)___constant_swab32((x)))
#define ntohl(x) ((__be32)___constant_swab32((x)))
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define htons(x) (x)
#define ntohs(X) (x)
#define htonl(x) (x)
#define ntohl(x) (x)
#endif

struct connection
{
    uint64_t lastseen;
    uint64_t count;

    uint32_t clientaddr;
    uint16_t srcport;
};

struct bpf_map_def SEC("maps") connection_map =
{
    .type = BPF_MAP_TYPE_LRU_HASH,
    .key_size = sizeof(uint32_t),
    .value_size = sizeof(uint16_t),
    .max_entries = 65535
};

struct bpf_map_def SEC("maps") port_map =
{
    .type = BPF_MAP_TYPE_LRU_HASH,
    .key_size = sizeof(uint16_t),
    .value_size = sizeof(struct connection),
    .max_entries = 65535
};

SEC("xdp_prog")
int xdp_prog_func(struct xdp_md *ctx)
{
    void *data = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    struct ethhdr *eth = (data);

    if (eth + 1 > (struct ethhdr *)data_end)
    {
        return XDP_DROP;
    }

    if (eth->h_proto != htons(ETH_P_IP))
    {
        return XDP_PASS;
    }

    struct iphdr *iph = (data + sizeof(struct ethhdr));

    if (iph + 1 > (struct iphdr *)data_end)
    {
        return XDP_DROP;
    }

    if (iph->protocol == IPPROTO_TCP)
    {
        struct tcphdr *tcph = data + sizeof(struct ethhdr) + (iph->ihl * 4);

        if (tcph + 1 > (struct tcphdr *)data_end)
        {
            return XDP_DROP;
        }

        uint64_t now = bpf_ktime_get_ns();

        uint16_t *sport = bpf_map_lookup_elem(&connection_map, &iph->saddr);

        if (sport)
        {
            struct connection *conn = bpf_map_lookup_elem(&port_map, sport);

            if (conn)
            {
                if (conn->clientaddr == iph->saddr)
                {
                    bpf_printk("Using port %" PRIu16 " with %" PRIu32 "\n", *sport, iph->saddr);
                    conn->lastseen = now;

                    return XDP_PASS;
                }
            }

            bpf_map_delete_elem(&connection_map, &iph->saddr);
        }

        // Look for available ports.
        uint16_t port = 0;
        uint64_t smallest = UINT64_MAX;

        for (uint32_t i = 1; i <= 64000; i++)
        {
            uint16_t tmp = (uint16_t)i;

            struct connection *conn = bpf_map_lookup_elem(&port_map, &tmp);

            if (!conn)
            {
                port = tmp;

                break;
            }
            else
            {
                if (conn->lastseen < smallest)
                {
                    smallest = conn->lastseen;
                    port = tmp;
                }
            }
        }

        if (port > 0)
        {
            // New entry.
            bpf_map_update_elem(&connection_map, &iph->saddr, &port, BPF_ANY);

            struct connection conn = {0};
            conn.clientaddr = iph->saddr;
            conn.lastseen = now;
            conn.srcport = port;

            bpf_map_update_elem(&port_map, &port, &conn, BPF_ANY);
        }
        
    }

    return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

 

Now I'm able to actually start working on Bifrost's forwarding aspect in XDP without running into limitations and if I do run into any more limitations, I know how to raise them now and recompile the kernel :)

Share this post


Link to post
Share on other sites


Update On New POPs

For players routing through our new POPs announced above (LA, Toronto, and Miami), there was a high chance you weren't seeing servers properly within the server browser (e.g. sometimes they'd show, sometimes they wouldn't). This was because for some reason, these new servers in particular have eight RX queues instead of two like the rest of our POP servers. It'd make more sense we had only two RX queues because we have two cores. It doesn't make sense to me, but we just modified Compressor's config to attempt to setup AF_XDP sockets on eight RX queues instead of two. It wouldn't error out if the RX queue wasn't available.

 

Sample on Toronto POP:

 

xxxxx@xxxxx:~/compressor-private# ls -la /sys/class/net/ens3/queues/
total 0
drwxr-xr-x 18 root root 0 Dec 25 02:08 .
drwxr-xr-x  5 root root 0 Dec 25 02:08 ..
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-0
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-1
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-2
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-3
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-4
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-5
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-6
drwxr-xr-x  3 root root 0 Dec 25 02:08 rx-7
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-0
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-1
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-2
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-3
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-4
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-5
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-6
drwxr-xr-x  3 root root 0 Dec 25 02:08 tx-7

 

Sample on Singapore POP (same package as Toronto POP):

 

xxxxx@xxxxx:~# ls -la /sys/class/net/ens3/queues/
total 0
drwxr-xr-x 6 root root 0 Dec 25 02:11 .
drwxr-xr-x 5 root root 0 Dec 25 02:11 ..
drwxr-xr-x 3 root root 0 Dec 25 02:11 rx-0
drwxr-xr-x 3 root root 0 Dec 25 02:11 rx-1
drwxr-xr-x 3 root root 0 Dec 25 02:11 tx-0
drwxr-xr-x 3 root root 0 Dec 25 02:11 tx-1

 

Doesn't make sense to me, but should be resolved at least :)

Share this post


Link to post
Share on other sites




×
×
  • Create New...