Sherlock Holmes and the Case of the Vanishing Multicast Packets

So this was a particularly awkward issue that occurred recently in our socket handling code. Now we push a lot of data around our network. By a lot I mean hundreds of thousands of messages per second, rapidly closing on millions. This needs to go to a lot of services, so we use IP multicasting to do this. We have expensive network cards and fancy switches to make sure we can do this fast and at scale. Now we were reconfiguring one of our environments this week to use said fancy network cards, and the multicast stopped working. This was very confusing – it worked locally on a single box, and setup looked correct – the packets were being sent by the application, but never got to the other box we were testing on.

The journey and solution both ended up teaching me some useful things, both in terms of how multicast works, and debugging these low level issues in Linux.

How does multicasting work?

Health warning: some knowledge of sockets is assumed here, but it’s fairly basic stuff. The basic premise of IP multicasting is that instead of sending to a specific computer’s IP address, you send your data to a multicast group, which is just a reserved IP address. People that want to receive data can join the multicast group with a special subscription command. Then when a sender pushes a new packet out, the router makes sure that all the subscribers get the packet. There are a few things worth noting here:

  1. Multicast is strictly unidirectional. This is kinda obvious when you think about it – the sender has no idea who it is sending to, and the receivers have no idea who is sending.
  2. It naturally supports MPMC – you can have multiple producers and multiple consumers. All the consumers get all the packets, so it’s not like a work sharing queue, more of a broadcast channel, but it does still have a lot of potential scaling use cases.
  3. It’s built on top of UDP – this means there is no order to the packets, and it’s unreliable (i.e. no guarantee that packets get delivered). Protocols exist to deal with this issue (e.g. PGM, MoldUdp64), but that is outside the scope of this document.
  4. Multicast is very poorly supported. For a start you can’t use it over the open internet, none of the main cloud providers support it, and it doesn’t work in docker. It can be used over a local machine, but to use it properly over a network, you need switches that support it.

Here is the obligatory very minimal example, in Rust, using the standard library. We could also use IP v6 instead of v4, but the code is similar, and most people still use IP v4 in practice for multicast, because the packet headers are smaller (if you are sending millions of packets a second, this is actually meaningful).

Sender

use std::io;
use std::net::UdpSocket;
use std::thread;
use std::time::Duration;

fn main() -> io::Result<()> {
    // We don't actually care what we bind to, but std::net forces our hand here
    let sock = UdpSocket::bind("0.0.0.0:0")?;

    // Connect to the group address. Note that multicast addresses 
    // must be in the range 224.0.0.1 to 239.255.255.255.
    // We don't need to connect to send, we can use send_to,
    // and specify the group address and port there instead.
    sock.connect("224.0.0.1:4211")?; 

    for i in 1.. {
        sock.send(format!("Message {}", i).as_bytes())?;
        thread::sleep(Duration::from_secs(1));
    }
    Ok(())
}

Receiver

use std::io;
use std::net::{Ipv4Addr, UdpSocket};
use std::thread;
use std::time::Duration;

fn main() -> io::Result<()> {
    // We want to make sure we are bound to the right port here
    // We could bind to a unicast address here, but that means we
    // would receive the unicast packets as well as the multicast packets.
    // Also behaviour and correct bind address on Windows is different.
    let sock = UdpSocket::bind("224.0.0.1:4211")?; 

    // This subscribes to the group by sending an IGMP request.
    // The second parameter is the interface to join on.
    // For this toy example, IPADDR_ANY works fine.
    sock.join_multicast_group_v4("224.0.0.1", Ipv4Addr::UNSPECIFIED)?;

    let mut buff = [0u8;65536]; // byte buffer to receive our packets in.
    loop {
        let bytes = sock.recv(&mut buff)?;
        println!("{}", std::str::from_utf8(&buff[..bytes]).unwrap());
    }
}

Narrowing down the problem

When trying to debug issues like this, you need to narrow the problem down. Packets not arriving at their destination could be a lot of things. Now these machines are big machines with a lot of network cards. So we started by checking the load of the network cards on both machines, and then kicking off our services. There are many tools to do this, but watch netstat -i is a clean simple way of seeing which network interfaces are doing stuff, and what they are doing. It’s times like this, especially as I come from a heavy Windows programming background, that I marvel at the scale and comprehensiveness of the tools available on Linux.

Boom, kick off the application, and 300K packets are being sent a second! Woo! But on the wrong interfaces! Boo! But we are getting somewhere.

A twist in the tale

So wrong interface = wrong configuration somewhere. At least that was what we thought. Sooo, I double checked in our config what interface we used to bind the socket on the publisher. This interface was the correct one. I double checked the port and the multicast group. These too were correct. We ran netstat again, several times, in case the magical network daemons* that carry the packets around were mocking us. They were not. Rage ensued. Many screams of “but it works on my machine!” were uttered.

Fortunately I have a colleague who is a hardware/bash/linux ninja. He also did a lot of the work setting up the boxes originally and he had tools. One of these was a multicast test tool called msend. So see if we could get anything to work, we fired it up and tested sending and receiving box to box. This worked, and on the right interfaces! So the hardware was correct, and there were no weird routing rules stopping it from working. Given we had double checked the application configuration already, that left the code.

There was nothing obvious I could see from inspection. The code looked fairly similar to the toy sender example above, except that I used a specific interface in the call to bind. However this appeared not to work. A quick scan of some of the examples in C seemed not to show anything obviously different from my code, except they didn’t use specific interfaces.

Linux tools to the rescue

At this point my linux ninja colleague suggested using strace to see what the msend program was doing different. strace is an awesome tool that logs all the system calls that your program makes to stderr. We did this one my code and on the msend tool and compared the results. While they were mostly similar, there were two differences that stood out. Firstly, msend didn’t call bind. Secondly the was one extra syscall in msend: setsockopt(3, SOL_IP, IP_MULTICAST_IF, .... Now that looks like a difference that would cause packets to be send out over the wrong interface. A quick google on IP_MULTICAST_IF revealed that that was the way you specify the right network interface to send on. This interestingly meant that the fact that it had worked perfectly for two years on many different machines was mostly down to blind luck and occasional blood sacrifices to the network daemons.

One last hurdle.

It turns out that there is no way to set IP_MULTICAST_IF on std::net::UdpSocket, without resorting to RawFd and OS specific code. So instead we ported our code to Alex Crichton’s socket2 crate. This provides a much thinner wrapper around the OS and is a much more comprehensive wrapper than std::net. The end result looked something like the following:

fn send(iface: Ipv4Addr, group: Ipv4Addr, port: u16) -> io::Result<()> {
    // No call to bind, just create a socket with protocol `None`
    let socket = Socket::new(Domain::ipv4(), Type::dgram(), None)?;
    // Set the interface correctly
    socket.set_multicast_if_v4(&iface)?;
    let sock_addr = SocketAddrV4::new(group, port);
    socket.connect(&sock_addr.into())?;

    for i in 1.. {
        sock.send(format!("Message {}", i).as_bytes())?;
        thread::sleep(Duration::from_secs(1));
    }
    Ok(())
}

This worked!

Summary

So in summary:

  • IP multicast is a weird beast that looks like a regular UDP socket, but behaves very differently. You have to know the differences, especially when it comes to configuring it correctly with multiple interfaces.
  • There is a wealth of tooling available in Linux to help debug these things.
  • No network daemons were harmed while debugging this issue.

What I’d like from Rust 2021

I’ve been a user of Rust now for 2 years. It was something of a revelation – I was amazed at how well thought out the core of the language is – many classic C++ gotchas and kludges (null pointers, double free, use after free, slicing, diamond inheritance) have been completely designed out. In that time the language and compiler has evolved rapidly, with const functions, async await, and (hopefully by the end of the year) const generics. I’m lucky enough to have been working with Rust full time for most of the last 2 years, building an exchange for a fintech startup. As such performance is a key goal, and the reason we picked Rust over Java or Go. We are very happy with the choice, but there are always things we can improve on. So with that in mind. Here are my themes and priorities for what I’d like from Rust in 2021.

1. Expressive Performance

Rust is already both expressive and performant, but has some way to go to match C++ yet. C++ generics are still the gold standard to beat, with specialization, const generics and higher kinded (duck) types. From the perspective of building high performance ultra low latency software, being able to express things at compile time is critical for controlling performance. Therefore I’d really like to see const generics (https://github.com/rust-lang/rust/issues/44580), specialization (https://github.com/rust-lang/rust/issues/31844), and GATs (https://github.com/rust-lang/rust/issues/44265) to be pushed forward in 2021.

2. Rust and Rust Analyser, together forever <3

Rust analyser is awesome, in particular the pace of development is a sight to behold. But it’s are still not quite aligned with Rust. I’d love to see both the rust analyser team and rustc team push forward on the work (like chalk integration in rustc) that would help bring rust-analyser towards full parity with rustc.

I often still have issues with rust analyser not working on my large workspace, whereas when I open up a simple project, it’s flawless. I have yet to work out exactly what causes this, but it feels like it needs some work on large workspaces.

3. Scaling Cargo

I love Cargo – partly because it’s not (C)Make, but mostly because it makes building rust code so extremely simple. However it has some issues as codebases scale up and get more complex. Larger codebases can use multiple languages, sometimes more than one toolchain and pre or post build processing. Not every tool was designed to work seamlessly with rust and cargo, not should we expect it to.

  • Post build scripting: This would be incredibly helpful for projects that require creating packages, or generating manifests etc. There is an issue tracking this, which appears to be in RFC hell: https://github.com/rust-lang/cargo/issues/545.
  • Workspaces: Workspaces currently have weird quirks at the moment – for example a project’s dependencies can have different features activated depending on whether it is compiled as part of the whole workspace or with -p! (https://github.com/rust-lang/cargo/issues/4463)
  • Cargo config: Cargo config is done using hidden files, which seem to be relative to the working folder you run Cargo from, not relative to the project they apply to. This in particular cause some weird subtle bugs that took us a while to work around. It feels like a lot of this ought to be in the Cargo.toml manifest instead. (https://github.com/rust-lang/cargo/issues/8244)

4. Encouraging contributions on work for the roadmap

Rust has a stellar community full of bright developers, who are keen to contribute. I’d like to contribute, but I don’t know where to begin. Particularly hot topic issues, like GAT or specialization – the issues are huge, and I have no idea who is currently working on them or what state they are in, or even if I have the knowledge to help. If we are to get these big features out of RFC hell and into the stable compiler, it feels important that they are brought to the fore more publically than where they are now. How we do this I’m not sure, possibly a section of the main website which lists roadmap progress/calls for assistance. These things are hard to find currently, even when you know what you are looking for :).

5. Compiler speed is not a priority

I’m sure many people will disagree here. I think that this is currently less critical than the outstanding feature work. Whilst we can all do with more compilation speed, I’d rather have an amazing rust analyser experience and a super expressive type system. These things together would have a much bigger impact on my productivity than a faster compiler.