Sherlock Holmes and the Case of the Vanishing Multicast Packets

So this was a particularly awkward issue that occurred recently in our socket handling code. Now we push a lot of data around our network. By a lot I mean hundreds of thousands of messages per second, rapidly closing on millions. This needs to go to a lot of services, so we use IP multicasting to do this. We have expensive network cards and fancy switches to make sure we can do this fast and at scale. Now we were reconfiguring one of our environments this week to use said fancy network cards, and the multicast stopped working. This was very confusing – it worked locally on a single box, and setup looked correct – the packets were being sent by the application, but never got to the other box we were testing on.

The journey and solution both ended up teaching me some useful things, both in terms of how multicast works, and debugging these low level issues in Linux.

How does multicasting work?

Health warning: some knowledge of sockets is assumed here, but it’s fairly basic stuff. The basic premise of IP multicasting is that instead of sending to a specific computer’s IP address, you send your data to a multicast group, which is just a reserved IP address. People that want to receive data can join the multicast group with a special subscription command. Then when a sender pushes a new packet out, the router makes sure that all the subscribers get the packet. There are a few things worth noting here:

  1. Multicast is strictly unidirectional. This is kinda obvious when you think about it – the sender has no idea who it is sending to, and the receivers have no idea who is sending.
  2. It naturally supports MPMC – you can have multiple producers and multiple consumers. All the consumers get all the packets, so it’s not like a work sharing queue, more of a broadcast channel, but it does still have a lot of potential scaling use cases.
  3. It’s built on top of UDP – this means there is no order to the packets, and it’s unreliable (i.e. no guarantee that packets get delivered). Protocols exist to deal with this issue (e.g. PGM, MoldUdp64), but that is outside the scope of this document.
  4. Multicast is very poorly supported. For a start you can’t use it over the open internet, none of the main cloud providers support it, and it doesn’t work in docker. It can be used over a local machine, but to use it properly over a network, you need switches that support it.

Here is the obligatory very minimal example, in Rust, using the standard library. We could also use IP v6 instead of v4, but the code is similar, and most people still use IP v4 in practice for multicast, because the packet headers are smaller (if you are sending millions of packets a second, this is actually meaningful).

Sender

use std::io;
use std::net::UdpSocket;
use std::thread;
use std::time::Duration;

fn main() -> io::Result<()> {
    // We don't actually care what we bind to, but std::net forces our hand here
    let sock = UdpSocket::bind("0.0.0.0:0")?;

    // Connect to the group address. Note that multicast addresses 
    // must be in the range 224.0.0.1 to 239.255.255.255.
    // We don't need to connect to send, we can use send_to,
    // and specify the group address and port there instead.
    sock.connect("224.0.0.1:4211")?; 

    for i in 1.. {
        sock.send(format!("Message {}", i).as_bytes())?;
        thread::sleep(Duration::from_secs(1));
    }
    Ok(())
}

Receiver

use std::io;
use std::net::{Ipv4Addr, UdpSocket};
use std::thread;
use std::time::Duration;

fn main() -> io::Result<()> {
    // We want to make sure we are bound to the right port here
    // We could bind to a unicast address here, but that means we
    // would receive the unicast packets as well as the multicast packets.
    // Also behaviour and correct bind address on Windows is different.
    let sock = UdpSocket::bind("224.0.0.1:4211")?; 

    // This subscribes to the group by sending an IGMP request.
    // The second parameter is the interface to join on.
    // For this toy example, IPADDR_ANY works fine.
    sock.join_multicast_group_v4("224.0.0.1", Ipv4Addr::UNSPECIFIED)?;

    let mut buff = [0u8;65536]; // byte buffer to receive our packets in.
    loop {
        let bytes = sock.recv(&mut buff)?;
        println!("{}", std::str::from_utf8(&buff[..bytes]).unwrap());
    }
}

Narrowing down the problem

When trying to debug issues like this, you need to narrow the problem down. Packets not arriving at their destination could be a lot of things. Now these machines are big machines with a lot of network cards. So we started by checking the load of the network cards on both machines, and then kicking off our services. There are many tools to do this, but watch netstat -i is a clean simple way of seeing which network interfaces are doing stuff, and what they are doing. It’s times like this, especially as I come from a heavy Windows programming background, that I marvel at the scale and comprehensiveness of the tools available on Linux.

Boom, kick off the application, and 300K packets are being sent a second! Woo! But on the wrong interfaces! Boo! But we are getting somewhere.

A twist in the tale

So wrong interface = wrong configuration somewhere. At least that was what we thought. Sooo, I double checked in our config what interface we used to bind the socket on the publisher. This interface was the correct one. I double checked the port and the multicast group. These too were correct. We ran netstat again, several times, in case the magical network daemons* that carry the packets around were mocking us. They were not. Rage ensued. Many screams of “but it works on my machine!” were uttered.

Fortunately I have a colleague who is a hardware/bash/linux ninja. He also did a lot of the work setting up the boxes originally and he had tools. One of these was a multicast test tool called msend. So see if we could get anything to work, we fired it up and tested sending and receiving box to box. This worked, and on the right interfaces! So the hardware was correct, and there were no weird routing rules stopping it from working. Given we had double checked the application configuration already, that left the code.

There was nothing obvious I could see from inspection. The code looked fairly similar to the toy sender example above, except that I used a specific interface in the call to bind. However this appeared not to work. A quick scan of some of the examples in C seemed not to show anything obviously different from my code, except they didn’t use specific interfaces.

Linux tools to the rescue

At this point my linux ninja colleague suggested using strace to see what the msend program was doing different. strace is an awesome tool that logs all the system calls that your program makes to stderr. We did this one my code and on the msend tool and compared the results. While they were mostly similar, there were two differences that stood out. Firstly, msend didn’t call bind. Secondly the was one extra syscall in msend: setsockopt(3, SOL_IP, IP_MULTICAST_IF, .... Now that looks like a difference that would cause packets to be send out over the wrong interface. A quick google on IP_MULTICAST_IF revealed that that was the way you specify the right network interface to send on. This interestingly meant that the fact that it had worked perfectly for two years on many different machines was mostly down to blind luck and occasional blood sacrifices to the network daemons.

One last hurdle.

It turns out that there is no way to set IP_MULTICAST_IF on std::net::UdpSocket, without resorting to RawFd and OS specific code. So instead we ported our code to Alex Crichton’s socket2 crate. This provides a much thinner wrapper around the OS and is a much more comprehensive wrapper than std::net. The end result looked something like the following:

fn send(iface: Ipv4Addr, group: Ipv4Addr, port: u16) -> io::Result<()> {
    // No call to bind, just create a socket with protocol `None`
    let socket = Socket::new(Domain::ipv4(), Type::dgram(), None)?;
    // Set the interface correctly
    socket.set_multicast_if_v4(&iface)?;
    let sock_addr = SocketAddrV4::new(group, port);
    socket.connect(&sock_addr.into())?;

    for i in 1.. {
        sock.send(format!("Message {}", i).as_bytes())?;
        thread::sleep(Duration::from_secs(1));
    }
    Ok(())
}

This worked!

Summary

So in summary:

  • IP multicast is a weird beast that looks like a regular UDP socket, but behaves very differently. You have to know the differences, especially when it comes to configuring it correctly with multiple interfaces.
  • There is a wealth of tooling available in Linux to help debug these things.
  • No network daemons were harmed while debugging this issue.