Testing all the libp2ps

Testing all the libp2ps

# Background

There are many implementations (opens new window) of libp2p with varying degrees of feature completeness. To name a few, there’s a Go, Rust, JS, Nim, Java, and Zig implementation. Each implementation has transports (opens new window), secure channels (opens new window), and muxers (opens new window) that they support. How do we make sure that each implementation can communicate with every other implementation? And how do we check that they can communicate over each supported combination (transport+encryption+muxer tuple)? In this post I’ll cover how we test every implementation on every strategy, on many versions, and a couple of challenges we ran into along the way. Finally, I’ll highlight some open problems that you can contribute to. Be warned, nothing in here is novel or particularly fancy. Just basic plumbing to check if two libp2p implementations can talk to each other.

Testing connectivity interoperability is as simple as starting up two libp2p nodes and having them ping (opens new window) each other. The difficulty arises in how we make a reproducible environment for every implementation and connection strategy. The first attempt used Testground (opens new window). That attempt didn’t get too far for various reasons you can read about here (opens new window), but, to summarize, Testground was too complicated and slow for what we wanted to do here – Start up two nodes and have them ping each other. The next attempt used Docker’s compose (opens new window) directly with some TypeScript to help generate the compose files. This was much easier to build, and I got a working setup in half a day.

# Problems to solve

Compose handles spinning up the test environment and putting the nodes on the same network, but there were still a couple of problems to solve: How do we define this test environment? How do we share the listener’s address to the dialer so that it knows who to dial? And how do we build each implementation?

We solved the first problem in perhaps an unconventional way, Sqlite. The problem of “Given these implementations that support these parameters, find all combinations of implementations and parameters that should be able to communicate with each other” is equivalent to a Join operation (opens new window). We could have solved this by manually implementing a join with nested for loops, but, and this might be my database background peeking through, it’s a lot more straightforward to define this in a simple query (opens new window). To populate the Sqlite tables, we create an In-Memory database (opens new window) and load data be iterating (opens new window) over each version defined in versions.ts (opens new window).

The second problem we solved by using Redis as a synchronization point. The listener pushes its address to Redis (RPUSH), and the dialer blocks until it can read the address from Redis (BLPOP). The reason I chose this was because I liked how Testground used the database as a synchronization primitive, and wanted to keep that property. In retrospect, I don’t think we actually need Redis here and it adds some unnecessary complexity. We could have just as easily had the test runner read the address from the server’s stdout and feed it via stdin to the dialer.

Now we have a way of defining which implementations will be connected to each other and how they’ll interact, but we still haven’t defined the implementations. At a high level, the things we care about for building the implementations are:

  1. Be reproducible for a given commit.
  2. Caching is an optimization. Things should be fine without it.
  3. If we have a cache hit, be fast.

The test runner accepts a version that defines information about what a certain released version of an implementation is capable of. We test against the latest version of each implementation as well as older versions. Here’s the type definition (opens new window) and an example:

type Version = {
    id: string,
    containerImageID: string,
    transports: Array<(string | { name: string, onlyDial: boolean })>,
    secureChannels: string[],
    muxers: string[]
		// If defined, this will increase the timeout for tests using this version
    timeoutSecs?: number,
}

// example
{
    id: "go-v0.28.0",
    transports: ["tcp", "ws", "quic", "quic-v1", "webtransport"],
    secureChannels: ["tls", "noise"],
    muxers: ["mplex", "yamux"],
    containerImageID: "sha256:598fe4..."
}

// Example 2, here we are testing a new version of go-libp2p
{
    id: "go-v0.29.0",
    transports: ["tcp", "ws", "quic", "quic-v1", "webtransport"],
    secureChannels: ["tls", "noise"],
    muxers: ["mplex", "yamux"],
    containerImageID: "sha256:ead2d2..."
}

The whole file for every version and implementation can be found at [multidim-interop/versions.ts](https://github.com/libp2p/test-plans/blob/master/multidim-interop/versions.ts). Every time a new libp2p version is released, we update this file. Right now it’s manual (and for go-libp2p it’s part of our release checklist (opens new window)).

Compose itself only references the provided container image ID. It’s up to each implementation to decide how it wants to build a containerized version of its node. Implementations define how to build themselves with a Makefile (example for go-libp2p v0.28 (opens new window)). These produce an image.json file that define the container image ID. The caching layer will cache these images by the SHA256 hash of their inputs (e.g. the Makefile) and the target architecture (x86 vs arm64). We store these cached images in S3, and the tool will try to load the cached images before building (see caching (opens new window)). Forks and libp2p implementations have read access to this cache so they can benefit from faster builds.

This system supports testing browsers by having the node use Playwright (opens new window) to spin up a browser. We test WebRTC by having the listener node also include a relay node to facilitate the WebRTC handshake. From the test system’s point of view this looks identical to a non-browser test.

# Coverage

Right now, the system tests 6 different libp2p implementations and runs about 1700 tests. The tests are also run on each PR in {Go, Rust, JS, Nim, and Zig}-libp2p. Are you working on a libp2p implementation and want to make sure you’re compatible? Checkout the README.md (opens new window) for the specifics on how to implement this test.

Multidim Interop coverage matrix

See the latest full run at: https://github.com/libp2p/test-plans/actions/workflows/transport-interop.yml?query=branch%3Amaster (opens new window). The run is defined by this GitHub action (opens new window).

# Impact realized so far

The system has already helped catch a couple bugs such as:

As well as helped validate some big code changes such as:

# Next steps

There’s still a couple of improvements we could make here. The biggest one is making this faster (opens new window). In CI, each test (starting docker compose until it exits) takes about 2s. It’s unclear why this takes so long. My hunch is that it’s not the actual test that takes a while (the handshake and ping time is around 50-300ms), but the cost to setup the cgroups, network namespaces, and other docker specific things. Running the simplest docker image of hello-world takes around 400ms on my machine (time docker run --rm hello-world). There must be something here we can optimize.

Besides those issues, there is a whole host of things to work on for testing interoperability besides basic connectivity such as adding tests for mDNS (opens new window) and testing interoperability of libp2p protocols like:

There are some great first issues (opens new window), and we’d welcome anyone to jump in to support support libp2p in its core tenet of providing rock solid stability.