This blog post will be split into 4 parts that will be published over the next weeks
- Overview and motivation
- Porting approach with various details, examples and problems I ran into along the way
- Performance optimizations
- Building Rust code into a C library as drop-in replacement
The initial versions of the
ebur128 crate was built around the
libebur128 C library (and included its code for ease of building), version 0.1.2 and newer is the pure Rust implementation.
libebur128 implements the EBU R128 loudness standard. The Wikipedia page gives a good summary of the standard, but in short it describes how to measure loudness of an audio signal and how to use this for loudness normalization.
While this intuitively doesn’t sound very complicated, there are lots of little details (like how human ears are actually working) that make this not as easy as one might expect. This results in there being many different ways for measuring loudness and is one of the reasons why this standard was introduced. Of course it is also not the only standard for this.
libebur128 is also the library that I used in the GStreamer loudness normalization plugin, about which I wrote a few weeks ago already. By porting the underlying loudness measurement code to Rust, the only remaining C dependency of that plugin is GStreamer itself.
Apart from that it is used by FFmpeg, but they include their own modified copy, as well as many other projects that need some kind of loudness measurement and don’t use ReplayGain, another older but widely used standard for the same problem.
Before going over the details of what I did, let me first explain why I did this work at all.
libebur128 is a perfectly well working library, in wide use for a long time and probably rather bug-free at this point and it was already possible to use the C implementation from Rust just fine. That’s what the initial versions of the
ebur128 crate were doing.
My main reason for doing this was simply because it seemed like a fun little project. It isn’t a lot of code that is changing often so once ported it should be more or less finished and it shouldn’t be much work to stay in sync with the C version. I started thinking about doing this already after the initial release of the C-based
ebur128 release, but after reading Joe Neeman’s blog post about porting another C audio library (RNNoise) to Rust this gave me the final push to actually start with porting the code and to follow through until it’s done.
However, don’t go around and ask other people to rewrite their projects in Rust (don’t be rude) or think that your own rewrite is magically going to be much faster and less buggy than the existing implementation. While Rust saves you from a big class of possible bugs, it doesn’t save you from yourself and usually rewrites contain bugs that didn’t exist in the original implementation. Also getting good performance in Rust requires, like in every other language, some effort. Before rewriting any software, think about the goals of this rewrite realistically as well as the effort required to actually get it finished.
Apart from fun there were also a few technical and non-technical reasons for me to look into this. I’m going to just list two here (curiosity and portability). I will skip the usual Rust memory-safety argument as that seems less important with this code: the C code is widely used for a long time, not changing a lot and has easy to follow memory access patterns. While it definitely had a memory safety bug (see above), it was rather difficult to trigger and it was fixed in the meantime.
Personally and at my company Centricular we try to do any new projects where it makes sense in Rust. While this worked very well in the past and we got great results, there were some questions for future projects that I wanted to get some answers, hard data and personal experience for
- How difficult is it to port a C codebase function by function to Rust while keeping everything working along the way?
- How difficult is it to get the same or better performance with idiomatic Rust code for low-level media processing code?
- How much bigger or smaller is the resulting code and do Rust’s higher-level concepts like iterators help to keep code concise?
- How difficult is it to create a C-compatible library in Rust with the same API and ABI?
I have some answers to all these questions already but previous work on this was not well structured and the results were also not documented, which I’m trying to change here now. Both to have a reference for myself in the future as well as for convincing other people that Rust is a reasonable technology choice for such projects.
As you can see the general pattern of these questions are introducing Rust into an existing codebase, replacing existing components with Rust and writing new components in Rust, which is also relates to my work on the Rust GStreamer bindings.
C is a very old language and while there is a standard, each compiler has its own quirks and each platform different APIs on top of the bare minimum that the C standard defines. C itself is very portable, but it is not easy to write portable C code, especially when not using a library like GLib that hides these differences and provides basic data structures and algorithms.
This seems to be something that is often forgotten when the portability of C is given as an argument against Rust, and that’s the reason why I wanted to mention this here specifically. While you can get a C compiler basically everywhere, writing C code that also runs well everywhere is another story and C doesn’t make this easy by design. Rust on the other hand makes writing portable code quite easy in my experience.
In practice there were three specific issues I had for this codebase. Most of the advantages of Rust here are because it is a new language and doesn’t have to carry a lot of historical baggage.
Mathematical Constants and Functions
Microsoft’s MSVC doesn’t, but instead you have to
#define _USE_MATH_DEFINES before including
While not a big problem per-se, it is annoying and indeed caused the initial version of the
ebur128 Rust crate to not compile with MSVC because I forgot about it.
Similarly, which mathematical functions are available depends a lot on the target platform and which version of the C standard is supported. An example of this is the
log10 function to calculate the base-10 logarithm. For portability reasons,
libebur128 didn’t use it but instead calculated it via the natural logarithm (
ln(x) / ln(10) = log10(x)) because it’s only available in POSIX and since C99. While C99 is from 1999, there are still many compilers out there that don’t fully support it, again most prominently MSVC until very recently.
log10 instead of going via the natural logarithm is faster and more precise due to floating point number reasons, which is why the Rust implementation uses it but in C it would be required to check at build-time if the function is available or not, which complicates the build process and can easily be forgotten.
libebur128 decided to not bother with these complications and simply not use it. Because of that, some conditional code in the Rust implementation is necessary for ensuring that both implementations return the same results in the tests.
libebur128 uses a linked-list-based queue data structure. As the C standard library is very minimal, no collection data structures are included. However on the BSDs and also on Linux with the GNU C library there is one available in
Of course MSVC does not have this and other compilers/platforms probably won’t have it either, so
libebur128 included a local copy of that queue implementation. Now when building, one has to decide whether there is a system implementation available or otherwise use the internal version. Or simply always use the internal version.
Copying implementations of basic data structures and algorithms into every single project is ugly and error-prone, so let’s maybe not do that. C not having a standardized mechanism for dependency handling doesn’t help with this, which is unfortunately why this is very common in C projects.
Thread-safe one-time initialization is another thing that is not defined by the C standard, and depending on your platform there are different APIs available for it or none at all. POSIX again defines one that is widely available, but you can’t really depend on it unconditionally.
This complicates the code and build procedure, so libebur128 simply did not do that and did its one-time initializations of some global arrays every time a new instance was created. Which is probably fine, but a bit wasteful and probably strictly-speaking according to the C standard not actually thread-safe.
The initial version of the
ebur128 Rust crate side-stepped this problem by simply doing this initialization once with the API provided by the Rust standard library. See part 2 and part 3 of this blog post for some more details about this.
Easier to Compile and Integrate
A Rust port only requires a Rust compiler, a mixed C/Rust codebase requires at least a C compiler in addition and some kind of build system for the C code.
CMake, which would be an additional dependency so in the initial version of the
ebur128 crate I went via cargo‘s
build.rs build scripts and the
cc crate as building
libebur128 is easy enough. This works but build scripts are problematic for integration of the Rust code into other build systems than cargo.
The Rust port also makes use of conditional compilation in various places. Unlike in C with the preprocessor, non-standardized and inconsistent platform
#defines and it being necessary to integrate everything in a custom way into the build system, Rust has a principled and well-designed approach to this problem. This makes it easier to keep the code clean, easier to maintain and more portable.
In addition to build system related simplifications, by not having any C code it is also much easier to compile the code to other targets like WebAssembly, which is natively supported by Rust. It is also possible to compile C to WebAssembly but getting both toolchains to agree with each other and produce compatible code seems not very easy.
The current version of the code produces the exact same results as the C version. This is enforced by the
quickcheck tests that are running randomized inputs through both versions and check that the results are the same. The code also succeeds all the tests in the EBU loudness test set, so should hopefully be standards compliant as long as the test implementation is not wrong.
Performance-wise the Rust implementation is at least as fast as the C implementation. In some configurations it’s a few percent faster but probably not enough that it actually matters in practice. There are various benchmarks for both versions in different configurations available. The benchmarks are based on the
criterion crate, which uses statistical methods to give as accurate as possible results.
criterion also generates nice results with graphs for making analysis of the results more pleasant. See part 3 of this blog post for more details.
Writing tests and benchmarks for Rust is so much easier and feels more natural then doing it in C, so the Rust implementation has quite good coverage of the different code paths now. Especially no struggling with build systems was necessary like it would have been in C thanks to cargo and Rust having built-in support. This alone seems to have the potential to cause Rust code having, on average, better quality than similar code written in C.
It is also possible to compile the Rust implementation into a C library with the great
cargo-c tool. This easily builds the code as a static/dynamic C library and installs the library, a C header file and also a pkg-config file. With this the Rust implementation is a 100% drop-in replacement of the C
libebur128. It is not even necessary to recompile existing code. See part 4 of this blog post for more details.
Apart from the Rust standard library the Rust implementation depends on two other, small and widely used crates. Unlike with C, depending on external dependencies is rather simple with Rust and
cargo. The two crates in question are
smallvecfor a dynamically sized vectors/arrays that can be stored on the stack up to a certain size and only then fall back to heap allocations. This allows to avoid a couple of heap allocations under normal usage.
bitflags, which provides a macro for implementing properly typed bitflags. This is used in the constructor of the main type for selecting the features and modes that should be enabled, which directly maps to how the C API works (just with less type-safety).
A common question when announcing a Rust port of some C library is how much
unsafe code was necessary to reach the same performance as the C code. In this case there are two uses of
unsafe code outside the FFI code to call the C implementation in the tests/benchmarks and the C API.
As part of that ringbuffer, explicit indexing of a slice is needed. While the indexes are already manually checked to wrap around when needed, the Rust compiler and LLVM can’t figure that out so additional bounds checks plus panic handling is present in the compiled code. Apart from slowing down the loop with the additional condition, the panic code also causes the whole loop to be optimized less well.
So to get around that,
unsafe indexing into the slice is used for performance reasons. While it requires a human now to check the memory safety of the code instead of relying on the compiler, the code in question is simple and small enough that it shouldn’t be a problem in practice.
More on this in part 2 and part 3 of this blog post.
Flushing Denormals to Zero
The other use of
unsafe code is in the filter that is applied to the incoming audio signal. On x86/x86-64 the
MXCSR register temporarily gets the
_MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered as zero.
This happens both for performance reasons as well as correctness reasons. Operations on denormals are generally much slower than on normalized floating point numbers. This has a measurable impact on the performance in this case.
Also as the C library does the same and not flushing denormals to zero would lead to slightly different results. While this difference doesn’t matter in practice as it’s very very small, it would make it harder to compare the results of both implementations as they wouldn’t be as close to each other anymore.
Doing this affects every floating point operation that happens while that bit is set, but because these are only the floating point operations performed by this crate and it’s guaranteed that the bit is unset again (even in case of panics) before leaving the filter, this shouldn’t cause any problems for other code.
Once the C library was ported and performance was comparable to the C implementation, I shortly checked the issues reported on the C library to check if there’s any useful feature requests or bug reports that I could implement / fix in the Rust implementation. There were three, one of which I also wanted for a future project.
None of the new features are available via the C API at this point for compatibility reasons.
Resetting the State
For this one there was a PR already for the C library. Previously the only way to reset all measurements was to create a new instance, which involves new memory allocations, filter initialization, etc..
It’s easy enough to provide a
reset method to do only the minimal work required to reset all measurements and restart with a fresh state so I’ve added that to the Rust implementation.
set_max_window() to actually work
This was a bug introduced in the C implementation a while ago in an attempt to prevent integer overflows when calculating sizes of memory allocations, which then would cause memory safety bugs because less memory was allocated than expected. Accidentally this fix restricted the allowed values for the maximum window size too much. There is a PR for fixing this in the C implementation.
On the Rust side this bug also existed because I simply ported over the checks. If I hadn’t ported over the checks, or ported an earlier version without the checks, there fortunately wouldn’t have been any memory safety bug on the Rust side though but instead one of two situations would have happened instead
- In debug builds integer overflows cause a panic, so instead of allocating less memory than expected during the setting of the parameters there would’ve been a panic immediately instead of invalid memory accesses later.
- In release builds integer overflows simply wrap around for performance reasons. This would’ve caused less memory than expected to be allocated, but later when trying to access the memory there would’ve been a panic when trying to access memory outside the allocated area.
While a panic is also not nice, it at least leads to no undefined behaviour and prevents worse things from happening.
The proper fix in this case was to not restrict the maximum window size statically but to instead check for overflows during the calculations. This is the same the PR for the C implementation does, but on the Rust side this is much easier because of built-in operations like
checked_mul for doing an overflow-checking multiplication. In C this requires some rather convoluted code (check the PR for details).
Support for Planar Audio Input
The last additional feature that I implemented was support for planar audio input, for which also a PR to the C implementation exists already.
Most of the time audio signals have the samples of each channel interleaved with each other, so for example for stereo you have an array of samples with the first sample for the left channel, the first sample for the right channel, the second sample for the left channel, etc.. While this representation has some advantages, in other situations it is easier or faster to work with planar audio: the samples of each channel are contiguous one after another, so you have e.g. first all the samples of the left channel one after another and only then all samples of the right channel.
The PR for the C implementation does this with some code duplication of existing macro code (which can be prevented by making the macros more complicated), on the Rust side I implemented this without any code duplication by adding an internal abstraction for interleaved/planar audio and iterating over the samples and then working with that in normal, generic Rust code. This required some minor refactoring and code reorganization but in the end was rather painless. Note that most of the change is addition of new tests and moving some code around.
When looking at the
Samples trait, the main part of this refactoring, one might wonder why I used closures instead of Rust iterators for iterating over the samples and the reason is unfortunately performance. More on this in part 3 of this blog post.
In the next part of this blog post I will describe the porting approach in detail and also give various examples for how to port C code to idiomatic Rust, and some examples of problems I was running into.