Linking and shrinking Rust static libraries: a tale of fire
From Centricular Devlog by Amy (Centricular)At the GStreamer project, we produce SDKs for lots of platforms: Linux, Android, macOS, iOS, and Windows. However, as we port more and more plugins to Rust 🦀, we are finding ourselves backed into a corner.
Rust static libraries are simply too big.
To give you an example, the AWS folks changed their SDK back in March to switch their cryptographic toolkit over to their aws-lc-rs crate [1]. However, that causes a 2-10x increase in code size (bug reports here and here), which gets duplicated on every plugin that makes use of their ecosystem!
What are Rust staticlibs made of?
To summarise, each Rust plugin packs a copy of its dependencies, plus a copy of the Rust standard library. This is not a problem on shared libraries and executables by their very nature, but on static libraries it causes several issues:
- Rust leaks unexported symbols from native staticlibs
 - On some platform, linking against multiple Rust staticlibs is impossible
 
First approach: Single-Object Prelinking
I won't bore you with the details as I've written another blog post on the subject; the gist is that you can unpack the library, and then ask the linker to perform "partial linking" or "relocatable linking" (Linux term) or "Single-Object Prelinking" (the Apple term, which I'll use throughout the post) over the object files. Setting which symbols you want to be visible for downstream consumers lets dead-code elimination take place at the plugin level, ensuring your libraries are now back to a reasonable size.
Why is it not enough?
Single-Object Prelinking has two drawbacks:
- Unoptimized code: the linker won't be able to deduplicate functions between melded objects, as they've been hidden by the prelinking process.
 - Windows: there are no officially supported tools (read: Visual Studio, LLVM, GCC) to perform this at the compiler level. It is possible to do this with binutils, but the PE-COFF format doesn't allow to change the visibility of unexported functions.
 
Melt all the object files with the power of dragons' fire!
As said earlier, no tools on Windows support prelinking officially yet, but there's another thing we can do: library deduplication.
Thanks to Rust's comprehensive crate ecosystem, I wrote a new CLI tool which I called dragonfire. Given a complete Rust workspace or list of static libraries, dragonfire:
- reads all the static libraries in one pass
 - deduplicates the object files inside them based on their size and naming (Rust has its own, unique naming convention for object files -- pretty useful!)
 - copies the duplicate objects into a new static library (usually called 
gstrsworkspaceas its primary use is for the GStreamer ecosystem) - removes the duplicates from the rest of the libraries
 - updates the symbol table in each of the libraries with the bundled LLVM tools
 
Thanks to the ar crate, the unpacking and writing only happens at stage 3, ensuring no wasteful I/O slowdowns takes place. The llvm-tools-preview component in turn takes care of locating and calling up llvm-ar for updating the workspace's symbol tables.
A special mention is deserved to the object files' naming convention. Assume a Rust staticlib named libfoo, its object files will be named as:
crate_name-hash1.crate_name.hash2-cgu.nnn.rcgu.o- On Windows only: 
foo.crate_name-hash1.crate_name.hash2-cgu.nnn.rcgu.o - On non-Windows platforms: same as above, but replacing 
foowithlibfoo-hash 
In all cases, crate_name means a dependency present somewhere in the workspace tree, and nnn is a number that will be bigger than zero whenever -C codegen-units was set to higher than 1.
For dragonfire purposes, dropping the library prefix is enough to be able to deduplicate object files; however, on Windows we can also find import library stubs, which LLVM can generate on its own by the use of the #[raw-dylib] annotation [2]. Import stubs can have any extension, e.g. .dll, .exe and .sys (the latter two coming from private Win32 APIs). These stubs cannot be deduplicated as they are generated individually per imported function, so dragonfire must preserve them where they are.
Drawbacks of object file deduplication
Again there are several disadvantages of this approach. On Apple platforms, deduplicating libraries triggers a strange linker error, which I've not seen before:
ld: multiple errors: compact unwind must have at least 1 fixup in '<framework>/GStreamer[arm64][1021](libgstrsworkspace_a-3f2b47962471807d-lse_ldset4_acq.o)'; r_symbolnum=-19 out of range in '<framework>/GStreamer[arm64][1022](libgstrsworkspace_a-compiler_builtins-350c23344d78cfbc.compiler_builtins.5e126dca1f5284a9-cgu.162.rcgu.o)'
This also led me to find that Rust libraries were packing bitcode, which is forbidden by Apple. (This was thankfully already fixed before shipping time, but we've not yet updated our Rust minimum version to take advantage of it.)
Another drawback is that Rust's implementation of LTO causes dead-code elimination at the crate level, as opposed to the workspace level. This makes object file deduplication impossible, as each copy is different.
For the Windows platform, there is an extra drawback which affects specifically object files produced by LLVM: the COMDAT sections are set to IMAGE_COMDAT_SELECT_NODUPLICATES. This means that the linker will outright reject functions with multiple definitions, rather than realise they're all duplicates and discarding all but one of the copies. MSVC in particular performs symbol resolution before dead-code elimination. This means that linking will fail because of unresolved symbols before dead code elimination kicks in; to use deduplicated libraries, one must set the linker flags /OPT:REF /FORCE:UNRESOLVED to ensure the dead code can be successfully eliminated.
Results
With library deduplication, we can make static libraries up to 44x smaller when building under MSVC [3] (you can expand the tables below for the full comparison):
- gstaws.lib: from 173M to 71M (~2.5x)
 - gstrswebrtc.lib: from 193M to 66M (~2.9x)
 - gstwebrtchttp.lib: from 66M to 1,5M (~ 44x)
 
Table: before and after melding under MSVC
| file | no prelinking | melded | 
|---|---|---|
| gstaws.lib | 173M | 71M | 
| gstcdg.lib | 36M | 572K | 
| gstclaxon.lib | 32M | 568K | 
| gstdav1d.lib | 34M | 936K | 
| gstelevenlabs.lib | 59M | 1008K | 
| gstfallbackswitch.lib | 37M | 2,3M | 
| gstffv1.lib | 34M | 744K | 
| gstfmp4.lib | 39M | 3,2M | 
| gstgif.lib | 34M | 1,1M | 
| gstgopbuffer.lib | 30M | 456K | 
| gsthlsmultivariantsink.lib | 46M | 1,6M | 
| gsthlssink3.lib | 41M | 1,2M | 
| gsthsv.lib | 34M | 796K | 
| gstjson.lib | 31M | 704K | 
| gstlewton.lib | 33M | 1,2M | 
| gstlivesync.lib | 33M | 728K | 
| gstmp4.lib | 38M | 2,2M | 
| gstmpegtslive.lib | 31M | 704K | 
| gstndi.lib | 38M | 2,8M | 
| gstoriginalbuffer.lib | 34M | 376K | 
| gstquinn.lib | 75M | 23M | 
| gstraptorq.lib | 33M | 2,4M | 
| gstrav1e.lib | 46M | 11M | 
| gstregex.lib | 38M | 404K | 
| gstreqwest.lib | 58M | 1,4M | 
| gstrsanalytics.lib | 35M | 1000K | 
| gstrsaudiofx.lib | 54M | 22M | 
| gstrsclosedcaption.lib | 52M | 8,4M | 
| gstrsinter.lib | 35M | 604K | 
| gstrsonvif.lib | 46M | 2,0M | 
| gstrspng.lib | 35M | 1,2M | 
| gstrsrtp.lib | 59M | 11M | 
| gstrsrtsp.lib | 57M | 4,4M | 
| gstrstracers.lib | 40M | 2,4M | 
| gstrsvideofx.lib | 48M | 11M | 
| gstrswebrtc.lib | 193M | 66M | 
| gstrsworkspace.lib | N/A | 137M | 
| gststreamgrouper.lib | 30M | 376K | 
| gsttextahead.lib | 30M | 332K | 
| gsttextwrap.lib | 32M | 2,1M | 
| gstthreadshare.lib | 52M | 12M | 
| gsttogglerecord.lib | 35M | 808K | 
| gsturiplaylistbin.lib | 31M | 648K | 
| gstvvdec.lib | 34M | 564K | 
| gstwebrtchttp.lib | 66M | 1,5M | 
The results from the melding above can be compared with the file sizes obtained using LTO on Windows [4] (remember it doesn't actually fix linking against plugins):
- gstaws.lib: from 71M (LTO) to 67M (melded) (-5.6%)
 - gstrswebrtc.lib: from 105M to 66M (-37.1%)
 - gstwebrtchttp.lib: from 28M to 1,5M (-94.6%)
 
Table: before and after LTO under MSVC (no melding involved)
| file (codegen-units=1 in all cases) | no prelinking | lto=thin | opt-level=s + lto=thin | debug=1 + opt-level=s | debug=1 + lto=thin + opt-level=s | 
|---|---|---|---|---|---|
| old/gstaws.lib | 199M | 199M | 171M | 78M | 67M | 
| old/gstcdg.lib | 11M | 11M | 11M | 7,5M | 7,5M | 
| old/gstclaxon.lib | 11M | 11M | 11M | 7,7M | 7,7M | 
| old/gstdav1d.lib | 12M | 12M | 12M | 7,9M | 7,8M | 
| old/gstelevenlabs.lib | 52M | 52M | 49M | 24M | 22M | 
| old/gstfallbackswitch.lib | 18M | 18M | 17M | 11M | 11M | 
| old/gstffv1.lib | 11M | 11M | 11M | 7,6M | 7,6M | 
| old/gstfmp4.lib | 20M | 20M | 19M | 12M | 11M | 
| old/gstgif.lib | 12M | 12M | 12M | 7,9M | 7,9M | 
| old/gstgopbuffer.lib | 9,7M | 9,7M | 9,7M | 7,5M | 7,4M | 
| old/gsthlsmultivariantsink.lib | 16M | 16M | 16M | 9,6M | 9,4M | 
| old/gsthlssink3.lib | 14M | 14M | 14M | 8,9M | 8,8M | 
| old/gsthsv.lib | 11M | 11M | 11M | 7,8M | 7,7M | 
| old/gstjson.lib | 12M | 12M | 12M | 8,4M | 8,2M | 
| old/gstlewton.lib | 12M | 12M | 12M | 8,1M | 8,1M | 
| old/gstlivesync.lib | 12M | 12M | 12M | 8,3M | 8,2M | 
| old/gstmp4.lib | 17M | 17M | 17M | 9,9M | 9,7M | 
| old/gstmpegtslive.lib | 12M | 12M | 12M | 8,0M | 7,9M | 
| old/gstndi.lib | 21M | 21M | 20M | 12M | 11M | 
| old/gstoriginalbuffer.lib | 9,6M | 9,6M | 9,7M | 7,4M | 7,3M | 
| old/gstquinn.lib | 94M | 94M | 86M | 39M | 35M | 
| old/gstraptorq.lib | 18M | 18M | 17M | 9,8M | 9,4M | 
| old/gstrav1e.lib | 39M | 39M | 37M | 19M | 18M | 
| old/gstregex.lib | 26M | 26M | 25M | 14M | 14M | 
| old/gstreqwest.lib | 53M | 53M | 49M | 24M | 22M | 
| old/gstrsanalytics.lib | 15M | 15M | 14M | 9,2M | 8,9M | 
| old/gstrsaudiofx.lib | 57M | 57M | 56M | 23M | 22M | 
| old/gstrsclosedcaption.lib | 40M | 40M | 36M | 20M | 18M | 
| old/gstrsinter.lib | 14M | 14M | 13M | 8,5M | 8,4M | 
| old/gstrsonvif.lib | 21M | 21M | 20M | 11M | 11M | 
| old/gstrspng.lib | 13M | 13M | 13M | 8,2M | 8,2M | 
| old/gstrsrtp.lib | 47M | 47M | 44M | 22M | 20M | 
| old/gstrsrtsp.lib | 35M | 35M | 33M | 16M | 15M | 
| old/gstrstracers.lib | 28M | 28M | 27M | 16M | 15M | 
| old/gstrsvideofx.lib | 16M | 16M | 35M | 9,2M | 15M | 
| old/gstrswebrtc.lib | 329M | 329M | 284M | 124M | 105M | 
| old/gststreamgrouper.lib | 9,6M | 9,6M | 9,7M | 7,2M | 7,2M | 
| old/gsttextahead.lib | 9,6M | 9,6M | 9,5M | 7,4M | 7,3M | 
| old/gsttextwrap.lib | 13M | 13M | 13M | 8,4M | 8,4M | 
| old/gstthreadshare.lib | 49M | 49M | 45M | 23M | 20M | 
| old/gsttogglerecord.lib | 13M | 13M | 13M | 8,5M | 8,4M | 
| old/gsturiplaylistbin.lib | 11M | 11M | 11M | 7,9M | 7,9M | 
| old/gstvvdec.lib | 11M | 11M | 11M | 7,5M | 7,5M | 
| old/gstwebrtchttp.lib | 69M | 69M | 63M | 30M | 28M | 
Conclusion
This article presents several longstanding pain points in Rust, namely staticlib binary sizes, symbol leaking, and incompatibilities between Rust and MSVC. I demonstrate the tool dragonfire that aims to address and work around, where possible, these issues, along with remaining issues to be addressed.
As explained earlier, dragonfire treated libraries are live on all platforms except Apple's, if you use the development packages from mainline; it's on track hopefully for the 1.28 release of GStreamer. There's already a merge request pending to enable it for Apple platforms, we're only waiting to update the Rust mininum version.
If you want to have a look, dragonfire's source code is available at Freedesktop's GitLab instance. Please note that at the moment I have no plans to submit this to crates.io.
Feel free to contact me with any feedback, and thanks for reading!
See its
default-https-clientfeature at lib.rs, you will find it throughout the AWS SDK ecosystem. ↩︎https://doc.rust-lang.org/reference/items/external-blocks.html#dylib-versus-raw-dylib ↩︎
In all cases the
-Cflags aredebug=1+codegen-units=1+opt-level=s; see this comment for the complete results across all platforms. ↩︎Source: https://gitlab.freedesktop.org/gstreamer/cerbero/-/merge_requests/1895 ↩︎
From  GStreamer News  by 
From  Christian F.K. Schaller  by 


 Dialog that gives detailed instructions for how to add G Code

















 



︎























