November 29, 2020

Phil NormandCatching up on WebKit GStreamer WebAudio backends maintenance

(Phil Normand)

Over the past few months the WebKit development team has been working on modernizing support for the WebAudio specification. This post highlights some of the changes that were recently merged, focusing on the GStreamer ports.

My fellow WebKit colleague, Chris Dumez, has been very active lately, updating the WebAudio implementation …

by Philippe Normand at November 29, 2020 12:45 PM

November 26, 2020

Jean-François Fortin TamLes cônes oranges du libre: du ramollissement de l’enthousiasme face aux chantiers technologiques

Dans ce billet, l’ami “Antistress” (Thibaut) fait un état des lieux des grands changements technologiques du libre autour de la plateforme GNU+Linux depuis une quinzaine d’années, lorsque plusieurs d’entre nous ont commencé à utiliser cette plateforme plus intensivement. Il y laisse également transpirer un certain épuisement, ou “manque d’excitation”, qui est probablement ressenti par plusieurs d’entre nous.

Pour ma part j’ai commencé à bidouiller avec Linux en 2003, mais ce n’est qu’avec la première version d’Ubuntu, 4.10, que j’ai pu y coller à plein temps (note: je suis passé à Fedora depuis 2010). Et depuis, ça a été une histoire de patience. Beaucoup, beaucoup de patience.

Je vois souvent des parallèles entre le monde du logiciel libre et la ville de Montréal. Comme les chantiers dans les rues de Montréal qui semblent éternels (au point où nous avons un cône orange qui nous sert de mascotte informelle), on a l’impression de ne pas voir le bout de nos chantiers technologiques.

Les cônes oranges typiques du paysage montréalais, une photo du Devoir (pour mes besoins de parodie)

La tâche de modernisation technologique de notre système d’opération favori est colossale, a impliqué nombre d’individus et de compagnies (qui sont venues et parties à divers moments des “desktop wars” et “mobile wars”), et nombres de rebondissements et apprentissages en cours de route. On peut penser notamment:

  • à Xorg et ses multiples tentatives d’avoir du compositing qui tienne la route (incluant XGL, AIGLX, Compiz/Beryl, Metacity vs Mutter, GNOME Shell, KWin) et l’éventuel ras-le-bol-on-recommence-avec-une-architecture-qui-ne-date-pas-des-années-70 qu’est Wayland (visionnement obligatoire: cette merveilleuse présentation de Daniel Stone);
  • au chantier du multimédia avec GStreamer et PulseAudio (et maintenant/prochainement: PipeWire);
  • au long combat de modernisation de la technologie de Firefox pour survivre face à l’épidémie d’obésité du web et rattraper les performances de Chrome/Chromium;
  • à l’attente de pilotes graphiques libres, performants et stables pour les cartes graphiques AMD (depuis leur changement de stratégie circa 2010 où AMD s’est mise à publier de la documentation pour soutenir les pilotes libres, et quelques années plus tard aller plus loin et rationaliser leurs pilotes propriétaires avec les pilotes libres);
  • à l’interminable migration de l’écosystème d’applications GTK2 vers GTK3 (juste à temps pour GTK4, tiens!), idem pour Python 2 vs Python 3;
  • à l’interminable attente, de 2010 à 2019, pour que des gens compétents se penchent de manière concertée sur la question de l’optimisation de performances de GNOME Shell (depuis 3.36 et 3.38, c’est le pur bonheur);
  • etc.

Dans le monde du libre, nous sommes tous un peu sado-masos. Masochistes pour ce qui est de la patience dont nous devons faire preuve, et sadiques dans notre relation amour-haine envers nos logiciels favoris (je ne me suis pas gêné, par le passé, pour être assez critique sur mon blog concernant les performances de Firefox; critiques qui sont, une décennie plus tard, enfin résolues), de la même façon que les Montréalais(e)s ont une relation amour-haine avec les cônes oranges (même sans être fans de VLC). Et c’est sans compter la déprimante hégémonie de Google et Apple dans le monde du mobile au sein de la population, ce qui confine notre impact au monde du “desktop”.

Alors effectivement, après toutes ces années, on devient un peu moins excités que dans notre jeunesse où on se faisait fervents évangélistes du libre. Thibaut est assez clair lorsqu’il écrit ceci:

[…] je dirais qu’à part pour le premier item (la finalisation des chantiers en cours de Firefox sous GNU/Linux), je manque un peu d’excitation.

Pour expliquer ce phénomène, je crois que je dirais ceci: du côté “desktop Linux” il y a eu beaucoup de progrès un peu partout au fil des années, et une des raisons pour laquelle nous sommes peut-être moins excités par rapport aux nouveautés sur nos ordinateurs, c’est aussi parce que “globalement, ça marche maintenant” (vous souvenez-vous de l’époque pré-Xorg où on devait s’attendre à ce que l’interface graphique ne fonctionne pas au premier démarrage? et l’époque 2004-2012 où on devait se battre pour avoir du WiFi qui fonctionne? Heureusement je n’ai plus croisé de Broadcom dans les dernières années…).

Donc, il ne reste plus tant de choses criantes que ça à régler. Et ce qui reste à régler… remplace souvent des choses qui “fonctionnent”. De 2003 à 2013 environ, j’ai eu plusieurs situations où mon système de fichiers ext4 s’est spontanément corrompu “par lui-même”, sans explication (alors que ext4 était supposément stable et largement testé depuis des années). Depuis 2013-2014, ça ne m’est plus arrivé. Alors, pour reprendre le constat d’Antistress concernant l’adoption modeste de Btrfs, excusez-moi, mais je n’ai aucun intérêt ni enthousiasme à reformatter mon disque dur 2 teraoctets (détenant mon /home) de ext4 vers Btrfs. Ni l’empressement de passer à Fedora Silverblue au lieu de Fedora Workstation. Je vais laisser les nouveaux jeunes s’amuser avec ça!

by Jeff at November 26, 2020 05:25 PM

November 05, 2020

Jean-François Fortin TamActiver la réduction de bruit en temps réel et l’annulation acoustique d’écho pour un microphone sous Linux avec PulseAudio (et faire croire aux gens que vous roulez en Porsche)

On se souviendra que j’avais témoigné de mon admiration pour le acoustic echo cancelling dans Empathy/Telepathy via PulseAudio il y a plusieurs années.

Or, Empathy et Telepathy sont morts et enterrés, et WebRTC a largement pris leur place (ça, on le voyait venir) avec des applications comme Jitsi Meet, BigBlueButton, et plein de sites web SaaS propriétaires.

Ce que j’avais oublié de dire en 2012, c’est que l’AEC et débruiteur peuvent également être activés de façon permanente pour toutes les applications utilisant le micro, sans que ce soit forcément Empathy/Telepathy. Cette fonctionnalité est toujours disponible dans PulseAudio en 2020*.

Cela peut être utile si vous ne voulez pas traiter vos enregistrements dans Audacity à chaque fois, ou pour les quelques cas d’utilisation de visioconférence où l’annulation acoustique d’écho n’est pas prise en charge par votre application (n’oublions pas que echo concealment n’est pas la même chose que echo cancellation). Ce n’est pas tellement mon cas (ayant de l’équipement de studio de production avec des microphones bien amplifiés et sans bruit de fond), mais ça peut être pratique pour les libristes moins bien équipé(e)s qui doivent tout de même participer aux appels conférence de plus en plus fréquents mais ne veulent pas infliger aux autres le bruit de leur microphone analogique pourri collé sur le ventilateur d’un ordinateur en surchauffe. Enfin, je suppose que toutes sortes de gens veulent se débarrasser de leur echo en 2020:

Pour activer la fonctionnalité “par utilisateur” (survit aux réinstallations du système si vous préservez votre partition home), éditez le fichier ~/.config/pulse/default.pa ; Si, au contraire, vous voulez l’activer system-wide pour tous les utilisateurs, éditer /etc/pulse/default.pa, et y insérer le contenu suivant:

load-module module-echo-cancel source_name=noechosource sink_name=noechosink
set-default-source noechosource
set-default-sink noechosink

On aurait pu écrire “Fiji” au lieu de “noechosource” et “lavabo” au lieu de “noechosink” je suppose. Il y a aussi une pléthore d’options possibles pour le module echo cancel, mais on va garder ça simple. Comme une Toyota.

Dans le cas de ~/.config/pulse/default.pa, n’oubliez pas que votre fichier doit hériter du fichier système, et donc commencer avec ces lignes au début du fichier:

.include /etc/pulse/default.pa

Puis vous pouvez redémarrer l’instance PulseAudio de votre session d’usager en le stoppant (puisqu’il redémarrera tout seul, du moins si vous êtes sous GNOME): pulseaudio --kill

Des nouveaux périphériques virtuels d’enregistrement seront alors à votre disposition dans pavucontrol (ou le mixeur audio de GNOME Control Center). Mais avant de vous lancer dans toute cette aventure, peut-être voudrez-vous renommer vos périphériques matériels aux yeux de PulseAudio… ainsi tout le monde dans la visio croira que vous roulez en Porsche.


*: du moins, jusqu’à ce qu’on en ait marre de PulseAudio et qu’on le remplace par un autre système qui fait le café dans les années à venir…

by Jeff at November 05, 2020 12:00 PM

October 31, 2020

Jean-François Fortin TamSpooky GTG features to try out for Halloween 2020

Are you an irresistible creature with an insatiable love for the dead… bugs? Well, grab your bug hunter crossbow, because we need you to test some big technological changes in GTG so that we can confidently release version 0.5 sometime soon (way before the year end, ideally).

Synchronizing your tasks across devices using CalDAV

A new synch backend/plugin was developed by Mildred during the summer. I think that’s super cool and it will probably be useful to many of you! I will now let this thematic tweet speak for itself (so that you can retweet it too today):

It would be great to have many adventuring early-adopters try out this feature. See the details (and report issues) in this ticket. This has not yet been merged into the main branch of the Git version of GTG, as I would like to have more people testing it before it is included by default in what will become a stable GTG release. Please indicate your success or failures by voting on this comment.

Changes related to the local file format data storage

In an attempt to rule out the possibility of our XML file format being the cause of GTG’s performance problems that occur when you have multiple hundreds of tasks (I have a thousand), Diego ported GTG’s XML storage backend to LXML, which is reportedly 40 times faster.

  • Unfortunately, that was not the silver bullet; the performance issues remained, but at least now we can rule out XML (and the task editor, see below) as suspects, and we can move on to a different part of the problem.
  • We suspect our performance woes have something to do with sorting in liblarch, or perhaps deeper: maybe it’s all GtkTreeView/GtkTreeModel’s fault? Who knows.
  • Do you take pride in optimizing the sh!t out of GTK applications? Do you have experience in duck-taping a jetpack onto a turtle? Join the party! Your help is more than welcome in investigating our performance problems and devising potential fixes. We will owe you beer and candy.

In addition to the LXML port, another area where Diego brought down the might of his code refactoring skills is the new “task editor” backend and view renderer, which includes a bunch of new features and refinements such as support for subheadings (with the “#” markdown syntax), the use of GTK checkboxes to represent subtasks, tag highlight colors that match the tag’s color, and a bunch of bugfixes. It was also envisioned as a way to address one of the performance issues I theorized about. This is what it looks like:

New taskview screenshot

And as if that wasn’t enough, Diego set out to redesign the XML file format from a new specification devised with the help of Brent Saner (proposal episodes 1, 2 and 3). The new file format is not yet merged into master, and we consider it quite experimental. If you’re feeling courageous and are looking for a bug hunting challenge, back up your data (really) before trying it out with the “file_format_2” branch (which you can see in the current list of branches).

As you can imagine, Diego’s changes are all major, invasive technological changes (particularly the proposed file format change), and they would benefit from extensive testing by everybody before 0.5 happens. I’ve done some pretty exhaustive testing before he merged his first two branches of course, but I’m only human, and it is possible that issues might remain. To make testing easier, a snapshot of the current git version has been packaged as a Flatpak package, available here. As for the third (file format) branch, we’d like to have more people testing it with copies of “real data” before we feel confident about merging it, so you’ll have to build the git version yourself instead of using the flatpak. So please grab GTG’s git version ASAP, with a copy of your data (see the instructions in the README, including the “Where is my user data and config stored?” section), and torture-test it to make sure everything is working properly, and report issues you may find (if any).

Other ongoing improvements

There have been countless bug fixes and improvements going on in recent months, here are a few that I can remember from the top of my head:

  • The main window’s previously active tab/mode is now restored when you launch the application.
  • Task windows no longer all reopen on launch when you shutdown the PC without closing GTG first.
  • There is a setting in GTG’s preference now to specify whether you are using a dark theme or not, so that GTG can adapt its colors accordingly. From what I am told, it seems GNOME still doesn’t have a mechanism for applications to know for certain if the user’s system is using a dark theme, or to connect to a signal when this changes…
  • Izidor Matusov made a surprise appearance to nuke the pyxdg dependency from orbit
  • Diego landed the new icon, along with a nighty version of it. I mourn the pixel-perfectness of the oldschool bitmap icon, but there are worse things going on in 2020 ;)
  • The about dialog now properly shows the application icon
  • Parent tasks automatically close their window when you click to open a subtask, which makes that consistent with closing the subtask’s window when you click “Open parent”
  • Neui started by working on the German translation, then quickly got addicted and started fixing a bunch of papercuts in the code! Among other things, Neui removed dbus-python in favor of gdbus, made the plugins’ “about” text translatable, documented the contribution process for translators, and various other things.
  • New Swedish and Hungarian translation updates have been contributed!
  • Francisco Lavin fixed the Hamster plugin! Productivity metric rodents rejoice! Please test this thing.
  • Recurring tasks. Holy shit.

More great things to come.


P.s.: I realized today that I totally forgot to mention this in the context of GTG: I have a personal news announcement mailing list where I send out notifications about my new blog posts (with a short hand-crafted summary), as courtesy to my readers. Over the past 12 months, I have sent 5 such emails (so, not exactly a deluge), and they have so far all had something to do with GTG. If you’re afraid of missing some news about GTG, or don’t have time to be following every ticket and every tweet, feel free to subscribe to my mailing list. Details and subscription form can be found here (that page sucks, I know, I need to write something simpler).

by Jeff at October 31, 2020 09:29 PM

Jean-François Fortin TamL’indicible horreur des fichiers SQLite (et les mandelbugs de corruption de profil Firefox)

En cet Halloween, je vous partage aujourd’hui une petite histoire d’horreur et quelques astuces, fruit de mes observations sur un mystérieux phénomène survenu en 2017. En fait, ce billet est resté dans mes brouillons depuis trois ans et trois semaines (essayez de battre ce record, bande de flemmards), et je me suis dit qu’il vallait mieux que j’en finisse avant la réélection de Trump, et publie ce billet enfin… et puis hein, ho, ça fait une éternité que j’ai bloggé en Français et sur Planet Libre, alors coucou Planet Libre, ça fait un bail. Regardez, j’ai ressuscité GTG cette année, c’est bien, ça fait le café. D’ailleurs, gardez l’oeil ouvert sur mon blog car il y a du nouveau qui s’en vient, même si c’est pas forcément en Français (et donc pas sur Planet Libre)…


L’indiscible horreur des fichiers SQLite

Je n’ai pas beaucoup d’amour pour les fichiers SQLite, format répandu de stockage de données pour les applications logicielles. Je trouve que:

  1. cela obscurcit les données (pas inspectables et éditables avec un éditeur de texte comme c’est le cas du XML, par exemple);
  2. ce format a tendance à créer des problèmes de performance où on doit régulièrement passer l’aspirateur (fait amusant: c’est quatre ans plus tard que Jim, mainteneur de Geary à l’époque, en vint à découvrir le problème de performance que peut représenter SQLite… en étudiant mon billet de blog!);
  3. ça a la fâcheuse habitude, dans le cas de Firefox (et Tracker), à se corrompre beaucoup trop facilement (et rendu là, la seule solution est d’exploser le fichier au complet, à cause du point #1).

Le bizarre incident du panda rouge pendant dans la nuit

Alors, pour ce qui est de la petite anecdote “récente”…

Au début de l’automne 2017, j’ai constaté que certains sites web dynamiques (comme LinkedIn, appear.in, rocketchat) refusaient de plus en plus souvent de charger dans Firefox 55, puis, avec Firefox 56, ont déclaré la grève générale illimitée et cessé entièrement de fonctionner. Perplexe, à ce moment là j’ai tenté de désactiver toutes mes extensions, vider le cache web, vider le cache applicatif à partir de la page de préférences de Firefox, emprisonner le fou du village, rien n’y faisait.

Or, selon mes expériences précédentes, je me suis dit “Si le problème se produit sur un ordinateur particulier et non un autre… la cause est probablement un fichier SQLite corrompu”. Et comme de fait, en utilisant la console de debug web de Firefox, je découvris que ce dernier rhâlait avec des erreurs “NS_ERROR_FILE_CORRUPTED” en tentant de charger LinkedIn, ce qui m’a mené vers ce sujet de discussion dans StackOverflow, entre autres.

Il se trouve que, cette fois-là, mon fichier ~/.mozilla/firefox/profil_normal.default/webappsstore.sqlite avait muté hors de contrôle. Il pesait 86 Mo, ce qui est très louche déjà, et même si je l’ouvrais avec sqlite3 et lançais “VACUUM;”, la taille ne descendait qu’à 84 Mo. Il était clair que c’était devenu…

Alors, il ne restait qu’une solution…

J’ai donc supprimé complètement le fichier, qui s’est recréé par lui même ensuite. Sa nouvelle taille: 622 ko. Et du coup, tous mes sites applicatifs se sont mis à fonctionner. CQFD.

J’en ai alors profité pour inspecter le profil Firefox dans mon autre ordinateur, simplement en triant par taille—j’ai toujours une certaine méfiance envers les “gros fichiers” dans ce dossier (l’expérience oblige). “Tiens donc,” m’exclamai-je, “ce webappstore.sqlite me semble également bien en chair!”

Alors, cette fois là, il s’agissait de webappsstore.sqlite, au lieu de l’habituel coupable “urlclassifier3.sqlite” (qui ne semble plus exister depuis l’été 2016 environ, comme la date de modification l’indique) ou, occasionnellement, “places.sqlite” (attention, concernant ce dernier: il semble qu’il contienne également vos favoris, que vous feriez bien d’exporter au préalable d’une frappe nucléaire; voyez le glossaire des divers fichiers présents dans le dossier de profil de Firefox).

Heureusement, je n’ai pas rencontré de nouveaux problèmes depuis cet incident de 2017. On dirait qu’avec les années, ces mandelbugs se font un peu plus rares, ou se font plus discrets. Mais parfois, seul dans la nuit froide d’octobre, je me demande s’il n’y a pas encore une bestiole mutante qui grouille dans le pelage de mon Firefox. Fichu concept de stockage par fichiers SQLite corruptibles. Fichu char d’assault bipède nucléaire.


P.s.: Malgré tout, j’aime quand même Firefox; c’est le navigateur que j’utilise et préconise (en même temps, c’est le seul navigateur un peu digne de confiance qu’il nous reste, avec un moteur indépendant…). Mais ce billet est néanmoins une anecdote amusante à raconter et je me dis qu’il pourrait être, potentiellement, utile à d’autres personnes.

by Jeff at October 31, 2020 06:22 AM

October 28, 2020

Bastien NoceraSandboxing inside the sandbox: No rogue thumbnailers inside Flatpak

(Bastien Nocera)

 A couple of years ago, we sandboxed thumbnailers using bubblewrap to avoid drive-by downloads taking advantage of thumbnailers with security issues.

 It's a great tool, and it's a tool that Flatpak relies upon to create its own sandboxes. But that also meant that we couldn't use it inside the Flatpak sandboxes themselves, and those aren't always as closed as they could be, to support legacy applications.

 We've finally implemented support for sandboxing thumbnailers within Flatpak, using the Spawn D-Bus interface (indirectly).

This should all land in GNOME 40, though it should already be possible to integrate it into your Flatpaks. Make sure to use the latest gnome-desktop development version, and that the flatpak-spawn utility is new enough in the runtime you're targeting (it's been updated in the freedesktop.org runtimes #1, #2, #3, but it takes time to trickle down to GNOME versions). Example JSON snippets:

        {
"name": "flatpak-xdg-utils",
"buildsystem": "meson",
"sources": [
{
"type": "git",
"url": "https://github.com/flatpak/flatpak-xdg-utils.git",
"tag": "1.0.4"
}
]
},
{
"name": "gnome-desktop",
"buildsystem": "meson",
"config-opts": ["-Ddebug_tools=true", "-Dudev=disabled"],
"sources": [
{
"type": "git",
"url": "https://gitlab.gnome.org/GNOME/gnome-desktop.git"
}
]
}  

(We also sped up GStreamer-based thumbnailers by allowing them to use a cache, and added profiling information to the thumbnail test tools, which could prove useful if you want to investigate performance or bugs in that area)

Edit: correct a link, thanks to the commenters for the notice

by Bastien Nocera (noreply@blogger.com) at October 28, 2020 11:20 AM

October 26, 2020

GStreamerGStreamer 1.18.1 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce the first bug fix release in the stable 1.18 release series of your favourite cross-platform multimedia framework!

This release contains important security fixes. We suggest you upgrade at your earliest convenience.

This release only contains bugfixes and it should be safe to update from 1.18.0.

See the GStreamer 1.18.1 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Download tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-devtools, gstreamer-vaapi, gstreamer-sharp, gst-omx, or gstreamer-docs.

October 26, 2020 11:00 AM

October 21, 2020

GStreamerGStreamer 1.16.3 old-stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce the third and likely last bug fix release in the stable 1.16 release series of your favourite cross-platform multimedia framework!

This release contains important security fixes. We suggest you upgrade at your earliest convenience, either to 1.16.3 or 1.18.

This release only contains bugfixes and it should be safe to update from 1.16.x.

See the GStreamer 1.16.3 release notes/ for the details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Download tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-vaapi, or gst-omx.

October 21, 2020 08:00 PM

October 15, 2020

Andy Wingoon "binary security of webassembly"

(Andy Wingo)

Greets!

You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

reader-response theory

For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.

I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.

the new criticism

Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

I found the answer to be quite nuanced. To me, the paper shows three interesting things:

  1. Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.

  2. Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.

  3. It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.

There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

on mitigations

It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.

chapeau

In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

by Andy Wingo at October 15, 2020 10:29 AM

October 13, 2020

Nirbheek ChauhanBuilding GStreamer on Windows the Correct Way

For the past 4 years, Tim and I have spent thousands of hours on better Windows support for GStreamer. Starting in May 2016 when I first wrote about this and then with the first draft of the work before it was revised, updated, and upstreamed.

Since then, we've worked tirelessly to improve Windows support in GStreamer  with patches to many projects such as the Meson build system, GStreamer's Cerbero meta-build system, and writing build files for several non-GStreamer projects such as x264, openh264, ffmpeg, zlib, bzip2, libffi, glib, fontconfig, freetype, fribidi, harfbuzz, cairo, pango, gtk, libsrtp, opus, and many more that I've forgotten. 

More recently, Seungha has also been working on new GStreamer elements for Windows such as d3d11, mediafoundation, wasapi2, etc. Sometimes we're able to find someone to sponsor all this work, but most of the time it's on our own dime.

Most of this has been happening in the background; noticed only by people who follow GStreamer development. I think more people should know about the work that's been happening upstream, and the official and supported ways to build GStreamer on Windows. Searching for this on Google can be a very confusing experience with the top results being outdated links or just plain clickbait.

So here's an overview of your options when you want to use GStreamer on Windows:

Installing GStreamer on Windows

 
GStreamer has released MinGW binary installers for Windows since the early 1.0 days using the Cerbero meta-build system which was created by Andoni for the non-upstream "GStreamer SDK" project, which was based on GStreamer 0.10.
 
Today it supports building GStreamer with both MinGW and Visual Studio, and even supports outputting UWP packages. So you can actually go and download all of those from the download page:


This is the easiest way to get started with GStreamer on Windows.
 

Building GStreamer yourself for Deployment

 
If you need to build GStreamer with a custom configuration for deployment, the easiest option is to use Cerbero, which is a meta-build system. It will download all the dependencies for you (including most of the build-tools), build them with Autotools, CMake, or Meson (as appropriate), and output a neat little MSI installer.
 
The README contains all the information you need, including screenshots for how to set things up:


As of a few days ago, after months of work the native Cerbero Windows builds have also been integrated into our Continuous Integration pipeline that runs on every merge request, which further improves the quality of our Windows support. We already had native Windows CI using gst-build, but this increases our coverage.

Contributing to GStreamer on Windows

 
If you want to contribute to GStreamer from Windows, the best option is to use gst-build (created by Thibault), which is basically a meson 'wrapper' project that has all the gstreamer repositories aggregated as subprojects. Once again, the README file is pretty easy to follow and has screenshots for how to set things up:


This is also the method used by all GStreamer developers to hack on gstreamer on all platforms, so it should work pretty well out of the box, and it's tested on the CI. If it doesn't work, come poke us on #gstreamer on FreeNode IRC or on the gstreamer mailing list.
 

It's All Upstream.

 
You don't need any special steps, and you don't need to read complicated blog posts to build GStreamer on Windows. Everything is upstream.

This post previously contained examples of such articles and posts that are spreading misinformation, but I have removed those paragraphs after discussion with the people who were responsible for them, and to keep this post simple. All I can hope is that it doesn't happen again.

by Nirbheek (noreply@blogger.com) at October 13, 2020 05:16 PM

Andy Wingomalloc as a service

(Andy Wingo)

Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.

loading walloc...

JavaScript disabled, no walloc demo. See the walloc web page for more information. >&&<&>>>>>&&>

The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.

wat?

The name of the allocator is "walloc", in which the w is for WebAssembly.

Walloc was designed with the following priorities, in order:

  1. Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.

  2. Reasonable allocation speed and fragmentation/overhead.

  3. Small size, to minimize download time.

  4. Standard interface: a drop-in replacement for malloc.

  5. Single-threaded (currently, anyway).

Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.

You can check out all the details at the walloc project page; a selection of salient bits are below.

Firstly, to build walloc, it's just a straight-up compile:

clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c

The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:

typedef __SIZE_TYPE__ size_t;

#define WASM_EXPORT(name) \
  __attribute__((export_name(#name))) \
  name

// Declare these as coming from walloc.c.
void *malloc(size_t size);
void free(void *p);
                          
void* WASM_EXPORT(walloc)(size_t size) {
  return malloc(size);
}

void WASM_EXPORT(wfree)(void* ptr) {
  free(ptr);
}

If you compile that to exports.o and link via wasm-ld --no-entry --import-memory -o walloc.wasm exports.o walloc.o, you end up with the walloc.wasm used in the demo above. See your inspector for the URL.

The resulting wasm file is about 2 kB (uncompressed).

Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.

implementation notes

When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.

The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above &__heap_base can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.

                                              memory growth ->
+----------------+-----------+-------------+-------------+----
| data and stack | alignment | walloc page | walloc page | ...
+----------------+-----------+-------------+-------------+----
^ 0              ^ &__heap_base            ^ 64 kB aligned

Interestingly, there are a few different orderings of data and stack used by different toolchains. It used to even be the case that the stack grew up. This diagram from the recent "Everything Old is New Again: Binary Security of WebAssembly" paper by Lehmann et al is a good summary:

The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length "LEB" encodings that favor short offsets, so it isn't the default, right now at least.

Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.

Walloc has two allocation strategies: small and large objects.

big bois

A large object is more than 256 bytes.

There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.

struct large_object {
  struct large_object *next;
  size_t size;
  char payload[0];
};
struct large_object* large_object_free_list;

Large object allocations are rounded up to 256-byte boundaries, including the header.

If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.

If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.

+-------------+---------+---------+-----+-----------+
| page header | chunk 1 | chunk 2 | ... | chunk 255 |
+-------------+---------+---------+-----+-----------+
^ +0          ^ +256    ^ +512                      ^ +64 kB

As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).

+-------------+---------+---------+----------+-----------+
| page header | large object 1    | large object 2 ...   |
+-------------+---------+---------+----------+-----------+
^ +0          ^ +256    ^ +512                           ^ +64 kB

When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.

Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.

small fry

Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.

struct small_object_freelist {
  struct small_object_freelist *next;
};
struct small_object_freelist small_object_freelists[10];

When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.

+-------------+---------+---------+------------+---------------------+
| page header | large object 1    | granules=4 | large object 2' ... |
+-------------+---------+---------+------------+---------------------+
^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB

In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.

           in fresh chunk, next link for object N points to object N+1
                                 /--------\                     
                                 |        |
            +------------------+-^--------v-----+----------+
granules=4: | (padding, maybe) | object 0 | ... | object 7 |
            +------------------+----------+-----+----------+
                               ^ 4-granule freelist now points here 

The size classes were chosen so that any wasted space (padding) is less than the size class.

Freeing a small object pushes it back on its size class's free list. Given a pointer, we know its size class by looking in the chunk kind in the page header.

and that's it

Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!

by Andy Wingo at October 13, 2020 01:34 PM

October 11, 2020

Jean-François Fortin TamIntroducing Atypica

A while ago, I envisioned building a new* professional video production collective for commercial and non-commercial projects, both as a “creative outlet” for one of my long-standing passions, and as a way to build a specialized service offering that can act as a bridge between my own Montreal-based marketing agency and other collaborators or artists and freelancers.

For various reasons—including my brain getting monopolized by a handful of demanding contracts, and encountering a lot of “new and interesting” challenges in life—I had to shelve that project until I could give it proper attention.

As things quieted down over the past year or so, and after I finished an epic yak shaving journey (including but not limited to “bringing Getting Things GNOME back to life with a new release“), I recently dusted off my old notes and plans to present my vision for this particular creative offering.

The result is the Atypica video production collective.

I spun this into a standalone website and name, because I didn’t necessarily want to make it all about idéemarque, and I didn’t want to overload idéemarque’s own offering anyway–video production is a large-enough topic, and it involves a different-enough set of people in general, that it warranted its own website to adequately explain.

Whenever I get the opportunity to add some more of our video productions that meet or exceed this level of quality in the future, I will; and if anyone you know wants to hire a great video producer and editor who brings a human touch to any topic, now you know where to find my specific offering in that space.

Next, I will be reworking idéemarque’s offering and website. That’s a story for another day.


P.s.: a note on the FLOSS community’s sometimes surprising level of expectations—upon publication of this blog post, someone messaged me to make the case, in very strongly-worded terms, that my video work is not worth presenting; that I “should really remove” my portfolio, and “maybe stick to programming or do product management”, implying I have no idea how to make a worthwhile video, as evidenced by <insert list of nitpicks about some lighting or motion imperfections found in my 2019 GStreamer conference video>. Nevermind that I had to fix the colors and motion of hundreds of clips in post-production to overcome some technical limitations I encountered on-site, and did thousands of (invisible) cuts to fix the interviewees’ sentences because sound quality and story flow were more important to me than Oscar-winning visuals…
So, while I appreciate (constructive) feedback, if any photography connaisseur is watching that video and feels compelled to rectify my shortcomings, please have the following in mind, rather than assuming lack of knowledge or competence: picture the constraints of a jet-lagged run-and-gunner who couldn’t bring even a third of his equipment (let alone any crew to help) on site, and simultaneously had to: handle all technical aspects; chase down attendees all day to encourage them to overcome their shyness and speak to the camera in-between talks; wrestle for physical space throughout the day in less-than-optimal shooting conditions; record over 150 B-roll scenes to complement the interviews in a diversified way to truly convey the atmosphere of the event; try to accomplish all of that in only a few hours. It might not be the most perfect video in the eyes of a demanding DP, but I do think I’ve done a fairly good job in the circumstances.

Luckily I have a really tough skin (as I’ve been involved in free & open-source software for over 15 years), but not many “real world” marketeers or PMs are as understanding as me. Already rare in this segment of the industry due to cultural and economic factors, if other benevolent marketeers received such dismissive criticism when publishing creative work for FLOSS projects, it is no wonder why they are typically nowhere to be found in our community. In the business world, I’ve had CEOs express wild enthusiasm over much, much less technically-refined videos I produced before these—I guess I’m glad they have no taste! ;)


*: I used to run a video production collective decades ago, but none of the material produced back then would meet my quality standards today (whether from a technological or “technique” standpoint), and the team that was part of the collective back then almost all became bankers (they also gave up on their award-winning jazz band. True story.)

by Jeff at October 11, 2020 06:12 PM

October 09, 2020

Jan SchmidtRift CV1 – multi-threaded tracking

(Jan Schmidt)

This video shows the completion of work to split the tracking code into 3 threads – video capture, fast analysis and long analysis.

If the projected pose of an object doesn’t line up with the LEDs where we expect it to be, the frame is sent off for more expensive analysis in another thread. That way, it doesn’t block tracking of other objects – the fast analysis thread can continue with the next frame.

As a new blob is detected in a video frame, it is assigned an ID, and tracked between frames using motion flow. When the analysis results are available at some point in the future, the ID lets us find blobs that still exist in that most recent video frame. If the blobs are still unknowns in the new frame, the code labels them with the LED ID it found – and then hopefully in the next frame, the fast analysis is locked onto the object again.

There are some obvious next things to work on:

  • It’s hard to decide what constitutes a ‘good’ pose match, especially around partially visible LEDs at the edges. More experimentation and refinement needed
  • The IMU dead-reckoning between frames is bad – the accelerometer biases especially for the controllers tends to make them zoom off very quickly and lose tracking. More filtering, bias extraction and investigation should improve that, and help with staying locked onto fast-moving objects.
  • The code that decides whether an LED is expected to be visible in a given pose can use some improving.
  • Often the orientation of a device is good, but the position is wrong – a matching mode that only searches for translational matches could be good.
  • Taking the gravity vector of the device into account can help reject invalid poses, as could some tests against plausible location based on human movements limits.

Code is at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

by thaytan at October 09, 2020 04:56 AM

October 07, 2020

Sebastian Pölsterlscikit-survival 0.14 with Improved Documentation Released

Today marks the release of version 0.14.0 of scikit-survival. The biggest change in this release is actually not in the code, but in the documentation. This release features a complete overhaul of the documentation. Most importantly, the documentation has a more modern feel to it, thanks to the visually pleasing pydata Sphinx theme, which also powers pandas.

Moreover, the documentation now contains a User Guide section that bundles several topics surrounding the use of scikit-survival. Some of these were available as separate Jupyter notebooks previously, such as the guide on Evaluating Survival Models. There are two new guides: The first one is on penalized Cox models. It provides a hands-on introduction to Cox’s proportional hazards model with $\ell_2$ (Ridge) and $\ell_1$ (LASSO) penalty. The second guide, is on Gradient Boosted Models and covers how gradient boosting can be used to obtain a non-linear proportional hazards model or a non-linear accelerated failure time model by using regression tree base learners. The second part of this guide covers a variant of gradient boosting that is most suitable for high-dimensional data and is based on component-wise least squares base learners.

To make it easier to get started, all notebooks can now be run in a Jupyter notebook, right from your browser, just by clicking on

In addition to the vastly improved documentation, this release includes important bug fixes. It fixes several bugs in CoxnetSurvivalAnalysis, where predict, predict_survival_function, and predict_cumulative_hazard_function returned wrong values if features of the training data were not centered. Moreover, the score function of ComponentwiseGradientBoostingSurvivalAnalysis and GradientBoostingSurvivalAnalysis will now correctly compute the concordance index if loss='ipcwls' or loss='squared'.

For a full list of changes in scikit-survival 0.14.0, please see the release notes.

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source following these instructions.

October 07, 2020 07:30 PM

October 05, 2020

Jan SchmidtRift CV1 update

(Jan Schmidt)

This is another in my series of updates on developing positional tracking for the Oculus Rift CV1 in OpenHMD

In the last post I ended with a TODO list. Since then I’ve crossed off a few things from that, and fixed a handful of very important bugs that were messing things up. I took last week off work, which gave me some extra hacking hours and enthusiasm too, and really helped push things forward.

Here’s the updated list:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed. Partially Fixed
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting. Partially Fixed
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test. Much Improved
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now. Fixed
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

As you can see, a few of the top-level items have been fixed, or mostly so. I split the computer vision for the tracking into several threads:

  • 1 thread shared between all sensors to capture USB packets and assemble them into frames
  • 1 thread per sensor to analyse the frame and update poses

The goal with that initial split was to prevent the processing of multiple sensors from interfering with each other, but I found that it also has a strong benefit even with a single sensor. I realised something in the last week that I probably should have noted earlier: The Rift sensors capture a video frame every 19.2ms, but that frame then takes a full 17ms to deliver across the USB – the means that when everything was in one thread, even with 1 sensor there was only about 2.2ms for the full analysis to take place or else we’d miss a packet of the next frame and have to throw it away. With the analysis now happening in a separate thread and a ping-pong double buffer in place, the analysis can take quite a bit longer without losing any video frames.

I plan to add a 2nd per-sensor thread that will divide the analysis further. The current thread will do only fast pass validation of any existing tracking lock, and will defer any longer term analysis to the other thread. That means that if we have a good lock on the HMD, but can’t (for example) find one of the controllers, searching for the controller will be deferred and the fast pass thread will move onto the next frame and keep tracking lock on the headset.

I fixed some bugs in the calculations that move between frames of reference – converting to/from the global position and orientation in the world to the position and orientation relative to each camera sensor when predicting what the appearance of the LEDs should be. I also added in the IMU offset and orientation of the LED models from the firmware, to make the predictions more accurate when devices move in the time between camera exposures.

Yaw Correction: when a device is observed by a sensor, the orientation is incorporated into what the IMU is measuring. The IMU can sense gravity and knows which way is up or down, but not which way is forward. The observation from the camera now corrects for that yaw drift, to keep things pointing the way you expect them to.

Some other bits:

  • Fixing numerical overflow issues in the OpenHMD maths routines
  • Capturing the IMU orientation and prediction that most closely corresponds to the moment each camera image is recorded, instead of when the camera image finishes transferring to the PC (which is 17ms later)
  • Improving the annotated debug view, to help understand what’s happening in the tracking computer vision steps
  • A 1st order estimate of device velocity to help improve the next predicted position

I posted a longer form video walkthrough of the current code in action, and discussing some of the remaining shortcomings.

As previously, the code is available at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

by thaytan at October 05, 2020 03:20 PM

September 29, 2020

Sebastian DrögePorting EBU R128 audio loudness analysis from C to Rust – Porting Details

(Sebastian Dröge)

This blog post is part two of a four part series

  1. Overview, summary and motivation
  2. Porting approach with various details, examples and problems I ran into along the way
  3. Performance optimizations
  4. Building Rust code into a C library as drop-in replacement

In this part I’ll go through the actual porting process of the libebur128 C code to Rust, the approach I’ve chosen with various examples and a few problems I was running into.

It will be rather technical. I won’t explain details about how the C code works but will only focus on the aspects that are relevant for porting to Rust, otherwise this blog post would become even longer than it already is.

Porting

With the warnings out of the way, let’s get started. As a reminder, the code can be found on GitHub and you can also follow along the actual chronological porting process by going through the git history there. It’s not very different to what will follow here but just in case you prefer looking at diffs instead.

Approach

The approach I’ve taken is basically the same that Federico took for librsvg or Joe Neeman’s took for nnnoiseless:

  1. Start with the C code and safe Rust bindings around the C API
  2. Look for a function or component very low in the call graph without dependencies on (too much) other C code
  3. Rewrite that code in Rust and add an internal C API for it
  4. Call the new internal C API for that new Rust code from the C code and get rid of the C implementation of the component
  5. Make sure the tests are still passing
  6. Go to 2. and repeat

Compared to what I did when porting the FFmpeg loudness normalization filter this has the advantage that at every step there is a working version of the code and you don’t only notice at the very end that somewhere along the way you made a mistake. At each step you can validate that what you did was correct and the amount of code to debug if something went wrong is limited.

Thanks to Rust having a good FFI story for interoperability with C in either direction, writing the parts of the code that are called from C or calling into C is not that much of a headache and not worse than actually writing C.

Rust Bindings around C Library

This step could’ve been skipped if all I cared about was having a C API for the ported code later, or if I wanted to work with the tests of the C library for validation and worry about calling it from Rust at a later point. In this case I had already done safe Rust bindings around the C library before, and having a Rust API made it much easier to write tests that could be used during the porting and that could be automatically run at each step.

bindgen

As a first step for creating the Rust bindings there needs to be a way to actually call into the C code. In C there are the header files with the type definitions and function declarations, but Rust can’t directly work from those. The solution to this was in this case bindgen, which basically converts the C header files into something that Rust can understand. The resulting API is completely unsafe still but can be used in a next step to write safe Rust bindings around it.

I would recommend using bindgen for any non-trivial C API for which there is no better translation tool available, or for which there is no machine-readable description of the API that could be used instead by another tool. Parsing C headers is no fun and there is very little information available in C for generating safe bindings. For example for GObject-based libraries, using gir would be a better idea as it works from a rich XML description of the API that contains information about e.g. ownership transfer and allows to autogenerate safe Rust bindings in many cases.

Also the dependency on clang makes it hard to run bindgen as part of every build, so instead I’ve made sure that the code generated by bindgen is platform independent and included it inside the repository. If you use bindgen, please try to do the same. Requiring clang for building your crate makes everything more complicated for your users, especially if they’re unfortunate enough to use Windows.

But back to the topic. What bindgen generates is basically a translation of the C header into Rust: type definitions and function declarations. This looks for example as follows

\#[repr(C)]
\#[derive(Debug, Copy, Clone)]
pub struct ebur128_state {
    pub mode: ::std::os::raw::c_int,
    pub channels: ::std::os::raw::c_uint,
    pub samplerate: ::std::os::raw::c_ulong,
    pub d: *mut ebur128_state_internal,
}

extern "C" {
    pub fn ebur128_init(
        channels: ::std::os::raw::c_uint,
        samplerate: ::std::os::raw::c_ulong,
        mode: ::std::os::raw::c_int,
    ) -> *mut ebur128_state;

    pub fn ebur128_destroy(st: *mut *mut ebur128_state);

    pub fn ebur128_add_frames_int(
        st: *mut ebur128_state,
        src: *const ::std::os::raw::c_int,
        frames: usize,
    ) -> ::std::os::raw::c_int;
}

Based on this it is possible to call the C functions directly from unsafe Rust code and access the members of all the structs. It requires working with raw pointers and ensuring that everything is done correctly at any point to not cause memory corruption or worse. It’s just like using the API from C with a slightly different syntax.

Build System

To be able to call into the C API its implementation somehow has to be linked into your crate. As the C code later also has to be modified to call into the already ported Rust functions instead of the original C code, it makes most sense to build it as part of the crate instead of linking to an external version of it.

This can be done with the cc crate. It is called into from cargo‘s build.rs for configuring it, for example for configuring which C files to compile and how. Once done it is possible to call any exported C function from the Rust code. The build.rs is not really complicated in this case

fn main() {
    cc::Build::new()
        .file("src/c/ebur128.c")
        .compile("ebur128");
}

Safe Rust API

With all that in place a safe Rust API around the unsafe C functions can be written now. How this looks in practice differs from API to API and might require some more thought in case of a more complex API to ensure everything is still safe and sound from a Rust point of view. In this case it was fortunately rather simple.

For example the struct definition, the constructor and the destructor (Drop impl) looks as follows based on what bindgen generated above

pub struct EbuR128(ptr::NonNull<ffi::ebur128_state>);

The struct is a simple wrapper around std::ptr::NonNull, which itself is a zero-cost wrapper around raw pointers that additionally ensures that the stored pointer is never NULL and allows additional optimizations to take place based on that.

In other words: the Rust struct is just a raw pointer but with additional safety guarantees.

impl EbuR128 {
    pub fn new(channels: u32, samplerate: u32, mode: Mode) -> Result<Self, Error> {
        static ONCE: std::sync::Once = std::sync::Once::new();

        ONCE.call_once(|| unsafe { ffi::ebur128_libinit() });

        unsafe {
            let ptr = ffi::ebur128_init(channels, samplerate as _, mode.bits() as i32);
            let ptr = ptr::NonNull::new(ptr).ok_or(Error::NoMem)?;
            Ok(EbuR128(ptr))
        }
    }
}

The constructor is slightly more complicated as it also has to ensure that the one-time initialization function is called, once. This requires using std::sync::Once as above.

After that it calls the C constructor with the given parameters. This can return NULL in various cases when not enough memory could be allocated as described in the documentation of the C library. This needs to be handled gracefully here and instead of panicking an error is returned to the caller. ptr::NonNull::new() is returning an Option and if NULL is passed it would return None. If this happens it is transformed into an error together with an early return via the ? operator.

In the end the pointer then only has to be wrapped in the struct and be returned.

impl Drop for EbuR128 {
    fn drop(&mut self) {
        unsafe {
            let mut state = self.0.as_ptr();
            ffi::ebur128_destroy(&mut state);
        }
    }
}

The Drop trait is used for defining what should happen if a value of the struct goes out of scope and what should be done to clean up after it. In this case this means calling the destroy function of the C library. It takes a pointer to a pointer to its state, which is then set to NULL. As such it is necessary to store the raw pointer in a local variable and pass a mutable reference to it. Otherwise the ptr::NonNull would end up with a NULL pointer inside it, which would result in undefined behaviour.

The last function that I want to mention here is the one that takes a slice of audio samples for processing

    pub fn add_frames_i32(&mut self, frames: &[i32]) -> Result<(), Error> {
        unsafe {
            if frames.len() % self.0.as_ref().channels != 0 {
                return Err(Error::NoMem);
            }

            let res = ffi::ebur128_add_frames_int(
                self.0.as_ptr(),
                frames.as_ptr(),
                frames.len() / self.0.as_ref().channels,
            );
            Error::from_ffi(res as ffi::error, || ())
        }
    }

Apart from calling the C function it is again necessary to check various pre-conditions before doing so. The C function will cause out of bounds reads if passed a slice that doesn’t contain a sample for each channel, so this must be checked beforehand or otherwise the caller (safe Rust code!) could cause out of bounds memory accesses.

In the end after calling the function its return value is converted into a Result, converting any errors into the crate’s own Error enum.

As can be seen here, writing safe Rust bindings around the C API requires reading of the documentation of the C code and keeping all the safety guarantees of Rust in mind to ensure that it is impossible to violate those safety guarantees, no matter what the caller passes into the functions.

Not having to read the documentation and still being guaranteed that the code can’t cause memory corruption is one of the big advantages of Rust. If there even is documentation or it mentions such details.

Replacing first function: Resampler

Once the safe Rust API is done, it is possible to write safe Rust code that makes use of the C code. And among other things that allows to write tests to ensure that the ported code still does the same as the previous C code. But this will be the topic of the next section. Writing tests is boring, porting code is more interesting and that’s what I will start with.

To find the first function to port, I first read the C code to get a general idea of how the different pieces fit together and what the overall structure of the code is. Based on this I selected the resampler that is used in the true peak measurement. It is one of the leaf functions of the call-graph, that is, it is not calling into any other C code and is relatively independent from everything else. Unlike many other parts of the code it is also already factored out into separate structs and functions.

In the C code this can be found in the interp_create(), interp_destroy() and interp_process() functions. The resulting Rust code can be found in the interp module, which provides basically the same API in Rust except for the destroy() function which Rust provides for free, and corresponding extern "C" functions that can be called from C.

The create() function is not very interesting from a porting point of view: it simply allocates some memory and initializes it. The C and Rust versions of it look basically the same. The Rust version is only missing the checks for allocation failures as those can’t currently be handled in Rust, and handling allocation failures like in the C code is almost useless with modern operating systems and overcommitting allocators.

Also the struct definition is not too interesting, it is approximately the same as the one in C just that pointers to arrays are replaced by Vecs and lengths of them are directly taken from the Vec instead of storing them separately. In a later version the Vec were replaced by boxed slices and SmallVec but that shouldn’t be a concern here for now.

The main interesting part here is the processing function and how to provide all the functions to the C code.

Processing Function: Using Iterators

The processing function, interp_process(), is basically 4 nested for loops over each frame in the input data, each channel in each frame, the interpolator factors and finally the filter coefficients.

unsigned int out_stride = interp->channels * interp->factor;

for (frame = 0; frame < frames; frame++) {
  for (chan = 0; chan < interp->channels; chan++) {
    interp->z[chan][interp->zi] = *in++;
    outp = out + chan;
    for (f = 0; f < interp->factor; f++) {
      acc = 0.0;
      for (t = 0; t < interp->filter[f].count; t++) {
        int i = (int) interp->zi - (int) interp->filter[f].index[t];
        if (i < 0) {
          i += (int) interp->delay;
        }
        c = interp->filter[f].coeff[t];
        acc += (double) interp->z[chan][i] * c;
      }
      *outp = (float) acc;
      outp += interp->channels;
    }
  }
  out += out_stride;
  interp->zi++;
  if (interp->zi == interp->delay) {
    interp->zi = 0;
  }
}

This could be written the same way in Rust but slice indexing is not very idiomatic in Rust and using Iterators is preferred as it leads to more declarative code. Also all the indexing would lead to suboptimal performance due to the required bounds checks. So the main task here is to translate the code to iterators as much as possible.

When looking at the C code, the outer loop iterates over chunks of channels samples from the input and chunks of channels * factor samples from the output. This is exactly what the chunks_exact iterator on slices does in Rust. Similarly the second outer loop iterates over all samples of the input chunks and the two-dimensional array z, which has delay items per channel. On the Rust side I represented z as a flat, one-dimensional array for simplicty, so instead of indexing the chunks_exact() iterator is used again for iterating over it.

This leads to the following for the two outer loops

for (src, dst) in src
    .chunks_exact(self.channels)
    .zip(dst.chunks_exact_mut(self.channels * self.factor)) {
    for (channel_idx, (src, z)) in src
        .iter()
        .zip(self.z.chunks_exact_mut(delay))
        .enumerate() {
        // insert more code here
    }
}

Apart from making it more clear what the data access patterns are, this is also less error prone and gives more information to the compiler for optimizations.

Inside this loop, a ringbuffer z / zi is used to store the incoming samples for each channel and keeping the last delay number of samples for further processing. We’ll keep this part as it is for now and use explicit indexing. While a VecDeque, or any similar data structure from other crates, could be used here, it would complicate the code more, cause more allocations. I’ll revisit this piece of code in part 3 of this blog post.

The first inner loop now iterates over all filters, of which there are factor many and chunks of size channels from the output, for which the same translation as before is used. The second inner loop iterates over all coefficients of the filter and the z ringbuffer, and sums up the product of both which is then stored in the output.

So overall the body of the second outer loop above with the two inner loops would look as follows

z[self.zi] = *src;

for (filter, dst) in self
    .filter
    .iter()
    .zip(dst.chunks_exact_mut(self.channels)) {
        let mut acc = 0.0;
        for (c, index) in &filter.coeff {
            let mut i = self.zi as i32 - *index as i32;
            if i < 0 {
                i += self.delay as i32;
            }
            acc += z[i] as f64 * c;
        }

        dst[channel_idx] = acc as f32;
    }
}

self.zi += 1;
if self.zi == self.delay {
    self.zi = 0;
}

The full code after porting can be found here.

My general approach for porting C code with loops is what I did above: first try to understand the data access patterns and then find ways to express these with Rust iterators, and only if there is no obvious way use explicit indexing. And if the explicit indexing turns out to be a performance problem due to bounds checks, first try to reorganize the data so that direct iteration is possible (which usually also improves performance due to cache locality) and otherwise use some well-targeted usages of unsafe code. But more about that in part 3 of this blog post in the context of optimizations.

Exporting C Functions

To be able to call the above code from C, a C-compatible function needs to be exported for it. This involves working with unsafe code and raw pointers again as that’s the only thing C understands. The unsafe Rust code needs to assert the implicit assumptions that can’t be expressed in C and that the calling C code has to follow.

For the interpolator, functions with the same API as the previous C code are exported to keep the amount of changes minimal: create(), destroy() and process() functions.

\#[no_mangle]
pub unsafe extern "C" fn interp_create(taps: u32, factor: u32, channels: u32) -> *mut Interp {
    Box::into_raw(Box::new(Interp::new(
        taps,
        factor,
        channels,
    )))
}

\#[no_mangle]
pub unsafe extern "C" fn interp_destroy(interp: *mut Interp) {
    drop(Box::from_raw(interp));
}

The #[no_mangle] attribute on functions makes sure that the symbols are exported as is instead of being mangled by the compiler. extern "C" makes sure that the function has the same calling convention as a corresponding C function.

For the create() function the interpolator struct is wrapped in a Box, which is the most basic mechanism to do heap allocations in Rust. This is needed because the C code shouldn’t have to know about the layout of the Interp struct and should just handle it as an opaque pointer. Box::into_raw() converts the Box into a raw pointer that can be returned directly to C. This also passes ownership of the memory to C.

The destroy() function is doing the inverse of the above and calls Box::from_raw() to get a Box back from the raw pointer. This requires that the raw pointer passed in was actually allocated via a Box and is of the correct type, which is something that can’t really be checked. The function has to trust the C code to do this correctly. The standard approach to memory safety in C: trust that everybody is writing correct code.

After getting back the Box, it only has to be dropped and Rust will take care of deallocating any memory as needed.

\#[no_mangle]
pub unsafe extern "C" fn interp_process(
    interp: *mut Interp,
    frames: usize,
    src: *const f32,
    dst: *mut f32,
) -> usize {
    use std::slice;

    let interp = &mut *interp;
    let src = slice::from_raw_parts(src, interp.channels * frames);
    let dst = slice::from_raw_parts_mut(dst, interp.channels * interp.factor * frames);

    interp.process(src, dst);

    interp.factor * frames
}

The main interesting part for the processing function is the usage of slice::from_raw_parts. From the C side, the function again has to trust that the pointer are correct and actually contains frames number of audio frames. In Rust a slice knows about its size so some conversion between the two is needed: a slice of the correct size has to be created around the pointer. This does not involve any copying of memory, it only stores the length information together with the raw pointer. This is also the reason why it’s not required anywhere to pass the length separately to the Rust version of the processing function.

With this the interpolator is fully ported and the C functions can be called directly from the C code. On the C side they are declared as follows and then called as before

typedef void * interpolator;

extern interpolator* interp_create(unsigned int taps, unsigned int factor, unsigned int channels);
extern void interp_destroy(interpolator* interp);
extern size_t interp_process(interpolator* interp, size_t frames, float* in, float* out);

The commit that did all this also adds some tests to ensure that everything still works correctly. It also contains some optimizations on top of the code above and is not 100% the same code.

Writing Tests

For testing that porting the code part by part to Rust doesn’t introduce any problems I went for the common two layer approach: 1. integration tests that test if the whole thing still works correctly, and 2. unit tests for the rewritten component alone.

The integration tests come in two variants inside the ebur128 module: one variant just testing via assertions that the results on a fixed input are the expected ones, and one variant comparing the C and Rust implementations. The unit tests only come in the second variant for now.

To test the C implementation in the integration tests an old version of the crate that had no ported code yet is pulled in. For comparing the C implementation of the individual components, I extracted the C code into a separate C file that exported the same API as the corresponding Rust code and called both from the tests.

Comparing Floating Point Numbers

The first variant is not very interesting apart from the complications involved when comparing floating point numbers. For this I used the float_eq crate, which provides different ways of comparing floating point numbers.

The first variant of tests use it by checking if the absolute difference between the expected and actual result is very small, which is less strict than the ULP check used for the second variant of tests. Unfortunately this was required because depending on the used CPU and toolchain the results differ from the expected static results (generated with one CPU and toolchain), while for the comparison between the C and Rust implementation on the same CPU the results are basically the same.

quickcheck

quickcheck is a crate that allows to write randomized property tests. This seemed like the perfect tool for writing tests that compare the two implementations: the property to check for is equality of results, and it should be true for any possible input.

Using quickcheck is simple in principle. You write a test function that takes the inputs as parameters, do the processing and then check that the property you want to test for holds by either using assertions or returning a bool or TestResult.

\#[quickcheck]
fn compare_c_impl_i16(signal: Signal<i16>) {
    let mut ebu = EbuR128::new(signal.channels, signal.rate, ebur128::Mode::all()).unwrap();
    ebu.add_frames_i16(&signal.data).unwrap();

    let mut ebu_c =
        ebur128_c::EbuR128::new(signal.channels, signal.rate, ebur128_c::Mode::all()).unwrap();
    ebu_c.add_frames_i16(&signal.data).unwrap();

    compare_results(&ebu, &ebu_c, signal.channels);
}

quickcheck will then generate random values for the function parameters via the Arbitrary impl of the given types and call it many times. If one run fails, it tries to find a minimal testcase that fails based on “shrinking” the initial failure case and then prints that failing, shrunk testcase.

And this is the part of using quickcheck that involves some more effort: writing a reasonable Arbitrary impl for the inputs that can also be shrunk in a useful way on failures.

For the tests here I came up with a Signal type. Its Arbitrary implementation creates an audio signal with 1-16 channels and multiple sine waves of different amplitudes and frequencies. Shrinking first tries to reproduce the problem with a single channel, and then halving the signal length.

This worked well in practice so far. It doesn’t cover all possible inputs but should cover anything that can fail, and the simple shrinking approach also helped to find smallish testcases if something failed. But of course it’s not a perfect solution, only a practical one.

Based on these sets of tests I could be reasonably certain that the C and Rust implementation provide exactly the same results, so I could start porting the next part of the code.

True Peak: Macros vs. Traits

C doesn’t really have any mechanisms for writing code that is generic over different types (other than void *), or any more advanced means of abstraction than functions and structs. For that reason the C code uses macros via the C preprocessor in various places to write code once for the different input types (i16, i32, f32 and f64 or in C terms short, int, float and double). The C preprocessor is just a fancy mechanism for string concatenation so this is rather unpleasant to write and read, results might not even be valid C code and the resulting compiler errors are often rather confusing.

In Rust macros could also be used for this. While more clean than C macros because of macro hygiene rules and working on a typed token tree instead of just strings, this still would end up with hard to write and read code with possibly confusing compiler errors. For abstracting over different types, Rust provides traits. These allow to write code generic over different types with a well-defined interface, and allow to do much more but I won’t cover more here.

One example of macro usage in the C code is the input processing, of which the true peak measurement is one part. In C this basically looks as follows, with some parts left out because they’re not relevant for the true peak measurement itself

static void ebur128_check_true_peak(ebur128_state* st, size_t frames) {
  size_t c, i, frames_out;

  frames_out =
      interp_process(st->d->interp, frames, st->d->resampler_buffer_input,
                     st->d->resampler_buffer_output);

  for (i = 0; i < frames_out; ++i) {
    for (c = 0; c < st->channels; ++c) {
      double val =
          (double) st->d->resampler_buffer_output[i * st->channels + c];

      if (EBUR128_MAX(val, -val) > st->d->prev_true_peak[c]) {
        st->d->prev_true_peak[c] = EBUR128_MAX(val, -val);
      }
    }
  }
}

\#define EBUR128_FILTER(type, min_scale, max_scale)                             \
  static void ebur128_filter_##type(ebur128_state* st, const type* src,        \
                                    size_t frames) {                           \
    static double scaling_factor =                                             \
        EBUR128_MAX(-((double) (min_scale)), (double) (max_scale));            \
    double* audio_data = st->d->audio_data + st->d->audio_data_index;          \
                                                                               \
    // some other code                                                         \
                                                                               \
    if ((st->mode & EBUR128_MODE_TRUE_PEAK) == EBUR128_MODE_TRUE_PEAK &&       \
        st->d->interp) {                                                       \
      for (i = 0; i < frames; ++i) {                                           \
        for (c = 0; c < st->channels; ++c) {                                   \
          st->d->resampler_buffer_input[i * st->channels + c] =                \
              (float) ((double) src[i * st->channels + c] / scaling_factor);   \
        }                                                                      \
      }                                                                        \
      ebur128_check_true_peak(st, frames);                                     \
    }                                                                          \
                                                                               \
    // some other code                                                         \
}

EBUR128_FILTER(short, SHRT_MIN, SHRT_MAX)
EBUR128_FILTER(int, INT_MIN, INT_MAX)
EBUR128_FILTER(float, -1.0f, 1.0f)
EBUR128_FILTER(double, -1.0, 1.0)

What the invocations of the macro at the bottom do is to take the whole macro body and replace the usages of type, min_scale and max_scale accordingly. That is, one ends up with a function ebur128_filter_short() that works on a const short *, uses 32768.0 as scaling_factor and the corresponding for the 3 other types.

To convert this to Rust, first a trait that provides all the required operations has to be defined and then implemented on the 4 numeric types that are supported as audio input. In this case, the only required operation is to convert the input values to an f32 between -1.0 and +1.0.

pub(crate) trait AsF32: Copy {
    fn as_f32(self) -> f32;
}

impl AsF32 for i16 {
    fn as_f32(self) -> f32 {
        self as f32 / -(i16::MIN as f32)
    }
}

// And the same for i32, f32 and f64

Once this trait is defined and implemented on the needed types the Rust function can be written generically over the trait

pub(crate) fn check_true_peak<T: AsF32>(&mut self, src: &[T], peaks: &mut [f64]) {
    assert!(src.len() <= self.buffer_input.len());
    assert!(peaks.len() == self.channels);

    for (o, i) in self.buffer_input.iter_mut().zip(src.iter()) {
        *o = i.as_f32();
    }

    self.interp.process(
        &self.buffer_input[..(src.len())],
        &mut self.buffer_output[..(src.len() * self.interp_factor)],
    );

    for (channel_idx, peak) in peaks.iter_mut().enumerate() {
        for o in self.buffer_output[..(src.len() * self.interp_factor)]
            .chunks_exact(self.channels) {
            let v = o[channel_idx].abs() as f64;
            if v > *peak {
              *peak = v;
            }
        }
    }
}

This is not a direct translation of the C code though. As part of rewriting the C code I also factored out the true peak detection from the filter function into its own function. It is called from the filter function shown in the C code a bit further above. This way it was easy to switch only this part from a C implementation to a Rust implementation while keeping the other parts of the filter, and to also test it separately from the whole filter function.

All this can be found in this commit together with tests and benchmarks. The overall code is a bit different than what is listed above, and also the latest version in the repository looks a bit different but more on that in part 3 of this blog post.

One last thing worth mentioning here is that the AsF32 trait is not public API of the crate and neither are the functions generic over the input type. Instead, the generic functions are only used internally and the public API only provides 4 functions that are specialized to the concrete input types. This keeps the API surface smaller and makes the API easier to understand for users.

Loudness History

The next component I ported to Rust was the loudness history data structure. This is used to keep a history of previous loudness measurements for giving longer-term results and it exists in two variants: a histogram-based one and a queue-based one with a maximum length.

As part of the history data structure there are also a couple of operations to calculate values from it, but I won’t go into details of them here. They were more or less direct translations of the C code.

In C this data structure and the operations on it were distributed all over the code and part of the main state struct, so the first step to port it over to Rust was to identify all those pieces and put them behind some kind of API.

This is also something that I noticed when porting the FFmpeg loudness normalization filter: Rust making it much less effort to define new structs, functions or even modules than C seems to often lead to more modular code with clear component boundaries instead of everything put together in the same place. Requirements from the borrow checker often also make it more natural to split components into separate structs and functions.

Check the commit for the full details but in the end I ended up with the following functions that are called from the C code

typedef void * history;

extern history* history_create(int use_histogram, size_t max);
extern void history_add(history* hist, double energy);
extern void history_set_max_size(history* hist, size_t max);
extern double history_gated_loudness(const history* hist);
extern double history_relative_threshold(const history* hist);
extern double history_loudness_range(const history* hist);
extern void history_destroy(history *hist);

Enums

In the C code the two variants of the history were implemented by having both of them always present in the state struct but only one of them initialized and then at every usage site having code like the following

if (st->d->use_histogram) {
    // histogram code follows here
} else {
    // queue code follows here
}

Doing it like this is error prone, easy to forget and having fields in the struct that are unused in certain configurations seems wasteful. In Rust this situation is naturally expressed with enums

enum History {
    Histogram(Histogram),
    Queue(Queue),
}

struct Histogram { ... }
struct Queue { ... }

This allows to use the same storage for both variants and at each usage site the compiler enforces that both variants are explicitly handled

match history {
    History::Histogram(ref hist) => {
        // histogram code follows here
    }
    History::Queue(ref queue) => {
        // queue code follows here
    }
}

Splitting it up like this with an enum also leads to implementing the operations of the two variants directly on their structs and only implementing the common code directly implemented on the history enum. This also improves readability because it’s immediately clear what a piece of code applies to.

Logarithms

As mentioned in the overview already, there are some portability concerns in the C code. One of them showed up when porting the history and comparing the results of the ported code with the C code. This resulted in the following rather ugly code

fn energy_to_loudness(energy: f64) -> f64 {
    #[cfg(feature = "internal-tests")]
    {
        10.0 * (f64::ln(energy) / f64::ln(10.0)) - 0.691
    }
    #[cfg(not(feature = "internal-tests"))]
    {
        10.0 * f64::log10(energy) - 0.691
    }
}

In the C code, ln(x) / ln(10) is used everywhere for calculating the base 10 logarithm. Mathematically that’s the same thing but in practice this is unfortunately not true and the explicit log10() function is both faster and more accurate. Unfortunately it’s not available everywhere in C (it’s available since C99) so was not used in the C code. In Rust it is always available so I first used it unconditionally.

When running the tests later, they failed because the results of C code where slightly different than the results of the Rust code. In the end I tracked it down to the usage of log10(), so for now when comparing the two implementations the slower and less accurate version is used.

Lazy Initialization

Another topic I already mentioned in the overview is one-time initialization. For using the histogram efficiently it is necessary to calculate the values at the edges of each histogram bin as well as the center values. These values are always the same so they could be calculated once up-front. The C code calculated them whenever a new instance was created.

In Rust, one could build something around std::sync::Once together with static mut variables for storing the data, but that would be not very convenient and also would require using some unsafe code. static mut variables are inherently unsafe. Instead this can be simplified with the lazy_static or once_cell crates, and the API of the latter is also available now as part of the Rust standard library in the nightly versions.

Here I used lazy_static, which leads to the following code

lazy_static::lazy_static! {
    static ref HISTOGRAM_ENERGIES: [f64; 1000] = {
        let mut energies = [0.0; 1000];

        for (i, o) in energies.iter_mut().enumerate() {
            *o = f64::powf(10.0, (i as f64 / 10.0 - 69.95 + 0.691) / 10.0);
        }

        energies
    };

    static ref HISTOGRAM_ENERGY_BOUNDARIES: [f64; 1001] = {
        let mut boundaries = [0.0; 1001];

        for (i, o) in boundaries.iter_mut().enumerate() {
            *o = f64::powf(10.0, (i as f64 / 10.0 - 70.0 + 0.691) / 10.0);
        }

        boundaries
    };
}

On first access to e.g. HISTOGRAM_ENERGIES the corresponding code would be executed and from that point onwards it would be available as a read-only array with the calculated values. In practice this later turned out to cause performance problems, but more on that in part 3 of this blog post.

Another approach for calculating these constant numbers would be to calculate them at compile-time via const functions. This is almost possible with Rust now, the only part missing is a const variant of f64::powf(). It is also not available as a const function in C++ so there is probably a deeper reason behind this. Otherwise the code would look exactly like the code above except that the variables would be plain static instead of static ref and all calculations would happen at compile-time.

In the latest version of the code, and until f64::powf() is available as a const function, I’ve decided to simply include a static array with the calculated values inside the code.

Data Structures

And the last topic for the history is an implementation detail of the queue-based implementation. As I also mentioned during the overview, the C code is using a linked-list-based queue, and this is exactly where it is used.

The queue is storing one f64 value per entry, which means that in the end there is one heap memory allocation of 12 or 16 bytes per entry, depending on pointer size. That’s a lot of very small allocations, each allocation is 50-100% bigger than the actual payload and that’s ignoring any overhead by the allocator itself. By this quite some memory is wasted, and by having each value in a different allocation and having to follow a pointer to the next one, any operations on all values is not going to be very cache efficient.

As C doesn’t have any data structures in its standard library and this linked-list-based queue is something that is readily available on the BSDs and Linux at least, it probably makes sense to use it here instead of implementing a more efficient data structure inside the code. But it still seems really suboptimal for this use-case.

In Rust the standard library provides an ringbuffer-based VecDeque, which offers exactly the API that is needed here, stores all values tightly packed and thus doesn’t have any memory wasted per value and at the same time provides better cache efficiency. And it is available everywhere where the Rust standard library is available, unlike the BSD queue used by the C implementation.

In practice, apart from the obvious savings in memory, this also caused the Rust implementation without any further optimizations to take only 50%-70% of the time that the C implementation took, depending on the operation.

Filter: Flushing Denormals to Zero

Overall porting the filter function from C was the same as everything mentioned before so I won’t go into details here. The whole commit porting it can be found here.

There is only one aspect I want to focus on here: if available on x86/x86-64 the MXCSR register temporarily gets the _MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered to be zero. If hardware support is not available, at the end of each filter call the values kept for the next call are manually set to zero if they contain denormals.

This is done in the C code for performance reasons. Operations on denormals are generally much slower than on normalized floating point numbers and it has a measurable impact on the performance in this case.

In Rust this had to be replicated. Not only for the performance reasons but also because otherwise the results of both implementations would be slightly different and comparing them in the tests would be harder.

On the C side, this requires some build system integration and usage of the C preprocessor to decide whether the hardware support for this can be used or not, and then some conditional code that is used in the EBUR128_FILTER macro that was shown a few sections above already. Specifically this is the code

\#if defined(__SSE2_MATH__) || defined(_M_X64) || _M_IX86_FP >= 2
\#include <xmmintrin.h>
\#define TURN_ON_FTZ                                                            \
  unsigned int mxcsr = _mm_getcsr();                                           \
  _mm_setcsr(mxcsr | _MM_FLUSH_ZERO_ON);
\#define TURN_OFF_FTZ _mm_setcsr(mxcsr);
\#define FLUSH_MANUALLY
\#else
\#warning "manual FTZ is being used, please enable SSE2 (-msse2 -mfpmath=sse)"
\#define TURN_ON_FTZ
\#define TURN_OFF_FTZ
\#define FLUSH_MANUALLY                                                         \
  st->d->v[c][4] = fabs(st->d->v[c][4]) < DBL_MIN ? 0.0 : st->d->v[c][4];      \
  st->d->v[c][3] = fabs(st->d->v[c][3]) < DBL_MIN ? 0.0 : st->d->v[c][3];      \
  st->d->v[c][2] = fabs(st->d->v[c][2]) < DBL_MIN ? 0.0 : st->d->v[c][2];      \
  st->d->v[c][1] = fabs(st->d->v[c][1]) < DBL_MIN ? 0.0 : st->d->v[c][1];
\#endif

This is not really that bad and my only concern here would be that it’s relatively easy to forget calling TURN_OFF_FTZ once the filter is done. This would then affect all future floating point operations outside the filter and potentially cause a lot of problems. This blog post gives a nice example of an interesting bug caused by this and shows how hard it was to debug it.

When porting this to more idiomatic Rust, this problem does not exist anymore.

This is the Rust implementation I ended up with

\#[cfg(all(
    any(target_arch = "x86", target_arch = "x86_64"),
    target_feature = "sse2"
))]
mod ftz {
    #[cfg(target_arch = "x86")]
    use std::arch::x86::{_mm_getcsr, _mm_setcsr, _MM_FLUSH_ZERO_ON};
    #[cfg(target_arch = "x86_64")]
    use std::arch::x86_64::{_mm_getcsr, _mm_setcsr, _MM_FLUSH_ZERO_ON};

    pub struct Ftz(u32);

    impl Ftz {
        pub fn new() -> Option<Self> {
            unsafe {
                let csr = _mm_getcsr();
                _mm_setcsr(csr | _MM_FLUSH_ZERO_ON);
                Some(Ftz(csr))
            }
        }
    }

    impl Drop for Ftz {
        fn drop(&mut self) {
            unsafe {
                _mm_setcsr(self.0);
            }
        }
    }
}

\#[cfg(not(any(all(
    any(target_arch = "x86", target_arch = "x86_64"),
    target_feature = "sse2"
))))]
mod ftz {
    pub struct Ftz;

    impl Ftz {
        pub fn new() -> Option<Self> {
            None
        }
    }
}

While a bit longer, it is also mostly whitespace. The important part to notice here is that when using the hardware support, a struct with a Drop impl is returned and once this struct is leaving the scope it would reset the MXCSR register again to its previous value. This way it can’t be forgotten and would also be reset as part of stack unwinding in case of panics.

On the usage side this looks as follows

let ftz = ftz::Ftz::new();

// all the calculations

if ftz.is_none() {
    // manual flushing of denormals to zero
}

No macros are required in Rust for this and all the platform-specific code is nicely abstracted away in a separate module. In the future support for this on e.g. ARM could be added and it would require no changes anywhere else, just the addition of another implementation of the Ftz struct.

Making it bullet-proof

As Anthony Ramine quickly noticed and told me on Twitter, the above is not actually sufficient. For non-malicious code using the Ftz type everything is alright: on every return path, including panics, the register would be reset again.

However malicious (or simply very confused?) code could make use of e.g. mem::forget(), Box::leak() or some other function to “leak” the Ftz value and cause the Drop implementation to never actually run and reset the register’s value. It’s perfectly valid to leak memory in safe Rust, so it’s not a good idea to rely on Drop implementations too much.

The solution for this can be found in this commit but the basic idea is to never actually give out a value of the Ftz type but only pass an immutable reference to safe Rust code. This then basically looks as follows

mod ftz {
    pub fn with_ftz<F: FnOnce(Option<&Ftz>) -> T, T>(func: F) -> T {
        unsafe {
            let ftz = Ftz::new();
            func(Some(&ftz))
        }
    }
}

ftz::with_ftz(|ftz| {
    // do things or check if `ftz` is None
});

This way it is impossible for any code outside the ftz module to leak the value and prevent resetting of the register.

Input Processing: Order of Operations Matters

The other parts of the processing code were relatively straightforward to port and not really different from anything I already mentioned above. However as part of porting that code I ran into a problem that took quite a while to debug: Once ported, the results of the C and Rust implementation were slightly different again.

I went through the affected code in detail and didn’t notice anything obvious. Both the C code and the Rust code were doing the same, so why are the results different?

This is the relevant part of the C code

size_t i;
double channel_sum;

channel_sum = 0.0;
if (st->d->audio_data_index < frames_per_block * st->channels) {
  for (i = 0; i < st->d->audio_data_index / st->channels; ++i) {
    channel_sum += st->d->audio_data[i * st->channels + c] *
                   st->d->audio_data[i * st->channels + c];
  }
  for (i = st->d->audio_data_frames -
           (frames_per_block - st->d->audio_data_index / st->channels);
       i < st->d->audio_data_frames; ++i) {
    channel_sum += st->d->audio_data[i * st->channels + c] *
                   st->d->audio_data[i * st->channels + c];
  }
} else {
  for (i = st->d->audio_data_index / st->channels - frames_per_block;
       i < st->d->audio_data_index / st->channels; ++i) {
    channel_sum += st->d->audio_data[i * st->channels + c] *
                   st->d->audio_data[i * st->channels + c];
  }
}

and the first version of the Rust code

let mut channel_sum = 0.0;

if audio_data_index < frames_per_block * channels {
    channel_sum += audio_data[..audio_data_index]
        .chunks_exact(channels)
        .map(|f| f[c] * f[c])
        .sum();

    channel_sum += audio_data
        [(audio_data.len() - frames_per_block * channels + audio_data_index)..]
        .chunks_exact(channels)
        .map(|f| f[c] * f[c])
        .sum();
} else {
    channel_sum += audio_data
        [(audio_data_index - frames_per_block * channels)..audio_data_index]
        .chunks_exact(channels)
        .map(|f| f[c] * f[c])
        .sum();
}

The difference between the two variants is the order of floating point operations in the if branch. The C code sums up all values into the same accumulator, while the Rust code first sums both parts into a separate accumulator and then adds them together. I changed it to do exactly the same and that caused the tests to pass again.

The order in which floating point operations are done matters, unfortunately, and in the example above the difference was big enough to cause the tests to fail. And the above is a nice practical example that shows that addition on floating point numbers is actually not associative.

Last C Code: Replacing API Layer

The last step was the most satisfying one: getting rid of all the C code. This can be seen in this commit. Note that the performance numbers in the commit message are wrong. At that point both versions were much closer already performance-wise and the Rust implementation was even faster in some configurations.

Up to that point all the internal code was already ported to Rust and the only remaining C code was the public C API, which did some minor tasks and then called into the Rust code. Technically this was not very interesting so I won’t get into any details here. It doesn’t add any new insights and this blog post is already long enough!

If you check the git history, all commits that followed after this one were cleanups, addition of some new features, adding more tests, performance optimizations (see the next part of this blog post) and adding a C-compatible API around the Rust implementation (see the last part of this blog post). The main part of the work was done.

Difficulty and Code Size

With the initial porting out of the way, I can now also answer the first two questions I wanted to get answered as part of this project.

Porting the C code to Rust was not causing any particular difficulties. The main challenges were

  • Understanding what the C code actually does and mapping the data access patterns to Rust iterators for more idiomatic and faster code. This also has the advantage of making the code clearer than with the C-style for loops. Porting was generally a pleasant experience and the language did not get in the way when implementing such code.
    Also, by being able to port it incrementally I could always do a little bit during breaks and didn’t have to invest longer, contiguous blocks of time into the porting.
  • Refactoring the C code to be able to replace parts of it step by step with Rust implementations. This was complicated a bit by the fact that the C code did not factor out the different logical components nicely but instead kept everything entangled.
    From having worked a lot with C for the more than 15 years, I wouldn’t say that this is because the code is bad (it is not!) but simply because C encourages writing code like this. Defining new structs or new functions seems like effort, and even worse if you try to move code into separate files because then you also have to worry about a header file and keeping the code and the header file in sync. Rust simplifies this noticeably and the way how the language behaves encourages splitting the code more into separate components.

Now for the size of the code. This is a slightly more complicated question. Rust and the default code style of rustfmt cause code to be spread out over more lines and to have more whitespace than the structurally same code in C with the common code styles used in C. In my experience, visually Rust code looks much less dense than C code for this reason.

Intuitively I would say that I have written much less Rust code for the actual implementation than there was C code, but lacking any other metrics let’s take a look at the lines of code while ignoring tests and comments. I used tokei for this.

  • 1211 lines for the Rust code
  • 1741 lines for the C code. Of this, 494 lines are headers and 367 lines of the headers are the queue implementation. That is, there are 1247 lines of non-header C code.

This makes the Rust implementation only slightly smaller if we ignore the C headers. Rust allows to write more concise code so I would have expected the difference to be bigger. At least partially this can probably be attributed to the different code formatting that causes Rust code to be less visually dense and as a result have code spread out over more lines than otherwise.

In any case, overall I’m happy with the results so far.

I will look at another metric of code size in the last part of this blog post for some further comparison: the size of the compiled code.

Next Part

In the next part of this blog post I will describe the performance optimizations I did to make the Rust implementation at least as fast as the C one and the problems I ran into while doing so. The previous two parts of the blog post had nothing negative to say about Rust but this will change in the third part. The Rust implementation without any optimizations was already almost as fast as the C implementation thanks to how well idiomatic Rust code can be optimized by the compiler, but the last few percent were not as painless as one would hope. In practice the performance difference probably wouldn’t have mattered.

From looking at the git history and comparing the code, you will also notice that some of the performance optimizations already happened as part of the porting. The final code is not exactly what I presented above.

by slomo at September 29, 2020 04:00 PM

September 22, 2020

Sebastian DrögePorting EBU R128 audio loudness analysis from C to Rust

(Sebastian Dröge)

Over the last few weeks I ported the libebur128 C library to Rust, both with a proper Rust API as well as a 100% compatible C API.

This blog post will be split into 4 parts that will be published over the next weeks

  1. Overview and motivation
  2. Porting approach with various details, examples and problems I ran into along the way
  3. Performance optimizations
  4. Building Rust code into a C library as drop-in replacement

If you’re only interested in the code, that can be found on GitHub and in the ebur128 crate on crates.io.

The initial versions of the ebur128 crate was built around the libebur128 C library (and included its code for ease of building), version 0.1.2 and newer is the pure Rust implementation.

EBU R128

libebur128 implements the EBU R128 loudness standard. The Wikipedia page gives a good summary of the standard, but in short it describes how to measure loudness of an audio signal and how to use this for loudness normalization.

While this intuitively doesn’t sound very complicated, there are lots of little details (like how human ears are actually working) that make this not as easy as one might expect. This results in there being many different ways for measuring loudness and is one of the reasons why this standard was introduced. Of course it is also not the only standard for this.

libebur128 is also the library that I used in the GStreamer loudness normalization plugin, about which I wrote a few weeks ago already. By porting the underlying loudness measurement code to Rust, the only remaining C dependency of that plugin is GStreamer itself.

Apart from that it is used by FFmpeg, but they include their own modified copy, as well as many other projects that need some kind of loudness measurement and don’t use ReplayGain, another older but widely used standard for the same problem.

Why?

Before going over the details of what I did, let me first explain why I did this work at all. libebur128 is a perfectly well working library, in wide use for a long time and probably rather bug-free at this point and it was already possible to use the C implementation from Rust just fine. That’s what the initial versions of the ebur128 crate were doing.

My main reason for doing this was simply because it seemed like a fun little project. It isn’t a lot of code that is changing often so once ported it should be more or less finished and it shouldn’t be much work to stay in sync with the C version. I started thinking about doing this already after the initial release of the C-based ebur128 release, but after reading Joe Neeman’s blog post about porting another C audio library (RNNoise) to Rust this gave me the final push to actually start with porting the code and to follow through until it’s done.

However, don’t go around and ask other people to rewrite their projects in Rust (don’t be rude) or think that your own rewrite is magically going to be much faster and less buggy than the existing implementation. While Rust saves you from a big class of possible bugs, it doesn’t save you from yourself and usually rewrites contain bugs that didn’t exist in the original implementation. Also getting good performance in Rust requires, like in every other language, some effort. Before rewriting any software, think about the goals of this rewrite realistically as well as the effort required to actually get it finished.

Apart from fun there were also a few technical and non-technical reasons for me to look into this. I’m going to just list two here (curiosity and portability). I will skip the usual Rust memory-safety argument as that seems less important with this code: the C code is widely used for a long time, not changing a lot and has easy to follow memory access patterns. While it definitely had a memory safety bug (see above), it was rather difficult to trigger and it was fixed in the meantime.

Curiosity

Personally and at my company Centricular we try to do any new projects where it makes sense in Rust. While this worked very well in the past and we got great results, there were some questions for future projects that I wanted to get some answers, hard data and personal experience for

  • How difficult is it to port a C codebase function by function to Rust while keeping everything working along the way?
  • How difficult is it to get the same or better performance with idiomatic Rust code for low-level media processing code?
  • How much bigger or smaller is the resulting code and do Rust’s higher-level concepts like iterators help to keep code concise?
  • How difficult is it to create a C-compatible library in Rust with the same API and ABI?

I have some answers to all these questions already but previous work on this was not well structured and the results were also not documented, which I’m trying to change here now. Both to have a reference for myself in the future as well as for convincing other people that Rust is a reasonable technology choice for such projects.

As you can see the general pattern of these questions are introducing Rust into an existing codebase, replacing existing components with Rust and writing new components in Rust, which is also relates to my work on the Rust GStreamer bindings.

Portability

C is a very old language and while there is a standard, each compiler has its own quirks and each platform different APIs on top of the bare minimum that the C standard defines. C itself is very portable, but it is not easy to write portable C code, especially when not using a library like GLib that hides these differences and provides basic data structures and algorithms.

This seems to be something that is often forgotten when the portability of C is given as an argument against Rust, and that’s the reason why I wanted to mention this here specifically. While you can get a C compiler basically everywhere, writing C code that also runs well everywhere is another story and C doesn’t make this easy by design. Rust on the other hand makes writing portable code quite easy in my experience.

In practice there were three specific issues I had for this codebase. Most of the advantages of Rust here are because it is a new language and doesn’t have to carry a lot of historical baggage.

Mathematical Constants and Functions

Mathematical constants are not actually part of any C standard. While most compilers just define M_PI (for π), M_E (for 𝖾) and others in math.h nonetheless as they’re defined by POSIX and UNIX98.

Microsoft’s MSVC doesn’t, but instead you have to #define _USE_MATH_DEFINES before including math.h.

While not a big problem per-se, it is annoying and indeed caused the initial version of the ebur128 Rust crate to not compile with MSVC because I forgot about it.

Similarly, which mathematical functions are available depends a lot on the target platform and which version of the C standard is supported. An example of this is the log10 function to calculate the base-10 logarithm. For portability reasons, libebur128 didn’t use it but instead calculated it via the natural logarithm (ln(x) / ln(10) = log10(x)) because it’s only available in POSIX and since C99. While C99 is from 1999, there are still many compilers out there that don’t fully support it, again most prominently MSVC until very recently.

Using log10 instead of going via the natural logarithm is faster and more precise due to floating point number reasons, which is why the Rust implementation uses it but in C it would be required to check at build-time if the function is available or not, which complicates the build process and can easily be forgotten. libebur128 decided to not bother with these complications and simply not use it. Because of that, some conditional code in the Rust implementation is necessary for ensuring that both implementations return the same results in the tests.

Data Structures

libebur128 uses a linked-list-based queue data structure. As the C standard library is very minimal, no collection data structures are included. However on the BSDs and also on Linux with the GNU C library there is one available in sys/queue.h.

Of course MSVC does not have this and other compilers/platforms probably won’t have it either, so libebur128 included a local copy of that queue implementation. Now when building, one has to decide whether there is a system implementation available or otherwise use the internal version. Or simply always use the internal version.

Copying implementations of basic data structures and algorithms into every single project is ugly and error-prone, so let’s maybe not do that. C not having a standardized mechanism for dependency handling doesn’t help with this, which is unfortunately why this is very common in C projects.

One-time Initialization

Thread-safe one-time initialization is another thing that is not defined by the C standard, and depending on your platform there are different APIs available for it or none at all. POSIX again defines one that is widely available, but you can’t really depend on it unconditionally.

This complicates the code and build procedure, so libebur128 simply did not do that and did its one-time initializations of some global arrays every time a new instance was created. Which is probably fine, but a bit wasteful and probably strictly-speaking according to the C standard not actually thread-safe.

The initial version of the ebur128 Rust crate side-stepped this problem by simply doing this initialization once with the API provided by the Rust standard library. See part 2 and part 3 of this blog post for some more details about this.

Easier to Compile and Integrate

A Rust port only requires a Rust compiler, a mixed C/Rust codebase requires at least a C compiler in addition and some kind of build system for the C code.

libebur128 uses CMake, which would be an additional dependency so in the initial version of the ebur128 crate I went via cargo‘s build.rs build scripts and the cc crate as building libebur128 is easy enough. This works but build scripts are problematic for integration of the Rust code into other build systems than cargo.

The Rust port also makes use of conditional compilation in various places. Unlike in C with the preprocessor, non-standardized and inconsistent platform #defines and it being necessary to integrate everything in a custom way into the build system, Rust has a principled and well-designed approach to this problem. This makes it easier to keep the code clean, easier to maintain and more portable.

In addition to build system related simplifications, by not having any C code it is also much easier to compile the code to other targets like WebAssembly, which is natively supported by Rust. It is also possible to compile C to WebAssembly but getting both toolchains to agree with each other and produce compatible code seems not very easy.

Overview

As mentioned above, the code can be found on GitHub and in the ebur128 crate on crates.io.

The current version of the code produces the exact same results as the C version. This is enforced by the quickcheck tests that are running randomized inputs through both versions and check that the results are the same. The code also succeeds all the tests in the EBU loudness test set, so should hopefully be standards compliant as long as the test implementation is not wrong.

Performance-wise the Rust implementation is at least as fast as the C implementation. In some configurations it’s a few percent faster but probably not enough that it actually matters in practice. There are various benchmarks for both versions in different configurations available. The benchmarks are based on the criterion crate, which uses statistical methods to give as accurate as possible results. criterion also generates nice results with graphs for making analysis of the results more pleasant. See part 3 of this blog post for more details.

Writing tests and benchmarks for Rust is so much easier and feels more natural then doing it in C, so the Rust implementation has quite good coverage of the different code paths now. Especially no struggling with build systems was necessary like it would have been in C thanks to cargo and Rust having built-in support. This alone seems to have the potential to cause Rust code having, on average, better quality than similar code written in C.

It is also possible to compile the Rust implementation into a C library with the great cargo-c tool. This easily builds the code as a static/dynamic C library and installs the library, a C header file and also a pkg-config file. With this the Rust implementation is a 100% drop-in replacement of the C libebur128. It is not even necessary to recompile existing code. See part 4 of this blog post for more details.

Dependencies

Apart from the Rust standard library the Rust implementation depends on two other, small and widely used crates. Unlike with C, depending on external dependencies is rather simple with Rust and cargo. The two crates in question are

  • smallvec for a dynamically sized vectors/arrays that can be stored on the stack up to a certain size and only then fall back to heap allocations. This allows to avoid a couple of heap allocations under normal usage.
  • bitflags, which provides a macro for implementing properly typed bitflags. This is used in the constructor of the main type for selecting the features and modes that should be enabled, which directly maps to how the C API works (just with less type-safety).

Unsafe Code

A common question when announcing a Rust port of some C library is how much unsafe code was necessary to reach the same performance as the C code. In this case there are two uses of unsafe code outside the FFI code to call the C implementation in the tests/benchmarks and the C API.

Resampler

The True Peak measurement is using a resampler to upsample the audio signal to a higher sample rate. As part of the most inner loop of the resampler a statically sized ringbuffer is used.

As part of that ringbuffer, explicit indexing of a slice is needed. While the indexes are already manually checked to wrap around when needed, the Rust compiler and LLVM can’t figure that out so additional bounds checks plus panic handling is present in the compiled code. Apart from slowing down the loop with the additional condition, the panic code also causes the whole loop to be optimized less well.

So to get around that, unsafe indexing into the slice is used for performance reasons. While it requires a human now to check the memory safety of the code instead of relying on the compiler, the code in question is simple and small enough that it shouldn’t be a problem in practice.

More on this in part 2 and part 3 of this blog post.

Flushing Denormals to Zero

The other use of unsafe code is in the filter that is applied to the incoming audio signal. On x86/x86-64 the MXCSR register temporarily gets the _MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered as zero.

This happens both for performance reasons as well as correctness reasons. Operations on denormals are generally much slower than on normalized floating point numbers. This has a measurable impact on the performance in this case.

Also as the C library does the same and not flushing denormals to zero would lead to slightly different results. While this difference doesn’t matter in practice as it’s very very small, it would make it harder to compare the results of both implementations as they wouldn’t be as close to each other anymore.

Doing this affects every floating point operation that happens while that bit is set, but because these are only the floating point operations performed by this crate and it’s guaranteed that the bit is unset again (even in case of panics) before leaving the filter, this shouldn’t cause any problems for other code.

Additional Features

Once the C library was ported and performance was comparable to the C implementation, I shortly checked the issues reported on the C library to check if there’s any useful feature requests or bug reports that I could implement / fix in the Rust implementation. There were three, one of which I also wanted for a future project.

None of the new features are available via the C API at this point for compatibility reasons.

Resetting the State

For this one there was a PR already for the C library. Previously the only way to reset all measurements was to create a new instance, which involves new memory allocations, filter initialization, etc..

It’s easy enough to provide a reset method to do only the minimal work required to reset all measurements and restart with a fresh state so I’ve added that to the Rust implementation.

Fix set_max_window() to actually work

This was a bug introduced in the C implementation a while ago in an attempt to prevent integer overflows when calculating sizes of memory allocations, which then would cause memory safety bugs because less memory was allocated than expected. Accidentally this fix restricted the allowed values for the maximum window size too much. There is a PR for fixing this in the C implementation.

On the Rust side this bug also existed because I simply ported over the checks. If I hadn’t ported over the checks, or ported an earlier version without the checks, there fortunately wouldn’t have been any memory safety bug on the Rust side though but instead one of two situations would have happened instead

  1. In debug builds integer overflows cause a panic, so instead of allocating less memory than expected during the setting of the parameters there would’ve been a panic immediately instead of invalid memory accesses later.
  2. In release builds integer overflows simply wrap around for performance reasons. This would’ve caused less memory than expected to be allocated, but later when trying to access the memory there would’ve been a panic when trying to access memory outside the allocated area.

While a panic is also not nice, it at least leads to no undefined behaviour and prevents worse things from happening.

The proper fix in this case was to not restrict the maximum window size statically but to instead check for overflows during the calculations. This is the same the PR for the C implementation does, but on the Rust side this is much easier because of built-in operations like checked_mul for doing an overflow-checking multiplication. In C this requires some rather convoluted code (check the PR for details).

Support for Planar Audio Input

The last additional feature that I implemented was support for planar audio input, for which also a PR to the C implementation exists already.

Most of the time audio signals have the samples of each channel interleaved with each other, so for example for stereo you have an array of samples with the first sample for the left channel, the first sample for the right channel, the second sample for the left channel, etc.. While this representation has some advantages, in other situations it is easier or faster to work with planar audio: the samples of each channel are contiguous one after another, so you have e.g. first all the samples of the left channel one after another and only then all samples of the right channel.

The PR for the C implementation does this with some code duplication of existing macro code (which can be prevented by making the macros more complicated), on the Rust side I implemented this without any code duplication by adding an internal abstraction for interleaved/planar audio and iterating over the samples and then working with that in normal, generic Rust code. This required some minor refactoring and code reorganization but in the end was rather painless. Note that most of the change is addition of new tests and moving some code around.

When looking at the Samples trait, the main part of this refactoring, one might wonder why I used closures instead of Rust iterators for iterating over the samples and the reason is unfortunately performance. More on this in part 3 of this blog post.

Next Part

In the next part of this blog post I will describe the porting approach in detail and also give various examples for how to port C code to idiomatic Rust, and some examples of problems I was running into.

by slomo at September 22, 2020 01:00 PM

September 21, 2020

Seungha YangContinuing to make GStreamer more Windows-friendly

The long-awaited GStreamer 1.18 has finally been released!

GStreamer 1.18 includes various exciting features especially new Windows plugins: Direct3D11, Media Foundation, UWP support, DXGI desktop capture and rewrite of the WASAPI audio plugin using Windows 10 APIs.

In this blog post, I’d like to briefly talk about the future work needed to optimize GStreamer support for Windows-specific features, with a focus on the Media Foundation plugin😊

GStreamer’s Media Foundation plugin was implemented to support various hardware using modern Windows APIs. The most valuable benefit of the plugin is that it doesn’t require any hardware vendor-specific APIs because it’s a Windows standard API like DXVA2.

This means that in theory if your application uses the mediafoundation plugin, it shouldn’t need any hardware-specific code to run on a Windows 10 device, even non-desktop ones such as the XBox, HoloLens, IoT and so on.

At present the Media Foundation plugin in GStreamer does not have feature-parity with GStreamer plugins that use other vendor specific SDKs such as the Nvidia Codec SDK (nvcodec) or Intel Media SDK (msdk). The good news is that with some more work we can bridge that gap. The most important things for that are:

  • Direct3D integration for the Media Foundation plugin
  • Avoid memory copy/transfer operations between sysmem and device memory

Direct3D integration requires significant work, but zerocopy support for the Media Foundation encoder has already been merged via custom COM implementation of each IMFMediaBuffer and IMF2DBuffer interface.

Currently the Media Foundation encoder works great when being fed data by the CPU (such as from a software decoder). However, the hardware-accelerated scenario requires Direct3D awareness via a d3d11 library.

I expect that once it’s implemented, we will be a big step closer to having GStreamer’s Direct3D 11 support be at-par with the existing OpenGL (plugin & library) and Vulkan (plugin & library) support.

Contact me if you’re interested in sponsoring this work!

by Seungha Yang at September 21, 2020 04:04 PM

September 20, 2020

Jan SchmidtOculus Rift CV1 progress

(Jan Schmidt)

In my last post here, I gave some background on how Oculus Rift devices work, and promised to talk more about Rift S internals. I’ll do that another day – today I want to provide an update on implementing positional tracking for the Rift CV1.

I was working on CV1 support quite a lot earlier in the year, and then I took a detour to get basic Rift S support in place. Now that the Rift S works as a 3DOF device, I’ve gone back to plugging away at getting full positional support on the older CV1 headset.

So, what have I been doing on that front? Back in March, I posted this video of a new tracking algorithm I was working on to match LED constellations to object models to get the pose of the headset and controllers:

The core of this matching is a brute-force search that (somewhat cleverly) takes permutations of observed LEDs in the video footage and tests them against permutations of LEDs from the device models. It then uses an implementation of the Lambda Twist P3P algorithm (courtesy of pH5) to compute the possible poses for each combination of LEDs. Next, it projects the points of the candidate pose to count how many other LED blobs will match LEDs in the 3D model and how closely. Finally, it computes a fitness metric and tracks the ‘best’ pose match for each tracked object.

For the video above, I had the algorithm implemented as a GStreamer plugin that ran offline to process a recording of the device movements. I’ve now merged it back into OpenHMD, so it runs against the live camera data. When it runs in OpenHMD, it also has access to the IMU motion stream, which lets it predict motion between frames – which can help with retaining the tracking lock when the devices move around.

This weekend, I used that code to close the loop between the camera and the IMU. It’s a little simple for now, but is starting to work. What’s in place at the moment is:

  • At startup time, the devices track their movement only as 3DOF devices with default orientation and position.
  • When a camera view gets the first “good” tracking lock on the HMD, it calls that the zero position for the headset, and uses it to compute the pose of the camera relative to the playing space.
  • Camera observations of the position and orientation are now fed back into the IMU fusion to update the position and correct for yaw drift on the IMU (vertical orientation is still taken directly from the IMU detection of gravity)
  • Between camera frames, the IMU fusion interpolates the orientation and position.
  • When a new camera frame arrives, the current interpolated pose is transformed back into the camera’s reference frame and used to test if we still have a visual lock on the device’s LEDs, and to label any newly appearing LEDs if they match the tracked pose
  • The device’s pose is refined using all visible LEDs and fed back to the IMU fusion.

With this simple loop in place, OpenHMD can now track multiple devices, and can do it using multiple cameras – somewhat. The first time the tracking block associated to a camera thinks it has a good lock on the HMD, it uses that to compute the pose of that camera. As long as the lock is genuinely good at that point, and the pose the IMU fusion is tracking is good – then the relative pose between all the cameras is consistent and the tracking is OK. However, it’s easy for that to go wrong and end up with an inconsistency between different camera views that leads to jittery or jumpy tracking….

In the best case, it looks like this:

Which I am pretty happy with 🙂

In that test, I was using a single tracking camera, and had the controller sitting on desk where the camera couldn’t see it, which is why it was just floating there. Despite the fact that SteamVR draws it with a Vive controller model, the various controller buttons and triggers work, but there’s still something weird going on with the controller tracking.

What next? I have a list of known problems and TODO items to tackle:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed.
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting.
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test.
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now.
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

A nice side effect of all this work is that it can all feed in later to Rift S support. The Rift S uses inside-out tracking to determine the headset’s position in the world – but the filtering to combine those observations with the IMU data will be largely the same, and once you know where the headset is, finding and tracking the controller LED constellations still looks a lot like the CV1’s system.

If you want to try it out, or take a look at the code – it’s up on Github. I’m working in the rift-correspondence-search branch of my OpenHMD repository at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

by thaytan at September 20, 2020 08:15 PM

September 10, 2020

Bastien Nocerapower-profiles-daemon: new project announcement

(Bastien Nocera)

Despite what this might look like, I don't actually enjoy starting new projects: it's a lot easier to clean up some build warnings, or add a CI, than it is to start from an empty directory.

But sometimes needs must, and I've just released version 0.1 of such a project. Below you'll find an excerpt from the README, which should answer most of the questions. Please read the README directly in the repository if you're getting to this blog post more than a couple of days after it was first published.

Feel free to file new issues in the tracker if you have ideas on possible power-saving or performance enhancements. Currently the only supported “Performance” mode supported will interact with Intel CPUs with P-State support. More hardware support is planned.

TLDR; this setting in the GNOME 3.40 development branch soon, Fedora packages are done, API docs available:

 

 

From the README:

Introduction

power-profiles-daemon offers to modify system behaviour based upon user-selected power profiles. There are 3 different power profiles, a "balanced" default mode, a "power-saver" mode, as well as a "performance" mode. The first 2 of those are available on every system. The "performance" mode is only available on select systems and is implemented by different "drivers" based on the system or systems it targets.

In addition to those 2 or 3 modes (depending on the system), "actions" can be hooked up to change the behaviour of a particular device. For example, this can be used to disable the fast-charging for some USB devices when in power-saver mode.

GNOME's Settings and shell both include interfaces to select the current mode, but they are also expected to adjust the behaviour of the desktop depending on the mode, such as turning the screen off after inaction more aggressively when in power-saver mode.

Note that power-profiles-daemon does not save the currently active profile across system restarts and will always start with the "balanced" profile selected.

Why power-profiles-daemon

The power-profiles-daemon project was created to help provide a solution for two separate use cases, for desktops, laptops, and other devices running a “traditional Linux desktop”.

The first one is a "Low Power" mode, that users could toggle themselves, or have the system toggle for them, with the intent to save battery. Mobile devices running iOS and Android have had a similar feature available to end-users and application developers alike.

The second use case was to allow a "Performance" mode on systems where the hardware maker would provide and design such a mode. The idea is that the Linux kernel would provide a way to access this mode which usually only exists as a configuration option in some machines' "UEFI Setup" screen.

This second use case is the reason why we didn't implement the "Low Power" mode in UPower, as was originally discussed.

As the daemon would change kernel settings, we would need to run it as root, and make its API available over D-Bus, as has been customary for more than 10 years. We would also design that API to be as easily usable to build graphical interfaces as possible.

Why not...

This section will contain explanations of why this new daemon was written rather than re-using, or modifying an existing one. Each project obviously has its own goals and needs, and those comparisons are not meant as a slight on the project.

As the code bases for both those projects listed and power-profiles-daemon are ever evolving, the comments were understood to be correct when made.

thermald

thermald only works on Intel CPUs, and is very focused on allowing maximum performance based on a "maximum temperature" for the system. As such, it could be seen as complementary to power-profiles-daemon.

tuned and TLP

Both projects have similar goals, allowing for tweaks to be applied, for a variety of workloads that goes far beyond the workloads and use cases that power-profiles-daemon targets.

A fair number of the tweaks that could apply to devices running GNOME or another free desktop are either potentially destructive (eg. some of the SATA power-saving mode resulting in corrupted data), or working well enough to be put into place by default (eg. audio codec power-saving), even if we need to disable the power saving on some hardware that reacts badly to it.

Both are good projects to use for the purpose of experimenting with particular settings to see if they'd be something that can be implemented by default, or to put some fine-grained, static, policies in place on server-type workloads which are not as fluid and changing as desktop workloads can be.

auto-cpufreq

It doesn't take user-intent into account, doesn't have a D-Bus interface and seems to want to work automatically by monitoring the CPU usage, which kind of goes against a user's wishes as a user might still want to conserve as much energy as possible under high-CPU usage.

by Bastien Nocera (noreply@blogger.com) at September 10, 2020 07:00 PM

Bastien NoceraAvoid “Tag: v-3.38.0-fixed-brown-paper-bag”

(Bastien Nocera)

Over the past couple of (gasp!) decades, I've had my fair share of release blunders: forgetting to clean the tree before making a tarball by hand, forgetting to update the NEWS file, forgetting to push after creating the tarball locally, forgetting to update the appdata file (causing problems on Flathub)...

That's where check-news.sh comes in, to replace the check-news function of the autotools. Ideally you would:

- make sure your CI runs a dist job

- always use a merge request to do releases

- integrate check-news.sh to your meson build (though I would relax the appdata checks for devel releases)

by Bastien Nocera (noreply@blogger.com) at September 10, 2020 04:47 PM

September 09, 2020

Nirbheek ChauhanGStreamer 1.18 supports the Universal Windows Platform

tl;dr: The GStreamer 1.18 release ships with UWP support out of the box, with official GStreamer binary releases for it. Try out the 1.17.90 pre-release 1.18.0 release and let us know how it goes! There's also an example gstreamer app for UWP that showcases OpenGL support (via ANGLE), audio/video capture, hardware codecs, and WebRTC.

Short History Lesson

 
Last year at the GStreamer Conference in Lyon, I gave a talk (slides) about how “Firefox Reality” for the Microsoft HoloLens 2 mixed-reality headset is actually Servo, and it uses GStreamer for all media handling: WebAudio, HTML5 Video, and WebRTC.

I also spoke about the work we at Centricular did to port GStreamer to the HoloLens 2. The HoloLens 2 uses the new development target for Windows Store apps: the Universal Windows Platform. The majority of win32 APIs have been deprecated, and apps have to use the new Windows Runtime, which is a language-agnostic API written from the ground up.

So the majority of work went into making sure that Win32 code didn't use deprecated APIs (we used a bunch of them!), and making sure that we could build using the UWP toolchain. Most of that involved two components:
  • GLib, a cross-platform low-level library / abstraction layer used by GNOME (almost all our win32 code is in here)
  • Cerbero, the build aggregator used by GStreamer to build binaries for all platforms supported: Android, iOS, Linux, macOS, Windows (MSVC, MinGW, UWP)
The target was to port the core of GStreamer, and those plugins with external dependencies that were needed to do playback in <audio> and <video> tags. This meant that the only external plugin dependency we needed was FFmpeg, for the gst-libav plugin. All this went well, and Firefox Reality successfully shipped with that work.

Upstreaming and WebRTC

 
Building upon that work, for the past few months we've been working on adding support for the WebRTC plugin, and also upstreaming as much of the work as possible. This involved a bunch of pieces:
  1. Use only OpenSSL and not GnuTLS in Cerbero because OpenSSL supports targeting UWP. This also had the advantage of moving us from two SSL stacks to one.
  2. Port a bunch of external optional dependencies to Meson so that they could be built with Meson, which is the easiest way for a cross-platform project to support UWP. If your Meson project builds on Windows, it will build on UWP with minimal or no build changes.
  3. Rebase the GLib patches that I didn't find the time to upstream last year on top of 2.62, split into smaller pieces that will be easier to upstream, update for new Windows SDK changes, remove some of the hacks, and so on.
  4. Rework and rewrite the Cerbero patches I wrote last year that were in no shape to be upstreamed.
  5. Ensure that our OpenGL support continues to work using Servo's ANGLE UWP port
  6. Write a new plugin for audio capture called wasapi2, great work by Seungha Yang.
  7. Write a new plugin for video capture called mfvideosrc as part of the media foundation plugin which is new in GStreamer 1.18, also by Seungha.
  8. Write a new example UWP app to test all this work, also done by Seungha! 😄
  9. Run the app through the Windows App Certification Kit
And several miscellaneous tasks and bugfixes that we've lost count of.

Our highest priority this time around was making sure that everything can be upstreamed to GStreamer, and it was quite a success! Everything needed for WebRTC support on UWP has been merged, and you can use GStreamer in your UWP app by downloading the official GStreamer binaries starting with the 1.18 release.

On top of everything in the above list, thanks to Seungha, GStreamer on UWP now also supports:

Try it out!

 
The example gstreamer app I mentioned above showcases all this. Go check it out, and don't forget to read the README file!
 

Next Steps

 
The most important next step is to upstream as many of the GLib patches we worked on as possible, and then spend time porting a bunch of GLib APIs that we currently stub out when building for UWP.

Other than that, enabling gst-libav is also an interesting task since it will allow apps to use FFmpeg software codecs in their gstreamer UWP app. People should use the hardware accelerated d3d11 decoders and mediafoundation encoders for optimal power consumption and performance, but sometimes it's not possible because codec support is very device-dependent. 

Parting Thoughts

 
I'd like to thank Mozilla for sponsoring the bulk of this work. We at Centricular greatly value partners that understand the importance of working with upstream projects, and it has been excellent working with the Servo team members, particularly Josh Matthews, Alan Jeffrey, and Manish Goregaokar.

In the second week of August, Mozilla restructured and the Servo team was one of the teams that was dissolved. I wish them all the best in their future endeavors, and I can't wait to see what they work on next. They're all brilliant people.

Thanks to the forward-looking and community-focused approach of the Servo team, I am confident that the project will figure things out to forge its own way forward, and for the same reason, I expect that GStreamer's UWP support will continue to grow.

by Nirbheek (noreply@blogger.com) at September 09, 2020 04:08 PM

September 08, 2020

Bastien NoceraVideos in GNOME 3.38

(Bastien Nocera)

This is going to be a short post, as changes to Videos have been few and far between in the past couple of releases.

The major change to the latest release is that we've gained Tracker 3 support through a grilo plugin (which meant very few changes to our own code). But the Tracker 3 libraries are incompatible with the Tracker 2 daemon that's usually shipped in distributions, including on this author's development system.

So we made use of the ability of Tracker to run inside a Flatpak sandbox along with the video player, removing the need to have Tracker installed by the distribution, on the host. This should also make it easier to give users control of the directories they want to use to store their movies, in the future.

The release candidate for GNOME 3.38 is available right now as the stable version on Flathub.

by Bastien Nocera (noreply@blogger.com) at September 08, 2020 02:31 PM

GStreamerGStreamer 1.18.0 new major stable release

(GStreamer)

The GStreamer team is excited to announce a new major feature release of your favourite cross-platform multimedia framework!

As always, this release is again packed with new features, bug fixes and other improvements.

The 1.18 release series adds new features on top of the previous 1.16 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

Highlights:

  • GstTranscoder: new high level API for applications to transcode media files from one format to another
  • High Dynamic Range (HDR) video information representation and signalling enhancements
  • Instant playback rate change support
  • Active Format Description (AFD) and Bar Data support
  • ONVIF trick modes support in both GStreamer RTSP server and client
  • Hardware-accelerated video decoding on Windows via DXVA2 / Direct3D11
  • Microsoft Media Foundation plugin for video capture and hardware-accelerated video encoding on Windows
  • qmlgloverlay: New overlay element that renders a QtQuick scene over the top of an input video stream
  • New imagesequencesrc element to easily create a video stream from a sequence of jpeg or png images
  • dashsink: Add new sink to produce DASH content
  • dvbsubenc: DVB Subtitle encoder element
  • TV broadcast compliant MPEG-TS muxing with constant bitrate muxing and SCTE-35 support
  • rtmp2: new RTMP client source and sink element implementation
  • svthevcenc: new [SVT-HEVC](https://github.com/OpenVisualCloud/SVT-HEVC)-based H.265 video encoder
  • vaapioverlay compositor element using VA-API
  • rtpmanager support for Google's Transport-Wide Congestion Control (twcc) RTP extension
  • splitmuxsink and splitmuxsrc gained support for auxiliary video streams
  • webrtcbin now contains some initial support for renegotiation involving stream addition and removal
  • New RTP source and sink elements to set up RTP streaming via rtp:// URIs
  • New Audio Video Transport Protocol (AVTP) plugin for Time-Sensitive Applications
  • Support for the Video Services Forum's Reliable Internet Stream Transport (RIST) TR-06-1 Simple Profile
  • Universal Windows Platform (UWP) support
  • rpicamsrc element for capturing from the Raspberry Pi camera
  • RTSP Server TCP interleaved backpressure handling improvements as well as support for Scale/Speed headers
  • GStreamer Editing Services gained support for nested timelines, per-clip speed rate control and the [OpenTimelineIO](https://opentimelineio.readthedocs.io) format.
  • Autotools build system has been removed in favour of Meson
  • Many performance improvements

Full release notes can be found here.

Binaries for Android, iOS, Mac OS X and Windows will be provided in due course.

You can download release tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-devtools, gstreamer-vaapi, gstreamer-sharp, gst-omx, or gstreamer-docs.

September 08, 2020 12:30 AM

September 07, 2020

Víctor JáquezReview of Igalia Multimedia activities (2020/H1)

This blog post is a review of the various activities the Igalia Multimedia team was involved in during the first half of 2020.

Our previous reports are:

Just before a new virus turned into pandemics we could enjoy our traditional FOSDEM. There, our colleague Phil gave a talk about many of the topics covered in this report.

GstWPE

GstWPE’s wpesrc element, produces a video texture representing a web page rendered off-screen by WPE.

We have worked on a new iteration of the GstWPE demo, focusing on one-to-many, web-augmented overlays, broadcasting with WebRTC and Janus.

Also, since the merge of gstwpe plugin in gst-plugins-bad (staging area for new elements) new users have come along spotting rough areas and improving the element along the way.

Video Editing

GStreamer Editing Services (GES) is a library that simplifies the creation of multimedia editing applications. It is based on the GStreamer multimedia framework and is heavily used by Pitivi video editor.

Implemented frame accuracy in the GStreamer Editing Services (GES)

As required by the industry, it is now possible to reference all time in frame number, providing a precise mapping between frame number and play time. Many issues were fixed in GStreamer to reach the precision enough for make this work. Also intensive regression tests were added.

Implemented time effects support in GES

Important refactoring inside GStreamer Editing Services have happened to allow cleanly and safely change playback speed of individual clips.

Implemented reverse playback in GES

Several issues have been fixed inside GStreamer core elements and base classes in order to support reverse playback. This allows us to implement reliable and frame accurate reverse playback for individual clips.

Implemented ImageSequence support in GStreamer and GES

Since OpenTimelineIO implemented ImageSequence support, many users in the community had said it was really required. We reviewed and finished up imagesequencesrc element, which had been awaiting review for years.

This feature is now also supported in the OpentimelineIO GES adapater.

Optimized nested timelines preroll time by an order of magnitude

Caps negotiation, done while the pipeline transitions from pause state to playing state, testing the whole pipeline functionality, was the bottleneck for nested timelines, so pipelines were reworked to avoid useless negotiations. At the same time, other members of GStreamer community have improved caps negotiation performance in general.

Last but not least, our colleague Thibault gave a talk in The Pipeline Conference about The Motion Picture Industry and Open Source Software: GStreamer as an Alternative, explaining how and why GStreamer could be leveraged in the motion picture industry to allow faster innovation, and solve issues by reusing all the multi-platform infrastructure the community has to offer.

WebKit multimedia

There has been a lot of work on WebKit multimedia, particularly for WebKitGTK and WPE ports which use GStreamer framework as backend.

WebKit Flatpak SDK

But first of all we would like to draw readers attention to the new WebKit Flatpak SDK. It was not a contribution only from the multimedia team, but rather a joint effort among different teams in Igalia.

Before WebKit Flatpak SDK, JHBuild was used for setting up a WebKitGTK/WPE environment for testing and development. Its purpose to is to provide a common set of well defined dependencies instead of relying on the ones available in the different Linux distributions, which might bring different outputs. Nonetheless, Flatpak offers a much more coherent environment for testing and develop, isolated from the rest of the building host, approaching to reproducible outputs.

Another great advantage of WebKit Flatpak SDK, at least for the multimedia team, is the possibility of use gst-build to setup a custom GStreamer environment, with latest master, for example.

Now, for sake of brevity, let us sketch an non-complete list of activities and achievements related with WebKit multimedia.

General multimedia

Media Source Extensions (MSE)

Encrypted Media Extension (EME)

One of the major results of this first half, is the upstream of ThunderCDM, which is an implementation of a Content Decryption Module, providing Widevine decryption support. Recently, our colleague Xabier, published a blog post on this regard.

And it has enabled client-side video rendering support, which ensures video frames remain protected in GPU memory so they can’t be reached by third-party. This is a requirement for DRM/EME.

WebRTC

GStreamer

Though we normally contribute in GStreamer with the activities listed above, there are other tasks not related with WebKit. Among these we can enumerate the following:

GStreamer VAAPI

  • Reviewed a lot of patches.
  • Support for media-driver (iHD), the new VAAPI driver for Intel, mostly for Gen9 onwards. There are a lot of features with this driver.
  • A new vaapioverlay element.
  • Deep code cleanups. Among these we would like to mention:
    • Added quirk mechanism for different backends.
    • Change base classes to GstObject and GstMiniObject of most of classes and buffers types.
  • Enhanced caps negotiation given current driver’s constraints

Conclusions

The multimedia team in Igalia has keep working, along the first half of this strange year, in our three main areas: browsers (mainly on WebKitGTK and WPE), video editing and GStreamer framework.

We worked adding and enhancing WebKitGTK and WPE multimedia features in order to offer a solid platform for media providers.

We have enhanced the Video Editing support in GStreamer.

And, along these tasks, we have contribuited as much in GStreamer framework, particulary in hardware accelerated decoding and encoding and VA-API.

by vjaquez at September 07, 2020 03:12 PM

September 04, 2020

Christian SchallerPipeWire Late Summer Update 2020

(Christian Schaller)

Wim Taymans

Wim Taymans talking about current state of PipeWire


Wim Taymans did an internal demonstration yesterday for the desktop team at Red Hat of the current state of PipeWire. For those still unaware PipeWire is our effort to bring together audio, video and pro-audio under Linux, creating a smooth and modern experience. Before PipeWire there was PulseAudio for consumer audio, Jack for Pro-audio and just unending pain and frustration for video. PipeWire is being done with the aim of being ABI compatible with ALSA, PulseAudio and JACK, meaning that PulseAudio and Jack apps should just keep working on top of Pipewire without the need for rewrites (and with the same low latency for JACK apps).

As Wim reported yesterday things are coming together with both the PulseAudio, Jack and ALSA backends being usable if not 100% feature complete yet. Wim has been running his system with Pipewire as the only sound server for a while now and things are now in a state where we feel ready to ask the wider community to test and help provide feedback and test cases.

Carla on PipeWire

Carla running on PipeWire

Carla as shown above is a popular Jack applications and it provides among other things this patchbay view of your audio devices and applications. I recommend you all to click in and take a close look at the screenshot above. That is the Jack application Carla running and as you see PulseAudio applications like GNOME Settings and Google Chrome are also showing up now thanks to the unified architecture of PipeWire, alongside Jack apps like Hydrogen. All of this without any changes to Carla or any of the other applications shown.

At the moment Wim is primarily testing using Cheese, GNOME Control center, Chrome, Firefox, Ardour, Carla, vlc, mplayer, totem, mpv, Catia, pavucontrol, paman, qsynth, zrythm, helm, Spotify and Calf Studio Gear. So these are the applications you should be getting the most mileage from when testing, but most others should work too.

Anyway, let me quickly go over some of the highlight from Wim’s presentation.

Session Manager

PipeWire now has a functioning session manager that allows for things like

  • Metadata, system for tagging objects with properties, visible to all clients (if permitted)
  • Load and save of volumes, automatic routing
  • Default source and sink with metadata, saved and loaded as well
  • Moving streams with metadata

Currently this is a simple sample session manager that Wim created himself, but we also have a more advanced session manager called Wireplumber being developed by Collabora, which they developed for use in automotive Linux usecases, but which we will probably be moving to over time also for the desktop.

Human readable handling of Audio Devices

Wim took the code and configuration data in Pulse Audio for ALSA Card Profiles and created a standalone library that can be shared between PipeWire and PulseAudio. This library handles ALSA sound card profiles, devices, mixers and UCM (use case manager used to configure the newer audio chips (like the Lenovo X1 Carbon) and lets PipeWire provide the correct information to provide to things like GNOME Control Center or pavucontrol. Using the same code as has been used in PulseAudio for this has the added benefit that when you switch from PulseAudio to PipeWire your devices don’t change names. So everything should look and feel just like PulseAudio from an application perspective. In fact just below is a screenshot of pavucontrol, the Pulse Audio mixer application running on top of Pipewire without a problem.

PulSe Audio Mixer

Pavucontrol, the Pulse Audio mixer on Pipewire

Creating audio sink devices with Jack
Pipewire now allows you to create new audio sink devices with Jack. So the example command below creates a Pipewire sink node out of calfjackhost and sets it up so that we can output for instance the audio from Firefox into it. At the moment you can do that by running your Jack apps like this:

PIPEWIRE_PROPS="media.class=Audio/Sink" calfjackhost

But eventually we hope to move this functionality into the GNOME Control Center or similar so that you can do this setup graphically. The screenshot below shows us using CalfJackHost as an audio sink, outputing the audio from Firefox (a PulseAudio application) and CalfJackHost generating an analyzer graph of the audio.

Calfjackhost on pipewire

The CalfJackhost being used as an audio sink for Firefox

Creating devices with GStreamer
We can also use GStreamer to create PipeWire devices now. The command belows take the popular Big Buck Bunny animation created by the great folks over at Blender and lets you set it up as a video source in PipeWire. So for instance if you always wanted to play back a video inside Cheese for instance, to apply the Cheese effects to it, you can do that this way without Cheese needing to change to handle video playback. As one can imagine this opens up the ability to string together a lot of applications in interesting ways to achieve things that there might not be an application for yet. Of course application developers can also take more direct advantage of this to easily add features to their applications, for instance I am really looking forward to something like OBS Studio taking full advantage of PipeWire.

gst-launch-1.0 uridecodebin uri=file:///home/wim/data/BigBuckBunny_320x180.mp4 ! pipewiresink mode=provide stream-properties="props,media.class=Video/Source,node.description=BBB"

Cheese paying a video through pipewire

Cheese playing a video provided by GStreamer through PipeWire.

How to get started testing PipeWire
Ok, so after seeing all of this you might be thinking, how can I test all of this stuff out and find out how my favorite applications work with PipeWire? Well first thing you should do is make sure you are running Fedora Workstation 32 or later as that is where we are developing all of this. Once you done that you need to make sure you got all the needed pieces installed:

sudo dnf install pipewire-libpulse pipewire-libjack pipewire-alsa

Once that dnf command finishes you run the following to get PulseAudio replaced by PipeWire.


cd /usr/lib64/

sudo ln -sf pipewire-0.3/pulse/libpulse-mainloop-glib.so.0 /usr/lib64/libpulse-mainloop-glib.so.0.999.0
sudo ln -sf pipewire-0.3/pulse/libpulse-simple.so.0 /usr/lib64/libpulse-simple.so.0.999.0
sudo ln -sf pipewire-0.3/pulse/libpulse.so.0 /usr/lib64/libpulse.so.0.999.0

sudo ln -sf pipewire-0.3/jack/libjack.so.0 /usr/lib64/libjack.so.0.999.0
sudo ln -sf pipewire-0.3/jack/libjacknet.so.0 /usr/lib64/libjacknet.so.0.999.0
sudo ln -sf pipewire-0.3/jack/libjackserver.so.0 /usr/lib64/libjackserver.so.0.999.0

sudo ldconfig


(you can also find those commands here

Once you run these commands you should be able to run

pactl info

and see this as the first line returned:
Server String: pipewire-0

I do recommend rebooting, to be 100% sure you are on a PipeWire system with everything outputting through PipeWire. Once that is done you are ready to start testing!

Our goal is to use the remainder of the Fedora Workstation 32 lifecycle and the Fedora Workstation 33 lifecycle to stabilize and finish the last major features of PipeWire and then start relying on it in Fedora Workstation 34. So I hope this article will encourage more people to get involved and join us on gitlab and on the PipeWire IRC channel at #pipewire on Freenode.

As we are trying to stabilize PipeWire we are working on it on a bug by bug basis atm, so if you end up testing out the current state of PipeWire then be sure to report issues back to us through the PipeWire issue tracker, but do try to ensure you have a good test case/reproducer as we are still so early in the development process that we can’t dig into ‘obscure/unreproducible’ bugs.

Also if you want/need to go back to PulseAudio you can run the commands here

Also if you just want to test a single application and not switch your whole system over you should be able to do that by using the following commands:

pw-pulse

or

pw-jack

Next Steps
So what are our exact development plans at this point? Well here is a list in somewhat priority order:

  1. Stabilize – Our top priority now is to make PipeWire so stable that the power users that we hope to attract us our first batch of users are comfortable running PipeWire as their only audio server. This is critical to build up a userbase that can help us identify and prioritize remaining issues and ensure that when we do switch Fedora Workstation over to using PipeWire as the default and only supported audio server it will be a great experience for users.
  2. Jackdbus – We want to implement support for the jackdbus API soon as we know its an important feature for the Fedora Jam folks. So we hope to get to this in the not to distant future
  3. Flatpak portal for JACK/audio applications – The future of application packaging is Flatpaks and being able to sandbox Jack applications properly inside a Flatpak is something we want to enable.
  4. Bluetooth – Bluetooth has been supported in PipeWire from the start, but as Wims focus has moved elsewhere it has gone a little stale. So we are looking at cycling back to it and cleaning it up to get it production ready. This includes proper support for things like LDAC and AAC passthrough, which is currently not handled in PulseAudio. Wim hopes to push an updated PipeWire in Fedora out next week which should at least get Bluetooth into a basic working state, but the big fix will come later.
  5. Pulse effects – Wim has looked at this, but there are some bugs that blocks data from moving through the pipeline.
  6. Latency compensation – We want complete latency compensation implemented. This is not actually in Jack currently, so it would be a net new feature.
  7. Network audio – PulseAudio style network audio is not implemented yet.

by uraeus at September 04, 2020 04:47 PM

September 02, 2020

Xabier Rodríguez CalvarSerious Encrypted Media Extensions on GStreamer based WebKit ports

Encrypted Media Extensions (a.k.a. EME) is the W3C standard for encrypted media in the web. This way, media providers such as Hulu, Netflix, HBO, Disney+, Prime Video, etc. can provide their contents with a reasonable amount of confidence that it will make it very complicated for people to “save” their assets without their permission. Why do I use the word “serious” in the title? In WebKit there is already support for Clear Key, which is the W3C EME reference implementation but EME supports more encryption systems, even privative ones (I have my opinion about this, you can ask me privately). No service provider (that I know) supports Clear Key, they usually rely on Widevine, PlayReady or some other.

Three years ago, my colleague Žan Doberšek finished the implementation of what was going to be the shell of WebKit’s modern EME implementation, following latest W3C proposal. We implemented that downstream (at Web Platform for Embedded) as well using Thunder, which includes as a plugin a fork of what was Open Content Decryption Module (a.k.a. OpenCDM). The OpenCDM API changed quite a lot during this journey. It works well and there are millions of set-top-boxes using it currently.

The delta between downstream and the upstream GStreamer based WebKit ports was quite big, testing was difficult and syncing was not always easy, so we decided reverse the situation.

Our first step was done by my colleague Charlie Turner, who made Clear Key work upstream again while adapted some changes the Apple folks had done meanwhile. It was amazing to see Clear Key tests passing again and his work with the CDMProxy related classes was awesome. After having ClearKey working, I had to adapt them a bit to accomodate Thunder. To explain a bit about the WebKit EME architecture, I must say that there are two layers. The first is the crossplatform one, which implements the W3C API (MediaKeys, MediaKeySession, CDM…). These classes rely on the platform ones (CDMPrivate, CDMInstance, CDMInstanceSession) to handle the platform management, message exchange, etc. which would be the second layer. Apple playback system is fully integrated with their DRM system so they don’t need anything else. We do because we need to integrate our own decryptors to defer to Thunder for decryption so in the GStreamer based ports we also need the CDMProxy related classes, which would be CDMProxy, CDMInstanceProxy, CDMInstanceSessionProxy… The last two extend CDMInstance and CDMInstanceSession respectively to be able to deal with the key management, that is abstracted to the KeyHandle and KeyStore.

Once the abstraction is there (let’s remember that the abstranction works both for Clear Key and Thunder), the Thunder implementation is quite simple, just gluing the CDMProxy, CDMInstanceProxy and CDMInstanceSessionProxy classes to the Thunder system and writing a GStreamer decryptor element for it. I might have made a mistake when selecting the files but considering Thunder classes + the GStreamer common decryptor code, cloc says it is just 1198 lines of platform code. I think it is pretty low for what it does. Apart from that, obviously, there are 5760 lines of crossplatform code.

To build and run all this you need to do several things:

  1. Build the dependencies with WEBKIT_JHBUILD=1 JHBUILD_ENABLE_THUNDER="yes" to enable the old fashioned JHBuild build and force it to build the Thunder dependencies. All dependendies are on JHBuild, even Widevine is referenced but to download it you need the proper credentials as it is closed source.
  2. Pass --thunder when calling build-webkit.sh.
  3. Run MiniBrowser with WEBKIT_GST_EME_RANK_PRIORITY="Thunder" and pass parameters --enable-mediasource=TRUE --enable-encrypted-media=TRUE --autoplay-policy=allow. The autoplay policy is usually optional but in this case it is necessary for the YouTube TV tests. We need to give the Thunder decryptor a higher priority because of WebM, that does not specify a key system and without it the Clear Key one can be selected and fail. MP4 does not create trouble because the protection system is specified and the caps negotiation does its magic.

As you could have guessed if you have a closer look at the GStreamer JHBuild moduleset, you’ll see that only Widevine is supported. To support more, you only have to make them build in the Thunder ecosystem and add them to CDMFactoryThunder::supportedKeySystems.

When I coded this, all YouTube TV tests for Widevine were green in the desktop. At the moment of writing this post they aren’t because of some problem with the Widevine installation that will be sorted quickly, I hope.

by calvaris at September 02, 2020 02:59 PM

August 31, 2020

Christian SchallerFirst Lenovo laptop with Fedora now available on the web!

(Christian Schaller)

This weekend the X1 Carbon with Fedora Workstation went live in North America on Lenovos webstore. This is a big milestone for us and for Lenovo as its the first time Fedora ships pre-installed on a laptop from a major vendor and its the first time the worlds largest laptop maker ships premium laptops with Linux directly to consumers. Currently only the X1 Carbon is available, but more models is on the way and more geographies will get added too soon. As a sidenote, the X1 Carbon and more has actually been available from Lenovo for a couple of Months now, it is just the web sales that went online now. So if you are a IT department buying Lenovo laptops in bulk, be aware that you can already buy the X1 Carbon and the P1 for instance through the direct to business sales channel.

Also as a reminder for people looking to deploy Fedora laptops or workstations in numbers, be sure to check out Fleet Commander our tool for helping you manage configurations across such a fleet.

I am very happy with the work that has been done here to get to this point both by Lenovo and from the engineers on my team here at Red Hat. For example Lenovo made sure to get all of their component makers to ramp up their Linux support and we have been working with them to both help get them started writing drivers for Linux or by helping add infrastructure they could plug their hardware into. We also worked hard to get them all set up on the Linux Vendor Firmware Service so that you could be assured to get updated firmware not just for the laptop itself, but also for its components.

We also have a list of improvements that we are working on to ensure you get the full benefit of your new laptops with Fedora and Lenovo, including working on things like improved power management features being able to have power consumption profiles that includes a high performance mode for some laptops that will allow it to run even faster when on AC power and on the other end a low power mode to maximize battery life. As part of that we are also working on adding lap detection support, so that we can ensure that you don’t risk your laptop running to hot in your lap and burning you or that radio antennas are running to strong when that close to your body.

So I hope you decide to take the leap and get one of the great developer laptops we are doing together with Lenovo. This is a unique collaboration between the worlds largest laptop maker and the worlds largest Linux company. What we are doing here isn’t just a minimal hardware enablement effort, but a concerted effort to evolve Linux as a laptop operating system and doing it in a proper open source way. So this is the culmination of our work over the last few years, creating the LVFS, adding Thunderbolt support to Linux, improving fingerprint reader support in Linux, supporting HiDPI screens, supporting hidpi mice, creating the possibility of a secure desktop with Wayland, working with NVidia to ensure that Mesa and Nvidia driver can co-exist through glvnd, creating Flatpak to ensure we can bring the advantages of containers to the desktop space and at the same way do it in a vendor neutral way. So when you buy a Lenovo laptop with Fedora Workstation, you are not just getting a great system, but you are also supporting our efforts to take Linux to the next level, something which I think we are truly the only linux vendor with the scale and engineering ability to do.

Of course we are not stopping here, so let me also use this chance to talk a bit about some of our other efforts.

Toolbox
Containers are popular for deploying software, but a lot of people are also discovering now that they are an incredible way to develop software, even if that software is not going to be deployed as a Flatpak or Kubernetes container. The term often used for containers when used as a development tool is pet containers and with Toolbox project we are aiming to create the best tool possible for developers to work with pet containers. Toolbox allows you to have always have a clean environment to work in which you can change to suit each project you work on, however you like, without affecting your host system. So for instance if you need to install a development snapshot of Python you can do that inside your Toolbox container and be confident that various other parts of your desktop will not start crashing due to the change. And when your are done with your project and don’t want that toolbox around anymore you can easily delete it without having to spend time to figure out which packages you installed can now be safely uninstalled from your host system or just not bother and have your host get bloated over time with stuff you are not actually using anymore.

One big advantage we got at Red Hat is that we are a major contributor to container technologies across the stack. We are a major participant in the Open Container Initiative and we are alongside Google the biggest contributor to the Kubernetes project. This includes having created a set of container tools called Podman. So when we started prototyping Toolbox we could base it up on podman and get access to all the power and features that podman provides, but at the same make them easier to use and consumer from your developer laptop or workstation.

Our initial motivation was also driven by the fact that for image based operating systems like Fedora Silverblue and Fedora CoreOS, where the host system is immutable you still need some way to be able to install packages and do development, but we quickly realized that the pet container development model is superior to the old ‘on host’ model even if you are using a traditional package based system like Fedora Workstation. So we started out by prototyping the baseline functionality, writing it as a shell script to quickly test out our initial ideas. Of course as Toolbox picked up in popularity we realized we needed to transition quickly to a proper development language so that we wouldn’t end up with an unmaintainable mess written in shell, and thus Debarshi Ray and Ondřej Míchal has recently completed the rewrite to Go (Note: the choice of Go was to make it easier for the wider container community to contribute since almost all container tools are written in Go).

Leading up towards Fedora Workstation 33 we are trying figure out a few things. One is how we can make giving you access to a RHEL based toolbox through the Red Hat Developer Program in an easy and straightforward manner, and this is another area where pet container development shines. You can set up your pet container to run a different Linux version than your host. So you can use Fedora to get the latest features for your laptop, but target RHEL inside your Toolbox to get an easy and quick deployment path to your company RHEL servers. I would love it if we can extend this even further as we go along, to for instance let you set up a Steam runtime toolbox to do game development targeting Steam.
Setting up a RHEL toolbox is already technically possible, but requires a lot more knowledge and understanding of the underlaying technologies than we wish.
The second thing we are looking at is how we deal with graphical applications in the context of these pet containers. The main reason we are looking at that is because while you can install for instance Visual Studio code inside the toolbox container and launch it from the command line, we realize that is not a great model for how you interact with GUI applications. At the moment the only IDE that is set up to be run in the host, but is able to interact with containers properly is GNOME Builder, but we realize that there are a lot more IDEs people are using and thus we want to try to come up with ways to make them work better with toolbox containers beyond launching them from the command line from inside the container. There are some extensions available for things like Visual Studio Code starting to try to improve things (those extensions are not created by us, but looking at solving a similar problem), but we want to see how we can help providing a polished experience here. Over time we do believe the pet container model of development is so good that most IDEs will follow in GNOME Builders footsteps and make in-container development a core part of the feature set, but for now we need to figure out a good bridging strategy.

Wayland – headless and variable refresh rate.
Since switching to Wayland we have continued to work in improving how GNOME work under Wayland to remove any major feature regressions from X11 and to start taking advantage of the opportunities that Wayland gives us. One of the last issues that Jonas Ådahl has been hard at work recently is trying to ensure we have headless support for running GNOME on systems without a screen. We know that there are a lot of sysadmins for instance who want to be able to launch a desktop session on their servers to be used as a tool to test and debug issues. These desktops are then accessed through tools such as VNC or Nice DCV. As part of that work he also made sure we could deal with having multiple monitors connected which had different refresh rates. Before that fix you would get the lowest common denominator between your screens, but now if you for instance got a 60Hz monitor and a 75Hz monitor they will be able to function independent of each other and run at their maximum refresh rate. With the variable refresh rate work now landed upstream Jonas is racing to get the headless support finished and landed in time for Fedora Workstation 33.

Linux Vendor Firmware Service
Richard Hughes is continuing his work on moving the LVFS forward having spent time this cycle working with the Linux Foundation to ensure the service can scale even better. He is also continuously onboarding new vendors and helping existing vendors use LVFS for even more things. We are now getting reports that LVFS has become so popular that we are now getting reports of major hardware companies who up to know hasn’t been to interested in the LVFS are getting told by their customers to start using it or they will switch supplier. So expect the rapid growth of vendors joining the LVFS to keep increasing. It is also worth nothing that many of vendors who are already set up on LVFS are steadily working on increasing the amount of systems they support on it and pushing their suppliers to do the same. Also for enterprise use of LVFS firmware Marc Richter also wrote an article on access.redhat.com about how to use LVFS with Red Hat Satelitte. Satellite for those who don’t know it is Red Hats tool for managing and keeping servers up to date and secure. So for large companies having their machines, especially servers, accessing LVFS directly is not a wanted behaviour, so now they can use Satelitte to provide a local repository of the LVFS firmware.

PipeWire
One of the changes we been working on that I am personally extremely excited about is PipeWire

. For those of you who don’t know it, PipeWire is one of our major swamp draining efforts which aims to bring together audio, pro-audio and video under linux and provide a modern infrastructure for us to move forward. It does so however while being ABI compatible with both Jack and PulseAudio, meaning that applications will not need to be ported to work with PipeWire. We have been using it for a while for video already to handle screen capture under Wayland and for allowing Flatpak containers access to webcams in a secure way, but Wim Taymans has been working tirelessly on moving that project forward over the last 6 Months, focused a lot of fixing corner cases in the Jack support and also ramping up the PulseAudio support. We had hoped to start wide testing in Fedora Workstation 32 of the audio parts of PipeWire, but we decided that since such a key advantage that PipeWire brings is not just to replace Jack or PulseAudio, but also to ensure the two usecases co-exist and interact properly, we didn’t want to start asking people to test until we got the PulseAudio support close to being production ready. Wim has been making progress by leaps and bounds recently and while I can’t 100% promise it yet we do expect to roll out the audio bits of PipeWire for more widescale testing in Fedora Workstation 33 with the goal of making it the default for Fedora Workstation 34 or more likely Fedora Workstation 35.
Wim is doing an internal demo this week, so I will try to put out a blog post talking about that later in the week.

Flatpak – incremental updates
One of the features we added to Flatpaks was the ability to distribute them as Open Container Initiative compliant containers. The reason for this was that as companies, Red Hat included, built infrastructure for hosting and distributing containers we could also use that for Flatpaks. This is obviously a great advantage for a variety of reasons, but it had one large downside compared to the traditional way of distributing Flatpaks (as Ostree images) which is that each update comes as a single large update as opposed to the atomic update model that OStree provides.
Which is why if you would compare the same application when shipping from Flathub, which uses Ostree, versus from the Fedora container registry, you would quickly notice that you get a lot smaller updates from Flathub. For kubernetes containers this hasn’t been considered a huge problem as their main usecase is copying the containers around in a high-speed network inside your cloud provider, but for desktop users this is annoying. So Alex Larsson and Owen Taylor has been working on coming up with a way to do to incremental updates for OCI/Docker/Kubernetes containers too, which not only means we can get very close to the Flathub update size in the Fedora Container Catalog, but it also means that since we implemented this in a way that works for all OCI/Kubernetes containers you will be able to get them too with incremental update functionality. Especially as such containers are making their way into edge computing where update sizes do matter, just like they do on the desktop.

Hangul input under Wayland
Red Hat, like Lenovo, targets most of the world with our products and projects. This means that we want them to work great even for people who doesn’t use English or another European language. To achieve this we have a team dedicated to ensuring that not just Linux, but all Red Hat products work well for international users as part of my group at Red Hat. That team, lead by Jens Petersen, is distributed around the globe with engineers in Japan, China, India, Singapore and Germany. This team contributes to a lot of different things like font maintenance, input method development, i18n infrastructure and more.
One thing this team recently discovered was that the support for Korean input under Wayland. So Peng Wu, Takao Fujiwara and Carlos Garnacho worked together to come up with a series of patches for ibus and GNOME Shell to ensure that Fedora Workstation on Wayland works perfectly for Korean input. I wanted to highlight this effort because while I don’t usually mention efforts which such a regional impact in my blog posts it is a critical part of keeping Linux viable and usable across the globe. And ensuring that you can use your computer in your own language is something we feel is important and want to enable and also an area where I believe Red Hat is investing more than any other vendor out there.

GLX on EGL
We meet with NVidia on a regular basis to discuss topics of shared interest and one thing we been looking at for a while now is the best way to support Nvidia binary driver under XWayland. As part of that Adam Jackson has been working on a research project to see how feasible it would be to create a way to run GLX applications on top of EGL. As one might imagine EGL doesn’t have a 1to1 match with GLX APIs, but based on what we seen so far is that it should be close enough to get things going (Adam already got glxgears running :). The goal here would be to have an initial version that works ok, and then in collaboration with NVidia we can evolve it to be a great solution for even the most demanding OpenGL/GLX applications. Currently the code causes an extra memcopy compared to running on GLX native, but this is something we think can be resolved in collaboration with NVidia. Of course this is still an early stage effort and Adam and NVidia are currently looking at it so there is of course a chance still we will hit a snag and have to go back to the drawing board. For those interested you can take a look at this Mesa merge request to see the current state.

by uraeus at August 31, 2020 04:27 PM

August 21, 2020

GStreamerGStreamer 1.17.90 pre-release (1.18 rc1)

(GStreamer)

The GStreamer team is pleased to announce the first release candidate for the upcoming stable 1.18 release series.

The unstable 1.17 release series adds new features on top of the current stable 1.16 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

The 1.17.90 pre-release series is for testing and development purposes in the lead-up to the stable 1.18 series which is now feature frozen and scheduled for release soon. Any newly-added API can still change until that point, although it is very rare for that to happen at this point.

Depending on how things go there might be another release candidate next week and then hopefully 1.18.0 shortly after.

Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

The autotools build has been dropped entirely for this release, so it's finally all Meson from here on.

This pre-release is for all developers, distributors and early adaptors and anyone who still needs to update their build/packaging setup for Meson.

On the documentation front we have switched away from gtk-doc to hotdoc, but we now provide a release tarball of the built documentation in html and devhelp format, and we recommend distributors switch to that and provide a single gstreamer documentation package in future. Packagers will not need to use hotdoc themselves.

Instead of a gst-validate tarball we now ship a gst-devtools tarball, and the gstreamer-editing-services tarball has been renamed to gst-editing-services for consistency with the module name in Gitlab.

Packagers: please note that plugins may have moved between modules, so please take extra care and make sure inter-module version dependencies are such that users can only upgrade all modules in one go, instead of seeing a mix of 1.17 and 1.16 on their system.

Binaries for Android, iOS, Mac OS X and Windows are also available at the usual location.

Release tarballs can be downloaded directly here:

As always, please let us know of any issues you run into by filing an issue in Gitlab.

August 21, 2020 06:00 PM

August 15, 2020

Vivek Rcvtracker: OpenCV object tracking plugin

I’ve been selected as a student developer at Pitivi for Google Summer of Code 2020. My project is to create an object tracking and blurring feature.

The tracking is done by passing the video clip through a pipeline which includes a tracker plugin. So, the first goal of the project was to implement the tracker plugin in GStreamer.

Introducing cvtracker

This is a GStreamer plugin which allows the user to select an object in the initial frame of a clip by specifying the object’s bounding box (x, y, width and height coordinates). The element then tracks the object during the subsequent frames of the clip.

This plugin is in the gst-plugins-bad module. It is currently a merge request.

The plugin can be used by anyone by just installing the module. An example pipeline is given below.

Example

A sample pipeline with cvtracker looks like this:

gst-launch-1.0 filesrc location=t.mp4 ! decodebin ! videoconvert ! cvtracker object-initial-x=175 object-initial-y=40 object-initial-width=300 object-initial-height=150 algorithm=1 ! videoconvert ! xvimagesink

Here’s a demo of the pipeline given above: YouTube

Algorithm

The tracker incorporates OpenCV’s long term tracker cv::Tracker.

The available tracking algorithms are:

Boosting         - the Boosting tracker
CSRT             - the CSRT tracker
KCF              - the KCF (Kernelized Correlation Filter) tracker
MedianFlow       - the Median Flow tracker
MIL              - the MIL tracker
MOSSE            - the MOSSE (Minimum Output Sum of Squared Error) tracker
TLD              - the TLD (Tracking, learning and detection) tracker

You might wonder why we missed the GOTURN algorithm. It was skipped due to the added complexity of setting up the models by the user.

Properties

algorithm                   - the tracking algorithm to use
draw-rect                   - to draw a rectangle around the tracked object
object-initial-x            - object’s initial x coordinate
object-initial-x            - object’s initial y coordinate
object-initial-height       - object’s initial height
object-initial-width        - object’s initial width

The element sends out the tracked object’s bounding box’s x, y, width and height coordinates through the pipeline bus and also through the buffer. If you want live tracking during the playback, you could use the draw-rect property.

August 15, 2020 04:27 PM