July 20, 2018

GStreamerGStreamer 1.14.2 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce the second bug fix release in the stable 1.14 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.14.x.

See /releases/1.14/ for the details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Download tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-sharp, gstreamer-vaapi, or gst-omx.

July 20, 2018 03:00 PM

July 18, 2018

Christian SchallerAn update from Fedora Workstation land

(Christian Schaller)

Battery life
I was very happy to see that Fedora Workstation 28 in the Phoronix benchmark got the best power consumption on a Dell XPS 13. Improving battery life has been a priority for us and Hans de Goede has been doing some incredible work so far. And best of all; more is to come :). So if you want great battery life with Linux on your laptop then be sure to be running Fedora on your laptop! On that note and to state the obvious, be aware that Fedora Workstation adoption rates are actually a major metric for us to decide where to put our efforts, so if we see good growth in Fedora due to people enjoying the improved battery life it enables us to keep investing in improving battery life, if we don’t see the growth we will need to conclude people don’t care that much and more our investment elsewhere.

Desktop remoting under Wayland
The team is also making great strides with desktop remoting under Wayland. In Fedora Workstation 29 we will have the VNC based GNOME Shell integrated desktop sharing working under Wayland thanks to the work done by Jonas Ådahl. It relies on PipeWire to share you Wayland session over VNC.
On a similar note Jan Grulich, Tomas Popela and Eike Rathke has been working on enabling Wayland desktop sharing through Firefox and Chromium. They are reporting good progress and actually did a video call between Firefox and Chromium last week, sharing their desktops with each other. This is enables by providing a PipeWire backend for both Firefox and Chromium. They are now working on cleaning up their patches and prepare them for submission upstream. We are also looking at providing a patched Firefox in Fedora Workstation 28 supporting this.

PipeWire
Wim Taymans talked about and demonstrated the latest improvements to PipeWire during GUADEC last week. He now got a libpulse.so drop in replacement that allows applications like Totem and Rhythmbox to play audio through PipeWire using the PulseAudio GStreamer plugin as Pipewire now provides a libpulse.so drop in replacement. Wim also keeps improving the Jack support in PipeWire by testing Jack applications one by one and fixing corner cases as he discovers them or they are reported by the Linux pro-audio community. We also ended up ordering Wim a Sony HT-Z9F soundbar for testing as we want to ensure PipeWire has great support for passthrough, be that SPDIF, HDMI or Bluetooth. The HT-Z9F also supports LDAC audio which is a new high quality audio format for Bluetooth and we want PipeWire to have full support for it.
To accelerate Pipewire development and adoption for audio we also decied to try to organize a PipeWire and Linux Audio hackfest this fall, with the goal of mapping our remaining issues and to try to bring the wider linux audio community together. So I am very happy that Arun Raghavan of PulseAudio fame agreed to be one of the co-organizer of this hackfest. Anyone interested in attending the PipeWire 2018 hackfest either add yourself to the attendee list or contact me (contact information can be found through the hackfest page) and I be happy to add you. The primary goal is to have developers from the PulseAudio and JACK communities attend alongside Wim Taymans and Bastien Nocera so we can make sure we got everything we need on the development roadmap and try to ensure we all pull in the same direction.

GNOME Builder
Christian Hergert did an update during GUADEC this year on GNOME Builder. As usual a ton of interesting stuff happening including new support for developing towards embedded devices like the upcoming Purism phone. Christian in his talk mentioned how Builder is probably the worlds first ‘Container Native IDE’ where it both is being developed with being packaged as a Flatpak in mind, but also developed with the aim of creating Flatpaks as its primary output. So a lot of effort is being put into both making sure it works well being inside a container itself, but also got all the bells and whistles for creating containers from your code. Another worthwhile point to mention is that Builder is also one of the best IDEs for doing Rust development in general!

Game mode in Fedora
Feral Interactive, one of the leading Linux game companies, released a tool they call gamemode for Linux not long ago. Since we want gamers to be first class citizens in Fedora Workstation we ended up going back and forth internally a bit about what to do about it, basically discussing if there was another way to resolve the problem even more seamlessly than gamemode. In the end we concluded that while the ideal solution would be to have the default CPU governor be able to deal with games better, we also realized that the technical challenge games posed to the CPU governor, by having a very uneven workload, is hard to resolve automatically and not something we have the resources currently to take a deep dive into. So in the end we decided that just packaging gamemode was the most reasonable way forward. So the package is lined up for the next batch update in Fedora 28 so you should soon be able to install it and for Fedora Workstation 29 we are looking at including it as part of the default install.

by uraeus at July 18, 2018 03:23 PM

June 26, 2018

Gustavo OrrilloExperiments with Flutter and Dart

Some time ago I heard about the Flutter SDK for cross-platform mobile development. Having written some apps as part of my research that needed to be available for both iOS and Android, and feeling sometimes frustrated by the fact that I could run Processing sketches on iPhones, I’ve been interested in this kind of cross-platform SDKs. In fact, I used Kivy in the past to create an app for prognosis of Ebola patients. I liked that Kivy is based on Python and had a simple UI toolkit. However, it was not clear to me if Kivy would allow to create native UIs on iOS and Android, and implementing the Processing API as a Python library that could be used in Kivy seemed difficult at the time (although, with the new P5 library from Abhik Pal, this may be easier now). More recently, I revisited the prognosis app and created a new version which incorporates a more sophisticated visualization of prognosis predictions, as well as patient care and management recommendations. The Android version is here, and the iOS version here. In this case, I ended up creating separate projects for each version, one with Android Studio and the Android SDK from Google, and the other with XCode and the iOS SDK from Apple (and consequently, the development time doubled).

(To be fair, you can use the Processing iCompiler from Frogg to write and run Processing sketches directly on your phone, but that’s a different use scenario)

In any case, I was also aware of a number of other existing SDK/frameworks for cross-platform mobile develoment: Xamarin, React Native, Cordova, PhoneGap… but never got enough time to explore at least a few of them in some detail. React Native seems very popular nowadays and it is JavaScript-based (there is even a p5-wrapper that one can use to embed p5.js sketches into mobile apps). Coming more from a Java background, and being quite involved in the Processing for Android project, I was looking for something closer to these programming languages/environments.

Flutter is based on a relatively new language called Dart, which according to what I found online, Dart is easy to learn for people with experience in Java. So, I decided to give it a try and see how hard would be to write Processing-like sketches with the Dart language together with the Flutter SDK.

I had to get used to Flutter’s widget and layout system first, but eventually got to understand the basics of it (the online documentation is pretty good) and to access lower-level rendering functions in Flutter. The CustomPainter class was key to achieve this, as it allows to paint on the canvas of a widget pretty much anything you want:

class PPainter extends ChangeNotifier implements CustomPainter {
  @override
  void paint(Canvas canvas, Size size) {
    draw(); 
  }

  void draw() {
    // draw anything you want...
  }
}

Next, I had to figure out how to animate the drawing, and handle touch events. Luckily, Flutter includes several built-in classes to generate UI animations, and I could use them to trigger the drawing in the custom painter on a continuous loop. Finally, some online code gave me enough hints to implement basic input handling:

class PWidget extends StatelessWidget {
  PPainter painter;
  
  @override
  Widget build(BuildContext context) {
    return new Container(
      child: new ClipRect(
          child: new CustomPaint(
            painter: painter,
            child: new GestureDetector(
              onTapDown: (details) {
                painter.onTapDown(context, details);
              },
              onTapUp: (details) {
                painter.onTapUp(context, details);
              },
            ),
          )
      ),
    );
  }              

With these basic elements in place, I was able to start implementing a few functions from the Processing API in order to create a simple proof-of-concept Dart package that encapsulates these functions and can be imported into a Flutter app. In the end, a Dart sketch in Flutter looks surprisingly similar to a Java Processing sketch:

class MySketch extends PPainter {
  var strokes = new List<List<PVector>>();

  void setup() {
    fullScreen();
  }

  void draw() {
    background(color(255, 255, 255));

    noFill();
    strokeWeight(10);
    stroke(color(10, 40, 200, 60));
    for (var stroke in strokes) {
      beginShape();
      for (var p in stroke) {
        vertex(p.x, p.y);
      }
      endShape();
    }
  }

  void mousePressed() {
    strokes.add([new PVector(mouseX, mouseY)]);
  }

  void mouseDragged() {
    var stroke = strokes.last;
    stroke.add(new PVector(mouseX, mouseY));
  }
}

This sketch can be embedded into a minimal UI layout in Flutter, if the purpose is just to show the sketch’s output with no other UI components. This is how the sketch above looks like when running on an iPhone and and Android, side to side:

Running a simple Processing sketch on iOS and Android

So far, the experiment proved to be quite succesful! I published the p5 package on the Dart Pub repository, so it can be easily imported into any Flutter app. Currently, it only includes a hanful of the Processing API functions, so it is not that useful, but an promising starting point nonetheless!

And in the end, I was able to write a Processing sketch and run it on an iPhone and an Android device from Android Studio, which was pretty satisfying:

Using Android Studio to debug on an iPhone

by Andres Colubri at June 26, 2018 11:00 PM

June 22, 2018

Bastien NoceraThomson 8-bit computers, a history

(Bastien Nocera) In March 1986, my dad was in the market for a Thomson TO7/70. I have the circled classified ads in “Téo” issue 1 to prove that :)



TO7/70 with its chiclet keyboard and optical pen, courtesy of MO5.com

The “Plan Informatique pour Tous” was in full swing, and Thomson were supplying schools with micro-computers. My dad, as a primary school teacher, needed to know how to operate those computers, and eventually teach them to kids.

The first thing he showed us when he got the computer, on the living room TV, was a game called “Panic” or “Panique” where you controlled a missile, protecting a town from flying saucers that flew across the screen from either side, faster and faster as the game went on. I still haven't been able to locate this game again.

A couple of years later, the TO7/70 was replaced by a TO9, with a floppy disk, and my dad used that computer to write an educational software about top-down additions, as part of a training program run by the teachers schools (“Écoles Normales” renamed to “IUFM“ in 1990).

After months of nagging, and some spring cleaning, he found the listings of his educational software, which I've liberated, with his permission. I'm currently still working out how to generate floppy disks that are usable directly in emulators. But here's an early screenshot.


Later on, my dad got an IBM PC compatible, an Olivetti PC/1, on which I'd play a clone of Asteroids for hours, but that's another story. The TO9 got passed down to me, and after spending a full summer doing planning for my hot-dog and chips van business (I was 10 or 11, and I had weird hobbies already), and entering every game from the “102 Programmes pour...” series of books, the TO9 got put to the side at Christmas, replaced by a Sega Master System, using that same handy SCART connector on the Thomson monitor.

But how does this concern you. Well, I've worked with RetroManCave on a Minitel episode not too long ago, and he agreed to do a history of the Thomson micro-computers. I did a fair bit of the research and fact-checking, as well as some needed repairs to the (prototype!) hardware I managed to find for the occasion. The result is this first look at the history of Thomson.



Finally, if you fancy diving into the Thomson computers, there will be an episode coming shortly about the MO5E hardware, and some games worth running on it, on the same YouTube channel.

I'm currently working on bringing the “TeoTO8D emulator to Flathub, for Linux users. When that's ready, grab some games from the DCMOTO archival site, and have some fun!

I'll also be posting some nitty gritty details about Thomson repairs on my Micro Repairs Twitter feed for the more technically enclined among you.

by Bastien Nocera (noreply@blogger.com) at June 22, 2018 04:06 PM

June 12, 2018

Bastien NoceraFingerprint reader support, the second coming

(Bastien Nocera) Fingerprint readers are more and more common on Windows laptops, and hardware makers would really like to not have to make a separate SKU without the fingerprint reader just for Linux, if that fingerprint reader is unsupported there.

The original makers of those fingerprint readers just need to send patches to the libfprint Bugzilla, I hear you say, and the problem's solved!

But it turns out it's pretty difficult to write those new drivers, and those patches, without an insight on how the internals of libfprint work, and what all those internal, undocumented APIs mean.

Most of the drivers already present in libfprint are the results of reverse engineering, which means that none of them is a best-of-breed example of a driver, with all the unknown values and magic numbers.

Let's try to fix all this!

Step 1: fail faster

When you're writing a driver, the last thing you want is to have to wait for your compilation to fail. We ported libfprint to meson and shaved off a significant amount of time from a successful compilation. We also reduced the number of places where new drivers need to be declared to be added to the compilation.

Step 2: make it clearer

While doxygen is nice because it requires very little scaffolding to generate API documentation, the output is also not up to the level we expect. We ported the documentation to gtk-doc, which has a more readable page layout, easy support for cross-references, and gives us more control over how introductory paragraphs are laid out. See the before and after for yourselves.

Step 3: fail elsewhere

You created your patch locally, tested it out, and it's ready to go! But you don't know about git-bz, and you ended up attaching a patch file which you uploaded. Except you uploaded the wrong patch. Or the patch with the right name but from the wrong directory. Or you know git-bz but used the wrong commit id and uploaded another unrelated patch. This is all a bit too much.

We migrated our bugs and repository for both libfprint and fprintd to Freedesktop.org's GitLab. Merge Requests are automatically built, discussions are easier to follow!

Step 4: show it to me

Now that we have spiffy documentation, unified bug, patches and sources under one roof, we need to modernise our website. We used GitLab's CI/CD integration to generate our website from sources, including creating API documentation and listing supported devices from git master, to reduce the need to search the sources for that information.

Step 5: simplify

This process has started, but isn't finished yet. We're slowly splitting up the internal API between "internal internal" (what the library uses to work internally) and "internal for drivers" which we eventually hope to document to make writing drivers easier. This is partially done, but will need a lot more work in the coming months.

TL;DR: We migrated libfprint to meson, gtk-doc, GitLab, added a CI, and are writing docs for driver authors, everything's on the website!

by Bastien Nocera (noreply@blogger.com) at June 12, 2018 07:00 PM

GStreamerGStreamer Conference 2018 Announced

(GStreamer)

The GStreamer project is happy to announce that this year's GStreamer Conference will take place on Thursday-Friday 25-26 October 2018 in Edinburgh, Scotland.

You can find more details about the conference on the GStreamer Conference 2018 web site.

A call for papers will be sent out shortly. Registration will open at a later time. We will announce those and any further updates on the gstreamer-announce mailing list, the website, and on Twitter.

Talk slots will be available in varying durations from 20 minutes up to 45 minutes. Whatever you're doing or planning to do with GStreamer, we'd like to hear from you!

We also plan to have sessions with short lightning talks / demos / showcase talks for those who just want to show what they've been working on or do a mini-talk instead of a full-length talk. Lightning talk slots will be allocated on a first-come-first-serve basis, so make sure to reserve your slot if you plan on giving a lightning talk.

There will also be a social event again on Thursday evening.

There are also plans to have a hackfest the weekend right after the conference.

We hope to see you in Edinburgh!

June 12, 2018 12:00 PM

June 07, 2018

Christian Schaller3rd Party Software in Fedora Workstation

(Christian Schaller)

So you have probably noticed by now that we started offering some 3rd party software in the latest Fedora Workstation namely Google Chrome, Steam, NVidia driver and PyCharm. This has come about due to a long discussion in the Fedora community on how we position Fedora Workstation and how we can improve our user experience. The principles we base of this policy you can read up on in this policy document. To sum it up though the idea is that while the Fedora operating system you install will continue as it has been for the last decade to be based on only free software (with an exception for firmware) you will be able to more easily find and install the plethora of applications out there through our software store application, GNOME Software. We also expect that as the world of Linux software moves towards containers in general and Flatpaks specifically we will have an increasing number of these 3rd party applications available in Fedora.

So the question I know some of you will have is, what do one actually have to do in order to get a 3rd party application listed in Fedora Workstation? Well wonder no longer as we put up a few documents now outlining the steps you will need to take. Compared to traditional linux packaging the major difference in the requirements around improved metadata for your application, so we are covering that aspect in special detail. These documents cover both RPMS and Flatpaks.

First of all you can get a general overview from our 3rd Party guidelines document. This document also explains how you submit a request to the Fedora Workstation Working group for your application to be added.

Then if you want to dig into the details of what metadata you need to create for your application there is the in-depth metadata tutorial here and finally once you are ready to set up your repository there is a tutorial explaining how you ensure your repository is able to provide the metadata you created above.

We expect to add more and more applications to Fedora Workstation over time here, and I would especially recommend that you look into Flatpaking your 3rd party application as it will decouple your application from the host operating system and thus decrease the workload on you maintaining your application for use in Fedora Workstation (and elsewhere).

by uraeus at June 07, 2018 01:22 PM

May 29, 2018

Christian SchallerAdding support for the Dell Canvas and Totem

(Christian Schaller)

I am very happy to see that Benjamin Tissoires work to enable the Dell Canvas and Totem has started to land in the upstream kernel. This work is the result of a collaboration between ourselves at Red Hat and Dell to bring this exciting device to Linux users.

Dell Canvas 27

Dell Canvas

The Dell Canvas and totem is essentially a graphics tablet with a stylus and also a turnable knob that can be placed onto the graphics tablet. Dell feature some videos on their site showcasing the Dell Canvas being used in ares such as drawing, video editing and CAD.

So for Linux applications supporting graphic drawing tablets already the canvas should work once this lands, but where we hope to see applications developers step up is adding support in their application for the totem. I have been pondering how we could help make that happen as we would be happy to donate a Dell Canvas to help kickstart application support, I am just unsure about the best way to go ahead.
I was considering offering one as a prize for the first application to add support for the totem, but that seems to be a chicken and egg problem by definition. If anyone got any suggestions for how to get one of these into the hands of the developer most interested and able to take advantage of it?

by uraeus at May 29, 2018 01:28 PM

May 21, 2018

Andy Wingocorrect or inotify: pick one

(Andy Wingo)

Let's say you decide that you'd like to see what some other processes on your system are doing to a subtree of the file system. You don't want to have to change how those processes work -- you just want to see what files those processes create and delete.

One approach would be to just scan the file-system tree periodically, enumerating its contents. But when the file system tree is large and the change rate is low, that's not an optimal thing to do.

Fortunately, Linux provides an API to allow a process to receive notifications on file-system change events, called inotify. So you open up the inotify(7) manual page, and are greeted with this:

With careful programming, an application can use inotify to efficiently monitor and cache the state of a set of filesystem objects. However, robust applications should allow for the fact that bugs in the monitoring logic or races of the kind described below may leave the cache inconsistent with the filesystem state. It is probably wise to do some consistency checking, and rebuild the cache when inconsistencies are detected.

It's not exactly reassuring is it? I mean, "you had one job" and all.

Reading down a bit farther, I thought that with some "careful programming", I could get by. After a day of trying, I am now certain that it is impossible to build a correct recursive directory monitor with inotify, and I am not even sure that "good enough" solutions exist.

pitfall the first: buffer overflow

Fundamentally, inotify races the monitoring process with all other processes on the system. Events are delivered to the monitoring process via a fixed-size buffer that can overflow, and the monitoring process provides no back-pressure on the system's rate of filesystem modifications. With inotify, you have to be ready to lose events.

This I think is probably the easiest limitation to work around. The kernel can let you know when the buffer overflows, and you can tweak the buffer size. Still, it's a first indication that perfect is not possible.

pitfall the second: now you see it, now you don't

This one is the real kicker. Say you get an event that says that a file "frenemies.txt" has been created in the directory "/contacts/". You go to open the file -- but is it still there? By the time you get around to looking for it, it could have been deleted, or renamed, or maybe even created again or replaced! This is a TOCTTOU race, built-in to the inotify API. It is literally impossible to use inotify without this class of error.

The canonical solution to this kind of issue in the kernel is to use file descriptors instead. Instead of or possibly in addition to getting a name with the file change event, you get a descriptor to a (possibly-unlinked) open file, which you would then be responsible for closing. But that's not what inotify does. Oh well!

pitfall the third: race conditions between inotify instances

When you inotify a directory, you get change notifications for just that directory. If you want to get change notifications for subdirectories, you need to open more inotify instances and poll on them all. However now you have N2 problems: as poll and the like return an unordered set of readable file descriptors, each with their own ordering, you no longer have access to a linear order in which changes occurred.

It is impossible to build a recursive directory watcher that definitively says "ok, first /contacts/frenemies.txt was created, then /contacts was renamed to /peeps, ..." because you have no ordering between the different watches. You don't know that there was ever even a time that /contacts/frenemies.txt was an accessible file name; it could have been only ever openable as /peeps/frenemies.txt.

Of course, this is the most basic ordering problem. If you are building a monitoring tool that actually wants to open files -- good luck bubster! It literally cannot be correct. (It might work well enough, of course.)

reflections

As far as I am aware, inotify came out to address the needs of desktop search tools like the belated Beagle (11/10 good pupper just trying to get his pup on). Especially in the days of spinning metal, grovelling over the whole hard-drive was a real non-starter, especially if the search database should to be up-to-date.

But after looking into inotify, I start to see why someone at Google said that desktop search was in some ways harder than web search -- I mean we all struggle to find files on our own machines, even now, 15 years after the whole dnotify/inotify thing started. Part of it is that the given the choice between supporting reliable, fool-proof file system indexes on the one hand, and overclocking the IOPS benchmarks on the other, the kernel gave us inotify. I understand it, but inotify still sucks.

I dunno about you all but whenever I've had to document such an egregious uncorrectable failure mode as any of the ones in the inotify manual, I have rewritten the software instead. In that spirit, I hope that some day we shall send inotify to the pet cemetery, to rest in peace beside Beagle.

by Andy Wingo at May 21, 2018 02:29 PM

May 17, 2018

GStreamerGStreamer 1.14.1 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce the first bug fix release in the stable 1.14 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.14.x.

See /releases/1.14/ for the details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Download tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-vaapi, or gst-omx.

May 17, 2018 05:00 PM

May 16, 2018

Andy Wingolightweight concurrency in lua

(Andy Wingo)

Hello, all! Today I'd like to share some work I have done recently as part of the Snabb user-space networking toolkit. Snabb is mainly about high-performance packet processing, but it also needs to communicate with management-oriented parts of network infrastructure. These communication needs are performed by a dedicated manager process, but that process has many things to do, and can't afford to make blocking operations.

Snabb is written in Lua, which doesn't have built-in facilities for concurrency. What we'd like is to have fibers. Fortunately, Lua's coroutines are powerful enough to implement fibers. Let's do that!

fibers in lua

First we need a scheduling facility. Here's the smallest possible scheduler: simply a queue of tasks and a function to run those tasks.

local task_queue = {}

function schedule_task(thunk)
   table.insert(task_queue, thunk)
end

function run_tasks()
   local queue = task_queue
   task_queue = {}
   for _,thunk in ipairs(queue) do thunk() end
end

For our purposes, a task is just a function that will be called with no arguments.

Now let's build fibers. This is easier than you might think!

local current_fiber = false

function spawn_fiber(fn)
   local fiber = coroutine.create(fn)
   schedule_task(function () resume_fiber(fiber) end)
end

function resume_fiber(fiber, ...)
   current_fiber = fiber
   local ok, err = coroutine.resume(fiber, ...)
   current_fiber = nil
   if not ok then
      print('Error while running fiber: '..tostring(err))
   end
end

function suspend_current_fiber(block, ...)
   -- The block function should arrange to reschedule
   -- the fiber when it becomes runnable.
   block(current_fiber, ...)
   return coroutine.yield()
end

Here, a fiber is simply a coroutine underneath. Suspending a fiber suspends the coroutine. Resuming a fiber runs the coroutine. If you're unfamiliar with coroutines, or coroutines in Lua, maybe have a look at the lua-users wiki page on the topic.

The difference between a fibers facility and just coroutines is that with fibers, you have a scheduler as well. Very much like Scheme's call-with-prompt, coroutines are one of those powerful language building blocks that should rarely be used directly; concurrent programming needs more structure than what Lua offers.

If you're following along, it's probably worth it here to think how you would implement yield based on these functions. A yield implementation should yield control to the scheduler, and resume the fiber on the next scheduler turn. The answer is here.

communication

Once you have fibers and a scheduler, you have concurrency, which means that if you're not careful, you have a mess. Here I think the Go language got the essence of the idea exactly right: Do not communicate by sharing memory; instead, share memory by communicating.

Even though Lua doesn't support multiple machine threads running concurrently, concurrency between fibers can still be fraught with bugs. Tony Hoare's Communicating Sequential Processes showed that we can avoid a class of these bugs by treating communication as a first-class concept.

Happily, the Concurrent ML project showed that it's possible to build these first-class communication facilities as a library, provided the language you are working in has threads of some kind, and fibers are enough. Last year I built a Concurrent ML library for Guile Scheme, and when in Snabb we had a similar need, I ported that code over to Lua. As it's a new take on the problem in a different language, I think I've been able to simplify things even more.

So let's take a crack at implementing Concurrent ML in Lua. In CML, the fundamental primitive for communication is the operation. An operation represents the potential for communication. For example, if you have a channel, it would have methods to return "get operations" and "put operations" on that channel. Actually receiving or sending a message on a channel occurs by performing those operations. One operation can be performed many times, or not at all.

Compared to a system like Go, for example, there are two main advantages of CML. The first is that CML allows non-deterministic choice between a number of potential operations in a generic way. For example, you can construct a operation that, when performed, will either get on one channel or wait for a condition variable to be signalled, whichever comes first. In Go, you can only select between operations on channels.

The other interesting part of CML is that operations are built from a uniform protocol, and so users can implement new kinds of operations. Compare again to Go where all you have are channels, and nothing else.

The CML operation protocol consists three related functions: try which attempts to directly complete an operation in a non-blocking way; block, which is called after a fiber has suspended, and which arranges to resume the fiber when the operation completes; and wrap, which is called on the result of a successfully performed operation.

In Lua, we can call this an implementation of an operation, and create it like this:

function new_op_impl(try, block, wrap)
   return { try=try, block=block, wrap=wrap }
end

Now let's go ahead and write the guts of CML: the operation implementation. We'll represent an operation as a Lua object with two methods. The perform method will attempt to perform the operation, and return the resulting value. If the operation can complete immediately, the call to perform will return directly. Otherwise, perform will suspend the current fiber and arrange to continue only when the operation completes.

The wrap method "decorates" an operation, returning a new operation that, if and when it completes, will "wrap" the result of the completed operation with a function, by applying the function to the result. It's useful to distinguish the sub-operations of a non-deterministic choice from each other.

Here our new_op function will take an array of operation implementations and return an operation that, when performed, will synchronize on the first available operation. As you can see, it already has the equivalent of Go's select built in.

function new_op(impls)
   local op = { impls=impls }
   
   function op.perform()
      for _,impl in ipairs(impls) do
         local success, val = impl.try()
         if success then return impl.wrap(val) end
      end
      local function block(fiber)
         local suspension = new_suspension(fiber)
         for _,impl in ipairs(impls) do
            impl.block(suspension, impl.wrap)
         end
      end
      local wrap, val = suspend_current_fiber(block)
      return wrap(val)
   end

   function op.wrap(f)
      local wrapped = {}
      for _, impl in ipairs(impls) do
         local function wrap(val)
            return f(impl.wrap(val))
         end
         local impl = new_op_impl(impl.try, impl.block, wrap)
         table.insert(wrapped, impl)
      end
      return new_op(wrapped)
   end

   return op
end

There's only one thing missing there, which is new_suspension. When you go to suspend a fiber because none of the operations that it's trying to do can complete directly (i.e. all of the try functions of its impls returned false), at that point the corresponding block functions will publish the fact that the fiber is waiting. However the fiber only waits until the first operation is ready; subsequent operations becoming ready should be ignored. The suspension is the object that manages this state.

function new_suspension(fiber)
   local waiting = true
   local suspension = {}
   function suspension.waiting() return waiting end
   function suspension.complete(wrap, val)
      assert(waiting)
      waiting = false
      local function resume()
         resume_fiber(fiber, wrap, val)
      end
      schedule_task(resume)
   end
   return suspension
end

As you can see, the suspension's complete method is also the bit that actually arranges to resume a suspended fiber.

Finally, just to round out the implementation, here's a function implementing non-deterministic choice from among a number of sub-operations:

function choice(...)
   local impls = {}
   for _, op in ipairs({...}) do
      for _, impl in ipairs(op.impls) do
         table.insert(impls, impl)
      end
   end
   return new_op(impls)
end

on cml

OK, I'm sure this seems a bit abstract at this point. Let's implement something concrete in terms of these primitives: channels.

Channels expose two similar but different kinds of operations: put operations, which try to send a value, and get operations, which try to receive a value. If there's a sender already waiting to send when we go to perform a get_op, the operation continues directly, and we resume the sender; otherwise the receiver publishes its suspension to a queue. The put_op case is similar.

Finally we add some synchronous put and get convenience methods, in terms of their corresponding CML operations.

function new_channel()
   local ch = {}
   -- Queues of suspended fibers waiting to get or put values
   -- via this channel.
   local getq, putq = {}, {}

   local function default_wrap(val) return val end
   local function is_empty(q) return #q == 0 end
   local function peek_front(q) return q[1] end
   local function pop_front(q) return table.remove(q, 1) end
   local function push_back(q, x) q[#q+1] = x end

   -- Since a suspension could complete in multiple ways
   -- because of non-deterministic choice, it could be that
   -- suspensions on a channel's putq or getq are already
   -- completed.  This helper removes already-completed
   -- suspensions.
   local function remove_stale_entries(q)
      local i = 1
      while i <= #q do
         if q[i].suspension.waiting() then
            i = i + 1
         else
            table.remove(q, i)
         end
      end
   end

   -- Make an operation that if and when it completes will
   -- rendezvous with a receiver fiber to send VAL over the
   -- channel.  Result of performing operation is nil.
   function ch.put_op(val)
      local function try()
         remove_stale_entries(getq)
         if is_empty(getq) then
            return false, nil
         else
            local remote = pop_front(getq)
            remote.suspension.complete(remote.wrap, val)
            return true, nil
         end
      end
      local function block(suspension, wrap)
         remove_stale_entries(putq)
         push_back(putq, {suspension=suspension, wrap=wrap, val=val})
      end
      return new_op({new_op_impl(try, block, default_wrap)})
   end

   -- Make an operation that if and when it completes will
   -- rendezvous with a sender fiber to receive one value from
   -- the channel.  Result is the value received.
   function ch.get_op()
      local function try()
         remove_stale_entries(putq)
         if is_empty(putq) then
            return false, nil
         else
            local remote = pop_front(putq)
            remote.suspension.complete(remote.wrap)
            return true, remote.val
         end
      end
      local function block(suspension, wrap)
         remove_stale_entries(getq)
         push_back(getq, {suspension=suspension, wrap=wrap})
      end
      return new_op({new_op_impl(try, block, default_wrap)})
   end

   function ch.put(val) return ch.put_op(val).perform() end
   function ch.get()    return ch.get_op().perform()    end

   return ch
end

a wee example

You might be wondering what it's like to program with channels in Lua, so here's a little example that shows a prime sieve based on channels. It's not a great example of concurrency in that it's not an inherently concurrent problem, but it's cute to show computations in terms of infinite streams.

function prime_sieve(count)
   local function sieve(p, rx)
      local tx = new_channel()
      spawn_fiber(function ()
         while true do
            local n = rx.get()
            if n % p ~= 0 then tx.put(n) end
         end
      end)
      return tx
   end

   local function integers_from(n)
      local tx = new_channel()
      spawn_fiber(function ()
         while true do
            tx.put(n)
            n = n + 1
         end
      end)
      return tx
   end

   local function primes()
      local tx = new_channel()
      spawn_fiber(function ()
         local rx = integers_from(2)
         while true do
            local p = rx.get()
            tx.put(p)
            rx = sieve(p, rx)
         end
      end)
      return tx
   end

   local done = false
   spawn_fiber(function()
      local rx = primes()
      for i=1,count do print(rx.get()) end
      done = true
   end)

   while not done do run_tasks() end
end

Here you also see an example of running the scheduler in the last line.

where next?

Let's put this into perspective: in a couple hundred lines of code, we've gone from minimal Lua to a language with lightweight multitasking, extensible CML-based operations, and CSP-style channels; truly a delight.

There are a number of possible ways to extend this code. One of them is to implement true multithreading, if the language you are working in supports that. In that case there are some small protocol modifications to take into account; see the notes on the Guile CML implementation and especially the Manticore Parallel CML project.

The implementation above is pleasantly small, but it could be faster with the choice of more specialized data structures. I think interested readers probably see a number of opportunities there.

In a library, you might want to avoid the global task_queue and implement nested or multiple independent schedulers, and of course in a parallel situation you'll want core-local schedulers as well.

The implementation above has no notion of time. What we did in the Snabb implementation of fibers was to implement a timer wheel, inspired by Juho Snellman's Ratas, and then add that timer wheel as a task source to Snabb's scheduler. In Snabb, every time the equivalent of run_tasks() is called, a scheduler asks its sources to schedule additional tasks. The timer wheel implementation schedules expired timers. It's straightforward to build CML timeout operations in terms of timers.

Additionally, your system probably has other external sources of communication, such as sockets. The trick to integrating sockets into fibers is to suspend the current fiber whenever an operation on a file descriptor would block, and arrange to resume it when the operation can proceed. Here's the implementation in Snabb.

The only difficult bit with getting nice nonblocking socket support is that you need to be able to suspend the calling thread when you see the EWOULDBLOCK condition, and for coroutines that is often only possible if you implemented the buffered I/O yourself. In Snabb that's what we did: we implemented a compatible replacement for Lua's built-in streams, in Lua. That lets us handle EWOULDBLOCK conditions in a flexible manner. Integrating epoll as a task source also lets us sleep when there are no runnable tasks.

Likewise in the Snabb context, we are also working on a TCP implementation. In that case you want to structure TCP endpoints as fibers, and arrange to suspend and resume them as appropriate, while also allowing timeouts. I think the scheduler and CML patterns are going to allow us to do that without much trouble. (Of course, the TCP implementation will give us lots of trouble!)

Additionally your system might want to communicate with fibers from other threads. It's entirely possible to implement CML on top of pthreads, and it's entirely possible as well to support communication between pthreads and fibers. If this is interesting to you, see Guile's implementation.

When I talked about fibers in an earlier article, I built them in terms of delimited continuations. Delimited continuations are fun and more expressive than coroutines, but it turns out that for fibers, all you need is the expressive power of coroutines -- multi-shot continuations aren't useful. Also I think the presentation might be more straightforward. So if all your language has is coroutines, that's still good enough.

There are many more kinds of standard CML operations; implementing those is also another next step. In particular, I have found semaphores and condition variables to be quite useful. Also, standard CML supports "guards", invoked when an operation is performed, and "nacks", invoked when an operation is definitively not performed because a choice selected some other operation. These can be layered on top; see the Parallel CML paper for notes on "primitive CML".

Also, the choice operator above is left-biased: it will prefer earlier impls over later ones. You might want to not always start with the first impl in the list.

The scheduler shown above is the simplest thing I could come up with. You may want to experiment with other scheduling algorithms, e.g. capability-based scheduling, or kill-safe abstractions. Do it!

Or, it could be you already have a scheduler, like some kind of main loop that's already there. Cool, you can use it directly -- all that fibers needs is some way to schedule functions to run.

godspeed

In summary, I think Concurrent ML should be better-known. Its simplicity and expressivity make it a valuable part of any concurrent system. Already in Snabb it helped us solve some longstanding gnarly issues by making the right solutions expressible.

As Adam Solove says, Concurrent ML is great, but it has a branding problem. Its ideas haven't penetrated the industrial concurrent programming world to the extent that they should. This article is another attempt to try to get the word out. Thanks to Adam for the observation that CML is really a protocol; I'm sure the concepts could be made even more clear, but at least this is a step forward.

All the code in this article is up on a gitlab snippet along with instructions for running the example program from the command line. Give it a go, and happy hacking with CML!

by Andy Wingo at May 16, 2018 03:17 PM

May 07, 2018

Gustavo OrrilloLassa fever in Nigeria: lessons learnt

Back in March of this year we reached an important milestone in the collaboration between the Sabeti Lab and the Irrua Specialist Teaching Hospital (ISTH) in Edo State, Nigeria: we published a joint paper on the journal The Lancet Infectious Diseases describing the largest and most detailed retrospective cohort study of Lassa fever patients in Nigeria, and identifying the clinical and laboratory predictors of outcome observed at ISTH for this deadly disease. This is a culmination of several years of work, starting in 2013.

Lassa virus At the entrance of the Lassa ward at ISTH with Christopher Iruolagbe, one of the clinicians.

First of all, I will give a brief introduction to Lassa fever. This is a member of a family of diseases called Viral Hemorrhagic Fevers (VHFs), which includes Ebola, dengue, and yellow fever among many others. These illnesses are caused by different viruses, but all have fever and hemorrhage as clinical manifestations. The 2014-2016 Ebola outbreak caused widespread concern due to its high mortality and fear for a worldwide pandemic, although its spread was largely contained to the African nations of Liberia, Sierra Leone, and Guinea. The outbreak left a large impact in the region, with almost 30,000 total cases and over 15,000 deaths. In contrast, Lassa fever is an endemic disease, with cases occurring throughout the year in West Africa, and presenting a wide range of clinical severity: most people don’t get sick enough to go to the hospital, but a small percentage become acutely ill and need medical attention. It is estimated that 300,000 people get infected with the Lassa virus every year, but less than 5% of those end up going to the hospital. The overall mortality among hospitalized cases is around 20%, lower than Ebola, but it can be much higher than that for elderly patients and pregnant women. Another difference with Ebola is the natural host of the virus: in the case of Lassa fever, it is the mastomys rat, which enters into people’s homes looking for food and spreads the virus through feces and urine, while for Ebola it is likely to be fruit bats instead. Even though the pathogens causing these diseases were discovered only in the last 50 years (Lassa fever in 1969, Ebola in 1976), genetic studies from our lab indicate that Lassa fever is an old disease, with the virus spreading out of Nigeria at least 400 years ago. A similar picture appears for Ebola, leading to the question of whether we are in the presence of emerging diseases or diagnoses.

Lassa virus TEM micrograph of Lassa virus virions.

Back in 2013, I just began developing the visualization tool Mirador at Fathom Information Design, and was looking for “real world” datasets to apply Mirador to. Around that time, Dr Peter Okokhere, the head of the Lassa fever ward at ISTH, was visiting the Sabeti Lab, and had compiled the records of all the patients treated in the ward since 2011 into an anonymized Excel spreadsheet. Mirador was designed precisely to handle that kind of tabular data, so it seemed to us that it should be straightforward to use Mirador to carry out exploratory analysis of Dr Okokhere’s dataset. In particular, we were interested in finding correlations between the different demographic, clinical, and laboratory variables collected for all patients and their outcome (death or survival). A subsequent step was to apply Machine Learning in order to train prognostic models that could eventually be helpful for clinicians, for example to triage patients upon admission to the ward depending on their death risk (so that time and material resources would be prioritized for high-risk patients). The models’ predictions could also identify the clinical features most strongly the risk, and hence inform medical judgment.

Lassa virus Mirador displaying ISTH data.

However, the work on the ISTH clinical data had to be put on hold as the lab shifted its focus and resources to the Ebola outbreak during the next two years. Some of the initial modeling approaches we were developing for the ISTH dataset were applied to similar datasets from Ebola patients, and this work led to some publications (here and here), where we investigated the possibility of deploying this prognostic models to the clinic in the form of medical apps for patient triage, care, and management. By late 2015, the Ebola outbreak was beginning to subside, and we were able to come back to our research on Lassa. In the meantime, Dr Okokhere had incorporated patients treated in 2014 and 2015, which enlarged the dataset to nearly 300 patients. At that time, I started to realize that I was originally too enthusiastic about the applicability of prognostic models to inform clinical decisions. As I learned while after delving deeper into the topic, such models need to be trained on much larger cohorts and then validated on independently-obtained datasets, so that they can be generalized to new patients. Widely used prognostic scores such as APACHE II or the Framingham Risk Score took thousands of patients and many years to be developed. As unique and detailed as the ISTH dataset is, it cannot support the creation of a Lassa fever prognostic score yet. Thanks to Dr Okokhere’s efforts to digitalize the paper medical records of the patients treated in the ward, we were able to have some data, but unfortunately there are no similar datasets available for analysis and validation. Only one other study, published in 1987, included more than 300 confirmed cases of Lassa fever and detailed demographic, clinical, and laboratory data from patients in Sierra Leone between 1977 and 1979.

Regardless of the current limitations, the ISTH dataset allowed us to provide a significant update on the clinical knowledge of Lassa fever in Nigeria, and to find the most important manifestations of the disease among the patients treated at ISTH.

CFR map Map of cases treated at ISTH between 2011 and 2015, clustered by geographical location and shaded by case fatality rate.

The predictive models also proved to be very useful, not as prognostic scores, but to test the independence between various biomarkers that characterize the pathophysiology of the disease. This, in addition other complementary results and previous experience from the clinicians at ISTH, led us to make hypothesis of medical relevance (e.g.: Lassa virus may damage the kidney cells in some of the patients), which we need to explore further.

Model performance Predictive performance of the logistic regression model trained on the ISTH data. The horizontal axis shows the mortality risk threshold used to predict death vs survival, and the right vertical axis the corresponding sensitivity and specificity of the model. The bars indicate the actual number of patients and mortality within each risk bin.

However, one of the requirements to move forward with this research and also improve patient care is to build better on-site capacity for data collection and management. This not only impacts the handling of the patients’ medical records, but also the laboratory samples that are used for diagnosis, clinical decision, and research. For this reason, we have been designing mobile apps for collecting clinical and laboratory data, using Dimagi’s CommCare platform. This platform has enabled us to translate paper-based protocols for data collection into digital forms that can work with limited internet connectivity, store the records in a HIPPA-compliant database, and generate real-time reports. These apps are currently undergoing the final stages of debug and testing, and will soon be deployed at ISTH’s Lassa ward and research lab for daily use.

CommCare clinical app CommCare clinical app for patient management.

As part of the testing process, I had the chance to visit ISTH recently, together with my lab colleague, Dr Kayla Barnes. We worked with the clinicians and lab staff, going through all the data entry modules in the apps and making sure that they function as expected and fit the workflow patient and sample management protocols. Being my first time in Nigeria and in Africa in general, this was an exceptional experience for me, and felt incredibly welcomed by all people we met during the visit. We even got tailor-made traditional Nigerian dresses we wore the last day of our stay at ISTH, as you can see in some of the pictures below!

ISTH's main entrance Lassa ward building Staff preparing a Ribavirin shot for a patient Working at the Lassa research lab Discussing the data collection protocols Testing the CommCare lab app Wearing traditional Nigeran dresses. Right to left: Eghosa Uyige, Kayla Barnes, Ikponmwosa Odia, John Aiyepada, and myself

Some pictures from our trip to ISTH in March 2018.

As much as we enjoyed a very successful visit and the hospitality of our Nigerian hosts, Nigeria and ISTH in particular were still recovering of the largest recorded Lassa fever outbreak in history: more than 200 confirmed cases were received and tested at ISTH between January and February of this year, which is more than all the cases from 2016 and 2017 combined. This outbreak was covered by several news sources, and it put a lot of strain on the Lassa ward personnel as they run out of beds and had to resort to temporary spaces to accommodate all patients. Fortunately, by the time we arrived the situation was gradually coming back to normal, as reported by the Nigerian Center for Disease Control. However, this recent event is a reminder of the threat posed by Lassa fever and other emerging infectious diseases, and how we need efficient detection and containment mechanisms to stop outbreaks and avoid loss of human life.

Research continues at ISTH and we hope to find insights about the recent outbreak from the viral sequences generated in Nigeria, thanks to the capacity building efforts from the African Center of Excellence for Genomics of Infectious Diseases, which our lab is part of together with several African and international partners. Among further next steps, we plan to carry out analyses of the combined the genomic and clinical data that could shed light on the genetic origin for the variability in the clinical manifestation of Lassa fever, and to develop new privacy-preserving Machine Learning algorithms that would allow us to respond to an outbreak faster by sharing data and models in real-time while protecting patient’s privacy.

by Andres Colubri at May 07, 2018 05:00 PM

April 24, 2018

Christian SchallerWarming up for Fedora Workstation 28

(Christian Schaller)

Been some time now since my last update on what is happening in Fedora Workstation and with current plans to release Fedora Workstation 28 in early May I thought this could be a good time to write something. As usual this is just a small subset of what the team has been doing and I always end up feeling a bit bad for not talking about the avalanche of general fixes and improvements the team adds to each release.

Thunderbolt
Christian Kellner has done a tremendous job keeping everyone informed of his work making sure we have proper Thunderbolt support in Fedora Workstation 28. One important aspect for us of this improved Thunderbolt support is that a lot of docking stations coming out will be requiring it and thus without this work being done you would not be able to use a wide range of docking stations. For a lot of screenshots and more details about how the thunderbolt support is done I recommend reading this article in Christians Blog.

3rd party applications
It has taken us quite some time to get there as getting this feature right both included a lot of internal discussion about policies around it and implementation detail. But starting from Fedora Workstation 28 you will be able to find more 3rd party software listed in GNOME Software if you enable it. The way it will work is that you as part of the initial setup will be asked if you want to have 3rd party software show up in GNOME Software. If you are upgrading you will be asked inside GNOME Software if you want to enable 3rd party software. You can also disable 3rd party software after enabling it from the GNOME Software settings as seen below:

GNOME Software settings

GNOME Software settings

In Fedora Workstation 27 we did have PyCharm available, but we have now added the NVidia driver and Steam to the list for Fedora Workstation 28.

We have also been working with Google to try to get Chrome included here and we are almost there as they merged for instance the needed Appstream metadata some time ago, but the last steps requires some tweaking of how Google generates their package repository (basically adding the appstream metadata to their yum repository) and we don’t have a clear timeline for when that will happen, but as soon as it does the Chrome will also appear in GNOME Software if you have 3rd party software enabled.

As we speak all 3rd party packages are RPMs, but we expect that going forward we will be adding applications packaged as Flatpaks too.

Finally if you want to propose 3rd party applications for inclusion you can find some instructions for how to do it here.

Virtualbox guest
Another major feature that got some attention that we worked on for this release was Hans de Goedes work to ensure Fedora Workstation could run as a virtualbox guest out of the box. We know there are many people who have their first experience with linux running it under Virtualbox on Windows or MacOSX and we wanted to make their first experience as good as possible. Hans worked with the virtualbox team to clean up their kernel drivers and agree on a stable ABI so that they could be merged into the kernel and maintained there from now on.

Firmware updates
The Spectre/Meltdown situation did hammer home to a lot of people the need to have firmware updates easily available and easy to update. We created the Linux Vendor Firmware service for Fedora Workstation users with that in mind and it was great to see the service paying off for many Linux users, not only on Fedora, but also on other distributions who started using the service we provided. I would like to call out to Dell who was a critical partner for the Linux Vendor Firmware effort from day 1 and thus their users got the most benefit from it when Spectre and Meltdown hit. Spectre and Meltdown also helped get a lot of other vendors off the fence or to accelerate their efforts to support LVFS and Richard Hughes and Peter Jones have been working closely with a lot of new vendors during this cycle to get support for their hardware and devices into LVFS. In fact Peter even flew down to the offices one of the biggest laptop vendors recently to help them resolve the last issues before their hardware will start showing up in the firmware service. Thanks to the work of Richard Hughes and Peter Jones you will both see a wider range of brands supported in the Linux Vendor Firmware Service in Fedora Workstation 28, but also a wider range of device classes.

Server side GL Vendor Neutral Dispatch
This is a bit of a technical detail, but Adam Jackson and Lyude Paul has been working hard this cycle on getting what we call Server side GLVND ready for Fedora Workstation 28. Currently we are looking at enabling it either as a zero-day update or short afterwards. so what is Server Side GLVND you say? Well it is basically the last missing piece we need to enable the use of the NVidia binary driver through XWayland. Currently the NVidia driver works with Wayland native OpenGL applications, but if you are trying to run an OpenGL application requiring X we need this to support it. And to be clear once we ship this in Fedora Workstation 28 it will also require a driver update from NVidia to use it, so us shipping it is just step 1 here. We do also expect there to be some need for further tuning once we got all the pieces released to get top notch performance. Of course over time we hope and expect all applications to become Wayland native, but this is a crucial transition technology for many of our users. Of course if you are using Intel or AMD graphics with the Mesa drivers things already work great and this change will not affect you in any way.

Flatpak
Flatpaks basically already work, but we have kept focus this time around on to fleshing out the story in terms of the so called Portals. Portals are essentially how applications are meant to be able to interact with things outside of the container on your desktop. Jan Grulich has put in a lot of great effort making sure we get portal support for Qt and KDE applications, most recently by adding support for the screen capture portal on top of Pipewire. You can ready more about that on Jan Grulichs blog. He is now focusing on getting the printing portal working with Qt.

Wim Taymans has also kept going full steam ahead of PipeWire, which is critical for us to enable applications dealing with cameras and similar on your system to be containerized. More details on that in my previous blog entry talking specifically about Pipewire.

It is also worth noting that we are working with Canonical engineers to ensure Portals also works with Snappy as we want to ensure that developers have a single set of APIs to target in order to allow their applications to be sandboxed on Linux. Alexander Larsson has already reviewed quite a bit of code from the Snappy developers to that effect.

Performance work
Our engineers have spent significant time looking at various performance and memory improvements since the last release. The main credit for the recently talked about ‘memory leak’ goes to Georges Basile Stavracas Neto from Endless, but many from our engineering team helped with diagnosing that and also fixed many other smaller issues along the way. More details about the ‘memory leak’ fix in Georges blog.

We are not done here though and Alberto Ruiz is organizing a big performance focused hackfest in Cambridge, England in May. We hope to bring together many of our core engineers to work with other members of the community to look at possible improvements. The Raspberry Pi will be the main target, but of course most improvements we do to make GNOME Shell run better on a Raspberry Pi also means improvements for normal x86 systems too.

Laptop Battery life
In our efforts to make Linux even better on laptops Hans De Goede spent a lot of time figuring out things we could do to make Fedora Workstation 28 have better battery life. How valuable these changes are will of course depend on your exact hardware, but I expect more or less everyone to have a bit better battery life on Fedora Workstation 28 and for some it could be a lot better battery life. You can read a bit more about these changes in Hans de Goedes blog.

by uraeus at April 24, 2018 05:15 PM

April 23, 2018

Sebastian DrögeGLib/GIO async operations and Rust futures + async/await

(Sebastian Dröge)

Unfortunately I was not able to attend the Rust+GNOME hackfest in Madrid last week, but I could at least spend some of my work time at Centricular on implementing one of the things I wanted to work on during the hackfest. The other one, more closely related to the gnome-class work, will be the topic of a future blog post once I actually have something to show.

So back to the topic. With the latest GIT version of the Rust bindings for GLib, GTK, etc it is now possible to make use of the Rust futures infrastructure for GIO async operations and various other functions. This should make writing of GNOME, and in general GLib-using, applications in Rust quite a bit more convenient.

For the impatient, the summary is that you can use Rust futures with GLib and GIO now, that it works both on the stable and nightly version of the compiler, and with the nightly version of the compiler it is also possible to use async/await. An example with the latter can be found here, and an example just using futures without async/await here.

Table of Contents

  1. Futures
    1. Futures in Rust
    2. Async/Await
    3. Tokio
  2. Futures & GLib/GIO
    1. Callbacks
    2. GLib Futures
    3. GIO Asynchronous Operations
    4. Async/Await
  3. The Future

Futures

First of all, what are futures and how do they work in Rust. In a few words, a future (also called promise elsewhere) is a value that represents the result of an asynchronous operation, e.g. establishing a TCP connection. The operation itself (usually) runs in the background, and only once the operation is finished (or fails), the future resolves to the result of that operation. There are all kinds of ways to combine futures, e.g. to execute some other (potentially async) code with the result once the first operation has finished.

It’s a concept that is also widely used in various other programming languages (e.g. C#, JavaScript, Python, …) for asynchronous programming and can probably be considered a proven concept at this point.

Futures in Rust

In Rust, a future is basically an implementation of relatively simple trait called Future. The following is the definition as of now, but there are discussions to change/simplify/generalize it currently and to also move it to the Rust standard library:

pub trait Future {
    type Item;
    type Error;

    fn poll(&mut self, cx: &mut task::Context) -> Poll<Self::Item, Self::Error>;
}

Anything that implements this trait can be considered an asynchronous operation that resolves to either an Item or an Error. Consumers of the future would call the poll method to check if the future has resolved already (to a result or error), or if the future is not ready yet. In case of the latter, the future itself would at a later point, once it is ready to proceed, notify the consumer about that. It would get a way for notifications from the Context that is passed, and proceeding does not necessarily mean that the future will resolve after this but it could just advance its internal state closer to the final resolution.

Calling poll manually is kind of inconvenient, so generally this is handled by an Executor on which the futures are scheduled and which is running them until their resolution. Equally, it’s inconvenient to have to implement that trait directly so for most common operations there are combinators that can be used on futures to build new futures, usually via closures in one way or another. For example the following would run the passed closure with the successful result of the future, and then have it return another future (Ok(()) is converted via IntoFuture to the future that always resolves successfully with ()), and also maps any errors to ()

fn our_future() -> impl Future<Item = (), Err = ()> {
    some_future
        .and_then(|res| {
            do_something(res);
            Ok(())
        })
        .map_err(|_| ())
}

A future represents only a single value, but there is also a trait for something producing multiple values: a Stream. For more details, best to check the documentation.

Async/Await

The above way of combining futures via combinators and closures is still not too great, and is still close to callback hell. In other languages (e.g. C#, JavaScript, Python, …) this was solved by introducing new features to the language: async for declaring futures with normal code flow, and await for suspending execution transparently and resuming at that point in the code with the result of a future.

Of course this was also implemented in Rust. Currently based on procedural macros, but there are discussions to actually move this also directly into the language and standard library.

The above example would look something like the following with the current version of the macros

#[async]
fn our_future() -> Result<(), ()> {
    let res = await!(some_future)
        .map_err(|_| ())?;

    do_something(res);
    Ok(())
}

This looks almost like normal, synchronous code but is internally converted into a future and completely asynchronous.

Unfortunately this is currently only available on the nightly version of Rust until various bits and pieces get stabilized.

Tokio

Most of the time when people talk about futures in Rust, they implicitly also mean Tokio. Tokio is a pure Rust, cross-platform asynchronous IO library and based on the futures abstraction above. It provides a futures executor and various types for asynchronous IO, e.g. sockets and socket streams.

But while Tokio is a great library, we’re not going to use it here and instead implement a futures executor around GLib. And on top of that implement various futures, also around GLib’s sister library GIO, which is providing lots of API for synchronous and asynchronous IO.

Just like all IO operations in Tokio, all GLib/GIO asynchronous operations are dependent on running with their respective event loop (i.e. the futures executor) and while it’s possible to use both in the same process, each operation has to be scheduled on the correct one.

Futures & GLib/GIO

Asynchronous operations and generally everything event related (timeouts, …) are based on callbacks that you have to register, and are running via a GMainLoop that is executing events from a GMainContext. The latter is just something that stores everything that is scheduled and provides API for polling if something is ready to be executed now, while the former does exactly that: executing.

Callbacks

The callback based API is also available via the Rust bindings, and would for example look as follows

glib::timeout_add(20, || {
    do_something_after_20ms();
    glib::Continue(false) // don't call again
});

glib::idle_add(|| {
    do_something_from_the_main_loop();
    glib::Continue(false) // don't call again
});

some_async_operation(|res| {
    match res {
        Err(err) => report_error_somehow(),
        Ok(res) => {
            do_something_with_result(res);
            some_other_async_operation(|res| {
                do_something_with_other_result(res);
            });
        }
    }
});

As can be seen here already, the callback-based approach leads to quite non-linear code and deep indentation due to all the closures. Also error handling becomes quite tricky due to somehow having handle them from a completely different call stack.

Compared to C this is still far more convenient due to actually having closures that can capture their environment, but we can definitely do better in Rust.

The above code also assumes that somewhere a main loop is running on the default main context, which could be achieved with the following e.g. inside main()

let ctx = glib::MainContext::default();
let l = glib::MainLoop::new(Some(&ctx), false);
ctx.push_thread_default();

// All operations here would be scheduled on this main context
do_things(&l);

// Run everything until someone calls l.quit()
l.run();
ctx.pop_thread_default();

It is also possible to explicitly select for various operations on which main context they should run, but that’s just a minor detail.

GLib Futures

To make this situation a bit nicer, I’ve implemented support for futures in the Rust bindings. This means, that the GLib MainContext is now a futures executor (and arbitrary futures can be scheduled on it), all the GSource related operations in GLib (timeouts, UNIX signals, …) have futures- or stream-based variants and all the GIO asynchronous operations also come with futures variants now. The latter are autogenerated with the gir bindings code generator.

For enabling usage of this, the futures feature of the glib and gio crates have to be enabled, but that’s about it. It is currently still hidden behind a feature gate because the futures infrastructure is still going to go through some API incompatible changes in the near future.

So let’s take a look at how to use it. First of all, setting up the main context and executing a trivial future on it

let c = glib::MainContext::default();
let l = glib::MainLoop::new(Some(&c), false);

c.push_thread_default();

// Spawn a future that is called from the main context
// and after printing something just quits the main loop
let l_clone = l.clone();
c.spawn(futures::lazy(move |_| {
    println!("we're called from the main context");
    l_clone.quit();
    Ok(())
});

l.run();

c.pop_thread_default();

Apart from spawn(), there is also a spawn_local(). The former can be called from any thread but requires the future to implement the Send trait (that is, it must be safe to send it to other threads) while the latter can only be called from the thread that owns the main context but it allows any kind of future to be spawned. In addition there is also a block_on() function on the main context, which allows to run non-static futures up to their completion and returns their result. The spawn functions only work with static futures (i.e. they have no references to any stack frame) and requires the futures to be infallible and resolve to ().

The above code already showed one of the advantages of using futures: it is possible to use all generic futures (that don’t require a specific executor), like futures::lazy or the mpsc/oneshot channels with GLib now. And any of the combinators that are available on futures

let c = MainContext::new();
                                                                                                       
let res = c.block_on(timeout_future(20)
    .and_then(move |_| {
        // Called after 20ms
        Ok(1)
    })
);

assert_eq!(res, Ok(1));

This example also shows the block_on functionality to return an actual value from the future (1 in this case).

GIO Asynchronous Operations

Similarly, all asynchronous GIO operations are now available as futures. For example to open a file asynchronously and getting a gio::InputStream to read from, the following could be done

let file = gio::File::new_for_path("Cargo.toml");

let l_clone = l.clone();
c.spawn_local(
    // Try to open the file
    file.read_async_future(glib::PRIORITY_DEFAULT)
        .map_err(|(_file, err)| {
            format!("Failed to open file: {}", err)
        })
        .and_then(move |(_file, strm)| {
            // Here we could now read from the stream, but
            // instead we just quit the main loop
            l_clone.quit();

            Ok(())
        })
);

A bigger example can be found in the gtk-rs examples repository here. This example is basically reading a file asynchronously in 64 byte chunks and printing it to stdout, then closing the file.

In the same way, network operations or any other asynchronous operation can be handled via futures now.

Async/Await

Compared to a callback-based approach, that bigger example is already a lot nicer but still quite heavy to read. With the async/await extension that I mentioned above already, the code looks much nicer in comparison and really almost like synchronous code. Except that it is not synchronous.

#[async]
fn read_file(file: gio::File) -> Result<(), String> {
    // Try to open the file
    let (_file, strm) = await!(file.read_async_future(glib::PRIORITY_DEFAULT))
        .map_err(|(_file, err)| format!("Failed to open file: {}", err))?;

    Ok(())
}

fn main() {
    [...]
    let future = async_block! {
        match await!(read_file(file)) {
            Ok(()) => (),
            Err(err) => eprintln!("Got error: {}", err),
        }
        l_clone.quit();
        Ok(())
    };

    c.spawn_local(future);
    [...]
}

For compiling this code, the futures-nightly feature has to be enabled for the glib crate, and a nightly compiler must be used.

The bigger example from before with async/await can be found here.

With this we’re already very close in Rust to having the same convenience as in other languages with asynchronous programming. And also it is very similar to what is possible in Vala with GIO asynchronous operations.

The Future

For now this is all finished and available from GIT of the glib and gio crates. This will have to be updated in the future whenever the futures API is changing, but it is planned to stabilize all this in Rust until the end of this year.

In the future it might also make sense to add futures variants for all the GObject signal handlers, so that e.g. handling a click on a GTK+ button could be done similarly from a future (or rather from a Stream as a signal can be emitted multiple times). If this is in the end more convenient than the callback-based approach that is currently used, is to be seen. Some experimentation would be necessary here. Also how to handle return values of signal handlers would have to be figured out.

by slomo at April 23, 2018 08:46 AM

April 17, 2018

Víctor JáquezHow to setup a gst-build environment with Intel’s VA-API stack

gst-build is far superior than gst-uninstalled scripts for developing GStreamer, mainly because its meson and ninja usage. Nonetheless, to integrate external dependencies it is not as easy as in gst-uninstalled.

This guide aims to show how to integrate GStreamer-VAAPI dependencies, in this case with the Intel VA-API driver.

Dependencies

For now we will need meson master, since it this patch is required. The pull request is already merged but it is unreleased yet.

Installation

clone gst-build repository

$ git clone git://anongit.freedesktop.org/gstreamer/gst-build
$ cd gst-build

apply custom patch for gst-build

The patch will add the repositories for libva and intel-vaapi-driver.

$ wget https://people.igalia.com/vjaquez/gst-build-vaapi/0001-add-libva-and-intel-vaapi-driver-as-subprojects.patch
$ git am 0001-add-libva-and-intel-vaapi-driver-as-subprojects.patch

0001-add-libva-and-intel-vaapi-driver-as-subprojects.patch

configure

Running this command, all dependency repositories will be cloned, symbolic links created, and the build directory configured.

$ meson bulid

apply custom patches for libva, intel-vaapi-driver and gstreamer-vaapi

libva

This patch is required since the headers files uninstalled paths doesn’t match with the ones in the “include” directives.

$ cd libva
$ wget https://people.igalia.com/vjaquez/gst-build-vaapi/0001-build-add-headers-for-uninstalled-setup.patch
$ git am 0001-build-add-headers-for-uninstalled-setup.patch
$ cd -

0001-build-add-headers-for-uninstalled-setup.patch

intel-vaapi-driver

The patch handles libva dependency as a subproject.

$ cd intel-vaapi-driver
$ wget https://people.igalia.com/vjaquez/gst-build-vaapi/0001-meson-support-libva-as-subproject.patch
$ git am 0001-meson-support-libva-as-subproject.patch
$ cd -

0001-meson-support-libva-as-subproject.patch

gstreamer-vaapi

Note to myself: this patch must be split and merged in upstream.

$ cd gstreamer-vaapi
$ wget https://people.igalia.com/vjaquez/gst-build-vaapi/0001-build-meson-libva-gst-uninstall-friendly.patch
$ git am 0001-build-meson-libva-gst-uninstall-friendly.patch
$ cd -

0001-build-meson-libva-gst-uninstall-friendly.patch updated: 2018/04/24

build

$ ninja -C build

And wait a couple minutes.

run uninstalled environment for testing

$ ninja -C build uninstalled
[gst-master] $ gst-inspect-1.0 vaapi
Plugin Details:
  Name                     vaapi
  Description              VA-API based elements
  Filename                 /opt/gst/gst-build/build/subprojects/gstreamer-vaapi/gst/vaapi/libgstvaapi.so
  Version                  1.15.0.1
  License                  LGPL
  Source module            gstreamer-vaapi
  Binary package           gstreamer-vaapi
  Origin URL               http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer

  vaapih264enc: VA-API H264 encoder
  vaapimpeg2enc: VA-API MPEG-2 encoder
  vaapisink: VA-API sink
  vaapidecodebin: VA-API Decode Bin
  vaapipostproc: VA-API video postprocessing
  vaapivc1dec: VA-API VC1 decoder
  vaapih264dec: VA-API H264 decoder
  vaapimpeg2dec: VA-API MPEG2 decoder
  vaapijpegdec: VA-API JPEG decoder

  9 features:
  +-- 9 elements

by vjaquez at April 17, 2018 02:43 PM

April 11, 2018

Nirbheek ChauhanA simple method of measuring audio latency

In my previous blog post, I talked about how I improved the latency of GStreamer's default audio capture and render elements on Windows.

An important part of any such work is a way to accurately measure the latencies in your audio path.

Ideally, one would use a mechanism that can track your buffers and give you a detailed breakdown of how much latency each component of your system adds. For instance, with an audio pipeline like this:

audio-capture → filter1 → filter2 → filter3 → audio-output

If you use GStreamer, you can use the latency tracer to measure how much latency filter1 adds, filter2 adds, and so on.

However, sometimes you need to measure latencies added by components outside of your control, for instance the audio APIs provided by the operating system, the audio drivers, or even the hardware itself. In that case it's really difficult, bordering on impossible, to do an automated breakdown.

But we do need some way of measuring those latencies, and I needed that for the aforementioned work. Maybe we can get an aggregated (total) number?

There's a simple way to do that if we can create a loopback connection in the audio setup. What's a loopback you ask?

Ouroboros snake biting its tail

Essentially, if we can redirect the audio output back to the audio input, that's called a loopback. The simplest way to do this is to connect the speaker-out/line-out to the microphone-in/line-in with a two-sided 3.5mm jack.

photo of male-to-male 3.5mm jack connecting speaker-out to mic-in

Now, when we send an audio wave down to the audio output, it'll show up on the audio input.

Hmm, what if we store the current time when we send the wave out, and compare it with the current time when we get it back? Well, that's the total end-to-end latency!

If we send out a wave periodically, we can measure the latency continuously, even as things are switched around or the pipeline is dynamically reconfigured.

Some of you may notice that this is somewhat similar to how the `ping` command measures latencies across the Internet.

screenshot of ping to 192.168.1.1


Just like a network connection, the loopback connection can be lossy or noisy, f.ex. if you use loudspeakers and a microphone instead of a wire, or if you have (ugh) noise in your line. But unlike network packets, we lose all context once the waves leave our pipeline and we have no way of uniquely identifying each wave.

So the simplest reliable implementation is to have only one wave traveling down the pipeline at a time. If we send a wave out, say, once a second, we can wait about one second for it to show up, and otherwise presume that it was lost.

That's exactly how the audiolatency GStreamer plugin that I wrote works! Here you can see its output while measuring the combined latency of the WASAPI source and sink elements:


The first measurement will always be wrong because of various implementation details in the audio stack, but the next measurements should all be correct.

This mechanism does place an upper bound on the latency that we can measure, and on how often we can measure it, but it should be possible to take more frequent measurements by sending a new wave as soon as the previous one was received (with a 1 second timeout). So this is an enhancement that can be done if people need this feature.

Hope you find the element useful; go forth and measure!

by Nirbheek (noreply@blogger.com) at April 11, 2018 02:13 PM

April 05, 2018

Sebastian DrögeImproving GStreamer performance on a high number of network streams by sharing threads between elements with Rust’s tokio crate

(Sebastian Dröge)

For one of our customers at Centricular we were working on a quite interesting project. Their use-case was basically to receive an as-high-as-possible number of audio RTP streams over UDP, transcode them, and then send them out via UDP again. Due to how GStreamer usually works, they were running into some performance issues.

This blog post will describe the first set of improvements that were implemented for this use-case, together with a minimal benchmark and the results. My colleague Mathieu will follow up with one or two other blog posts with the other improvements and a more full-featured benchmark.

The short version is that CPU usage decreased by about 65-75%, i.e. allowing 3-4x more streams with the same CPU usage. Also parallelization works better and usage of different CPU cores is more controllable, allowing for better scalability. And a fixed, but configurable number of threads is used, which is independent of the number of streams.

The code for this blog post can be found here.

Table of Contents

  1. GStreamer & Threads
  2. Thread-Sharing GStreamer Elements
  3. Available Elements
  4. Little Benchmark
  5. Conclusion

GStreamer & Threads

In GStreamer, by default each source is running from its own OS thread. Additionally, for receiving/sending RTP, there will be another thread in the RTP jitterbuffer, yet another thread for receiving RTCP (another source) and a last thread for sending RTCP at the right times. And RTCP has to be received and sent for the receiver and sender side part of the pipeline, so the number of threads doubles. In the sum this gives at least 1 + 1 + (1 + 1) * 2 = 6 threads per RTP stream in this scenario. In a normal audio scenario, there will be one packet received/sent e.g. every 20ms on each stream, and every now and then an RTCP packet. So most of the time all these threads are only waiting.

Apart from the obvious waste of OS resources (1000 streams would be 6000 threads), this also brings down performance as all the time threads are being woken up. This means that context switches have to happen basically all the time.

To solve this we implemented a mechanism to share threads, and in the end as a result we have a fixed, but configurable number of threads that is independent from the number of streams. And can run e.g. 500 streams just fine on a single thread with a single core, which was completely impossible before. In addition we also did some work to reduce the number of allocations for each packet, so that after startup no additional allocations happen per packet anymore for buffers. See Mathieu’s upcoming blog post for details.

In this blog post, I’m going to write about a generic mechanism for sources, queues and similar elements to share their threads between each other. For the RTP related bits (RTP jitterbuffer and RTCP timer) this was not used due to reuse of existing C codebases.

Thread-Sharing GStreamer Elements

The code in question can be found here, a small benchmark is in the examples directory and it is going to be used for the results later. A full-featured benchmark will come in Mathieu’s blog post.

This is a new GStreamer plugin, written in Rust and around the Tokio crate for asynchronous IO and generally a “task scheduler”.

While this could certainly also have been written in C around something like libuv, doing this kind of work in Rust is simply more productive and fun due to its safety guarantees and the strong type system, which definitely reduced the amount of debugging a lot. And in addition “modern” language features like closures, which make working with futures much more ergonomic.

When using these elements it is important to have full control over the pipeline and its elements, and the dataflow inside the pipeline has to be carefully considered to properly configure how to share threads. For example the following two restrictions should be kept in mind all the time:

  1. Downstream of such an element, the streaming thread must never ever block for considerable amounts of time. Otherwise all other elements inside the same thread-group would be blocked too, even if they could do any work now
  2. This generally all works better in live pipelines, where media is produced in real-time and not as fast as possible

Available Elements

So this repository currently contains the generic infrastructure (see the src/iocontext.rs source file) and a couple of elements:

  • an UDP source: ts-udpsrc, a replacement for udpsrc
  • an app source: ts-appsrc, a replacement for appsrc to inject packets into the pipeline from the application
  • a queue: ts-queue, a replacement for queue that is useful for adding buffering to a pipeline part. The upstream side of the queue will block if not called from another thread-sharing element, but if called from another thread-sharing element it will pause the current task asynchronously. That is, stop the upstream task from producing more data.
  • a proxysink/src element: ts-proxysrc, ts-proxysink, replacements for proxysink/proxysrc for connecting two pipelines with each other. This basically works like the queue, but split into two elements.
  • a tone generator source around spandsp: ts-tonesrc, a replacement for tonegeneratesrc. This also contains some minimal FFI bindings for that part of the spandsp C library.

All these elements have more or less the same API as their non-thread-sharing counterparts.

API-wise, each of these elements has a set of properties for controlling how it is sharing threads with other elements, and with which elements:

  • context: A string that defines in which group this element is. All elements with the same context are running on the same thread or group of threads,
  • context-threads: Number of threads to use in this context. -1 means exactly one thread, 1 and above used N+1 threads (1 thread for polling fds, N worker threads) and 0 sets N to the number of available CPU cores. As long as no considerable work is done in these threads, -1 has shown to be the most efficient. See also this tokio GitHub issue
  • context-wait: Number of milliseconds that the threads will wait on each iteration. This allows to reduce CPU usage even further by handling all events/packets that arrived during that timespan to be handled all at once instead of waking up the thread every time a little event happens, thus reducing context switches again

The elements are all pushing data downstream from a tokio thread whenever data is available, assuming that downstream does not block. If downstream is another thread-sharing element and it would have to block (e.g. a full queue), it instead returns a new future to upstream so that upstream can asynchronously wait on that future before producing more output. By this, back-pressure is implemented between different GStreamer elements without ever blocking any of the tokio threads. All this is implemented around the normal GStreamer data-flow mechanisms, there is no “tokio fast-path” between elements.

Little Benchmark

As mentioned above, there’s a small benchmark application in the examples directory. This basically sets up a configurable number of streams and directly connects them to a fakesink, throwing away all packets. Additionally there is another thread that is sending all these packets. As such, this is really the most basic benchmark and not very realistic but nonetheless it shows the same performance improvement as the real application. Again, see Mathieu’s upcoming blog post for a more realistic and complete benchmark.

When running it, make sure that your user can create enough fds. The benchmark will just abort if not enough fds can be allocated. You can control this with ulimit -n SOME_NUMBER, and allowing a couple of thousands is generally a good idea. The benchmarks below were running with 10000.

After running cargo build –release to build the plugin itself, you can run the benchmark with:

cargo run --release --example udpsrc-benchmark -- 1000 ts-udpsrc -1 1 20

and in another shell the UDP sender with

cargo run --release --example udpsrc-benchmark-sender -- 1000

This runs 1000 streams, uses ts-udpsrc (alternative would be udpsrc), configures exactly one thread -1, 1 context, and a wait time of 20ms. See above for what these settings mean. You can check CPU usage with e.g. top. Testing was done on an Intel i7-4790K, with Rust 1.25 and GStreamer 1.14. One packet is sent every 20ms for each stream.

Source Streams Threads Contexts Wait CPU
udpsrc 1000 1000 x x 44%
ts-udpsrc 1000 -1 1 0 18%
ts-udpsrc 1000 -1 1 20 13%
ts-udpsrc 1000 -1 2 20 15%
ts-udpsrc 1000 2 1 20 16%
ts-udpsrc 1000 2 2 20 27%
Source Streams Threads Contexts Wait CPU
udpsrc 2000 2000 x x 95%
ts-udpsrc 2000 -1 1 20 29%
ts-udpsrc 2000 -1 2 20 31%
Source Streams Threads Contexts Wait CPU
ts-udpsrc 3000 -1 1 20 36%
ts-udpsrc 3000 -1 2 20 47%

Results for 3000 streams for the old udpsrc are not included as starting up that many threads needs too long.

The best configuration is apparently a single thread per context (see this tokio GitHub issue) and waiting 20ms for every iterations. Compared to the old udpsrc, CPU usage is about one third in that setting, and generally it seems to parallelize well. It’s not clear to me why the last test has 11% more CPU with two contexts, while in every other test the number of contexts does not really make a difference, and also not for that many streams in the real test-case.

The waiting does not reduce CPU usage a lot in this benchmark, but on the real test-case it does. The reason is most likely that this benchmark basically sends all packets at once, then waits for the remaining time, then sends the next packets.

Take these numbers with caution, the real test-case in Mathieu’s blog post will show the improvements in the bigger picture, where it was generally a quarter of CPU usage and almost perfect parallelization when increasing the number of contexts.

Conclusion

Generally this was a fun exercise and we’re quite happy with the results, especially the real results. It took me some time to understand how tokio works internally so that I can implement all kinds of customizations on top of it, but for normal usage of tokio that should not be required and the overall design makes a lot of sense to me, as well as the way how futures are implemented in Rust. It requires some learning and understanding how exactly the API can be used and behaves, but once that point is reached it seems like a very productive and performant solution for asynchronous IO. And modelling asynchronous IO problems based on the Rust-style futures seems a nice and intuitive fit.

The performance measurements also showed that GStreamer’s default usage of threads is not always optimal, and a model like in upipe or pipewire (or rather SPA) can provide better performance. But as this also shows, it is possible to implement something like this on top of GStreamer and for the common case, using threads like in GStreamer reduces the cognitive load on the developer a lot.

For a future version of GStreamer, I don’t think we should make the threading “manual” like in these two other projects, but instead provide some API additions that make it nicer to implement thread-sharing elements and to add ways in the GStreamer core to make streaming threads non-blocking. All this can be implemented already, but it could be nicer.

All this “only” improved the number of threads, and thus the threading and context switching overhead. Many other optimizations in other areas are still possible on top of this, for example optimizing receive performance and reducing the number of memory copies inside the pipeline even further. If that’s something you would be interested in, feel free to get in touch.

And with that: Read Mathieu’s upcoming blog posts about the other parts, RTP jitterbuffer / RTCP timer thread sharing, and no allocations, and the full benchmark.

by slomo at April 05, 2018 03:21 PM

Nirbheek ChauhanLatency in Digital Audio

We've come a long way since Alexander Graham Bell, and everything's turned digital.

Compared to analog audio, digital audio processing is extremely versatile, is much easier to design and implement than analog processing, and also adds effectively zero noise along the way. With rising computing power and dropping costs, every operating system has had drivers, engines, and libraries to record, process, playback, transmit, and store audio for over 20 years.

Today we'll talk about the some of the differences between analog and digital audio, and how the widespread use of digital audio adds a new challenge: latency.

Analog vs Digital


Analog data flows like water through an empty pipe. You open the tap, and the time it takes for the first drop of water to reach you is the latency. When analog audio is transmitted through, say, an RCA cable, the transmission happens at the speed of electricity and your latency is:

wire length/speed of electricity

This number is ridiculously smallespecially when compared to the speed of sound. An electrical signal takes 0.001 milliseconds to travel 300 metres (984 feet). Sound takes 874 milliseconds (almost a second).

All analog effects and filters obey similar equations. If you're using, say, an analog pedal with an electric guitar, the signal is transformed continuously by an electrical circuit, so the latency is a function of the wire length (plus capacitors/transistors/etc), and is almost always negligible.

Digital audio is transmitted in "packets" (buffers) of a particular size, like a bucket brigade, but at the speed of electricity. Since the real world is analog, this means to record audio, you must use an Analog-Digital Converter. The ADC quantizes the signal into digital measurements (samples), packs multiple samples into a buffer, and sends it forward. This means your latency is now:

(wire length/speed of electricity) + buffer size

We saw above that the first part is insignificant, what about the second part?

Latency is measured in time, but buffer size is measured in bytes. For 16-bit integer audio, each measurement (sample) is stored as a 16-bit integer, which is 2 bytes. That's the theoretical lower limit on the buffer size. The sample rate defines how often measurements are made, and these days, is usually 48KHz. This means each sample contains ~0.021ms of audio. To go lower, we need to increase the sample rate to 96KHz or 192KHz.

However, when general-purpose computers are involved, the buffer size is almost never lower than 32 bytes, and is usually 128 bytes or larger. For single-channel 16-bit integer audio at 48KHz, a 32 byte buffer is 0.33ms, and a 128 byte buffer is 1.33ms. This is our buffer size and hence the base latency while recording (or playing) digital audio.

Digital effects operate on individual buffers, and will add an additional amount of latency depending on the delay added by the CPU processing required by the effect. Such effects may also add latency if the algorithm used requires that, but that's the same with analog effects.

The Digital Age


So everyone's using digital. But isn't 1.33ms a lot of additional latency?

It might seem that way till you think about it in real-world terms. Sound travels less than half a meter (1½ feet) in that time, and that sort of delay is completely unnoticeable by humansotherwise we'd notice people's lips moving before we heard their words.

In fact, 1.33ms is too small for the majority of audio applications!

To process such small buffer sizes, you'd have to wake the CPU up 750 times a second, just for audio. This is highly inefficient, and wastes a lot of power. You really don't want that on your phone or your laptop, and is completely unnecessary in most cases anyway.

For instance, your music player will usually use a buffer size of ~200ms, which is just 5 CPU wakeups per second. Note that this doesn't mean that you will hear sound 200ms after hitting "play". The audio player will just send 200ms of audio to the sound card at once, and playback will begin immediately.

Of course, you can't do that with live playback such as video calls—you can't "read-ahead" data you don't have. You'd have to invent a time machine first. As a result, apps that use real-time communication have to use smaller buffer sizes because that directly affects the latency of live playback.

That brings us back to efficiency. These apps also need to conserve power, and 1.33ms buffers are really wasteful. Most consumer apps that require low latency use 10-15ms buffers, and that's good enough for things like voice/video calling, video games, notification sounds, and so on.

Ultra Low Latency


There's one category left: musicians, sound engineers, and other folk that work in the pro-audio business. For them, 10ms of latency is much too high!

You usually can't notice a 10ms delay between an event and the sound for it, but when making music, you can hear it when two instruments are out-of-sync by 10ms or if the sound for an instrument you're playing is delayed. Instruments such as drum snare are more susceptible to this problem than others, which is why the stage monitors used in live concerts must not add any latency.

The standard in the music business is to use buffers that are 5ms or lower, down to the 0.33ms number that we talked about above.

Power consumption is absolutely no concern, and the real problems are the accumulation of small amounts of latencies everywhere in your stack, and ensuring that you're able to read buffers from the hardware or write buffers to the hardware fast enough.

Let's say you're using an app on your computer to apply digital effects to a guitar that you're playing. This involves capturing audio from the line-in port, sending it to the application for processing, and playing it from the sound card to your amp.

The latency while capturing and outputting audio are both multiples of the buffer size, so it adds up very quickly. The effects app itself will also add a variable amount of latency, and at 1.33ms buffer sizes you will find yourself quickly approaching a 10ms latency from line-in to amp-out. The only way to lower this is to use a smaller buffer size, which is precisely what pro-audio hardware and software enables.

The second problem is that of CPU scheduling. You need to ensure that the threads that are fetching/sending audio data to the hardware and processing the audio have the highest priority, so that nothing else will steal CPU-time away from them and cause glitching due to buffers arriving late.

This gets harder as you lower the buffer size because the audio stack has to do more work for each bit of audio. The fact that we're doing this on a general-purpose operating system makes it even harder, and requires implementing real-time scheduling features across several layers. But that's a story for another time!

I hope you found this dive into digital audio interesting! My next post will be is about my journey in implementing ultra low latency capture and render on Windows in the WASAPI plugin for GStreamer. This was already possible on Linux with the JACK GStreamer plugin and on macOS with the CoreAudio GStreamer plugin, so it will be interesting to see how the same problems are solved on Windows. Tune in!

by Nirbheek (noreply@blogger.com) at April 05, 2018 04:04 AM

March 28, 2018

GStreamerGStreamer 1.12.5 old-stable bugfix release

(GStreamer)

The GStreamer team is pleased to announce the fifth and likely last bugfix release in the old stable 1.12 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.12.x.

The 1.12 stable series is now superseded by the 1.14 stable series, and 1.12.5 will likely be the last bugfix release in the 1.12 series.

See /releases/1.12/ for the details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Download tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-vaapi, or gst-omx.

March 28, 2018 11:30 PM

Víctor JáquezGStreamer VA-API Troubleshooting

GStreamer VA-API is not a trivial piece of software. Even though, in my opinion it is a bit over-engineered, the complexity relies on its layered architecture: the user must troubleshoot in which layer is the failure.

So, bear in mind this architecture:

GStreamer VA-API is not a trivial piece of software. Even though, in my opinion it is a bit over-engineered, the complexity relies on its layered architecture: the user must troubleshoot in which layer is the failure.

So, bear in mind this architecture:

libva architecturelibva architecture

And the point of failure could be anywhere.

Drivers

libva is a library designed to load another library called driver or back-end. This driver is responsible to talk with the kernel, windowing platform, memory handling library, or any other piece of software or hardware that actually will do the video processing.

There are many drivers in the wild. As it is an API aiming to stateless video processing, and the industry is moving towards that way to process video, it is expected more drivers would appear in the future.

Nonetheless, not all the drivers have the same level of maturity, and some of them are abandon-ware. For this reason we decided in GStreamer VA-API, some time ago, to add a white list of functional drivers, basically, those developed by Mesa3D and this one from Intel™. If you wish to disable that white-list, you can do it by setting an environment variable:

$ export GST_VAAPI_ALL_DRIVERS=1

Remember, if you set it, you are on your own, since we do not trust on the maturity of that driver yet.

Internal libva↔driver version

Thus, there is an internal API between libva and the driver and it is versioned, meaning that the internal API version of the installed libva library must match with the internal API exposed by the driver. One of the causes that libva could not initialize a driver could be because the internal API version does not match.

Drivers path and driver name

By default there is a path where libva looks for drivers to load. That path is defined at compilation time. Following Debian’s file-system hierarchy standard (FHS) it should be set by distributions in /usr/lib/x86_64-linux-gnu/dri/. But the user can control this path with an environment variable:

$ export LIBVA_DRIVERS_PATH=${HOME}/src/intel-vaapi-driver/src/.libs

The driver path, as a directory, might contain several drivers. libva will try to guess the correct one by querying the instantiated VA display (which could be either KMS/DRM, Wayland, Android or X11). If the user instantiates a VA display different of his running environment, the guess will be erroneous, the library loading will fail.

Although, there is a way for the user to set the driver’s name too. Again, by setting an environment variable:

$ export LIBVA_DRIVER_NAME=iHD

With this setting, libva will try to load iHD_drv_video.so (a new and experimental open source driver from Intel™, targeted for MediaSDK —do not use it yet with GStreamer VAAPI—).

vainfo

vainfo is the diagnostic tool for VA-API. In a couple words, it will iterate on a list of VA displays, in try-and-error strategy, and try to initialize VA. In case of success, vainfo will report the driver signature, and it will query the driver for the available profiles and entry-points.

For example, my skylake board for development will report

$ vainfo
error: can't connect to X server!
libva info: VA-API version 1.1.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /home/vjaquez/gst/master/intel-vaapi-driver/src/.libs/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_1
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.1 (libva 2.1.1.pre1)
vainfo: Driver version: Intel i965 driver for Intel(R) Skylake - 2.1.1.pre1 (2.1.0-41-g99c3748)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileH264ConstrainedBaseline: VAEntrypointFEI
      VAProfileH264ConstrainedBaseline: VAEntrypointStats
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264Main               : VAEntrypointFEI
      VAProfileH264Main               : VAEntrypointStats
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointFEI
      VAProfileH264High               : VAEntrypointStats
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointEncSlice
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointEncSlice
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice

And my AMD board with stable packages replies:

$ vainfo
libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva )
vainfo: Driver version: mesa gallium vaapi
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileNone                   : VAEntrypointVideoProc

Does this mean that VA-API processes video? No. It means that there is an usable VA display which could open a driver correctly and libva can extract symbols from it.

I would like to mention another tool, not official, but I like it a lot, since it extracts almost of the VA information available in the driver: vadumpcaps.c, written by Mark Thompson.

GStreamer VA-API registration

When GStreamer is launched, normally it will register all the available plugins and plugin features (elements, device providers, etc.). All that data is cache and keep until the cache file is deleted or the cache invalidated by some event.

At registration time, GStreamer VA-API will instantiate a DRM-based VA display, which works with no need of a real display (in other words, headless), and will query the driver for the profiles and entry-points tuples, in order to register only the available elements (encoders, decoders. sink, post-processor). If the DRM VA display fails, a list of VA displays will be tried.

In the case that libva could not load any driver, or the driver is not in the white-list, GStreamer VA-API will not register any element. Otherwise gst-inspect-1.0 will show the registered elements:

$ gst-inspect-1.0 vaapi
Plugin Details:
  Name                     vaapi
  Description              VA-API based elements
  Filename                 /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstvaapi.so
  Version                  1.12.4
  License                  LGPL
  Source module            gstreamer-vaapi
  Source release date      2017-12-07
  Binary package           gstreamer-vaapi
  Origin URL               http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer

  vaapijpegdec: VA-API JPEG decoder
  vaapimpeg2dec: VA-API MPEG2 decoder
  vaapih264dec: VA-API H264 decoder
  vaapivc1dec: VA-API VC1 decoder
  vaapivp8dec: VA-API VP8 decoder
  vaapih265dec: VA-API H265 decoder
  vaapipostproc: VA-API video postprocessing
  vaapidecodebin: VA-API Decode Bin
  vaapisink: VA-API sink
  vaapimpeg2enc: VA-API MPEG-2 encoder
  vaapih265enc: VA-API H265 encoder
  vaapijpegenc: VA-API JPEG encoder
  vaapih264enc: VA-API H264 encoder

  13 features:
  +-- 13 elements

Beside the normal behavior, GStreamer VA-API will also invalidate GStreamer’s cache at every boot, or when any of the mentioned environment variables change.

Conclusion

A simple task list to review when GStreamer VA-API is not working at all is this:

#. Check your LIBVA_* environment variables
#. Verify that vainfo returns sensible information
#. Invalidate GStreamer’s cache (or just delete the file)
#. Check the output of gst-inspect-1.0 vaapi

And, if you decide to file a bug in bugzilla, please do not forget to attach the output of vainfo and the logs if the developer asks for them.

by vjaquez at March 28, 2018 06:11 PM

March 27, 2018

Víctor JáquezGStreamer VA-API 1.14: what’s new?

As you may already know, there is a new release of GStreamer, 1.14. In this blog post we will talk about the new features and improvements of GStreamer VA-API module, though you have a more comprehensive list of changes in the release notes.

Most of the topics explained along this blog post are already mentioned in the release notes, but a bit more detailed.

DMABuf usage

We have improved DMA-buf’s usage, mostly at downstream.

In the case of upstream, we just got rid a nasty hack which detected when to instantiate and use a buffer pool in sink pad with a dma-buf based allocator. This functionality has been already broken for a while, and that code was the wrong way to enabled it. The sharing of a dma-buf based buffer pool to upstream is going to be re-enabled after bug 792034 is merged.

For downstream, we have added the handling of memory:DMABuf caps feature. The purpose of this caps feature is to negotiate a media when the buffers are not map-able onto user space, because of digital rights or platform restrictions.

For example, currently intel-vaapi-driver doesn’t allow the mapping of its produced dma-buf descriptors. But, as we cannot know if a back-end produces or not map-able dma-buf descriptors, gstreamer-vaapi, when the allocator is instantiated, creates a dummy buffer and tries to map it, if it fails, memory:DMABuf caps feature is negotiated, otherwise, normal video caps are used.

VA-API usage

First of all, GStreamer VA-API has support now for libva-2.0, this means VA-API 1.10. We had to guard some deprecated symbols and the new ones. Nowadays most of distributions have upgraded to libva-2.0.

We have improved the initialization of the VA display internal structure (GstVaapiDisplay). Previously, if a X based display was instantiated, immediately it tried to grab the screen resolution. Obviously, this broke the usage of headless systems. We just delay the screen resolution check to when the VA display truly requires that information.

New API were added into VA, particularly for log handling. Now it is possible to redirect the log messages into a callback. Thus, we use it to redirect VA-API message into the GStreamer log mechanisms, uncluttering the console’s output.

Also, we have blacklisted, in autoconf and meson, libva version 0.99.0, because that version is used internally by the closed-source version of Intel MediaSDK, which is incompatible with official libva. By the way, there is a new open-source version of MediaSDK, but we will talk about it in a future blog post.

Application VA Display sharing

Normally, the object GstVaapiDisplay is shared among the pipeline through the GstContext mechanism. But this class is defined internally and it is not exposed to users since release 1.6. This posed a problem when an application wanted to create its own VA Display and share it with an embedded pipeline. The solution is a new context application message: gst.vaapi.app.Display, defined as a GstStructure with two fields: va-display with the application’s vaDisplay, and x11-display with the application’s X11 native display. In the future, a Wayland’s native handler will be processed too. Please note that this context message is only processed by vaapisink.

One precondition for this solution was the removal of the VA display cache mechanism, a lingered request from users, which, of course, we did.

Interoperability with appsink and similar

A hardware accelerated driver, as the Intel one, may have custom offsets and strides for specific memory regions. We use the GstVideoMeta to set this custom values. The problem comes when downstream does not handle this meta, for example, appsink. Then, the user expect the “normal” values for those variable, but in the case of GStreamer VA-API with a hardware based driver, when the user displays the frame, it is shown corrupted.

In order to fix this, we have to make a memory copy, from our custom VA-API images to an allocated system memory. Of course there is a big CPU penalty, but it is better than delivering wrong video frames. If the user wants a better performance, then they should seek for a different approach.

Resurrection of GstGLUploadTextureMeta for EGL renders

I know, GstGLUploadTextureMeta must die, right? I am convinced of it. But, Clutter video sink uses it, an it has a vast number of users, so we still have to support it.

Last release we had remove the support for EGL/Wayland in the last minute because we found a terrible bug just before the release. GLX support has always been there.

With Daniel van Vugt efforts, we resurrected the support for that meta in EGL. Though I expect the replacement of Clutter sink with glimagesink someday, soon.

vaapisink demoted in Wayland

vaapisink was demoted to marginal rank on Wayland because COGL cannot display YUV surfaces.

This means, by default, vaapisink won’t be auto-plugged when playing in Wayland.

The reason is because Mutter (aka GNOME) cannot display the frames processed by vaapisink in Wayland. Nonetheless, please note that in Weston, it works just fine.

Decoders

We have improved a little bit upstream renegotiation: if the new stream is compatible with the previous one, there is no need to reset the internal parser, with the exception of changes in codec-data.

low-latency property in H.264

A new property has added only to H.264 decoder: low-latency. Its purpose is for live streams that do not conform the H.264 specification (sadly there are many in the wild) and they need to twitch the spec implementation. This property force to push the frames in the decoded picture buffer as soon as possible.

base-only property in H.264

This is the result of the Google Summer of Code 2017, by Orestis Floros. When this property is enabled, all the MVC (Multiview Video Coding) or SVC (Scalable Video Coding) frames are dropped. This is useful if you want to reduce the processing time or if your VA-API driver does not support those kind of streams.

Encoders

In this release we have put a lot of effort in encoders.

Processing Regions of Interest

It is possible, for certain back-ends and profiles (for example, H.264 and H.265 encoders with Intel driver), to specify a set of regions of interest per frame, with a delta-qp per region. This mean that we would ask more quality in those regions.

In order to process regions of interest, upstream must add to the video frame, a list of GstVideoRegionOfInterestMeta. This list then is traversed by the encoder and it requests them if the VA-API profile, in the driver, supports it.

The common use-case for this feature is if you want to higher definition in regions with faces or text messages in the picture, for example.

New encoding properties

  • quality-level: For all the available encoders. This is number between 1 to 8, where a lower number means higher quality (and slower processing).
  • aud: This is for H.264 encoder only and it is available for certain drivers and platforms. When it is enabled, an AU delimiter is inserted for each encoded frame. This is useful for network streaming, and more particularly for Apple clients.

  • mbbrc: For H.264 only. Controls (auto/on/off) the macro-block bit-rate.

  • temporal-levels: For H.264 only. It specifies the number of temporal levels to include a the hierarchical frame prediction.

  • prediction-type: For H.264 only. It selects the reference picture selection mode.

    The frames are encoded as different layers. A frame in a particular layer will use pictures in lower or same layer as references. This means decoder can drop frames in upper layer but still decode lower layer frames.

    • hierarchical-p: P frames, except in top layer, are reference frames. Base layer frames are I or B.
  • hierarchical-b: B frames , except in top most layer, are reference frames. All the base layer frames are I or P.

  • refs: Added for H.265 (it was already supported for H.264). It specifies the number of reference pictures.

  • qp-ip and qp-ib: For H.264 and H.265 encoders. They handle the QP (quality parameters) difference between the I and P frames, the the I and B frames respectively.

  • Set media profile via downstream caps

    H.264 and H.265 encoders now can configure the desired media profile through the downstream caps.

    Contributors

    Many thanks to all the contributors and bug reporters.

         1  Daniel van Vugt
        46  Hyunjun Ko
         1  Jan Schmidt
         3  Julien Isorce
         1  Matt Staples
         2  Matteo Valdina
         2  Matthew Waters
         1  Michael Tretter
         4  Nicolas Dufresne
         9  Orestis Floros
         1  Philippe Normand
         4  Sebastian Dröge
        24  Sreerenj Balachandran
         1  Thibault Saunier
        13  Tim-Philipp Müller
         1  Tomas Rataj
         2  U. Artie Eoff
         1  VaL Doroshchuk
       172  Víctor Manuel Jáquez Leal
         3  XuGuangxin
         2  Yi A Wang
    

    by vjaquez at March 27, 2018 10:52 AM

    March 24, 2018

    Nirbheek ChauhanLow-latency audio on Windows with GStreamer

    Digital audio is so ubiquitous that we rarely stop to think or wonder how the gears turn underneath our all-pervasive apps for entertainment. Today we'll look at one specific piece of the machinery: latency.

    Let's say you're making a video of someone's birthday party with an app on your phone. Once the recording starts, you don't care when the app starts writing it to diskas long as everything is there in the end.

    However, if you're having a Skype call with your friend, it matters a whole lot how long it takes for the video to reach the other end and vice versa. It's impossible to have a conversation if the lag (latency) is too high.

    The difference is, do you need real-time feedback or not?

    Other examples, in order of increasingly stricter latency requirements are: live video streaming, security cameras, augmented reality games such as Pokémon Go, multiplayer video games in general, audio effects apps for live music recording, and many many more.

    “But Nirbheek”, you might ask, “why doesn't everyone always ‘immediately’ send/store/show whatever is recorded? Why do people have to worry about latency?” and that's a great question!

    To understand that, checkout my previous blog post, Latency in Digital Audio. It's also a good primer on analog vs digital audio!

    Low latency on consumer operating systems


    Each operating system has its own set of application APIs for audio, and each has a lower bind on the achievable latency:


    GStreamer already has plugins for almost all of these¹ (plus others that aren't listed here), and on Windows, GStreamer has been using the DirectSound API by default for audio capture and output since the very beginning.

    However, the DirectSound API was deprecated in Windows XP, and with Vista, it was removed and replaced with an emulation layer on top of the newly-released WASAPI. As a result, the plugin can't be configured to have less than 200ms of latency, which makes it unsuitable for all the low-latency use-cases mentioned above. The DirectSound API is quite crufty and unnecessarily complex anyway.

    GStreamer is rarely used in video games, but it is widely used for live streaming, audio/video calls, and other real-time applications. Worse, the WASAPI GStreamer plugins were effectively untouched and unused since the initial implementation in 2008 and were completely broken².

    This left no way to achieve low-latency audio capture or playback on Windows using GStreamer.

    The situation became particularly dire when GStreamer added a new implementation of the WebRTC spec in this release cycle. People that try it out on Windows were going to see much higher latencies than they should.

    Luckily, I rewrote most of the WASAPI plugin code in January and February, and it should now work well on all versions of Windows from Vista to 10! You can get binary installers for GStreamer or build it from source.

    Shared and Exclusive WASAPI


    WASAPI allows applications to open sound devices in two modes: shared and exclusive. As the name suggests, shared mode allows multiple applications to output to (or capture from) an audio device at the same time, whereas exclusive mode does not.

    Almost all applications should open audio devices in shared mode. It would be quite disastrous if your YouTube videos played without sound because Spotify decided to open your speakers in exclusive mode.

    In shared mode, the audio engine has to resample and mix audio streams from all the applications that want to output to that device. This increases latency because it must maintain its own audio ringbuffer for doing all this, from which audio buffers will be periodically written out to the audio device.

    In theory, hardware mixing could be used if the sound card supports it, but very few sound cards implement that now since it's so cheap to do in software. On Windows, only high-end audio interfaces used for professional audio implement this.

    Another option is to allocate your audio engine buffers directly in the sound card's memory with DMA, but that complicates the implementation and relies on good drivers from hardware manufacturers. Microsoft has tried similar approaches in the past with DirectSound and been burned by it, so it's not a route they took with WASAPI³.

    On the other hand, some applications know they will be the only ones using a device, and for them all this machinery is a hindrance. This is why exclusive mode exists. In this mode, if the audio driver is implemented correctly, the application's buffers will be directly written out to the sound card, which will yield the lowest possible latency.

    Audio latency with WASAPI


    So what kind of latencies can we get with WASAPI?

    That depends on the device period that is being used. The term device period is a fancy way of saying buffer size; specifically the buffer size that is used in each call to your application that fetches audio data.

    This is the same period with which audio data will be written out to the actual device, so it is the major contributor of latency in the entire machinery.

    If you're using the AudioClient interface in WASAPI to initialize your streams, the default period is 10ms. This means the theoretical minimum latency you can get in shared mode would be 10ms (audio engine) + 10ms (driver) = 20ms. In practice, it'll be somewhat higher due to various inefficiencies in the subsystem.

    When using exclusive mode, there's no engine latency, so the same number goes down to ~10ms.

    These numbers are decent for most use-cases, but like I explained in my previous blog post, this is totally insufficient for pro-audio use-cases such as applying live effects to music recordings. You really need latencies that are lower than 10ms there.

    Ultra-low latency with WASAPI


    Starting with Windows 10, WASAPI removed most of its aforementioned inefficiencies, and introduced a new interface: AudioClient3. If you initialize your streams with this interface, and if your audio driver is implemented correctly, you can configure a device period of just 2.67ms at 48KHz.

    The best part is that this is the period not just in exclusive mode but also in shared mode, which brings WASAPI almost at-par with JACK and CoreAudio

    So that was the good news. Did I mention there's bad news too? Well, now you know.

    The first bit is that these numbers are only achievable if you use Microsoft's implementation of the Intel HD Audio standard for consumer drivers. This is fine; you follow some badly-documented steps and it turns out fine.

    Then you realize that if you want to use something more high-end than an Intel HD Audio sound card, unless you use one of the rare pro-audio interfaces that have drivers that use the new WaveRT driver model instead of the old WaveCyclic model, you still see 10ms device periods.

    It seems the pro-audio industry made the decision to stick with ASIO since it already provides <5ms latency. They don't care that the API is proprietary, and that most applications can't actually use it because of that. All the apps that are used in the pro-audio world already work with it.

    The strange part is that all this information is nowhere on the Internet and seems to lie solely in the minds of the Windows audio driver cabals across the US and Europe. It's surprising and frustrating for someone used to working in the open to see such counterproductive information asymmetry, and I'm not the only one.

    This is where I plug open-source and talk about how Linux has had ultra-low latencies for years since all the audio drivers are open-source, follow the same ALSA driver model, and are constantly improved. JACK is probably the most well-known low-latency audio engine in existence, and was born on Linux. People are even using Pulseaudio these days to work with <5ms latencies.

    But this blog post is about Windows and WASAPI, so let's get back on track.

    To be fair, Microsoft is not to blame here. Decades ago they made the decision of not working more closely with the companies that write drivers for their standard hardware components, and they're still paying the price for it. Blue screens of death were the most user-visible consequences, but the current audio situation is an indication that losing control of your platform has more dire consequences.

    There is one more bit of bad news. In my testing, I wasn't able to get glitch-free capture of audio in the source element using the AudioClient3 interface at the minimum configurable latency in shared mode, even with critical thread priorities unless there was nothing else running on the machine.

    As a result, this feature is disabled by default on the source element. This is unfortunate, but not a great loss since the same device period is achievable in exclusive mode without glitches.

    Measuring WASAPI latencies


    Now that we're back from our detour, the executive summary is that the GStreamer WASAPI source and sink elements now use the latest recommended WASAPI interfaces. You should test them out and see how well they work for you!

    By default, a device is opened in shared mode with a conservative latency setting. To force the stream into the lowest latency possible, set low-latency=true. If you're on Windows 10 and want to force-enable/disable the use of the AudioClient3 interface, toggle the use-audioclient3 property.

    To open a device in exclusive mode, set exclusive=true. This will ignore the low-latency and use-audioclient3 properties since they only apply to shared mode streams. When a device is opened in exclusive mode, the stream will always be configured for the lowest possible latency by WASAPI.

    To measure the actual latency in each configuration, you can use the new audiolatency plugin that I wrote to get hard numbers for the total end-to-end latency including the latency added by the GStreamer audio ringbuffers in the source and sink elements, the WASAPI audio engine (capture and render), the audio driver, and so on.

    I look forward to hearing what your numbers are on Windows 7, 8.1, and 10 in all these configurations! ;)


    1. The only ones missing are AAudio because it's very new and ASIO which is a proprietary API with licensing requirements.

    2. It's no secret that although lots of people use GStreamer on Windows, the majority of GStreamer developers work on Linux and macOS. As a result the Windows plugins haven't always gotten a lot of love. It doesn't help that building GStreamer on Windows can be a daunting task . This is actually one of the major reasons why we're moving to Meson, but I've already written about that elsewhere!

    3. My knowledge about the history of the decisions behind the Windows Audio API is spotty, so corrections and expansions on this are most welcome!

    4. The ALSA drivers in the Linux kernel should not be confused with the ALSA userspace library.

    by Nirbheek (noreply@blogger.com) at March 24, 2018 06:09 AM

    March 20, 2018

    Sebastian DrögeGStreamer Rust bindings 0.11 / plugin writing infrastructure 0.2 release

    (Sebastian Dröge)

    Following the GStreamer 1.14 release and the new round of gtk-rs releases, there are also new releases for the GStreamer Rust bindings (0.11) and the plugin writing infrastructure (0.2).

    Thanks also to all the contributors for making these releases happen and adding lots of valuable changes and API additions.

    GStreamer Rust Bindings

    The main changes in the Rust bindings were the update to GStreamer 1.14 (which brings in quite some new API, like GstPromise), a couple of API additions (GstBufferPool specifically) and the addition of the GstRtspServer and GstPbutils crates. The former allows writing a full RTSP server in a couple of lines of code (with lots of potential for customizations), the latter provides access to the GstDiscoverer helper object that allows inspecting files and streams for their container format, codecs, tags and all kinds of other metadata.

    The GstPbutils crate will also get other features added in the near future, like encoding profile bindings to allow using the encodebin GStreamer element (a helper element for automatically selecting/configuring encoders and muxers) from Rust.

    But the biggest changes in my opinion is some refactoring that was done to the Event, Message and Query APIs. Previously you would have to use a view on a newly created query to be able to use the type-specific functions on it

    let mut q = gst::Query::new_position(gst::Format::Time);
    if pipeline.query(q.get_mut().unwrap()) {
        match q.view() {
            QueryView::Position(ref p) => Some(p.get_result()),
            _ => None,
        }
    } else {
        None
    }

    Now you can directly use the type-specific functions on a newly created query

    let mut q = gst::Query::new_position(gst::Format::Time);
    if pipeline.query(&mut q) {
        Some(q.get_result())
    } else {
        None
    }

    In addition, the views can now dereference directly to the event/message/query itself and provide access to their API, which simplifies some code even more.

    Plugin Writing Infrastructure

    While the plugin writing infrastructure did not see that many changes apart from a couple of bugfixes and updating to the new versions of everything else, this does not mean that development on it stalled. Quite the opposite. The existing code works very well already and there was just no need for adding anything new for the projects I and others did on top of it, most of the required API additions were in the GStreamer bindings.

    So the status here is the same as last time, get started writing GStreamer plugins in Rust. It works well!

    by slomo at March 20, 2018 11:42 AM

    March 19, 2018

    GStreamerGStreamer 1.14.0 new major stable release

    (GStreamer)

    The GStreamer team is proud to announce a new major feature release of your favourite cross-platform multimedia framework!

    The 1.14 release series adds new features on top of the previous 1.12 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

    Highlights:

    • WebRTC support: real-time audio/video streaming to and from web browsers
    • Experimental support for the next-gen royalty-free AV1 video codec
    • Video4Linux: encoding support, stable element names and faster device probing
    • Support for the Secure Reliable Transport (SRT) video streaming protocol
    • RTP Forward Error Correction (FEC) support (ULPFEC)
    • RTSP 2.0 support in rtspsrc and gst-rtsp-server
    • ONVIF audio backchannel support in gst-rtsp-server and rtspsrc
    • playbin3 gapless playback and pre-buffering support
    • tee, our stream splitter/duplication element, now does allocation query aggregation which is important for efficient data handling and zero-copy
    • QuickTime muxer has a new prefill recording mode that allows file import in Adobe Premiere and FinalCut Pro while the file is still being written.
    • rtpjitterbuffer fast-start mode and timestamp offset adjustment smoothing
    • souphttpsrc connection sharing, which allows for connection reuse, cookie sharing, etc.
    • nvdec: new plugin for hardware-accelerated video decoding using the NVIDIA NVDEC API
    • Adaptive DASH trick play support
    • ipcpipeline: new plugin that allows splitting a pipeline across multiple processes
    • Major gobject-introspection annotation improvements for large parts of the library API
    • GStreamer C# bindings have been revived and seen many updates and fixes
    • The externally-maintained GStreamer Rust bindings have many usability improvements and cover most of the API now

    Full release notes can be found here.

    Binaries for Android, iOS, Mac OS X and Windows will be provided in the next days.

    You can download release tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-vaapi, gstreamer-sharp, or gst-omx.

    March 19, 2018 08:00 PM

    Phil NormandGStreamer’s playbin3 overview for application developers

    (Phil Normand)

    Multimedia applications based on GStreamer usually handle playback with the playbin element. I recently added support for playbin3 in WebKit. This post aims to document the changes needed on application side to support this new generation flavour of playbin.

    So, first of, why is it named playbin3 anyway? The GStreamer …

    by Philippe Normand at March 19, 2018 07:13 AM

    March 18, 2018

    Phil NormandMoving to Pelican

    (Phil Normand)

    Time for a change! Almost 10 years ago I was starting to hack on a Blog engine with two friends, it was called Alinea and it powered this website for a long time. Back then hacking on your own Blog engine was the pre-requirement to host your blog :) But nowadays …

    by Philippe Normand at March 18, 2018 09:18 AM

    Phil NormandThe GNOME-Shell Gajim extension maintenance

    (Phil Normand)

    Back in January 2011 I wrote a GNOME-Shell extension allowing Gajim users to carry on with their chats using the Empathy infrastructure and UI present in the Shell. For some time the extension was also part of the official gnome-shell-extensions module and then I had to move it to …

    by Philippe Normand at March 18, 2018 09:18 AM

    Phil NormandWeb Engines Hackfest 2014

    (Phil Normand)

    Last week I attended the Web Engines Hackfest. The event was sponsored by Igalia (also hosting the event), Adobe and Collabora.

    As usual I spent most of the time working on the WebKitGTK+ GStreamer backend and Sebastian Dröge kindly joined and helped out quite a bit, make sure to read …

    by Philippe Normand at March 18, 2018 09:18 AM

    March 12, 2018

    GStreamerGStreamer 1.13.91 pre-release (1.14 rc2)

    (GStreamer)

    The GStreamer team is pleased to announce the second and hopefully last release candidate for the upcoming stable 1.14 release series.

    The 1.14 release series adds new features on top of the current stable 1.12 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

    The 1.13.91 pre-release is for testing and development purposes in the lead-up to the stable 1.14 series which is now feature frozen and scheduled for release soon.

    Full release notes can be found on the 1.14 release notes page, highlighting all the new features, bugfixes, performance optimizations and other important changes.

    Packagers: please note that quite a few plugins and libraries have moved between modules, so please take extra care and make sure inter-module version dependencies are such that users can only upgrade all modules in one go, instead of seeing a mix of 1.13 and 1.12 on their system.

    Binaries for Android, iOS, Mac OS X and Windows will be provided shortly.

    Release tarballs can be downloaded directly here:

    March 12, 2018 11:00 PM

    March 03, 2018

    GStreamerGStreamer 1.13.90 pre-release (1.14 rc1)

    (GStreamer)

    The GStreamer team is pleased to announce the first release candidate for the upcoming stable 1.14 release series.

    The 1.14 release series adds new features on top of the current stable 1.12 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

    The 1.13.90 pre-release is for testing and development purposes in the lead-up to the stable 1.14 series which is now feature frozen and scheduled for release soon.

    Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

    Packagers: please note that quite a few plugins and libraries have moved between modules, so please take extra care and make sure inter-module version dependencies are such that users can only upgrade all modules in one go, instead of seeing a mix of 1.13 and 1.12 on their system.

    Binaries for Android, iOS, Mac OS X and Windows will be provided shortly.

    Release tarballs can be downloaded directly here:

    March 03, 2018 11:00 PM