December 18, 2023

GStreamerGStreamer 1.22.8 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Security fixes for the AV1 codec parser
  • [Security fixes](https://gstreamer.freedesktop.org/security/) for the AV1 video codec parser
  • avdec video decoder: fix another possible deadlock with FFmpeg 6.1
  • qtdemux: reverse playback and seeking fixes for files with raw audio streams
  • v4l2: fix "newly allocated buffer ... is not free" warning log flood
  • GstPlay + GstPlayer library fixes
  • dtls: Fix build failure on Windows when compiling against OpenSSL 3.2.0
  • d3d11screencapturesrc: Fix wrong color with HDR enabled
  • Cerbero build tool: More python 3.12 string escape warning fixes; make sure to bundle build tools as well
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.8 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

December 18, 2023 02:00 PM

December 14, 2023

Andy Wingoservice update

(Andy Wingo)

Late last year I switched blog entries and comments to be written in a dialect of markdown, but there was a bug that I never noticed: if a text consisted only of a single paragraph or block, it would trigger an error that got reported back to the user in a very strange way, and which would prevent the comment from being posted.

I had never seen the error myself because blog posts are generally more than a paragraph, but it must have been quite irritating when commenting. Sorry about that; it should be fixed now. Should you experience more strange errors, please do send me an email with the comment to wingo@igalia.com. Cheers.

by Andy Wingo at December 14, 2023 02:28 PM

December 12, 2023

Andy Wingosir talks-a-lot

(Andy Wingo)

I know, dear reader: of course you have already seen all my talks this year. Your attentions are really too kind and I thank you. But those other people, maybe you share one talk with them, and then they ask you for more, and you have to go stalking back through the archives to slake their nerd-thirst. This happens all the time, right?

I was thinking of you this morning and I said to myself, why don’t I put together a post linking to all of my talks in 2023, so that you can just send them a link; here we are. You are very welcome, it is really my pleasure.

2023 talks

Scheme + Wasm + GC = MVP: Hoot Scheme-to-Wasm compiler update. Wasm standards group, Munich, 11 Oct 2023. slides

Scheme to Wasm: Use and misuse of the GC proposal. Wasm GC subgroup, 18 Apr 2023. slides

A world to win: WebAssembly for the rest of us. BOB, Berlin, 17 Mar 2023. blog slides youtube

Cross-platform mobile UI: “Compilers, compilers everywhere”. EOSS, Prague, 27 June 2023. slides youtube blog blog blog blog blog blog

CPS Soup: A functional intermediate language. Spritely, remote, 10 May 2023. blog slides

Whippet: A new GC for Guile. FOSDEM, Brussels, 4 Feb 2023. blog event slides

but wait, there’s more

Still here? The full talks archive will surely fill your cup.

by Andy Wingo at December 12, 2023 03:18 PM

GStreamerNew GStreamer Matrix chat space

(GStreamer)

Hello everyone,

As part of the ongoing effort to provide better tooling and services for the GStreamer community at wide, there is now a Matrix instance for GStreamer available at https://matrix.to/#/#community:gstreamer.org . You can join using a matrix client and a matrix account (the GStreamer instance does not provide account creation).

The decision comes after discussions that took place during the gstreamer conference/hackfest and discussions here : https://discourse.gstreamer.org/t/a-replacement-for-irc/151 . But the summary is that Matrix provides a modern chat system with features people expect in this day and age, while not being a closed silo.

There are already some rooms up:

Note that a client with support for spaces and threads is recommended (but not mandatory).

In terms of the OFTC IRC channel, it will remain bridged to matrix (2/3 users in #gstreamer on IRC are already accessing it via Matrix), but we recommend switching over to using the above matrix rooms since:

  • They provide the full matrix experience (history, threads, search, emojis, calls,…)
  • They offer more rooms than the (single) IRC room
  • Using the IRC bridge requires you to handle IRC commands if you wish to join/speak there (NickServ)
  • The IRC bridge isn’t maintained by the GStreamer community and we cannot guarantee it will always be around/up.

December 12, 2023 09:30 AM

December 10, 2023

Andy Wingoa simple hdr histogram

(Andy Wingo)

Good evening! This evening, a note on high-dynamic-range (HDR) histograms.

problem

How should one record garbage collector pause times?

A few options present themselves: you could just record the total pause time. Or, also record the total number of collections, which allows you to compute the average. Maximum is easy enough, too. But then you might also want the median or the p90 or the p99, and these percentile values are more gnarly: you either need to record all the pause times, which can itself become a memory leak, or you need to approximate via a histogram.

Let’s assume that you decide on the histogram approach. How should you compute the bins? It would be nice to have microsecond accuracy on the small end, but if you bin by microsecond you could end up having millions of bins, which is not what you want. On the other end you might have multi-second GC pauses, and you definitely want to be able to record those values.

Consider, though, that it’s not so important to have microsecond precision for a 2-second pause. This points in a direction of wanting bins that are relatively close to each other, but whose absolute separation can vary depending on whether we are measuring microseconds or milliseconds. You want approximately uniform precision over a high dynamic range.

logarithmic binning

The basic observation is that you should be able to make a histogram that gives you, say, 3 significant figures on measured values. Such a histogram would count anything between 1230 and 1240 in the same bin, and similarly for 12300 and 12400. The gap between bins increases as the number of digits grows.

Of course computers prefer base-2 numbers over base-10, so let’s do that. Say we are measuring nanoseconds, and the maximum number of seconds we expect is 100 or so. There are about 230 nanoseconds in a second, and 100 is a little less than 27, so that gives us a range of 37 bits. Let’s say we want a precision of 4 significant base-2 digits, or 4 bits; then we will have one set of 24 bins for 10-bit values, another for 11-bit values, and so-on, for a total of 37 × 24 bins, or 592 bins. If we use a 32-bit integer count per bin, such a histogram would be 2.5kB or so, which I think is acceptable.

Say you go to compute the bin for a value. Firstly, note that there are some values that do not have 4 significant bits: if you record a measurement of 1 nanosecond, presumably that is just 1 significant figure. These are like the denormals in floating-point numbers. Let’s just say that recording a value val in [0, 24-1] goes to bin val.

If val is 24 or more, then we compute the major and minor components. The major component is the number of bits needed to represent val, minus the 4 precision bits. We can define it like this in C, assuming that val is a 64-bit value:

#define max_value_bits 37
#define precision 4
uint64_t major = 64ULL - __builtin_clzl(val) - precision;

The 64 - __builtin_clzl(val) gives us the ceiling of the base-2 logarithm of the value. And actually, to take into account the denormal case, we do this instead:

uint64_t major = val < (1ULL << precision)
  ? 0ULL
  : 64ULL - __builtin_clzl(val) - precision;

Then to get the minor component, we right-shift val by major bits, unless it is a denormal, in which case the minor component is the value itself:

uint64_t minor = val < (1 << precision)
  ? val
  : (val >> (major - 1ULL)) & ((1ULL << precision) - 1ULL);

Then the histogram bucket for a given value can be computed directly:

uint64_t idx = (major << precision) | minor;

Of course, we would prefer to bound our storage, hence the consideration about 37 total bits in 100 seconds of nanoseconds. So let’s do that, and record any out-of-bounds value in the special last bucket, indicating that we need to expand our histogram next time:

if (idx >= (max_value_bits << precision))
  idx = max_value_bits << precision;

The histogram itself can then be defined simply as having enough buckets for all major and minor components in range, plus one for overflow:

struct histogram {
  uint32_t buckets[(max_value_bits << precision) + 1];
};

Getting back the lower bound for a bucket is similarly simple, again with a case for denormals:

uint64_t major = idx >> precision;
uint64_t minor = idx & ((1ULL << precision) - 1ULL);
uint64_t min_val = major
  ? ((1ULL << precision) | minor) << (major - 1ULL)
  : minor;

y u no library

How many lines of code does something need to be before you will include it as a library instead of re-implementing? If I am honest with myself, there is no such limit, as far as code goes at least: only a limit of time. I am happy to re-implement anything; time is my only enemy. But strategically speaking, time too is the fulcrum: if I can save time by re-implementing over integrating a library, I will certainly hack.

The canonical library in this domain is HdrHistogram. But even the C port is thousands of lines of code! A histogram should not take that much code! Hence this blog post today. I think what we have above is sufficient. HdrHistogram’s documentation speaks in terms of base-10 digits of precision, but that is easily translated to base-2 digits: if you want 0.1% precision, then in base-2 you’ll need 10 bits; no problem.

I should of course mention that HdrHistogram includes an API that compensates for coordinated omission, but I think such an API is straigtforward to build on top of the basics.

My code, for what it is worth, and which may indeed be buggy, is over here. But don’t re-use it: write your own. It could be much nicer in C++ or Rust, or any other language.

Finally, I would note that somehow this feels very basic; surely there is prior art? I feel like in 2003, Google would have had a better answer than today; alack. Pointers appreciated to other references, and if you find them, do tell me more about your search strategy, because mine is inadequate. Until then, gram you later!

by Andy Wingo at December 10, 2023 09:27 PM

December 08, 2023

Andy Wingov8's mark-sweep nursery

(Andy Wingo)

Today, a followup to yesterday’s note with some more details on V8’s new young-generation implementation, minor mark-sweep or MinorMS.

A caveat again: these observations are just from reading the code; I haven’t run these past the MinorMS authors yet, so any of these details might be misunderstandings.

The MinorMS nursery consists of pages, each of which is 256 kB, unless huge-page mode is on, in which case they are 2 MB. The total default size of the nursery is 72 MB by default, or 144 MB if pointer compression is off.

There can be multiple threads allocating into the nursery, but let’s focus on the main allocator, which is used on the main thread. Nursery allocation is bump-pointer, whether in a MinorMS page or scavenger semi-space. Bump-pointer regions are called linear allocation buffers, and often abbreviated as Lab in the source, though the class is LinearAllocationArea.

If the current bump-pointer region is too small for the current allocation, the nursery implementation finds another one, or triggers a collection. For the MinorMS nursery, each page collects the set of allocatable spans in a free list; if the free-list is non-empty, it pops off one entry as the current and tries again.

Otherwise, MinorMS needs another page, and specifically a swept page: a page which has been visited since the last GC, and whose spans of unused memory have been collected into a free-list. There is a concurrent sweeping task which should usually run ahead of the mutator, but if there is no swept page available, the allocator might need to sweep some. This logic is in MainAllocator::RefillLabMain.

Finally, if all pages are swept and there’s no Lab big enough for the current allocation, we trigger collection from the roots. The initial roots are the remembered set: pointers from old objects to new objects. Most of the trace happens concurrently with the mutator; when the nursery utilisation rises over 90%, V8 will kick off concurrent marking tasks.

Then once the mutator actually runs out of space, it pauses, drains any pending marking work, marks conservative roots, then drains again. I am not sure whether MinorMS with conservative stack scanning visits the whole C/C++ stacks or whether it manages to install some barriers (i.e. “don’t scan deeper than 5 frames because we collected then, and so all older frames are older”); dunno. All of this logic is in MinorMarkSweepCollector::MarkLiveObjects.

Marking traces the object graph, setting object mark bits. It does not trace pages. However, the MinorMS space promotes in units of pages. So how to decide what pages to promote? The answer is that sweeping partitions the MinorMS pages into empty, recycled, aging, and promoted pages.

Empty pages have no surviving objects, and are very useful because they can be given back to the operating system if needed or shuffled around elsewhere in the system. If they are re-used for allocation, they do not need to be swept.

Recycled pages have some survivors, but not many; MinorMS keeps the page around for allocation in the next cycle, because it has enough empty space. By default, a page is recyclable if it has 50% or more free space after a minor collection, or 30% after a major collection. MinorMS also promotes a page eagerly if in the last cycle, we only managed to allocate into 30% or less of its empty space, probably due to fragmentation. These pages need to be swept before re-use.

Finally, MinorMS doesn’t let pages be recycled indefinitely: after 4 minor cycles, a page goes into the aging pool, in which it is kept unavailable for allocation for one cycle, but is not yet promoted. This allows any new allocations on that page in the previous cycle age out and probably die, preventing premature tenuring.

And that’s it. Next time, a note on a way in which generational collectors can run out of memory. Have a nice weekend, hackfolk!

by Andy Wingo at December 08, 2023 02:34 PM

December 07, 2023

Andy Wingothe last 5 years of V8's garbage collector

(Andy Wingo)

Captain, status report: I’m down here in a Jeffries tube, poking at V8’s garbage collector. However, despite working on other areas of the project recently, V8 is now so large that it’s necessary to ignore whole subsystems when working on any given task. But now I’m looking at the GC in anger: what is its deal? What does V8’s GC even look like these days?

The last public article on the structure of V8’s garbage collector was in 2019; fine enough, but dated. Now in the evening of 2023 I think it could be useful to revisit it and try to summarize the changes since then. At least, it would have been useful to me had someone else written this article.

To my mind, work on V8’s GC has had three main goals over the last 5 years: improving interactions between the managed heap and C++, improving security, and increasing concurrency. Let’s visit these in turn.

C++ and GC

Building on the 2018 integration of the Oilpan tracing garbage collector into the Blink web engine, there was some refactoring to move the implementation of Oilpan into V8 itself. Oilpan is known internally as cppgc.

I find the cppgc name a bit annoying because I can never remember what it refers to, because of the other thing that has been happpening in C++ integration: a migration away from precise roots and instead towards conservative root-finding.

Some notes here: with conservative stack scanning, we can hope for better mutator throughput and fewer bugs. The throughput comes from not having to put all live pointers in memory; the compiler can keep them in registers, and avoid managing the HandleScope. You may be able to avoid the compile-time and space costs of stack maps (side tables telling the collector where the pointers are). There are also two classes of bug that we can avoid: holding on to a handle past the lifetime of a handlescope, and holding on to a raw pointer (instead of a handle) during a potential GC point.

Somewhat confusingly, it would seem that conservative stack scanning has garnered the acronym “CSS” inside V8. What does CSS have to do with GC?, I ask. I know the answer but my brain keeps asking the question.

In exchange for this goodness, conservative stack scanning means that because you can’t be sure that a word on the stack refers to an object and isn’t just a spicy integer, you can’t move objects that might be the target of a conservative root. And indeed the conservative edge might actually not point to the start of the object; it could be an interior pointer, which places additional constraints on the heap, that it be able to resolve internal pointers.

Security

Which brings us to security and the admirable nihilism of the sandbox effort. The idea is that everything is terrible, so why not just assume that no word is safe and that an attacker can modify any word they can address. The only way to limit the scope of an attacker’s modifications is then to limit the address space. This happens firstly by pointer compression, which happily also has some delightful speed and throughput benefits. Then the pointer cage is placed within a larger cage, and off-heap data such as Wasm memories and array buffers go in that larger cage. Any needed executable code or external object is accessed indirectly, through dedicated tables.

However, this indirection comes with a cost of a proliferation in the number of spaces. In the beginning, there was just an evacuating newspace, a mark-compact oldspace, and a non-moving large object space. Now there are closer to 20 spaces: a separate code space, a space for read-only objects, a space for trusted objects, a space for each kind of indirect descriptor used by the sandbox, in addition to spaces for objects that might be shared between threads, newspaces for many of the preceding kinds, and so on. From what I can see, managing this complexity has taken a significant effort. The result is pretty good to work with, but you pay for what you get. (Do you get security guarantees? I don’t know enough to say. Better pay some more to be sure.)

Finally, the C++ integration has also had an impact on the spaces structure, and with a security implication to boot. The thing is, conservative roots can’t be moved, but the original evacuating newspace required moveability. One can get around this restriction by pretenuring new allocations from C++ into the mark-compact space, but this would be a performance killer. The solution that V8 is going for is to use the block-structured mark-compact space that is already used for the old-space, but for new allocations. If an object is ever traced during a young-generation collection, its page will be promoted to the old generation, without copying. Originally called minor mark-compact or MinorMC in the commit logs, it was renamed to minor mark-sweep or MinorMS to indicate that it doesn’t actually compact. (V8’s mark-compact old-space doesn’t have to compact: V8 usually chooses to just mark in place. But we call it a mark-compact space because it has that capability.)

This last change is a performance hazard: yes, you keep the desirable bump-pointer allocation scheme for new allocations, but you lose on locality in the old generation, and the rate of promoted bytes will be higher than with the semi-space new-space. The only relief is that for a given new-space size, you can allocate twice as many objects, because you don’t need the copy reserve.

Why do I include this discussion in the security section? Well, because most MinorMS commits mention this locked bug. One day we’ll know, but not today. I speculate that evacuating is just too rich a bug farm, especially with concurrency and parallel mutators, and that never-moving collectors will have better security properties. But again, I don’t know for sure, and I prefer to preserve my ability to speculate rather than to ask for too many details.

Concurrency

Speaking of concurrency, ye gods, the last few years have been quite the ride I think. Every phase that can be done in parallel (multiple threads working together to perform GC work) is now fully parallel: semi-space evacuation, mark-space marking and compaction, and sweeping. Every phase that can be done concurrently (where the GC runs threads while the mutator is running) is concurrent: marking and sweeping. A major sweep task can run concurrently with an evacuating minor GC. And, V8 is preparing for multiple mutators running in parallel. It’s all a bit terrifying but again, with engineering investment and a huge farm of fuzzers, it seems to be a doable transition.

Concurrency and threads means that V8 has sprouted new schedulers: should a background task have incremental or concurrent marking? How many sweepers should a given isolate have? How should you pause concurrency when the engine needs to do something gnarly?

The latest in-progress work would appear to be concurrent marking of the new-space. I think we should expect this work to result in a lower overall pause-time, though I am curious also to learn more about the model: how precise is it? Does it allow a lot of slop to get promoted? It seems to have a black allocator, so there will be some slop, but perhaps it can avoid promotion for those pages. I don’t know yet.

Summary

Yeah, GCs, man. I find the move to a non-moving young generation is quite interesting and I wish the team luck as they whittle down the last sharp edges from the conservative-stack-scanning performance profile. The sandbox is pretty fun too. All good stuff and I look forward to spending a bit more time with it; engineering out.

by Andy Wingo at December 07, 2023 12:15 PM

December 05, 2023

Andy Wingocolophonwards

(Andy Wingo)

A brief meta-note this morning: for the first time in 20 years, I finally got around to updating the web design of wingolog.org recently and wanted to share a bit about that.

Back when I made the initial wingolog design, I was using the then-brand-new Wordpress, Internet Explorer 6 was the most common web browser, CSS wasn’t very good, the Safari browser had just made its first release, smartphones were yet to be invented, and everyone used low-resolution CRT screens. The original design did use CSS instead of tables, thankfully, but it was very narrow and left a lot up to the user agent (notably font choice and size).

These days you can do much better. Even HTML has moved on, with <article> and <aside> and <section> elements. CSS is powerful and interoperable, with grid layout and media queries and :has() and :is() and all kinds of fun selectors. And, we have web fonts.

I probably would have stuck with the old design if it were readable, but with pixel counts growing, the saturated red bands on the sides flooded the screen, leaving the reader feeling like they were driving into headlights in the rain.

Anyway, the new design is a bit more peaceful, I hope. Feedback welcome.

I’m using grid layout, but not in the way that I thought I would. From reading the documentation, I had the impression that the element with display: grid would be a kind of flexible corkboard which could be filled up by any child element. However, that’s not quite true: it only works for direct children, which means your HTML does have to match the needs of the grid. Grandchildren can take their rows and columns from grandparents via subgrid, but only really display inside themselves: you can’t pop a grandkid out to a grandparent grid area. (Or maybe you can! It’s a powerful facility and I don’t claim to fully understand it.)

Also, as far as I can tell there is no provision to fill up one grid area with multiple children. Whereas I thought that on the root page, each blog entry would end up in its own grid area, that’s just not the case: you put the <main> (another new element!) in a grid area and let it lay out itself. Fine enough.

I would love to have proper side-notes, and I thought the grid would do something for me there, but it seems that I have to wait for CSS anchor positioning. Until then you can use position: absolute tricks, but side-notes may overlap unless the source article is careful.

For fonts, I dearly wanted proper fonts, but I was always scared of the flash of invisible text. It turns out that with font-display: swap you can guarantee that the user can read text if for some reason your fonts fail to load, at the cost of a later layout shift when the fonts come in. At first I tried Bitstream Charter for the body typeface, but I was unable to nicely mix it with Fira Mono without line-heights getting all wonky: a <code> tag on a line would make that line too high. I tried all kinds of adjustments in the @font-face but finally decided to follow my heart and buy a font. Or two. And then sheepishly admit it to my spouse the next morning. You are reading this in Valkyrie, and the headings are Hermes Maia. I’m pretty happy with the result and I hope you are too. They are loaded from my server, to which the browser already has a TCP and TLS connection, so it would seem that the performance impact is minimal.

Part of getting performance was to inline my CSS file into the web pages produced by the blog software, allowing the browser to go ahead and lay things out as they should be without waiting on a chained secondary request to get the layout.

Finally, I did finally break down and teach my blog software’s marxdown parser about “smart quotes” and em dashes and en dashes. I couldn’t make this post in good faith without it; “the guy yammers on about web design and not only is he not a designer, he uses ugly quotes”, &c, &c...

Finally finally, some recommendations: I really enjoyed reading Erik Spiekermann’s Stop Stealing Sheep, 4th ed. on typography and type, which led to a raft of book purchases. Eric Meyer and Estelle Weyl’s CSS: The Definitive Guide was very useful for figuring out what is possible with CSS and how to make it happen. It’s a guide, though, and is not very opinionated; you might find Matthew Butterick’s Practical Typography to be useful if you are looking for pretty-good opinions about what to make.

Onwards and upwards!

by Andy Wingo at December 05, 2023 11:36 AM

November 29, 2023

Christian SchallerFedora Workstation 39 and beyond

(Christian Schaller)

I have not been so active for a while with writing these Fedora Workstation updates and part of the reason was that I felt I was beginning to repeat myself a lot, which I partly felt was a side effect of writing them so often, but with some time now since my last update I felt that time was ripe again. So what are some of the things we have been working on and what are our main targets going forward? This is not a exhaustive list, but hopefully items you find interesting. Apologize for weird sentences and potential spelling mistakes, but it ended up a a long post and when you read your own words over for the Nth time you start going blind to issues :)

PipeWire

PipeWire 1.0 is available! PipeWire keeps the Linux Multimedia revolution rolling[/caption]So lets start with one of your favorite topics, PipeWire. As you probably know PipeWire 1.0 is now out and I feel it is a project we definitely succeeded with, so big kudos to Wim Taymans for leading this effort. I think the fact that we got both the creator of JACK, Paul Davis and the creator of PulseAudio Lennart Poettering to endorse it means our goal of unifying the Linux audio landscape is being met. I include their endorsement comments from the PipeWire 1.0 release announcement here :

“PipeWire represents the next evolution of audio handling for Linux, taking
the best of both pro-audio (JACK) and desktop audio servers (PulseAudio) and
linking them into a single, seamless, powerful new system.”
– Paul Davis, JACK and Ardour author

“PipeWire is a worthy successor to PulseAudio, providing a feature set
closer to how modern audio hardware works, and with a security model
with today’s application concepts in mind. Version 1.0 marks a
major milestone in completing the adoption of PipeWire in the standard
set of Linux subsystems. Congratulations to the team!”
– Lennart Poettering, Pulseaudio and systemd author

So for new readers, PipeWire is a audio and video server we created for Fedora Workstation to replace PulseAudio for consumer audio, JACK for pro-audio and add similar functionality for video to your linux operating system. So instead of having to deal with two different sound server architectures users now just have to deal with one and at the same time they get the same advantages for video handling. Since PipeWire implemented both the PulseAudio API and the JACK API it is a drop in replacement for both of them without needing any changes to the audio applications built for those two sound servers. Wim Taymans alongside the amazing community that has grown around the project has been hard at work maturing PipeWire and adding any missing feature they could find that blocked anyone from moving to it from either PulseAudio and JACK. Wims personal focus recently has been on an IRQ based ALSA driver for PipeWire to be able to provide 100% performance parity with the old JACK server. So while a lot of Pro-audio users felt that PipeWire’s latency was already good enough, this work by Wim shaves of the last few milliseconds to reach the same level of latency as JACK itself had.

In parallel with the work on PipeWire the community and especially Collabora has been hard at work on the new 0.5 release of WirePlumber, the session manager which handles all policy issues for PipeWire. I know people often get a little confused about PipeWire vs WirePlumber, but think of it like this: PipeWire provides you the ability to output audio through a connected speaker, through a bluetooth headset, through an HDMI connection and so on, but it doesn’t provide any ‘smarts’ for how that happens. The smarts are instead provided by WirePlumber which then contains policies to decide where to route your audio or video, either based on user choice or through preset policies making the right choices automatically, like if you disconnect your USB speaker it will move the audio to your internal speaker instead. Anyway, WirePlumber 0.5 will be a major step forward for WirePlumber moving from using lua scripts for configuration to instead using JSON for configuration while retaining lua for scripting. This has many advantages, but I point you to this excellent blog post by Collabora’s Ashok Sidipotu for the details. Ashok got further details about WirePlumber 0.5 that you can find here.

With PipeWire 1.0 out the door I feel we are very close to reaching one of our initial goals with PipeWire, to remove the need for custom pro-audio distributions like Fedora JAM or Ubuntu Studio, and instead just let audio folks be able to use the same great Fedora Workstation as the rest of the world. With 1.0 done Wim plans next to look a bit at things like configuration tools and similar used by pro-audio folks and also dive into the Flatpak portal needs of pro-audio applications more, to ensure that Flatpaks + PipeWire is the future of pro-audio.

On the video handling side its been a little slow going since there applications need to be ported from relying directly on v4l. Jan Grulich has been working with our friends at Mozilla and Google to get PipeWire camera handling support into Firefox and Google Chrome. At the moment it looks like the Firefox support will land first, in fact Jan has set up a COPR that lets you try it out here. For tracking the upstream work in WebRTC to add PipeWire support Jan set up this tracker bug. Getting the web browsers to use PipeWire is important both to enable the advanced video routing capabilities of PipeWire, but it will also provide applications the ability to use libcamera which is a needed for new modern MIPI cameras to work properly under Linux.

Another important application to get PipeWire camera support into is OBS Studio and the great thing is that community member Georges Stavracas is working on getting the PipeWire patches merged into OBS Studio, hopefully in time for their planned release early next year. You can track Georges work in this pull request.

For more information about PipeWire 1.0 I recommend our interview with Wim Taymans in Fedora Magazine and also the interview with Wim on Linux Unplugged podcast.

HDR
HDRHDR, or High Dynamic Range, is another major effort for us. HDR is a technology I think many of you have become familiar with due to it becoming quite common in TVs these days. It basically provides for greatly increased color depth and luminescence on your screen. This is a change that entails a lot of changes through the stack, because when you introduce into an existing ecosystem like the Linux desktop you have to figure out how to combine both new HDR capable applications and content and old non-HDR applications and content. Sebastian Wick, Jonas Ådahl, Oliver Fourdan, Michel Daenzer and more on the team has been working with other members of the ecosystem from Intel, AMD, NVIDIA, Collabora and more to pick and define the standards and protocols needed in this space. A lot of design work was done early in the year so we been quite focused on implementation work across the drivers, Wayland, Mesa, GStreamer, Mutter, GTK+ and more. Some of the more basic scenarios, like running a fullscreen HDR application is close to be ready, while we are still working hard on getting all the needed pieces together for the more complex scenarios like running SDR and HDR windows composited together on your desktop. So getting for instance full screen games to run in HDR mode with Steam should happen shortly, but the windowed support will probably land closer to summer next year.

Wayland remoting
One feature we been also spending a lot of time on is enabling remote logins to a Wayland desktop. You have been able to share your screen under Wayland more or less from day one, but it required your desktop session to be already active. But lets say you wanted to access your Wayland desktop running on a headless system you been out of luck so far and had to rely on the old X session instead. So putting in place all the pieces for this has been quite an undertaking with work having been done on PipeWire, on Wayland portals, gnome remote desktop daemon, libei; the new input emulation library, gdm and more. The pieces needed are finally falling into place and we expect to have everything needed landed in time for GNOME 46. This support is currently done using a private GNOME API, but a vendor less API is being worked on to replace it.

As a sidenote here not directly related to desktop remoting, but libei has also enabled us to bring xtest support to XWayland which was important for various applications including Valves gamescope.

NVIDIA drivers
One area we keep investing in is improving the state of NVIDIA support on Linux. This comes both in the form of being the main company backing the continued development of the Nouveau graphics driver. So the challenge with Nouveau is that for the longest while it offered next to no hardware acceleration for 3D graphics. The reason for this was that the firmware that NVIDIA provided for Nouveau to use didn’t expose that functionality and since recent generations of NVIDIA cards only works with firmware signed by NVIDIA this left us stuck. So Nouveau was a good tool for doing an initial install of a system, but if you where doing any kind of serious 3D acceleration, including playing games, then you would need to install the NVIDIA binary driver. So in the last year that landscape around that has changed drastically, with the release of the new out-of-tree open source driver from NVIDIA. Alongside that driver a new firmware has also been made available , one that do provide full support for hardware acceleration.
Let me quickly inject a quick explanation of out-of-tree versus in-tree drivers here. An in-tree driver is basically a kernel driver for a piece of hardware that has been merged into the official Linux kernel from Linus Torvalds and is thus being maintained as part of the official Linux kernel releases. This ensures that the driver integrates well with the rest of the Linux kernel and that it gets updated in sync with the rest of the Linux kernel. So Nouveau is an in-tree kernel driver which also integrates with the rest of the open source graphics stack, like Mesa. The new NVIDIA open source driver is an out-of-tree driver which ships as a separate source code release on its own schedule, but of course NVIDIA works to keeps it working with the upstream kernel releases (which is a lot of work of course and thus considered a major downside to being an out of tree driver).

As of the time of writing this blog post NVIDIAs out-of-tree kernel driver and firmware is still a work in progress for display usercases, but that is changing with NVIDIA exposing more and more display features in the driver (and the firmware) with each new release they do. But if you saw the original announcement of the new open source driver from NVIDIA and have been wondering why no distribution relies on it yet, this is why. So what does this mean for Nouveau? Well our plan is to keep supporting Nouveau for the foreseeable future because it is an in-tree driver, which is a lot easier to ensure keeps working with each new upstream kernel release.

At the same time the new firmware updates allows Nouveau to eventually offer performance levels competitive with the official out-of-tree driver, kind of how the open source AMD driver with MESA offers comparable performance to AMD binary GPU driver userspace. So Nouvea maintainer Ben Skeggs spent the last year working hard on refactoring Nouveau to work with the new firmware and we now have a new release of Nouveau out showing the fruits of that labor, enabling support for NVIDIAs latest chipset. Over time we will have it cover more chipset and expand Vulkan and OpenGL (using Zink) support to be a full fledged accelerated graphics driver.
So some news here, Ben after having worked tirelessly on keeping Nouveau afloat for so many years decided he needed a change of pace and thus decided to leave software development behind for the time being. A big thank you to Ben from all us at Red Hat and Fedora ! The good news is that Danilo Krummrich will take over as the development lead, with Lyude Paul taking on working on the Display side specifically of the driver. We also expect to have other members of the team chipping in too. They will pick up Bens work and continue working with NVIDIA and the community on a bright future for Nouveau.

So as I mentioned though the new open source driver from NVIDIA is still being matured for the display usercase and until it works fully as a display driver neither will Nouveau be able to be a full alternative since they share the same firmware. So people will need to rely on the binary NVIDIA Driver for some time still. One thing we are looking at there and discussing is if there are ways for us to improve the experience of using that binary driver with Secure Boot enabled. Atm that requires quite a bit of manual fiddling with tools like mokutils, but we have some ideas on how to streamline that a bit, but it is a hard nut to solve due to a combination of policy issues, legal issues, security issues and hardware/UEFI bugs so I am making no promises at this point, just a promise that it is something we are looking at.

Accessibility
Accessibility is an important feature for us in Fedora Workstation and thus we hired Lukáš Tyrychtr to focus on the issue. Lukáš has been working through across the stack fixing issues blocking proper accessibility support in Fedora Workstation and also participated in various accessibility related events. There is still a lot to do there so I was very happy to hear recently that the GNOME Foundation got a million Euro sponsorship from the Sovereign Tech Fund to improve various things across the stack, especially improving accessibility. So the combination of Lukáš continued efforts and that new investment should make for a much improved accessibility experience in GNOME and in Fedora Workstation going forward.

GNOME Software
Another area that we keep investing in is improving GNOME Software, with Milan Crha working continuously on bugfixing and performance improvements. GNOME Software is actually a fairly complex piece of software as it has to be able to handle the installation and updating of RPMS, OSTree system images, Flatpaks, fonts and firmware for us in addition to the formats it handles for other distributions. For some time it felt was GNOME Software was struggling with the load of all those different formats and usercases and was becoming both slow and with a lot of error messages. Milan has been spending a lot of time dealing with those issues one by one and also recently landed some major performance improvements making the GNOME Software experience a lot better. One major change that Milan is working on that I think we will be able to land in Fedora Workstation 40/41 is porting GNOME Software to use DNF5. The main improvement end users will probably notice is that it unifies the caches used for GNOME Software and using dnf on the command line, saving you storage space and also ensuring the two are fully in sync on what RPMS is installed/updated at any given time.

Fedora and Flatpaks

Flatpaks is another key element of our strategy for moving the Linux desktop forward and as part of that we have now enabled all of Flathub to be available if you choose to enable 3rd party repositories when you install Fedora Workstation. This means that the huge universe of applications available on Flathub will be easy to install through GNOME Software alongside the content available in Fedora’s own repositories. That said we have also spent time improving the ease of making Fedora Flatpaks. Owen Taylor jumped in and removed the dependency on a technology called ‘modularity‘ which was initially introduced to Fedora to bring new features around having different types of content and ease keeping containers up to date. Unfortunately it did not work out as intended and instead it became something that everyone just felt made things a lot more complicated, including building Flatpaks from Fedora content. With Owens updates building Flatpaks in Fedora has become a lot simpler and should help energize the effort building Flatpaks in Fedora.

Toolbx
As we continue marching towards a vision for Fedora Workstation to be a highly robust operating we keep evolving Toolbx. Our tool for making running your development environment(s) inside a container and thus allows you to both keep your host OS pristine and up to date, while at the same time using specific toolchains and tools inside the development container. This is a hard requirement for immutable operating systems such as Fedora Silverblue or Universal blue, but it is also useful on operating systems like Fedora Workstation as a way to do development for other platforms, like for instance Red Hat Enterprise Linux.

A major focus for Toolbx since the inception is to get it a stage where it is robust and reliable. So for instance while we prototyped it as a shell script, today it is written in Go to be more maintainable and also to confirm with the rest of the container ecosystem. A recent major step forward for getting that stability there is that starting with Fedora 39, the toolbox image is now a release blocking deliverable. This means it is now built as part of the nightly compose and the whole Toolbx stack (ie. the fedora-toolbox image and the toolbox RPM) is part of the release-blocking test criteria. This shows the level of importance we put on Toolbx as the future of Linux software development and its criticality to Fedora Workstation. Earlier, we built the fedora-toobox image as a somewhat separate and standalone thing, and people interested in Toolbx would try to test and keep the whole thing working, as much as possible, on their own. This was becoming unmanageable because Toolbx integrates with many parts of the distribution from Mutter (ie, the Wayland and X sockets) to Kerberos to RPM (ie., %_netsharedpath in /usr/lib/rpm/macros.d/macros.toolbox) to glibc locale definitions and translations. The list of things that could change elsewhere in Fedora, and end up breaking Toolbx, was growing too large for a small group of Toolbx contributors to keep track of.

We the next release we now also have built-in support for Arch Linux and Ubuntu through the –distro flag in toolbox.git main, thanks again to the community contributors who worked with us on this allowing us to widen the amount of distros supported while keeping with our policy of reliability and dependability. And along the same theme of ensuring Toolbx is a tool developers can rely on we have added lots and lots of new tests. We now have more than 280 tests that run on CentOS Stream 9, all supported Fedoras and Rawhide, and Ubuntu 22.04.

Another feature that Toolbx maintainer Debarshi Ray put a lot of effort into is setting up full RHEL containers in Toolbx on top of Fedora. Today, thanks to Debarshi work you do subscription-manager register --username user@domain.name on the Fedora or RHEL host, and the container is automatically entitled to RHEL content. We are still looking at how we can provide a graphical interface for that process or at least how to polish up the CLI for doing subscription-manager register. If you are interested in this feature, Debarshi provides a full breakdown here.

Other nice to haves added is support for enterprise FreeIPA set-ups, where the user logs into their machine through Kerberos and support for automatically generated shell completions for Bash, fish and Z shell.

Flatpak and Foreman & Katello
For those out there using Foreman to manage your fleet of Linux installs we have some good news. We are in the process of implementing support for Flatpaks in these tools so that you can manage and deploy applications in the Flatpak format using them. This is still a work in progress, but relevant Pulp and Katello commits are Pulp commit Support for Flatpak index endpoints and Katello commits Reporting results of docker v2 repo discovery” and Support Link header in docker v2 repo discovery“.

LVFS
Another effort that Fedora Workstation has brought to the world of Linux and that is very popular arethe LVFS and fwdup formware update repository and tools. Thanks to that effort we are soon going to be passing one hundred million firmware updates on Linux devices soon! These firmware updates has helped resolve countless bugs and much improved security for Linux users.

But we are not slowing down. Richard Hughes worked with industry partners this year to define a Bill of Materials defintion to firmware updates allowing usings to be better informed on what is included in their firmware updates.

We now support over 1400 different devices on the LVFS (covering 78 different protocols!), with over 8000 public firmware versions (image below) from over 150 OEMs and ODMs. We’ve now done over 100,000 static analysis tests on over 2,000,000 EFI binaries in the firmware capsules!

Some examples of recently added hardware:
* AMD dGPUs, Navi3x and above, AVer FONE540, Belkin Thunderbolt 4 Core Hub dock, CE-LINK TB4 Docks,CH347 SPI programmer, EPOS ADAPT 1×5, Fibocom FM101, Foxconn T99W373, SDX12, SDX55 and SDX6X devices, Genesys GL32XX SD readers, GL352350, GL3590, GL3525S and GL3525 USB hubs, Goodix Touch controllers, HP Rata/Remi BLE Mice, Intel USB-4 retimers, Jabra Evolve 65e/t and SE, Evolve2, Speak2 and Link devices, Logitech Huddle, Rally System and Tap devices, Luxshare Quad USB4 Dock, MediaTek DP AUX Scalers, Microsoft USB-C Travel Hub, More Logitech Unifying receivers, More PixartRF HPAC devices, More Synaptics Prometheus fingerprint readers, Nordic HID devices, nRF52 Desktop Keyboard, PixArt BLE HPAC OTA, Quectel EM160 and RM520, Some Western Digital eMMC devices, Star Labs StarBook Mk VIr2, Synaptics Triton devices, System76 Launch 3, Launch Heavy 3 and Thelio IO 2, TUXEDO InfinityBook Pro 13 v3, VIA VL122, VL817S, VL822T, VL830 and VL832, Wacom Cintiq Pro 27, DTH134 and DTC121, One 13 and One 12 Tablets

InputLeap on Wayland
One really interesting feature that landed for Fedora Workstation 39 was the support for InputLeap. It’s probably not on most peoples radar, but it’s an important feature for system administrators, developers and generally anyone with more than a single computer on their desk.

Historically, InputLeap is a fork of Barrier which itself was a fork of Synergy, it allows to share the same input devices (mouse, keyboard) across different computers (Linux, Windows, MacOS) and to move the pointer between the screens of these computers seamlessly as if they were one.

InputLeap has a client/server architecture with the server running on the main host (the one with the keyboard and mouse connected) and multiple clients, the other machines sitting next to the server machine. That implies two things, the InputLeap daemon on the server must be able to “capture” all the input events to forward them to the remote clients when the pointer reaches the edge of the screen, and the InputLeap client must be able to “replay” those input events on the client host to make it as if the keyboard and mouse were connected directly to the (other) computer. Historically, that relied on X11 mechanisms and neither InputLeap (nor Barrier or even Synergy as a matter of fact) would work on Wayland.

This is one of the use cases that Peter Hutterer had in mind when he started libEI, a low-level library aimed at providing a separate communication channel for input emulation in Wayland compositors and clients (even though libEI is not strictly tied to Wayland). But libEI alone is far from being sufficient to implement InputLeap features, with Wayland we had the opportunity to make things more secure than X11 and take benefit from the XDG portal mechanisms.

On the client side, for replaying input events, it’s similar to remote desktop but we needed to update the existing RemoteDesktop portal to pass the libEI socket. On the server side, it required a brand new portal for input capture . These also required their counterparts in the GNOME portal, for both RemoteDesktop and InputCapture [8], and of course, all that needs to be supported by the Wayland compositor, in the case of GNOME that’s mutter. That alone was a lot of work.

Yet, even with all that in place, that’s just the basic requirements to support a Synergy/Barrier/InputLeap-like feature, the tools in question need to have support for the portal and libEI implemented to benefit from the mechanisms we’ve put in place and for the all feature to work and be usable. So libportal was also updated to support the new portal features and a new “Wayland” backend alongside the X11, Windows and Mac OS backends was contributed to InputLeap.

The merge request in InputLeap was accepted very early, even before the libEI API was completely stabilized and before the rest of the stack was merged, which I believe was a courageous choice from Povilas (who maintains InputLeap) which helped reduce the time to have the feature actually working, considering the number of components and inter-dependencies involved. Of course, there are still features missing in the Wayland backend, like copy/pasting between hosts, but a clipboard interface was fairly recently added to the remote desktop portal and therefore could be used by InputLeap to implement that feature.

Fun fact, Xwayland also grew support for libEI also using the remote desktop portal and wires that to the XTEST extension on X11 that InputLeap’s X11 backend uses, so it might even be possible to use the X11 backend of InputLeap in the client side through Xwayland, but of course it’s better to use the Wayland backend on both the client and server sides.

InputLeap is a great example of collaboration between multiple parties upstream including key contributions from us at Red Hat to implement and contribute a feature that has been requested for years upstream..

Thank you to Olivier Fourdan, Debarshi Ray, Richard Hughes, Sebastian Wick and Jonas Ådahl for their contributions to this blog post.

by uraeus at November 29, 2023 05:55 PM

November 24, 2023

Andy Wingotree-shaking, the horticulturally misguided algorithm

(Andy Wingo)

Let’s talk about tree-shaking!

looking up from the trough

But first, I need to talk about WebAssembly’s dirty secret: despite the hype, WebAssembly has had limited success on the web.

There is Photoshop, which does appear to be a real success. 5 years ago there was Figma, though they don’t talk much about Wasm these days. There are quite a number of little NPM libraries that use Wasm under the hood, usually compiled from C++ or Rust. I think Blazor probably gets used for a few in-house corporate apps, though I could be fooled by their marketing.

You might recall the hyped demos of 3D first-person-shooter games with Unreal engine again from 5 years ago, but that was the previous major release of Unreal and was always experimental; the current Unreal 5 does not support targetting WebAssembly.

Don’t get me wrong, I think WebAssembly is great. It is having fine success in off-the-web environments, and I think it is going to be a key and growing part of the Web platform. I suspect, though, that we are only just now getting past the trough of disillusionment.

It’s worth reflecting a bit on the nature of web Wasm’s successes and failures. Taking Photoshop as an example, I think we can say that Wasm does very well at bringing large C++ programs to the web. I know that it took quite some work, but I understand the end result to be essentially the same source code, just compiled for a different target.

Similarly for the JavaScript module case, Wasm finds success in getting legacy C++ code to the web, and as a way to write new web-targetting Rust code. These are often tasks that JavaScript doesn’t do very well at, or which need a shared implementation between client and server deployments.

On the other hand, WebAssembly has not been a Web success for DOM-heavy apps. Nobody is talking about rewriting the front-end of wordpress.com in Wasm, for example. Why is that? It may sound like a silly question to you: Wasm just isn’t good at that stuff. But why? If you dig down a bit, I think it’s that the programming models are just too different: the Web’s primary programming model is JavaScript, a language with dynamic typing and managed memory, whereas WebAssembly 1.0 was about static typing and linear memory. Getting to the DOM from Wasm was a hassle that was overcome only by the most ardent of the true Wasm faithful.

Relatedly, Wasm has also not really been a success for languages that aren’t, like, C or Rust. I am guessing that wordpress.com isn’t written mostly in C++. One of the sticking points for this class of language. is that C#, for example, will want to ship with a garbage collector, and that it is annoying to have to do this. Check my article from March this year for more details.

Happily, this restriction is going away, as all browsers are going to ship support for reference types and garbage collection within the next months; Chrome and Firefox already ship Wasm GC, and Safari shouldn’t be far behind thanks to the efforts from my colleague Asumu Takikawa. This is an extraordinarily exciting development that I think will kick off a whole ‘nother Gartner hype cycle, as more languages start to update their toolchains to support WebAssembly.

if you don’t like my peaches

Which brings us to the meat of today’s note: web Wasm will win where compilers create compact code. If your language’s compiler toolchain can manage to produce useful Wasm in a file that is less than a handful of over-the-wire kilobytes, you can win. If your compiler can’t do that yet, you will have to instead rely on hype and captured audiences for adoption, which at best results in an unstable equilibrium until you figure out what’s next.

In the JavaScript world, managing bloat and deliverable size is a huge industry. Bundlers like esbuild are a ubiquitous part of the toolchain, compiling down a set of JS modules to a single file that should include only those functions and data types that are used in a program, and additionally applying domain-specific size-squishing strategies such as minification (making monikers more minuscule).

Let’s focus on tree-shaking. The visual metaphor is that you write a bunch of code, and you only need some of it for any given page. So you imagine a tree whose, um, branches are the modules that you use, and whose leaves are the individual definitions in the modules, and you then violently shake the tree, probably killing it and also annoying any nesting birds. The only thing that’s left still attached is what is actually needed.

This isn’t how trees work: holding the trunk doesn’t give you information as to which branches are somehow necessary for the tree’s mission. It also primes your mind to look for the wrong fixed point, removing unneeded code instead of keeping only the necessary code.

But, tree-shaking is an evocative name, and so despite its horticultural and algorithmic inaccuracies, we will stick to it.

The thing is that maximal tree-shaking for languages with a thicker run-time has not been a huge priority. Consider Go: according to the golang wiki, the most trivial program compiled to WebAssembly from Go is 2 megabytes, and adding imports can make this go to 10 megabytes or more. Or look at Pyodide, the Python WebAssembly port: the REPL example downloads about 20 megabytes of data. These are fine sizes for technology demos or, in the limit, very rich applications, but they aren’t winners for web development.

shake a different tree

To be fair, both the built-in Wasm support for Go and the Pyodide port of Python both derive from the upstream toolchains, where producing small binaries is nice but not necessary: on a server, who cares how big the app is? And indeed when targetting smaller devices, we tend to see alternate implementations of the toolchain, for example MicroPython or TinyGo. TinyGo has a Wasm back-end that can apparently go down to less than a kilobyte, even!

These alternate toolchains often come with some restrictions or peculiarities, and although we can consider this to be an evil of sorts, it is to be expected that the target platform exhibits some co-design feedback on the language. In particular, running in the sea of the DOM is sufficiently weird that a Wasm-targetting Python program will necessarily be different than a “native” Python program. Still, I think as toolchain authors we aim to provide the same language, albeit possibly with a different implementation of the standard library. I am sure that the ClojureScript developers would prefer to remove their page documenting the differences with Clojure if they could, and perhaps if Wasm becomes a viable target for Clojurescript, they will.

on the algorithm

To recap: now that it supports GC, Wasm could be a winner for web development in Python and other languages. You would need a different toolchain and an effective tree-shaking algorithm, so that user experience does not degrade. So let’s talk about tree shaking!

I work on the Hoot Scheme compiler, which targets Wasm with GC. We manage to get down to 70 kB or so right now, in the minimal “main” compilation unit, and are aiming for lower; auxiliary compilation units that import run-time facilities (the current exception handler and so on) from the main module can be sub-kilobyte. Getting here has been tricky though, and I think it would be even trickier for Python.

Some background: like Whiffle, the Hoot compiler prepends a prelude onto user code. Tree-shakind happens in a number of places:

Generally speaking, procedure definitions (functions / closures) are the easy part: you just include only those functions that are referenced by the code. In a language like Scheme, this gets you a long way.

However there are three immediate challenges. One is that the evaluation model for the definitions in the prelude is letrec*: the scope is recursive but ordered. Binding values can call or refer to previously defined values, or capture values defined later. If evaluating the value of a binding requires referring to a value only defined later, then that’s an error. Again, for procedures this is trivially OK, but as soon as you have non-procedure definitions, sometimes the compiler won’t be able to prove this nice “only refers to earlier bindings” property. In that case the fixing letrec (reloaded) algorithm will end up residualizing bindings that are set!, which of all the tree-shaking passes above require the delicate DCE pass to remove them.

Worse, some of those non-procedure definitions are record types, which have vtables that define how to print a record, how to check if a value is an instance of this record, and so on. These vtable callbacks can end up keeping a lot more code alive even if they are never used. We’ll get back to this later.

Similarly, say you print a string via display. Well now not only are you bringing in the whole buffered I/O facility, but you are also calling a highly polymorphic function: display can print anything. There’s a case for bitvectors, so you pull in code for bitvectors. There’s a case for pairs, so you pull in that code too. And so on.

One solution is to instead call write-string, which only writes strings and not general data. You’ll still get the generic buffered I/O facility (ports), though, even if your program only uses one kind of port.

This brings me to my next point, which is that optimal tree-shaking is a flow analysis problem. Consider display: if we know that a program will never have bitvectors, then any code in display that works on bitvectors is dead and we can fold the branches that guard it. But to know this, we have to know what kind of arguments display is called with, and for that we need higher-level flow analysis.

The problem is exacerbated for Python in a few ways. One, because object-oriented dispatch is higher-order programming. How do you know what foo.bar actually means? Depends on foo, which means you have to thread around representations of what foo might be everywhere and to everywhere’s caller and everywhere’s caller’s caller and so on.

Secondly, lookup in Python is generally more dynamic than in Scheme: you have __getattr__ methods (is that it?; been a while since I’ve done Python) everywhere and users might indeed use them. Maybe this is not so bad in practice and flow analysis can exclude this kind of dynamic lookup.

Finally, and perhaps relatedly, the object of tree-shaking in Python is a mess of modules, rather than a big term with lexical bindings. This is like JavaScript, but without the established ecosystem of tree-shaking bundlers; Python has its work cut out for some years to go.

in short

With GC, Wasm makes it thinkable to do DOM programming in languages other than JavaScript. It will only be feasible for mass use, though, if the resulting Wasm modules are small, and that means significant investment on each language’s toolchain. Often this will take the form of alternate toolchains that incorporate experimental tree-shaking algorithms, and whose alternate standard libraries facilitate the tree-shaker.

Welp, I’m off to lunch. Happy wassembling, comrades!

by Andy Wingo at November 24, 2023 11:41 AM

November 20, 2023

Michael SheldonPied Beta

(Michael Sheldon)

For the past couple of months I’ve been working on Pied, an application that makes it easy to use modern, natural sounding, text-to-speech voices on Linux. It does this by integrating the Piper neural text-to-speech engine with speech-dispatcher, so most existing software will work with it out of the box.

The first beta version is now available in the snap store:

Get it from the Snap Store

(Other package formats will follow)

I’d appreciate any feedback if you’re able to test it, thanks!

by Mike at November 20, 2023 09:11 PM

November 16, 2023

Andy Wingoa whiff of whiffle

(Andy Wingo)

A couple nights ago I wrote about a superfluous Scheme implementation and promised to move on from sheepishly justifying my egregious behavior in my next note, and finally mention some results from this experiment. Well, no: I am back on my bullshit. Tonight I write about a couple of implementation details that discerning readers may find of interest: value representation, the tail call issue, and the standard library.

what is a value?

As a Lisp, Scheme is one of the early “dynamically typed” languages. These days when you say “type”, people immediately think propositions as types, mechanized proof of program properties, and so on. But “type” has another denotation which is all about values and almost not at all about terms: one might say that vector-ref has a type, but it’s not part of a proof; it’s just that if you try to vector-ref a pair instead of a vector, you get a run-time error. You can imagine values as being associated with type tags: annotations that can be inspected at run-time for, for example, the sort of error that vector-ref will throw if you call it on a pair.

Scheme systems usually have a finite set of type tags: there are fixnums, booleans, strings, pairs, symbols, and such, and they all have their own tag. Even a Scheme system that provides facilities for defining new disjoint types (define-record-type et al) will implement these via a secondary type tag layer: for example that all record instances are have the same primary tag, and that you have to retrieve their record type descriptor to discriminate instances of different record types.

Anyway. In Whiffle there are immediate types and heap types. All values have a low-bit tag which is zero for heap objects and nonzero for immediates. For heap objects, the first word of the heap object has tagging in the low byte as well. The 3-bit heap tag for pairs is chosen so that pairs can just be two words, with no header word. There is another 3-bit heap tag for forwarded objects, which is used but the GC when evacuating a value. Other objects put their heap tags in the low 8 bits of the first word. Additionally there is a “busy” tag word value, used to prevent races when evacuating from multiple threads.

Finally, for generational collection of objects that can be “large” – the definition of large depends on the collector implementation, and is not nicely documented, but is more than, like, 256 bytes – anyway these objects might need to have space for a “remembered” bit in the object themselves. This is not the case for pairs but is the case for, say, vectors: even though they are prolly smol, they might not be, and they need space for a remembered bit in the header.

tail calls

When I started Whiffle, I thought, let’s just compile each Scheme function to a C function. Since all functions have the same type, clang and gcc will have no problem turning any tail call into a proper tail call.

This intuition was right and wrong: at optimization level -O2, this works great. We don’t even do any kind of loop recognition / contification: loop iterations are tail calls and all is fine. (Not the most optimal implementation technique, but the assumption is that for our test cases, GC costs will dominate.)

However, when something goes wrong, I will need to debug the program to see what’s up, and so you might think to compile at -O0 or -Og. In that case, somehow gcc does not compile to tail calls. One time while debugging a program I was flummoxed at a segfault during the call instruction; turns out it was just stack overflow, and the call was trying to write the return address into an unmapped page. For clang, I could use the musttail attribute; perhaps I should, to allow myself to debug properly.

Not being able to debug at -O0 with gcc is annoying. I feel like if GNU were an actual thing, we would have had the equivalent of a musttail attribute 20 years ago already. But it’s not, and we still don’t.

stdlib

So Whiffle makes C, and that C uses some primitives defined as inline functions. Whiffle actually lexically embeds user Scheme code with a prelude, having exposed a set of primitives to that prelude and to user code. The assumption is that the compiler will open-code all primitives, so that the conceit of providing a primitive from the Guile compilation host to the Whiffle guest magically works out, and that any reference to a free variable is an error. This works well enough, and it’s similar to what we currently do in Hoot as well.

This is a quick and dirty strategy but it does let us grow the language to something worth using. I think I’ll come back to this local maximum later if I manage to write about what Hoot does with modules.

coda

So, that’s Whiffle: the Guile compiler front-end for Scheme, applied to an expression that prepends a user’s program with a prelude, in a lexical context of a limited set of primitives, compiling to very simple C, in which tail calls are just return f(...), relying on the C compiler to inline and optimize and all that.

Perhaps next up: some results on using Whiffle to test Whippet. Until then, good night!

by Andy Wingo at November 16, 2023 09:11 PM

November 14, 2023

Andy Wingowhiffle, a purpose-built scheme

(Andy Wingo)

Yesterday I promised an apology but didn’t actually get past the admission of guilt. Today the defendant takes the stand, in the hope that an awkward cross-examination will persuade the jury to take pity on a poor misguided soul.

Which is to say, let’s talk about Whiffle: what it actually is, what it is doing for me, and why on earth it is that [I tell myself that] writing a new programming language implementation is somehow preferable than re-using an existing one.

graphic designgarbage collection is my passion

Whiffle is purpose-built to test the Whippet garbage collection library.

Whiffle lets me create Whippet test cases in C, without actually writing C. C is fine and all, but the problem with it and garbage collection is that you have to track all stack roots manually, and this is an error-prone process. Generating C means that I can more easily ensure that each stack root is visitable by the GC, which lets me make test cases with more confidence; if there is a bug, it is probably not because of an untraced root.

Also, Whippet is mostly meant for programming language runtimes, not for direct use by application authors. In this use-case, probably you can use less “active” mechanisms for ensuring root traceability: instead of eagerly recording live values in some kind of handlescope, you can keep a side table that is only consulted as needed during garbage collection pauses. In particular since Scheme uses the stack as a data structure, I was worried that using handle scopes would somehow distort the performance characteristics of the benchmarks.

Whiffle is not, however, a high-performance Scheme compiler. It is not for number-crunching, for example: garbage collectors don’t care about that, so let’s not. Also, Whiffle should not go to any effort to remove allocations (sroa / gvn / cse); creating nodes in the heap is the purpose of the test case, and eliding them via compiler heroics doesn’t help us test the GC.

I settled on a baseline-style compiler, in which I re-use the Scheme front-end from Guile to expand macros and create an abstract syntax tree. I do run some optimizations on that AST; in the spirit of the macro writer’s bill of rights, it does make sense to provide some basic reductions. (These reductions can be surprising, but I am used to the Guile’s flavor of cp0 (peval), and this project is mostly for me, so I thought it was risk-free; I was almost right!).

Anyway the result is that Whiffle uses an explicit stack. A safepoint for a thread simply records its stack pointer: everything between the stack base and the stack pointer is live. I do have a lingering doubt about the representativity of this compilation strategy; would a conclusion drawn from Whippet apply to Guile, which uses a different stack allocation strategy? I think probably so but it’s an unknown.

what’s not to love

Whiffle also has a number of design goals that are better formulated in the negative. I mentioned compiler heroics as one thing to avoid, and in general the desire for a well-understood correspondence between source code and run-time behavior has a number of other corrolaries: Whiffle is a pure ahead-of-time (AOT) compiler, as just-in-time (JIT) compilation adds noise. Worse, speculative JIT would add unpredictability, which while good on the whole would be anathema to understanding an isolated piece of a system like the GC.

Whiffle also should produce stand-alone C files, without a thick run-time. I need to be able to understand and reason about the residual C programs, and depending on third-party libraries would hinder this goal.

Oddly enough, users are also an anti-goal: as a compiler that only exists to test a specific GC library, there is no sense in spending too much time making Whiffle nicer for other humans, humans whose goal is surely not just to test Whippet. Whiffle is an interesting object, but is not meant for actual use or users.

corners: cut

Another anti-goal is completeness with regards to any specific language standard: the point is to test a GC, not to make a useful Scheme. Therefore Whippet gets by just fine without flonums, fractions, continuations (delimited or otherwise), multiple return values, ports, or indeed any library support at all. All of that just doesn’t matter for testing a GC.

That said, it has been useful to be able to import standard Scheme garbage collection benchmarks, such as earley or nboyer. These have required very few modifications to run in Whippet, mostly related to Whippet’s test harness that wants to spawn multiple threads.

and so?

I think this evening we have elaborated a bit more about the “how”, complementing yesterday’s note about the “what”. Tomorrow (?) I’ll see if I can dig in more to the “why”: what questions does Whiffle let me ask of Whippet, and how good of a job does it do at getting those answers? Until then, may all your roots be traced, and happy hacking.

by Andy Wingo at November 14, 2023 10:10 PM

November 13, 2023

Andy Wingoi accidentally a scheme

(Andy Wingo)

Good evening, dear hackfriends. Tonight’s missive is an apology: not quite in the sense of expiation, though not quite not that, either; rather, apology in the sense of explanation, of exegesis: apologia. See, I accidentally made a Scheme. I know I have enough Scheme implementations already, but I went and made another one. It’s for a maybe good reason, though!

one does not simply a scheme

I feel like we should make this the decade of leaning into your problems, and I have a Scheme problem, so here we are. See, I co-maintain Guile, and have been noodling on a new garbage collector (GC) for Guile, Whippet. Whippet is designed to be embedded in the project that uses it, so one day I hope it will be just copied into Guile’s source tree, replacing the venerable BDW-GC that we currently use.

The thing is, though, that GC implementations are complicated. A bug in a GC usually manifests itself far away in time and space from the code that caused the bug. Language implementations are also complicated, for similar reasons. Swapping one GC for another is something to be done very carefully. This is even more the case when the switching cost is high, which is the case with BDW-GC: as a collector written as a library to link into “uncooperative” programs, there is more cost to moving to a conventional collector than in the case where the embedding program is already aware that (for example) garbage collection may relocate objects.

So, you need to start small. First, we need to prove that the new GC implementation is promising in some way, that it might improve on BDW-GC. Then... embed it directly into Guile? That sounds like a bug farm. Is there not any intermediate step that one might take?

But also, how do you actually test that a GC algorithm or implementation is interesting? You need a workload, and you need the ability to compare the new collector to the old, for that workload. In Whippet I had been writing some benchmarks in C (example), but this approach wasn’t scaling: besides not sparking joy, I was starting to wonder if what I was testing would actually reflect usage in Guile.

I had an early approach to rewrite a simple language implementation like the other Scheme implementation I made to demonstrate JIT code generation in WebAssembly, but that soon foundered against what seemed to me an unlikely rock: the compiler itself. In my wasm-jit work, the “compiler” itself was in C++, using the C++ allocator for compile-time allocations, and the result was a tree of AST nodes that were interpreted at run-time. But to embed the benchmarks in Whippet itself I needed something C, which is less amenable to abstraction of any kind... Here I think I could have made a different choice: to somehow allow C++ or something as a dependency to write tests, or to do more mallocation in the “compiler”...

But that wasn’t fun. A lesson I learned long ago is that if something isn’t fun, I need to turn it into a compiler. So I started writing a compiler to a little bytecode VM, initially in C, then in Scheme because C is a drag and why not? Why not just generate the bytecode C file from Scheme? Same dependency set, once the C file is generated. And then, as long as you’re generating C, why go through bytecode at all? Why not just, you know, generate C?

after all, why not? why shouldn’t i keep it?

And that’s how I accidentally made a Scheme, Whiffle. Tomorrow I’ll write a little more on what Whiffle is and isn’t, and what it’s doing for me. Until then, happy hacking!

by Andy Wingo at November 13, 2023 09:36 PM

GStreamerGStreamer 1.22.7 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Security fixes for the MXF demuxer and AV1 codec parser
  • glfilter: Memory leak fix for OpenGL filter elements
  • d3d11videosink: Fix toggling between fullscreen and maximized, and window switching in fullscreen mode
  • DASH / HLS adaptive streaming fixes
  • Decklink card device provider device name string handling fixes
  • interaudiosrc: handle non-interleaved audio properly
  • openh264: Fail gracefully if openh264 encoder/decoder creation fails
  • rtspsrc: improved whitespace handling in response headers by certain cameras
  • v4l2codecs: avoid wrap-around after 1000000 frames; tiled formats handling fixes
  • video-scaler, audio-resampler: downgraded "Can't find exact taps" debug log messages
  • wasapi2: Don't use global volume control object
  • Rust plugins: various improvements in aws, fmp4mux, hlssink3, livesync, ndisrc, rtpav1depay, rsfilesink, s3sink, sccparse
  • WebRTC: various webrtchttp, webrtcsrc, and webrtcsink improvements and fixes
  • Cerbero build tools: recognise Windows 11; restrict parallelism of gst-plugins-rs build on small systems
  • Packages: ca-certificates update; fix gio module loading and TLS support on macOS

See the GStreamer 1.22.7 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

November 13, 2023 12:00 PM

October 31, 2023

Víctor JáquezGStreamer Conference 2023

This year the GStreamer Conference happened in A Coruña, basically at home, along with the hackfest.

The conference was the first after a long hiatus of four years of pandemics. The community craved it and long expected it. Some igalians helped to the GStreamer Foundation and our warm community with the organization and logistics. I’m very thankful with my peers and the sponsors of the event. Personally, I’m happy with the outcome. Though, I ought to say, organizing a conference like this is quite a challenge and very demanding.

Palexco Marina

The conference were recorded and streamed by Ubicast. And you can watch any presentation of the conference in their GStreamer Archive.

Tim sharing the State of the Union

Lunch time in Palexco

This is the list of talks where fellow Igalians participated:

Vulkan Video in GStreamer

Video Editing with GStreamer: an update

MSE and EME on GStreamer based WebKit ports

High-level WebRTC APIs in GStreamer

GstPipelineStudio version 0.3.0 is out !

WebCodecs in WebKit, with GStreamer!

Updates in GStreamer VA-API (and DMABuf negotiation)

GstWebRTC in WebKit, current status and challenges

There were two days of conference. The following two were for the hackfest, at Igalia’s Head Quarters.

Day one of the Hackfest

by vjaquez at October 31, 2023 01:10 PM

October 23, 2023

Michael SheldonHow to make Flutter’s AlertDialog screen reader friendly

(Michael Sheldon)

While developing Pied, I discovered that Flutter’s AlertDialog widget isn’t accessible to screen reader users (there’s been a bug report filed against this for over a year). Text within the AlertDialog doesn’t get sent to the screen reader when the dialog is focused. The user has no indication that the dialog is present, let alone what its contents are.

Example

The following video demonstrates a typical Flutter AlertDialog with the Orca screen reader enabled. When the ‘Launch inaccessible alert’ button is pressed an alert dialog appears, but the screen reader is unable to read its contents. The ‘Launch accessible alert’ button is then pressed and an accessible alert dialog is demonstrated, with the screen reader successfully reading the contents of the dialog.

The example application used in the video can be found in the accessible_alert_example GitHub repository.

Creating a standard alert dialog

The following code will create a normal Flutter AlertDialog as recommended in the official Flutter documentation. Unfortunately this doesn’t work correctly for screen reader users:

showDialog(
  context: context,
  builder: (context) => AlertDialog(
    actions: [
      TextButton(
        onPressed: () {
          Navigator.pop(context);
        },
      child: const Text('Close Alert'),
      )
    ],
    content: Text(
      'This text is invisible to screen readers. There is no indication that this dialog has even appeared.'
    ),
  )
);

Creating an accessible alert dialog

By making a few small changes to the above code we can make it work correctly with screen readers. First we wrap the AlertDialog in a Semantics widget. This allows us to attach a screen reader friendly label to the AlertDialog. The Semantics label text should be the same as the text displayed in the AlertDialog. Finally, to have this text read as soon as the alert is triggered, we enable autofocus on the TextButton:

showDialog(
  context: context,
  builder: (context) => Semantics(
    label: 'This text will be read by a screen reader. It is clear to the user that something has happened.',
    enabled: true,
    container: true,
    child: AlertDialog(
      actions: [
        TextButton(
          autofocus: true,
          onPressed: () {
            Navigator.pop(context);
          },
          child: const Text('Close Alert'),
        )
      ],
      content: Text(
        'This text will be read by a screen reader. It is clear to the user that something has happened.'
      ),
    )
  )
);

by Mike at October 23, 2023 05:33 PM

October 19, 2023

Andy Wingorequiem for a stringref

(Andy Wingo)

Good day, comrades. Today’s missive is about strings!

a problem for java

Imagine you want to compile a program to WebAssembly, with the new GC support for WebAssembly. Your WebAssembly program will run on web browsers and render its contents using the DOM API: Document.createElement, Document.createTextNode, and so on. It will also use DOM interfaces to read parts of the page and read input from the user.

How do you go about representing your program in WebAssembly? The GC support gives you the ability to define a number of different kinds of aggregate data types: structs (records), arrays, and functions-as-values. Earlier versions of WebAssembly gave you 32- and 64-bit integers, floating-point numbers, and opaque references to host values (externref). This is what you have in your toolbox. But what about strings?

WebAssembly’s historical answer has been to throw its hands in the air and punt the problem to its user. This isn’t so bad: the direct user of WebAssembly is a compiler developer and can fend for themself. Using the primitives above, it’s clear we should represent strings as some kind of array.

The source language may impose specific requirements regarding string representations: for example, in Java, you will want to use an (array i16), because Java’s strings are specified as sequences of UTF-16¹ code units, and Java programs are written assuming that random access to a code unit is constant-time.

Let’s roll with the Java example for a while. It so happens that JavaScript, the main language of the web, also specifies strings in terms of 16-bit code units. The DOM interfaces are optimized for JavaScript strings, so at some point, our WebAssembly program is going to need to convert its (array i16) buffer to a JavaScript string. You can imagine that a high-throughput interface between WebAssembly and the DOM is going to involve a significant amount of copying; could there be a way to avoid this?

Similarly, Java is going to need to perform a number of gnarly operations on its strings, for example, locale-specific collation. This is a hard problem whose solution basically amounts to shipping a copy of libICU in their WebAssembly module; that’s a lot of binary size, and it’s not even clear how to compile libICU in such a way that works on GC-managed arrays rather than linear memory.

Thinking about it more, there’s also the problem of regular expressions. A high-performance regular expression engine is a lot of investment, and not really portable from the native world to WebAssembly, as the main techniques require just-in-time code generation, which is unavailable on Wasm.

This is starting to sound like a terrible system: big binaries, lots of copying, suboptimal algorithms, and a likely ongoing functionality gap. What to do?

a solution for java

One observation is that in the specific case of Java, we could just use JavaScript strings in a web browser, instead of implementing our own string library. We may need to make some shims here and there, but the basic functionality from JavaScript gets us what we need: constant-time UTF-16¹ code unit access from within WebAssembly, and efficient access to browser regular expression, internationalization, and DOM capabilities that doesn’t require copying.

A sort of minimum viable product for improving the performance of Java compiled to Wasm/GC would be to represent strings as externref, which is WebAssembly’s way of making an opaque reference to a host value. You would operate on those values by importing the equivalent of String.prototype.charCodeAt and friends; to get the receivers right you’d need to run them through Function.call.bind. It’s a somewhat convoluted system, but a WebAssembly engine could be taught to recognize such a function and compile it specially, using the same code that JavaScript compiles to.

(Does this sound too complicated or too distasteful to implement? Disabuse yourself of the notion: it’s happening already. V8 does this and other JS/Wasm engines will be forced to follow, as users file bug reports that such-and-such an app is slow on e.g. Firefox but fast on Chrome, and so on and so on. It’s the same dynamic that led asm.js adoption.)

Getting properly good performance will require a bit more, though. String literals, for example, would have to be loaded from e.g. UTF-8 in a WebAssembly data section, then transcoded to a JavaScript string. You need a function that can convert UTF-8 to JS string in the first place; let’s call it fromUtf8Array. An engine can now optimize the array.new_data + fromUtf8Array sequence to avoid the intermediate array creation. It would also be nice to tighten up the typing on the WebAssembly side: having everything be externref imposes a dynamic type-check on each operation, which is something that can’t always be elided.

beyond the web?

“JavaScript strings for Java” has two main limitations: JavaScript and Java. On the first side, this MVP doesn’t give you anything if your WebAssembly host doesn’t do JavaScript. Although it’s a bit of a failure for a universal virtual machine, to an extent, the WebAssembly ecosystem is OK with this distinction: there are different compiler and toolchain options when targetting the web versus, say, Fastly’s edge compute platform.

But does that mean you can’t run Java on Fastly’s cloud? Does the Java compiler have to actually implement all of those things that we were trying to avoid? Will Java actually implement those things? I think the answers to all of those questions is “no”, but also that I expect a pretty crappy outcome.

First of all, it’s not technically required that Java implement its own strings in terms of (array i16). A Java-to-Wasm/GC compiler can keep the strings-as-opaque-host-values paradigm, and instead have these string routines provided by an auxiliary WebAssembly module that itself probably uses (array i16), effectively polyfilling what the browser would give you. The effort of creating this module can be shared between e.g. Java and C#, and the run-time costs for instantiating the module can be amortized over a number of Java users within a process.

However, I don’t expect such a module to be of good quality. It doesn’t seem possible to implement a good regular expression engine that way, for example. And, absent a very good run-time system with an adaptive compiler, I don’t expect the low-level per-codepoint operations to be as efficient with a polyfill as they are on the browser.

Instead, I could see non-web WebAssembly hosts being pressured into implementing their own built-in UTF-16¹ module which has accelerated compilation, a native regular expression engine, and so on. It’s nice to have a portable fallback but in the long run, first-class UTF-16¹ will be everywhere.

beyond java?

The other drawback is Java, by which I mean, Java (and JavaScript) is outdated: if you were designing them today, their strings would not be UTF-16¹.

I keep this little “¹” sigil when I mention UTF-16 because Java (and JavaScript) don’t actually use UTF-16 to represent their strings. UTF-16 is standard Unicode encoding form. A Unicode encoding form encodes a sequence of Unicode scalar values (USVs), using one or two 16-bit code units to encode each USV. A USV is a codepoint: an integer in the range [0,0x10FFFF], but excluding surrogate codepoints: codepoints in the range [0xD800,0xDFFF].

Surrogate codepoints are an accident of history, and occur either when accidentally slicing a two-code-unit UTF-16-encoded-USV in the middle, or when treating an arbitrary i16 array as if it were valid UTF-16. They are annoying to detect, but in practice are here to stay: no amount of wishing will make them go away from Java, JavaScript, C#, or other similar languages from those heady days of the mid-90s. Believe me, I have engaged in some serious wishing, but if you, the virtual machine implementor, want to support Java as a source language, your strings have to be accessible as 16-bit code units, which opens the door (eventually) to surrogate codepoints.

So when I say UTF-16¹, I really mean WTF-16: sequences of any 16-bit code units, without the UTF-16 requirement that surrogate code units be properly paired. In this way, WTF-16 encodes a larger language than UTF-16: not just USV codepoints, but also surrogate codepoints.

The existence of WTF-16 is a consequence of a kind of original sin, originating in the choice to expose 16-bit code unit access to the Java programmer, and which everyone agrees should be somehow firewalled off from the rest of the world. The usual way to do this is to prohibit WTF-16 from being transferred over the network or stored to disk: a message sent via an HTTP POST, for example, will never include a surrogate codepoint, and will either replace it with the U+FFFD replacement codepoint or throw an error.

But within a Java program, and indeed within a JavaScript program, there is no attempt to maintain the UTF-16 requirements regarding surrogates, because any change from the current behavior would break programs. (How many? Probably very, very few. But productively deprecating web behavior is hard to do.)

If it were just Java and JavaScript, that would be one thing, but WTF-16 poses challenges for using JS strings from non-Java languages. Consider that any JavaScript string can be invalid UTF-16: if your language defines strings as sequences of USVs, which excludes surrogates, what do you do when you get a fresh string from JS? Passing your string to JS is fine, because WTF-16 encodes a superset of USVs, but when you receive a string, you need to have a plan.

You only have a few options. You can eagerly check that a string is valid UTF-16; this might be a potentially expensive O(n) check, but perhaps this is acceptable. (This check may be faster in the future.) Or, you can replace surrogate codepoints with U+FFFD, when accessing string contents; lossy, but preserves your language’s semantic domain. Or, you can extend your language’s semantics to somehow deal with surrogate codepoints.

My point is that if you want to use JS strings in a non-Java-like language, your language will need to define what to do with invalid UTF-16. Ideally the browser will give you a way to put your policy into practice: replace with U+FFFD, error, or pass through.

beyond java? (reprise) (feat: snakes)

With that detail out of the way, say you are compiling Python to Wasm/GC. Python’s language reference says: “A string is a sequence of values that represent Unicode code points. All the code points in the range U+0000 - U+10FFFF can be represented in a string.” This corresponds to the domain of JavaScript’s strings; great!

On second thought, how do you actually access the contents of the string? Surely not via the equivalent of JavaScript’s String.prototype.charCodeAt; Python strings are sequences of codepoints, not 16-bit code units.

Here we arrive to the second, thornier problem, which is less about domain and more about idiom: in Python, we expect to be able to access strings by codepoint index. This is the case not only to access string contents, but also to refer to positions in strings, for example when extracting a substring. These operations need to be fast (or fast enough anyway; CPython doesn’t have a very high performance baseline to meet).

However, the web platform doesn’t give us O(1) access to string codepoints. Usually a codepoint just takes up one 16-bit code unit, so the (zero-indexed) 5th codepoint of JS string s may indeed be at s.codePointAt(5), but it may also be at offset 6, 7, 8, 9, or 10. You get the point: finding the nth codepoint in a JS string requires a linear scan from the beginning.

More generally, all languages will want to expose O(1) access to some primitive subdivision of strings. For Rust, this is bytes; 8-bit bytes are the code units of UTF-8. For others like Java or C#, it’s 16-bit code units. For Python, it’s codepoints. When targetting JavaScript strings, there may be a performance impedance mismatch between what the platform offers and what the language requires.

Languages also generally offer some kind of string iteration facility, which doesn’t need to correspond to how a JavaScript host sees strings. In the case of Python, one can implement for char in s: print(char) just fine on top of JavaScript strings, by decoding WTF-16 on the fly. Iterators can also map between, say, UTF-8 offsets and WTF-16 offsets, allowing e.g. Rust to preserve its preferred “strings are composed of bytes that are UTF-8 code units” abstraction.

Our O(1) random access problem remains, though. Are we stuck?

what does the good world look like

How should a language represent its strings, anyway? Here we depart from a precise gathering of requirements for WebAssembly strings, but in a useful way, I think: we should build abstractions not only for what is, but also for what should be. We should favor a better future; imagining the ideal helps us design the real.

I keep returning to Henri Sivonen’s authoritative article, It’s Not Wrong that “🤦🏼‍♂️”.length == 7, But It’s Better that “🤦🏼‍♂️”.len() == 17 and Rather Useless that len(“🤦🏼‍♂️”) == 5. It is so good and if you have reached this point, pop it open in a tab and go through it when you can. In it, Sivonen argues (among other things) that random access to codepoints in a string is not actually important; he thinks that if you were designing Python today, you wouldn’t include this interface in its standard library. Users would prefer extended grapheme clusters, which is variable-length anyway and a bit gnarly to compute; storage wants bytes; array-of-codepoints is just a bad place in the middle. Given that UTF-8 is more space-efficient than either UTF-16 or array-of-codepoints, and that it embraces the variable-length nature of encoding, programming languages should just use that.

As a model for how strings are represented, array-of-codepoints is outdated, as indeed is UTF-16. Outdated doesn’t mean irrelevant, of course; there is lots of Python code out there and we have to support it somehow. But, if we are designing for the future, we should nudge our users towards other interfaces.

There is even a case that a JavaScript engine should represent its strings as UTF-8 internally, despite the fact that JS exposes a UTF-16 view on strings in its API. The pitch is that UTF-8 takes less memory, is probably what we get over the network anyway, and is probably what many of the low-level APIs that a browser uses will want; it would be faster and lighter-weight to pass UTF-8 to text shaping libraries, for example, compared to passing UTF-16 or having to copy when going to JS and when going back. JavaScript engines already have a dozen internal string representations or so (narrow or wide, cons or slice or flat, inline or external, interned or not, and the product of many of those); adding another is just a Small Matter Of Programming that could show benefits, even if some strings have to be later transcoded to UTF-16 because JS accesses them in that way. I have talked with JS engine people in all the browsers and everyone thinks that UTF-8 has a chance at being a win; the drawback is that actually implementing it would take a lot of effort for uncertain payoff.

I have two final data-points to indicate that UTF-8 is the way. One is that Swift used to use UTF-16 to represent its strings, but was able to switch to UTF-8. To adapt to the newer performance model of UTF-8, Swift maintainers designed new APIs to allow users to request a view on a string: treat this string as UTF-8, or UTF-16, or a sequence of codepoints, or even a sequence of extended grapheme clusters. Their users appear to be happy, and I expect that many languages will follow Swift’s lead.

Secondly, as a maintainer of the Guile Scheme implementation, I also want to switch to UTF-8. Guile has long used Python’s representation strategy: array of codepoints, with an optimization if all codepoints are “narrow” (less than 256). The Scheme language exposes codepoint-at-offset (string-ref) as one of its fundamental string access primitives, and array-of-codepoints maps well to this idiom. However, we do plan to move to UTF-8, with a Swift-like breadcrumbs strategy for accelerating per-codepoint access. We hope to lower memory consumption, simplify the implementation, and have general (but not uniform) speedups; some things will be slower but most should be faster. Over time, users will learn the performance model and adapt to prefer string builders / iterators (“string ports”) instead of string-ref.

a solution for webassembly in the browser?

Let’s try to summarize: it definitely makes sense for Java to use JavaScript strings when compiled to WebAssembly/GC, when running on the browser. There is an OK-ish compilation strategy for this use case involving externref, String.prototype.charCodeAt imports, and so on, along with some engine heroics to specially recognize these operations. There is an early proposal to sand off some of the rough edges, to make this use-case a bit more predictable. However, there are two limitations:

  1. Focussing on providing JS strings to Wasm/GC is only really good for Java and friends; the cost of mapping charCodeAt semantics to, say, Python’s strings is likely too high.

  2. JS strings are only present on browsers (and Node and such).

I see the outcome being that Java will have to keep its implementation that uses (array i16) when targetting the edge, and use JS strings on the browser. I think that polyfills will not have acceptable performance. On the edge there will be a binary size penalty and a performance and functionality gap, relative to the browser. Some edge Wasm implementations will be pushed to implement fast JS strings by their users, even though they don’t have JS on the host.

If the JS string builtins proposal were a local maximum, I could see putting some energy into it; it does make the Java case a bit better. However I think it’s likely to be an unstable saddle point; if you are going to infect the edge with WTF-16 anyway, you might as well step back and try to solve a problem that is a bit more general than Java on JS.

stringref: a solution for webassembly?

I think WebAssembly should just bite the bullet and try to define a string data type, for languages that use GC. It should support UTF-8 and UTF-16 views, like Swift’s strings, and support some kind of iterator API that decodes codepoints.

It should be abstract as regards the concrete representation of strings, to allow JavaScript strings to stand in for WebAssembly strings, in the context of the browser. JS hosts will use UTF-16 as their internal representation. Non-JS hosts will likely prefer UTF-8, and indeed an abstract API favors migration of JS engines away from UTF-16 over the longer term. And, such an abstraction should give the user control over what to do for surrogates: allow them, throw an error, or replace with U+FFFD.

What I describe is what the stringref proposal gives you. We don’t yet have consensus on this proposal in the Wasm standardization group, and we may never reach there, although I think it’s still possible. As I understand them, the objections are two-fold:

  1. WebAssembly is an instruction set, like AArch64 or x86. Strings are too high-level, and should be built on top, for example with (array i8).

  2. The requirement to support fast WTF-16 code unit access will mean that we are effectively standardizing JavaScript strings.

I think the first objection is a bit easier to overcome. Firstly, WebAssembly now defines quite a number of components that don’t map to machine ISAs: typed and extensible locals, memory.copy, and so on. You could have defined memory.copy in terms of primitive operations, or required that all local variables be represented on an explicit stack or in a fixed set of registers, but WebAssembly defines higher-level interfaces that instead allow for more efficient lowering to machine primitives, in this case SIMD-accelerated copies or machine-specific sets of registers.

Similarly with garbage collection, there was a very interesting “continuation marks” proposal by Ross Tate that would give a low-level primitive on top of which users could implement root-finding of stack values. However when choosing what to include in the standard, the group preferred a more high-level facility in which a Wasm module declares managed data types and allows the WebAssembly implementation to do as it sees fit. This will likely result in more efficient systems, as a Wasm implementation can more easily use concurrency and parallelism in the GC implementation than a guest WebAssembly module could do.

So, the criteria of what to include in the Wasm standard is not “what is the most minimal primitive that can express this abstraction”, or even “what looks like an ARMv8 instruction”, but rather “what makes Wasm a good compilation target”. Wasm is designed for its compiler-users, not for the machines that it runs on, and if we manage to find an abstract definition of strings that works for Wasm-targetting toolchains, we should think about adding it.

The second objection is trickier. When you compile to Wasm, you need a good model of what the performance of the Wasm code that you emit will be. Different Wasm implementations may use different stringref representations; requesting a UTF-16 view on a string that is already UTF-16 will be cheaper than doing so on a string that is UTF-8. In the worst case, requesting a UTF-16 view on a UTF-8 string is a linear operation on one system but constant-time on another, which in a loop over string contents makes the former system quadratic: a real performance failure that we need to design around.

The stringref proposal tries to reify as much of the cost model as possible with its “view” abstraction; the compiler can reason that any cost may incur then rather than when accessing a view. But, this abstraction can leak, from a performance perspective. What to do?

I think that if we look back on what the expected outcome of the JS-strings-for-Java proposal is, I believe that if Wasm succeeds as a target for Java, we will probably already end up with WTF-16 everywhere. We might as well admit this, I think, and if we do, then this objection goes away. Likewise on the Web I see UTF-8 as being potentially advantageous in the medium-long term for JavaScript, and certainly better for other languages, and so I expect JS implementations to also grow support for fast UTF-8.

i’m on a horse

I may be off in some of my predictions about where things will go, so who knows. In the meantime, in the time that it takes other people to reach the same conclusions, stringref is in a kind of hiatus.

The Scheme-to-Wasm compiler that I work on does still emit stringref, but it is purely a toolchain concept now: we have a post-pass that lowers stringref to WTF-8 via (array i8), and which emits calls to host-supplied conversion routines when passing these strings to and from the host. When compiling to Hoot’s built-in Wasm virtual machine, we can leave stringref in, instead of lowering it down, resulting in more efficient interoperation with the host Guile than if we had to bounce through byte arrays.

So, we wait for now. Not such a bad situation, at least we have GC coming soon to all the browsers. appy hacking to all my stringfolk, and until next time!

by Andy Wingo at October 19, 2023 10:33 AM

Stéphane CerveauIntroducing GstPipelineStudio 0.3.4

GstPipelineStudio

As it’s not always convenient to use the powerful command line based, gst-launch tool and also manage all the debug possibilities on all the platforms supported by GStreamer, I started this personal project in 2021 to facilitate the adoption to the GStreamer framework and help newbies as confirmed engineers enjoy the power of it.

Indeed a few other projects, such as Pipeviz (greatly inspired from…) or gst-debugger, already tried to offer this GUI capability, my idea with GPS was to provide a cross-platform tool written in Rust with the powerful framework, gtk-rs.

The aim of this project is to provide the GUI for GStreamer but also being able to remote debug existing pipeline while offering a very simple and accessible interface as back in the days I discovered DirectShow with the help of graphedit or GraphStudioNext

Project details

GPS

The interface includes 5 important zones:

  • (1) The registry area gives you an access to the GStreamer registry including all the plugins/elements available on your system. It provides you details on each elements. You can also access to a favorite list.
  • (2) the main drawing area is where you can add elements from the registry area and connect them together. This area allows you to have multiple independent pipelines with its own player for each drawing area.
  • (3) The control playback area, each pipeline can be controlled from this area including basic play/stop/pause but also a seekbar.
  • (4) the debug zone where you’ll receive the messages from the application.
  • (5) The render zone where you can have a video preview if a video sink has been added to the pipeline. Future work includes to have tracers or audio analysis in this zone.

The project has been written in Rust to offer more stability and thanks to the wonderful work to use the GTK framework, it was perfectly fitting to this project as it gives an easy way to use it over the 3 platforms targeted such as GNU/Linux, MacOS and Windows. On this last platform which is quite well “implanted” in the desktop eco-system, the use of GStreamer can lead to difficulties, that’s why GstPipelineStudio Windows MSI will be a perfect match to test the power of the GStreamer framework.

This project has been written under the GPL v3 License.

How it works under the hood

The trick is quite simple as it uses the power of gst-parse-launch API to build a pipeline as a transformation of the visual pipeline to a command line.

So its a clearly a sibling of gst-launch.

Right now its directly linked to the GStreamer installed on your system but future work could be to connect it over daemons such as GstPrinceOfParser or gstd

What’s new in 0.3.4

The main feature of this release is the cross platform ready state. These are beta versions but the CI is now ready to build and deploy Flathub (Linux), Mac OS and Windows version of GstPipelineStudio.

You can download the installers from the project page or with:

  • Linux: flatpak install org.freedesktop.dabrain34.GstPipelineStudio
  • MacOS: DMG file
  • Windows: MSI file

Here is a list of main features added to the app:

  • Open a pipeline from the command line and it will be drawn automatically on the screen. This feature allows you to take any command line pipeline and draw it on the screen to allow any new play tricks with the pipeline, such as change of elements, properties etc.

  • Multiple graphview allows you to draw multiple independent pipeline in the same instance of GstPipelineStudio. The playback state is totally independent for each of the views.

  • Capsfilter has been added to the links allowing to add this crucial feature of GStreamer pipelines.

  • gstreamer-1.0 wrap support to the build system. So you can build your own version of GPS using a dedicated GStreamer version.

What’s next in the pipeline

Among multiple use case, key and debug features, the most upcoming features are:

  • Support the zoom on the graphview. As a pipeline can be quite big, the zoom is a key/must features for GPS. See MR
  • Debug sections such as receiving events/tags/messages or tracers and terminal support, see MR
  • Elements compatibility to check if an element can connect to the previous/next one.
  • Remote debugging: A tracer wsserver is currently under development allowing to send over websocket the pipeline events such as connections, properties or element addition. A MR is under development to connect this tracer and render the corresponding pipeline.
  • Auto plugging according to the rank of each compatible elements for a given pad caps.
  • Display the audio signal in a dedicated render tab.
  • Translations
  • Documentation
  • Unit tests

Here is a lighning talk, I gave about this release (0.3.3), during the 2023 GStreamer conference.

Hope you’ll enjoy this tool and please feel free to provide new features with an RFC here or merge requests here.

As usual, if you would like to learn more about GstPipelineStudio, GStreamer or any other open multimedia framework, please contact us!

October 19, 2023 10:10 AM

October 16, 2023

Andy Wingoon safepoints

(Andy Wingo)

Hello all, a brief note today. For context, my big project over the last year and a half or so is Whippet, a new garbage collector implementation. If everything goes right, Whippet will finding a better point on the space/time tradeoff curve than Guile‘s current garbage collector, BDW-GC, and will make it into Guile, well, sometime.

But, testing a garbage collector is... hard. Methodologically it’s very tricky, though there has been some recent progress in assessing collectors in a more systematic way. Ideally of course you test against a real application, and barring that, against a macrobenchmark extracted from a real application. But garbage collectors are deeply intertwined with language run-times; to maximize the insight into GC performance, you need to minimize everything else, and to minimize non-collector cost, that means relying on a high-performance run-time (e.g. the JVM), which... is hard! It’s hard enough to get toy programs to work, but testing against a beast like the JVM is not easy.

In the case of Guile, it’s even more complicated, because the BDW-GC is a conservative collector. BDW-GC doesn’t require precise annotation of stack roots, and scans intraheap edges conservatively (unless you go out of your way to do otherwise). How can you test this against a semi-space collector if Guile can’t precisely enumerate all object references? The Immix-derived collector in Whippet can be configured to run with conservative stack roots (and even heap links), but how then to test the performance tradeoff of conservative vs precise tracing?

In my first iterations on Whippet, I hand-coded some benchmarks in C, starting with the classic gcbench. I used stack-allocated linked lists for precise roots, when run in precise-rooting mode. But, this is excruciating and error-prone; trying to write some more tests, I ran up against a wall of gnarly nasty C code. Not fun, and I wasn’t sure that it was representative in terms of workload; did the handle discipline have some kind of particular overhead? The usual way to do things in a high-performance system is to instead have stack maps, where the collector is able to precisely find roots on the stack and registers using a side table.

Of course the usual solution if something is not fun is to make it into a tools problem, so I wrote a little Scheme-to-C compiler, Whiffle, purpose-built for testing Whippet. It’s a baseline-style compiler that uses the C stack for control (function call and return), and a packed side stack of temporary values. The goal was to be able to generate some C that I could include in the collector’s benchmarks; I’ve gotten close but haven’t done that final step yet.

Anyway, because its temporaries are all in a packed stack, Whippet can always traverse the roots precisely. This means I can write bigger benchmark programs without worry, which will allow me to finish testing the collector. As a nominally separate project, Whiffle also tests that Whippet’s API is good for embedding. (Yes, the names are similar; if I thought that in the future I would need to say much more about Whiffle, I would rename it!)

I was able to translate over the sparse benchmarks that Whippet already had into Scheme, and set about checking that they worked as expected. Which they did... mostly.

the bug

The Whippet interface abstracts over different garbage collector implementations; the choice of which collector to use is made at compile-time. There’s the basic semi-space copying collector, with a large object space and ephemerons; there’s the Immix derived “whippet” collector (yes, it shares the same name); and there’s a shim that provides the Whippet API via the BDW-GC.

The benchmarks are multithreaded, except when using the semi-space collector which only supports one mutator thread. (I should fix that at some point.) I tested the benchmarks with one mutator thread, and they worked with all collectors. Yay!

Then I tested with multiple mutator threads. The Immix-derived collectors worked fine, in both precise and conservative modes. The BDW-GC collector... did not. With just one mutator thread it was fine, but with multiple threads I would get segfaults deep inside libgc.

the side quest

Over on the fediverse, Daphe Preston-Kendal asks, how does Whiffle deal with tail calls? The answer is, “sloppily”. All of the generated function code has the same prototype, so return-calls from one function to another should just optimize to jumps, and indeed they do: at -O2. For -O0, this doesn’t work, and sometimes you do want to compile in that mode (no optimizations) when investigating. I wish that C had a musttail attribute, and I use GCC too much to rely on the one that only Clang supports.

I found the -foptimize-sibling-calls flag in GCC and somehow convinced myself that it was a sufficient replacement, even when otherwise at -O0. However most of my calls are indirect, and I think this must have fooled GCC. Really not sure. But I was sure, at one point, until a week later I realized that the reason that the call instruction was segfaulting was because it couldn’t store the return address in *$rsp, because I had blown the stack. That was embarrassing, but realizing that and trying again at -O2 didn’t fix my bug.

the second side quest

Finally I recompiled bdw-gc, which I should have done from the beginning. (I use Guix on my test machine, and I wish I could have entered into an environment that had a specific set of dependencies, but with debugging symbols and source code; there are many things about Guix that I would think should help me develop software but which don’t. Probably there is a way to do this that I am unaware of.)

After recompiling, it became clear that BDW-GC was trying to scan a really weird range of addresses for roots, and accessing unmapped memory. You would think that this would be a common failure mode for BDW-GC, but really, this never happens: it’s an astonishingly reliable piece of software for what it does. I have used it for almost 20 years and its problems are elsewhere. Anyway, I determined that the thread that it was scanning was... well it was stuck somewhere in some rr support library, which... why was that anyway?

You see, the problem was that since my strange spooky segfaults on stack overflow, I had been living in rr for a week or so, because I needed to do hardware watchpoints and reverse-continue. But, somehow, strangely, oddly, rr was corrupting my program: it worked fine when not run in rr. Somehow rr caused GCC to grab the wrong $rsp on a remote thread.

the actual bug

I had found the actual bug in the shower, some days before, and fixed it later that evening. Consider that both precise Immix and conservative Immix are working fine. Consider that conservative Immix is, well, stack-conservative just like BDW-GC. Clearly Immix finds the roots correctly, why didn’t BDW-GC? What is the real difference?

Well, friend, you have read the article title, so perhaps you won’t be surprised when I say “safepoints”. A safepoint is a potential stopping place in a program. At a safepoint, the overall state of the program can be inspected by some independent part of the program. In the case of garbage collection, safepoints are places where heap object references in a thread’s stack can be enumerated.

For my Immix-derived collectors, safepoints are cooperative: the only place a thread will stop for collection is in an allocation call, or if the thread has explicitly signalled to the collector that is doing a blocking operation such as a thread join. Whiffle usually keeps the stack pointer for the side value stack in a register, but for allocating operations that need to go through the slow path, it makes sure to write that stack pointer into a shared data structure, so that other threads can read it if they need to walk its stack.

The BDW collector doesn’t work this way. It can’t rely on cooperation from the host. Instead, it installs a signal handler for the process that can suspend and resume any thread at any time. When BDW-GC needs to trace the heap, it stops all threads, finds the roots for those threads, and then traces the graph.

I had installed a custom BDW-GC mark function for Whiffle stacks so that even though they were in anonymous mmap’d memory that BDW-GC doesn’t usually trace for roots, Whiffle would be sure to let BDW-GC know about them. That mark function used the safepoints that I was already tracking to determine which part of the side stack to mark.

But here’s the thing: cooperative safepoints can be lazy, but preemptive safepoints must be eager. For BDW-GC, it’s not enough to ensure that the thread’s stack is traversable at allocations: it must be traversable at all times, because a signal can stop a thread at any time.

Safepoints also differ on the axis of precision: precise safepoints must include no garbage roots, whereas conservative safepoints can be sloppy. For precise safepoints, if you bump the stack pointer to allocate a new frame, you can’t include the slots in that frame in the root set until they have values. Precision has a cost to the compiler and to the run-time in side table size or shared-data-structure update overhead. As far as I am aware, no production collector uses fully preemptive, precise roots. (Happy to be corrected, if people have counterexamples; I know that this used to be the case, but my understanding is that since multi-threaded mutators have been common, people have mostly backed away from this design.)

Concretely, as I was using a stack discipline to allocate temporaries, I had to shrink the precise safepoint at allocation sites, but I was neglecting to expand the conservative safepoint whenever the stack pointer would be restored.

coda

When I was a kid, baseball was not my thing. I failed out at the earlier phase of tee-ball, where the ball isn’t thrown to you by the pitcher but is instead on a kind of rubber stand. As often as not I would fail to hit the ball and instead buckle the tee under it, making an unsatisfactory “flubt” sound, causing the ball to just fall to the ground. I probably would have persevered if I hadn’t also caught a ground ball to the face one day while playing right field, knocking me out and giving me a nosebleed so bad that apparently they called off the game because so many other players were vomiting. I woke up in the entryway of the YMCA and was given a lukewarm melted push-pop.

Anyway, “flubt” is the sound I heard in my mind when I realized that rr had been perturbing my use of BDW-GC, and instead of joy and relief when I finally realized that it worked already, it tasted like melted push-pop. It be that way sometimes. Better luck next try!

by Andy Wingo at October 16, 2023 01:46 PM

October 02, 2023

Sebastian Pölsterlscikit-survival 0.22.0 released

I am pleased to announce the release of scikit-survival 0.22.0. The highlights for this release include

Missing Values Support in SurvivalTree

Based on the missing value support in scikit-learn 1.3, a SurvivalTree can now deal with missing values if it is fit with splitter='best'.

If the training data contained no missing values, then during prediction missing values are mapped to the child node with the most samples:

X, y = load_veterans_lung_cancer()
X_train = np.asarray(X.loc[:, ["Karnofsky_score"]], dtype=np.float32)
est = SurvivalTree(max_depth=1)
est.fit(X_train, y)
X_test = np.array([[np.nan]])
surv_fn = est.predict_survival_function(X_test, return_array=True)
mask = X_train[:, 0] > est.tree_.threshold[0]
km_x, km_y = kaplan_meier_estimator(
y[mask]["Status"], y[mask]["Survival_in_days"]
)
plt.step(km_x, km_y, where="post", linewidth=5)
plt.step(
est.unique_times_, surv_fn[0], where="post", linewidth=3, linestyle="dotted"
)
plt.ylim(0, 1)

If a tree is fit to training data with missing values, the splitter will evaluate each split with all samples with missing values going to the left or the right child node.

X, y = load_veterans_lung_cancer()
X_train = np.asarray(X.loc[:, ["Age_in_years"]], dtype=np.float32)
X_train[-50:, :] = np.nan
est = SurvivalTree(max_depth=1)
est.fit(X_train, y)
X_test = np.array([[np.nan]])
surv_fn = est.predict_survival_function(X_test, return_array=True)
mask = X_train[:, 0] > est.tree_.threshold[0]
mask |= np.isnan(X_train[:, 0])
km_x, km_y = kaplan_meier_estimator(
y[mask]["Status"], y[mask]["Survival_in_days"]
)
plt.step(km_x, km_y, where="post", linewidth=5)
plt.step(
est.unique_times_, surv_fn[0], where="post", linewidth=3, linestyle="dotted"
)
plt.ylim(0, 1)

These rules are identical to those of scikit-learn’s missing value support.

Low-memory Mode for SurvivalTree and RandomSurvivalForest

The last release already saw performance improvments to SurvivalTree and RandomSurvivalForest. This release adds the low_memory option to RandomSurvivalForest, ExtraSurvivalTrees, and SurvivalTree.

If low-memory mode is disabled, which is the default, calling predict on a sample will require memory in the order of unique event times in the training data, because the cumulative hazard function is computed as an intermediate value. If low-memory mode is enabled, then the risk score is computed directly, without computing the cumulative hazard function. However, low-memory mode disables using predict_cumulative_hazard_function and predict_survival_function.

Install

Pre-built conda packages are available for Linux, macOS, and Windows, either

via pip:

pip install scikit-survival

or via conda

 conda install -c sebp scikit-survival

October 02, 2023 08:39 PM

September 23, 2023

GStreamerNew GStreamer Discourse forum

(GStreamer)

The GStreamer project is thrilled to announce that there is now a new GStreamer Discourse forum up and running at https://discourse.gstreamer.org, offering a modern and welcoming platform for support requests and discussions.

Anybody is welcome to join and discuss, ask questions, reply, etc...

We look forward to seeing you all there.

Discourse forum screenshot

September 23, 2023 06:30 PM

September 20, 2023

GStreamerGStreamer 1.22.6 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Security fixes for the MXF demuxer and H.265 video parser
  • Fix latency regression in H.264 hardware decoder base class
  • androidmedia: fix HEVC codec profile registration and fix coded_data handling
  • decodebin3: fix switching from a raw stream to an encoded stream
  • gst-inspect: prettier and more correct signal and action signals printing
  • rtmp2: Allow NULL flash version, omitting the field, for better RTMP server compatibility
  • rtspsrc: better compatibility with buggy RTSP servers that don't set a clock-rate
  • rtpjitterbuffer: fix integer overflow that led to more packets being declared lost than have been lost
  • v4l2: fix video encoding regression on RPi and fix support for left and top padding
  • waylandsink: Crop surfaces to their display width height
  • cerbero: Recognise Manjaro; add Rust support for MSVC ARM64; cmake detection fixes
  • various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.6 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

September 20, 2023 08:00 PM

September 16, 2023

GStreamerGStreamer Conference 2023: Full Schedule, Talk Abstracts and Speakers Biographies now available

(GStreamer)

The GStreamer Conference team is pleased to announce that the full conference schedule including talk abstracts and speaker biographies is now available for this year's lineup of talks and speakers, covering again an exciting range of topics!

The GStreamer Conference 2023 will take place on 25-26 September 2023 in A Coruña, Spain, followed by a hackfest.

Details about the conference, hackfest and how to register can be found on the conference website.

This year's topics and speakers:

Lightning Talks:

Many thanks to our amazing sponsors ‒ Platinum sponsors Igalia, Collabora, Fluendo, and Pexip, Gold sponsors Axis Communications, and Centricular, and Silver sponsors Zeiss, RidgeRun, Laerdal Labs and asymptotic, without whom the conference would not be possible in this form. And to Ubicast who will be recording and live streaming the talks again.

We hope to see you all A Coruña! Don't forget to register!

September 16, 2023 04:30 PM

September 06, 2023

Jean-François Fortin TamHelp us make GNOME Calendar rock-solid by expanding the test suite!

GNOME Calendar 45 will be a groundbreaking release in terms of UX (more on that later?), performance, and to some extent, reliability (we’ve at least solved two complex crashers recently, including a submarine Cthulhu crasher heisenbug and its offspring)… and yet, I think this might be “just the beginning” of a new era. And a beginning… is a very delicate time.

If you’ve tried to use GNOME Calendar in the past decade or so, you’ve certainly encountered one of the many bugs related to timezones, daylight saving time, and all of that crazy stuff that would make Tom Scott curl up in a corner and cry. But it doesn’t have to be that way, and in fact, there is a way for anyone who knows a bit of C programming to help us build the tools to solve this mission-critical problem.

Today, I’d like to urge you to help in writing some automated tests.
The sooner our test suite can grow, the faster we can make GNOME Calendar rock-solid.

The

As I explain in this epic blocker ticket:

[…] There currently are some unit tests in GNOME Calendar, but not nearly enough, and we need to cover a lot more in order to fix the pile of timezone-related issues […]

We really need to use test-driven development here, otherwise these bugs will be a nightmare to fix and to keep fixed forever. Before we start fixing those bugs, we need as many unit tests as possible to cover spec compliance.

By helping write unit tests to ensure GNOME Calendar complies with standard calendaring specifications, your contribution will make a huge impact in GNOME Calendar’s future reliability, as an improved test suite will make it 100x easier for us to fix existing bugs while preventing introducing new bugs.

Doing this will also help us more confidently expand GNOME Calendar’s featureset to handle meetings planning functionality. […]

Why is this suddenly mission-critical now?

I believe a boosted test suite is now becoming mission-critical, because in my logical ordering of the dependency chain for this new roadmap I devised in 2023, I have determined that timezones/DST/midnight backend+views reliability blocks almost everything left: you can’t expand the UI and UX featureset to encompass meetings & timezones management unless you have a rock-solid, reliable backend and views/representation that you can trust to begin with (why create nice UIs if the users won’t trust the resulting events/data/views?)

So we need to fix those ugly issues once and for all. And there’s pretty much only one way to do this kind of daunting bugfixing work efficiently and sustainably: with test-driven development.

Once we have those tests, I trust that we will be able to make the app pretty much bulletproof.

Having extensive automated test coverage will allow us to confidently/safely fix some of the first “fundamental” compliance bugs there, and I can bet that by doing so, we would incidently solve tons of other issues at once—I wouldn’t be surprised if this would allow us to solve 50 issues or more. This will not only make a ton of existing users happy (and make GNOME Calendar viable for a ton more new users), but also make contributors happy because we will be able to resolve and close a ton of tickets cluttering our view. It’s all part of a flywheel of sustainability here.

“How hard can it be?”

While writing such tests might not be easy to do for a “beginner programmer who does not know the C language”, it is a good “newcomer contributor” task for someone who already knows C programming and are looking to make a lasting quality impact on a popular FLOSS desktop productivity application that many people love.

Maybe you, or a friend or colleague, would be interested by this challenge (while existing GNOME Calendar contributors keep fixing bugs at the same time)? This blog post is retootable if you want to spread the word there, too.

Ready to help us improve reliability?

If the above sounds like an interesting way to help the project, read the detailed ticket for more details, join us on the cozy “#gnome-calendar:gnome.orgMatrix channel (mapped to the #gnome-calendar IRC channel on irc.libera.chat) and get started!

by Jeff at September 06, 2023 02:12 PM

August 25, 2023

Jean-François Fortin TamWhy I picked the biggest furry elephant as my microblogging platform (and refuse to self-host a Mastodon server)

This article will require between 1 and 2 minutes of your attention if you read only the first half; obviously double that if you also feel like reading the second (more philosophical & strategic) half.


As you may know, in addition to this blog here, I have also been microblogging very actively for years (whether on Twitter or on LinkedIn), particularly the day-to-day / work-in-progress of my Free & Open Source software contributions across GNOME and the FreeDesktop, and that habit shall outlive Twitter’s 2022-2023 chaotic hostile takeover and sabotage by its new majority shareholder/owner. I have (reluctantly) found refuge in the shire that is the fediverse, a quirky platform filled with countless technical & usability challenges, but eh, what else have we got left? Tis the last bastion we have (we’ll see what happens when Meta/Facebook “enters the chat”, will it be like what happened with XMPP? 🤷)…

MAY I JOIN YOU” by David Revoy − CC-BY-SA 4.0, with fair-use elements

And so, some of you might be pleased to hear that I have been dragged—kicking and screaming—to the Mastodon, as my replacement for the Twitter. I just, uh, “forgot” to tell y’all.

Seriously, it was a really long and busy R&D winter! I didn’t have time to announce this properly here; when registrations reopened and I finally could sign up for the largest* general-purpose instance (maintained by the nonprofit Mastodon gGmbH), it was already “income tax season” and I was busy helping the GNOME Calendar, Nautilus, and Epiphany projects… so I pretty much only had time to change my Twitter profile banner to this picture (because any blog post like what you’re reading is at least 5 to 7 hours of work):

A profile banner on my old social media microblogging account, pointing to my personal Mastodon account

So yeah, follow my personal Mastodon account if you hadn’t found me already (and if you ain’t on Mastodon, I guess there’s still this blog’s email notification system in addition to RSS).

Due to the inherent cultural difference of Mastodon (more on that below), my account there is somewhat more specialized than it was on Twitter. As stated in my introduction toot, I focus mainly on creative, positive & uplifting stuff. It’s more fun, and it pretty much allows me to not worry about the whole “contents warning” minefield. My account is primarily a way for me to share:

  • My findings and contributions to Free & Open Source software “as they happen in the day-to-day”, particularly around GNOME & FreeDesktop-related projects on Linux; I often post my bug reports there, particularly if it is about usability & interaction design, or performance optimization. I sometimes publish awareness and advocacy posts related to such topics.
  • Occasionally, posts related to cycling, ecology, sustainability, gardening, urbanism and societal improvements.

If you’d be interested in things beyond FLOSS and casual life updates, you can also follow my three FLOSS-friendly businesses:

As you can see, those three extra accounts have been very quiet. They are not going to be corporatey spammage / annoyances, they are simply intended to be a way for interested parties to be notified when I have some neat accomplishments to show (whether in FLOSS-related marketing work, or in other industries) or when we publish a new article in those various fields of work. I don’t expect that the Mastodon/fediverse crowd will show much interest in those, but I’ll be flattered if some of you do!


While I kept servicing the Twitter accounts at the same time as the Mastodon accounts for a couple of months, lately it has been abundantly clear that Twitter has truly become a deserted place, and the continued sabotage of its features & reliability has made it pretty much impossible to use professionally and personally.

That’s the end of the article if you simply want to follow my adventures… (◕‿◕✿)

Below are some short philosophical & sustainability observations on the whole situation.


Brief thoughts on the societal bubble of the fediverse

As you can see by my high amount of activity on my personal Mastodon account, I have taken a certain liking to it, even though I was sure I would hate it. I guess this platform sort of works if your audience is primarily FLOSS enthusiasts and you spent months studying the cultural & technical quirks. And yet, I am not blind to the fact that I happen to live in one of the very few bubbles that had high affinity with Mastodon to begin with.

I still miss the ability to do topical research and feel the “pulse of the planet“, and to follow more “mainstream” people like my local policymakers and organizations.

  • Pretty much none of the local architects, urbanists, public institutions, mayors, journalists, businesses, neighbors, and many other “normal” people, are there to engage with. Hoping any of them will join—when even I didn’t want to join unless I had no other options left—is wishful thinking; talking about the fediverse to normies gets you the same blank stares as when you say you run Linux and use FLOSS software—completely abstract to them even if you explain, and they will never care beyond the two-minutes casual conversation. At best they will say “Are you a hacker?!” and then you inevitably say, “Depends on your definition of the word hacker…”
  • There could be a major event occuring in my neighborhood and I would never hear about it in realtime through the fediverse. My neighborhood has over 20 thousand people living there—with higher density than Brooklyn—and yet I know of exactly two other individuals from my neighborhood who are on Mastodon… and of course they work in tech.

Twitter’s global and serendipitous system was the biggest value “for users” (especially journalists) that I feel we have lost, and there is no alternative.

Some thoughts on instances’ sustainability

*: And now, a big footnote for those who wonder why I specifically picked the biggest instance I could, and who might say:

Hipster Ariel says,

I do not intend to switch to some niche Mastodon server/instance, unless my life depends on it.

  • There are many technical (and social) reasons why I do not want to be confined to a tiny server where it feels like a Linuxcon happening in a deserted strip mall, and you experience the full extent of decentralization bugs & technical limitations (I have at least two dozen bookmarks to bug reports related to that);
  • I don’t want to lawyer up to figure out which server to pick, and to be subjected to complicated and/or arbitrary rules (FOSStodon users have been learning this lesson in the last few days; I already learned the hard way in 2020 that there was such a thing as picking the “wrong” server or username and realizing that you can’t cleanly change some things)
  • I don’t want to be dependent on a tiny overworked team of volunteers burning out, suddenly disappearing along with the instance, or getting hit by a bus.

Decentralization is very neat conceptually, but I’ll take my chances with the biggest mainstream instance that has the best odds of still being around and able to pay their servers & moderators in two years. Remember, kids, Twitter was not just a failwhale propped up by invisible ropes and ducktape, it was also a white elephant whose value never was in its FUBAR software & infrastructure, but in the brand safety (for advertisers) provided by its paid moderation team, and the handling of legal/moral/safety/security/liability nightmares that user-generated content represent. As I summarized my readings in that tweet, back when tweeting was still a thing:

A couple of years prior, I had also educated various clients about the risks and liabilities of running social media platforms (my clients were, as a result, in a better position to make their own decisions according to their own objectives, threat models, resources and risk tolerance).

I’ve seen enough safety & security nightmares out there throughout the years, so personally speaking, I’ll save myself that hassle, at least; I’m not going to be running my own fediverse instances.

by Jeff at August 25, 2023 05:23 PM

August 21, 2023

Nirbheek ChauhanWhat is WebRTC? Why does it need ‘Signalling’?

If you’re new to WebRTC, you must’ve heard that it’s a way to do video calls in your browser without needing to install an app. It’s pretty great!

However, it uses a bunch of really arcane terminology because it builds upon older technologies such as RTP, RTCP, SDP, ICE, STUN, etc. To understand what WebRTC Signalling is, you must first understand these foundational technologies.

Readers who are well-versed in this subject might find some of the explanations annoyingly simplistic to read. They will also notice that I am omitting a lot of of detail, leading to potentially misleading statements.

I apologize in advance to these people. I am merely trying to avoid turning this post into a book. If you find a sub-heading too simplistic, please feel free to skip it. :-)

RTP

Real-time Transport Protocol is a standardized way of taking video or audio data (media) and chopping it up into “packets” (you can literally think of them as packets / parcels) that are sent over the internet using UDP. The purpose is to try and deliver them to the destination as quickly as possible.

UDP (user datagram protocol) is a packet-based alternative to TCP (transmission control protocol), which is connection-based. So when you send something to a destination (IP address + port number), it will be delivered if possible but you have no protocol-level mechanism for finding out if it was received, unlike, say, TCP ACKs.

You can think of this as chucking parcels over a wall towards someone whom you can’t see or hear. A bunch of them will probably be lost, and you have no straightforward way to know how many were actually received.

UDP is used instead of TCP for a number of reasons, but the most important ones are:

  1. TCP is designed for perfect delivery of all data, so networks will often try too hard to do that and use ridiculous amounts of buffering (sometimes 30 seconds or more!), which leads to latencies that are too large for two people to be able to talk over a call.

  2. UDP doesn’t have that problem, but the trade-off is that it gives no guarantees of delivery at all!

    You’d be right to wonder why nothing new has been created to be a mid-way point between these two extremes. The reason is that new transport protocols don’t get any uptake because existing systems on the Internet (operating systems, routers, switches, etc) don’t (want to) support them. This is called Protocol ossification, and it's a big problem for the Internet.

    Due to this, new protocols are just built on top of UDP and try to add mechanisms to detect packet loss and such. One such mechanism is…

RTCP

RTP Control Protocol refers to standardized messages (closely related to RTP) that are sent by a media sender to all receivers, and also messages that are sent back by the receiver to the sender (feedback). As you might imagine, this message-passing system has been extended to do a lot of things, but the most important are:

  1. Receivers use this to send feedback to the sender about how many packets were actually received, what the latency was, etc.
  2. Senders send information about the stream to receivers using this, for instance to synchronize audio and video streams (also known as lipsync), to tell receivers that the stream has ended (a BYE message), etc.

Similar to RTP, these messages are also sent over UDP. You might ask “what if these are lost too”? Good question!

RTCP packets are sent at regular intervals, so you’d know if you missed one, and network routers and switches will prioritize RTCP packets over other data, so you’re unlikely to lose too many in a row unless there was a complete loss of connectivity.

Peer

WebRTC is often called a “peer-to-peer” (P2P) protocol. You might’ve heard that phrase in a different context: P2P file transfer, such as Bittorrent.

The word “peer” contrasts with “server-client” architectures, in which “client” computers can only talk to (or via) “server” computers, not directly to each other.

We can contrast server-client architecture with peer-to-peer using a real-world example:

  • If you send a letter to your friend using a postal service, that’s a server-client architecture.
  • If you leave the letter in your friend’s mailbox yourself, that’s peer-to-peer.

But what if you don’t know what kind of messages the recipient can receive or understand? For that we have…

SDP

Stands for Session Description Protocol which is a standardized message format to tell the other side the following:

  • Whether you want to send and/or receive, audio and/or video
  • How many streams of audio and/or video you want to send / receive
  • What formats you can send or receive, for audio and/or video

This is called an “offer”. Then the other peer uses the same message format to reply with the same information, which is called an “answer”.

This constitutes media “negotiation”, also called “SDP exchange”. One side sends an “offer” SDP, the other side replies with an “answer” SDP, and now both sides know what to do.

As you might expect, there’s a bunch of other technical details here, and you can know all about them at this excellent page that explains every little detail. It even explains the format for ICE messages! Which is…

ICE

Interactive Connectivity Establishment is a standardized mechanism for peers to tell each other how to transmit and receive UDP packets. The simplest way to think of it is that it’s just a list of IP address and port pairs.

Once both sides have successfully sent each other (“exchanged”) ICE messages, both sides know how to send RTP and RTCP packets to each other.

Why do we need IP address + port pairs to know how to send and receive packets? For that you need to understand…

How The Internet Works

If you’re connected to the internet, you always have an IP address. That’s usually something like 192.168.1.150 – a private address that is specific to your local (home) network and has no meaning outside of that. Having someone’s private IP address is basically like having just their house number but no other parts of their address, like the street or the city. Useful if you're living in the same building, but not otherwise.

Most personal devices (computer or phone or whatever) with access to the Internet don’t actually have a public IP address. Picking up the analogy from earlier, a public IP address is the internet equivalent of a full address with a house number, street address, pin code, country.

When you want to connect to (visit) a website, your device actually talks to an ISP (internet service provider) router, which will then talk to the web server on your behalf and ask it for the data (website in this case) that you requested. This process of packet-hopping is called “routing” of network packets.

This ISP router with a public address is called a NAT (Network Address Translator). Like the name suggests, its job is to translate the addresses embedded in packets sent to it from public to private and vice-versa.

Let’s say you want to send a UDP packet to www.google.com. Your browser will resolve that domain to an IP address, say 142.250.205.228. Next, it needs a port to send that packet to, and both sides have to pre-agree on that port. Let’s pick 16789 for now.

Your device will then allocate a port on your device from which to send this packet, let’s say 11111. So the packet header looks a bit like this:

From To
192.168.1.150:11111 142.250.205.228:16789

Your ISP’s NAT will intercept this packet, and it will replace your private address and port in the From field in the packet header to its own public address, say 169.13.42.111, and it will allocate a new sender port, say 22222:

From To
169.13.42.111:22222 142.250.205.228:16789

Due to this, the web server never sees your private address, and all it can see is the public address of the NAT.

When the server wants to reply, it can send data back to the From address, and it can use the same port that it received the packet on:

From To
142.250.205.228:16789 169.13.42.111:22222

The NAT remembers that this port 22222 was recently used for your From address, and it will do the reverse of what it did before:

From To
142.250.205.228:16789 192.168.1.150:11111

And that’s how packets are send and received by your phone, computer, tablet, whatever when talking to a server.

Since at least one side needs to have a public IP address for this to work, how can your phone send messages to your friend’s phone? Both only have private addresses.

Solution 1: Just Use A Server As A Relay

The simplest solution is to have a server in the middle that relays your messages. This is how all text messaging apps such as iMessage, WhatsApp, Instagram, Telegram, etc work.

You will need to buy a server with a public address, but that’s relatively cheap if you want to send small messages.

For sending RTP (video and audio) this is accomplished with a TURN (Traversal Using Relays around NAT) server.

Bandwidth can get expensive very quickly, so you don’t want to always use a TURN server. But this is a fool-proof method to transmit data, so it’s used a backup.

Solution 2: STUN The NAT Into Doing What You Want

STUN stands for “Simple Traversal of UDP through NATs”, and it works due to a fun trick we can do with most NATs.

Previously we saw how the NAT will remember the mapping between a “port on its public address” and “your device’s private address and port”. With many NATs, this actually works for any packet sent on that public port by anyone.

This means if a public server can be used to create such mappings on the NATs of both peers, then the two can send messages to each other from NAT-to-NAT without a relay server!

Let’s dig into this, and let’s substitute hard-to-follow IP addresses with simple names: AlicePhone, AliceNAT, BobPhone, BobNAT, and finally STUNServer:19302.

First, AlicePhone follows this sequence:

  1. AlicePhone sends a STUN packet intended for STUNServer:19302 using UDP

    From To
    AlicePhone:11111 STUNServer:19302
  2. AliceNAT will intercept this and convert it to:

    From To
    AliceNAT:22222   STUNServer:19302
  3. When STUNServer receives this packet, it will know that if someone wants to send a packet to AlicePhone:11111, they could use AliceNAT:22222 as the To address. This is an example of an ICE candidate.

  4. STUNServer will then send a packet back to AlicePhone with this information.

Next, BobPhone does the same sequence and discovers that if someone wants to send a packet to BobPhone:33333 they can use BobNAT:44444 as the To address. This is BobPhone’s ICE candidate.

Now, AlicePhone and BobPhone must exchange these ICE candidates.

How do they do this? They have no idea how to talk to each other yet.

The answer is… they Just Use A Server As A Relay! The server used for this purpose is called a Signalling Server.

Note that these called “candidates” because this mechanism won’t work if one of the two NATs changes the public port also based on the public To address, not just the private From address. This is called a Symmetric NAT, and in these (and other) cases, you have to fallback to TURN.

Signalling Server

Signalling is a technical term that simply means: “a way to pass small messages between peers”. In this case, it’s a way for peers to exchange SDP and ICE candidates.

Once these small messages have been exchanged, the peers know how to send data to each other over the internet without needing a relay.

Now open your mind: you could use literally any out of band-mechanism for this. You can use Amazon Kinesis Video Signalling Channels. You can use a custom websocket server or a ProtoBuf server.

Heck, Alice and Bob can copy/paste these messages into iMessage on both ends. In theory, you can even use carrier pigeons — it’ll just take a very long time to exchange messages 😉

That’s it, this is what Signalling means in a WebRTC context, and why it’s necessary for a successful connection!

What a Signalling Server gives you on top of this is state management: checking whether a peer is allowed to send messages to another peer, whether a peer is allowed to join a call, can be invited to a call, which peers are in a call right now, etc.

Based on your use-case, this part can be really easy to implement or really difficult and heavy in corner-cases. Most people can get away with a really simple protocol, just by adding authorization to this multi-party protocol I wrote for the GStreamer WebRTC multiparty send-receive examples. More complex setups require a more bespoke solution, where all peers aren’t equal.

by Nirbheek (noreply@blogger.com) at August 21, 2023 06:25 PM

August 17, 2023

Arun RaghavanTo Conference Organisers Everywhere…

(well, not exactly everywhere …)

This is not an easy post for me to write, being a bit of a criticism / “you can do better” note for organisers of conferences that cater to a global community.

It’s not easy because most of the conferences I attend are community driven, and I have helped organise community conferences in the past. It is a thankless job, a labour of love, and you mostly do not get to enjoy the fruits of that labour as others do.

The problem is that these conferences end up inadvertently excluding members who live in, for lack of a better phrase, the Global South.

Visas

It always surprises me when I meet someone who doesn’t realise that I can’t just book tickets to go anywhere in the world. Not because this is information that everyone should be aware of, but because this is such a basic aspect of travel for someone like me. As a holder of an Indian passport, I need to apply for a visa to travel to … well most countries.

The list of countries that require a visa are clearly defined by post-colonial geopolitics, this is just a fact of life and not something I can do very much about.

Getting a Visa

Applying for a visa is a cumbersome and intrusive process that I am now used to. The process varies from country to country, but it’s usually something like:

  • Get an invitation letter from conference organisers
  • Book all the tickets and accommodation for the trip
  • Provide bank statements for 3-6 months, income tax returns for ~3 years (in India, those statements need attestation by the bank)
  • Maybe provide travel and employment history for the past few years (how many years depends on the country)
  • Get an appointment from the embassy of the country you’re traveling to (or their service provider)
  • Submit your passport and application
  • Maybe provide documentation that was not listed on the provider’s website
  • Wait for your passport with visa (if granted) to be mailed back to you

The duration of visa (that is how long you can stay in the country) depends on the country.

In the EU, I am usually granted a visa for the exact dates of travel (so there is no flexibility to change plans). The UK allows you to pay more for a longer visa.

The US and Canada grant multi-year visas that allow one to visit for up to 6 months by default (in the US, whether you are permitted to enter and how long you may stay are determined by the person at the border).

Timelines

Now we get to the crux of the problem: this process can take anywhere from a few days (if you are very lucky) to a few months (if you are not).

Appointments are granted by the embassy or the third party that countries delegate application collection to, and these may or may not be easily available. Post-pandemic, I’ve seen that several embassies just aren’t accepting visitor visa appointments or have a multi-month wait.

If you do get an appointment, the processing time can vary again. Sometimes, it’s a matter of a few days, sometimes a few weeks. A lot of countries I have applied to recommend submitting your application at least 6 weeks in advance (this is from the date of your visa appointment which might be several weeks in the future).

Conference Schedules

If you’re organising a conference, there are a few important dates:

  • When the conference dates are announced
  • When the call for participation goes out
  • When it ends
  • When speakers are notified
  • The conference itself

These dates are based on a set of complex factors — venue availability and confirmation, literally writing and publishing all the content of the website, paper committee availability, etc.

But if you’re in my position, you need at least 2-3 months between the first and the last step. If your attendance is conditional on speaking at the conference (for example, if your company will only sponsor you if you’re speaking), then you need a minimum of 2-3 months between when speakers are notified and the conference starts.

From what I see, this is not something that is top-of-mind for conference organisers. That may happen for a host of perfectly understandable reasons, but it also has a cost to the community and individuals who might want to participate.

Other Costs

Applying for a visa costs money. This can be anything from a few hundred to over a 1000 US dollars.

It also costs you time — filling in the application, getting all the documentation in place, getting a physical visa photo (must be no older than 6 months), traveling to an appointment, waiting in line, etc. This can easily be a matter of a day if not more.

Finally, there is an emotional cost to all this — there is constant uncertainty during the process, and a visa rejection means every visa you apply for thereafter needs you to document that rejection and reason. And you may find out just days before your planned travel whether you get to travel or not.

What Can One Do?

All of this clearly sucks, but the problem of visas is too big and messy for any of us to have any real impact on, at least in the short term. But if you’re organising a conference, and you want a diverse audience, here are a few things you can do:

  • Announce the dates of the conference as early as possible (allows participants to book travel, visa appointments, maybe club multiple conferences)
  • Provide invitation letters in a timely manner
  • Call for participation as early as possible
  • Notify speakers as soon as you can

I know of conferences that do some if not all of these things — you know who you are and you have my gratitude for it.

If you made it this far, thank you for reading.

by Arun at August 17, 2023 09:42 PM

August 14, 2023

Bastien NoceraNew responsibilities

(Bastien Nocera)

As part of the same process outlined in Matthias Clasen's "LibreOffice packages" email, my management chain has made the decision to stop all upstream and downstream work on desktop Bluetooth, multimedia applications (namely totem, rhythmbox and sound-juicer) and libfprint/fprintd. The rest of my upstream and downstream work will be reassigned depending on Red Hat's own priorities (see below), as I am transferred to another team that deals with one of a list of Red Hat’s priority projects.

I'm very disappointed, because those particular projects were already starved for resources: I spent less than 10% of my work time on them in the past year, with other projects and responsibilities taking most of my time.

This means that, in the medium-term at least, all those GNOME projects will go without a maintainer, reviewer, or triager:
- gnome-bluetooth (including Settings panel and gnome-shell integration)
- totem, totem-pl-parser, gom
- libgnome-volume-control
- libgudev
- geocode-glib
- gvfs AFC backend

Those freedesktop projects will be archived until further notice:
- power-profiles-daemon
- switcheroo-control
- iio-sensor-proxy
- low-memory-monitor

I will not be available for reviewing libfprint/fprintd, upower, grilo/grilo-plugins, gnome-desktop thumbnailer sandboxing patches, or any work related to XDG specifications.

Kernel work, reviews and maintenance, including recent work on SteelSeries headset and Logitech devices kernel drivers, USB revoke for Flatpak Portal support, or core USB is suspended until further notice.

All my Fedora packages were orphaned about a month and a half ago, it's likely that there are still some that are orphaned, if there are takers. RHEL packages were unassigned about 3 weeks ago, they've been reassigned since then, so I cannot point to the new maintainer(s).

If you are a partner, or a customer, I would recommend that you get in touch with your Red Hat contacts to figure out what the plan is going forward for the projects you might be involved with.

If you are a colleague that will take on all or part of the 90% of the work that's not being stopped, or a community member that was relying on my work to further advance your own projects, get in touch, I'll do my best to accommodate your queries, time permitting.

I'll try to make sure to update this post, or create a new one if and when any of the above changes.

by Bastien Nocera (noreply@blogger.com) at August 14, 2023 10:31 AM

August 09, 2023

Jean-François Fortin TamPlease help test (and fix) GTG’s GTK 4 port

As you know, even with a “simple” language like Python, porting a desktop application to a new version of GTK can be a pretty significant amount of work; doubly so when it is accompanied by major refactoring of the app itself at the same time.

In Getting Things GNOME‘s case, this has been over a year in the making, for various technical and personal/life reasons.

We need your help to determine when/if it would be “safe” to merge our long-running core rewrite & GTK 4 port branch to GTG‘s main branch and repository. We think it mostly works, but we can’t be sure until you try it out (preferrably while using a copy of your real world data file).

If you have an up-to-date Linux distro with recent versions of GTK (ex: 4.10.4+), and would like to help test & fix the upcoming GTK 4 version of GTG, check out this branch / PR and give it a try in the coming days. You can then help by:

  • Finding and reporting remaining showstopper issues: problems specific to that branch should be reported here (in Diego’s repository) for now.
    • We’re looking for problems that would be considered unacceptable/non-functional for the sake of merging to the main repository. Especially, anything related to data or dangerously broken behavior, if any.
    • Broken plugins are acceptable at this point.
    • If you can’t find any unreported issues after extensive testing… let us know too, it’s nice to know our code is perfect 😉
  • Sending merge requests to Diego’s repository (check the issues and existing PRs, if any, to avoid duplicate work) would help a ton, of course, because we’ve been stretched very thin lately.

Being able to confidently merge this branch would make it much easier for others in the GTG community to have a stable base to work on, and provide the remaining fixes to bring version 0.7 to the finish line.


P.s.: remember that we also have a Matrix discussion channel (see the GTG wiki page for details) if you’d like to coordinate some aspects in a more informal and interactive manner before posting to GitHub.

by Jeff at August 09, 2023 06:56 PM