July 07, 2020

Jean-François Fortin TamRebuild of EvanGTGelion: Getting Things GNOME 0.4 released!

We are very proud to be announcing today the 0.4 release of Getting Things GNOME (“GTG”), codenamed “You Are (Not) Done”. This much-awaited release is a major overhaul that brings together many updates and enhancements, including new features, a modernized user interface and updated underlying technology.

ScreenshotScreenshot of GTG 0.4

Beyond what is featured in these summarized release notes below, GTG itself has undergone over 630 changes affecting over 500 files, and received hundreds of bug fixes, as can be seen on here and here.

We are thankful to all of our supporters and contributors, past and present, who made GTG 0.4 possible. Check out the “About” dialog for a list of contributors for this release.

A summary of GTG’s development history and a high-level explanation of its renaissance can be seen in this teaser video:

A demonstration video provides a tour of GTG’s current features:

A few words about the significance of this release

As a result of the new lean & agile project direction and contributor workflow we have formalized, this release—the first in over 6.5 years—constitutes a significant milestone in the revival of this community-driven project.

This milestone represents a very significant opportunity to breathe new life into the project, which is why I made sure to completely overhaul the “contributor experience”, clarifying and simplifying the process for new contributors to make an impact.

I would highly encourage everybody to participate towards the next release. You can contribute all sorts of improvements to this project, be it bug fixes or new features, adopting one of the previous plugins, doing translation and localization work, working on documentation, or spreading the word. Your involvement is what makes this project a success.

— Jeff (yours truly)

“When I switched from Linux to macOS a few years ago, I never found a todo app that was as good as GTG, so I started to try every new shiny (and expensive) thing. I used Evernote, Todoist, Things and many others. Nothing came close. I spent the next 6 years trying every new productivity gadget in order to find the perfect combo.
In 2019, Jeff decided to take over GTG and bring it back from the grave. It’s a strange feeling to see your own creation continuing in the hands of others. Living to see your software being developed by others is quite an accomplishment. Jeff’s dedication demonstrated that, with GTG, we created a tool which can become an essential part of chaos warrior’s productivity system. A tool which is useful without being trendy, even years after it was designed. A tool that people still want to use. A tool that they can adapt and modernise. This is something incredible that can only happen with Open Source.”

— Lionel Dricot, original author of GTG (quote edited with permission)

It has been over seven years since the 0.3rd impact. This might very well be the Fourth Impact. Shinji, get in the f%?%$ing build bot!

— Gendo Ikari

Release notes

Technology Upgrades

GTG and libLarch have been fully ported to Python 3, GTK 3, and GObject introspection (PyGI).

User Interface and Frontend Improvements

General UI overhaul

The user interface has been updated to follow the current GNOME Human Interface Guidelines (HIG), style (see GH GTG PR #219 and GH GTG PR #235 for context) and design patterns:

  • Client-side window decorations using the GTK HeaderBar widget. Along with the removal of the menu bars, this saves a significant amount of space and allows for more content to be displayed on screen.
  • The Preferences dialog was redesigned, and its contents cleaned up to remove obsolete settings (see GH GTG PR #227).
  • All windows are properly parented (set as transient) with the main window, so that they can be handled better by window managers.
  • Symbolic icons are available throughout the UI.
  • Improvements to padding and borders are visible throughout the application.

Main window (“Task Browser”)

  • The menu bar has been replaced by a menu button. Non-contextual actions (for example: toggle Sidebar, Plugins, Preferences, Help, and About) have been moved to the main menu button.
  • Searching is now handled through a dedicated Search Bar that can be toggled on and off with the mouse, or the Ctrl+F keyboard shortcut.
  • The “Workview” mode has been renamed to the “Actionable” view. “Open”, “Actionable”, and “Closed” tasks view modes are available (see GH GTG PR #235).
  • An issue with sorting tasks by title in the Task Browser has been fixed: sorting is no longer case-sensitive, and now ignores tag marker characters (GH GTG issue #375).
  • Start/Due/Closed task dates now display as properly translated in the Task Browser (GH GTG issue #357)
  • In the Task Browser’s right-click context menus, more start/due dates choices are available, including common upcoming dates and a custom date picker (GH GTG issue #244).

Task Editor

  • The Calendar date picker pop-up widgets have been improved (see GH GTG PR #230).
  • The Task Editor now attempts to place newly created windows in a more logical way (GH GTG issue #287).
  • The title (first line of a task) has been changed to a neutral black header style, so that it doesn’t look like a hyperlink.

New Features

  • You can now open (or create) a task’s parent task (GH GTG issue #138).
  • You can now select multiple closed tasks and perform bulk actions on them (GH GTG issue #344).
  • It is now possible to rename or delete tags by right-clicking on them in the Task Browser.
  • You can automatically generate and assign tag colors. (LP GTG issue #644993)
  • The Quick Add entry now supports emojis 🤩
  • The Task Editor now provides a searchable “tag picker” widget.
  • The “Task Reaper” allows deleting old closed tasks for increased performance. Previously available as a plugin, it is now a built-in feature available in the Preferences dialog (GH GTG issue #222).
  • The Quick Deferral (previously, the “Do it Tomorrow” plugin) is now a built-in feature. It is now possible to defer multiple tasks at once to common upcoming days or to a custom date (GH GTG issue #244).
  • In the unlikely case where GTG might encounter a problem opening your data file, it will automatically attempt recovery from a previous backup snapshot and let you know about it (LP GTG issue #971651)

Backend and Code Quality improvements

  • Updates were made to overall code quality (GH GTG issue #237) to reduce barriers to contribution:
    • The code has been ported to use GtkApplication, resulting in simpler and more robust UI code overall.
    • GtkBuilder/Glade “.ui” files have been regrouped into one location.
    • Reorganization of various .py files for consistency.
    • The debugging/logging system has been simplified.
    • Various improvements to the test suite.
    • The codebase is mostly PEP8-compliant. We have also relaxed the PEP8 max line length convention to 100 characters for readability, because this is not the nineties anymore.
  • Support is available for Tox, for testing automation within virtualenvs (see GH GTG PR #239).
  • The application’s translatable strings have been reviewed and harmonized, to ensure the entire application is translatable (see GH GTG PR #346).
  • Application CSS has been moved to its own file (see GH GTG PR #229).
  • Outdated plugins and synchronization services have been removed (GH GTG issue #222).
  • GTG now provides an “AppData” (FreeDesktop AppStream metadata) file to properly present itself in distro-agnostic software-centers.
  • The Meson build system is now supported (see GH GTG PR #315).
    • The development version’s launch script now allows running the application with various languages/locales, using the LANG environment variable for example.
    • Appdata and desktop files are named based on the chosen Meson profile (see GH GTG PR #349).
    • Depending on the Meson profile, the HeaderBar style changes dynamically to indicate when the app is run in a dev environment, such as GNOME Builder (GH GTG issue #341).

Documentation Updates

  • The user manual has been rewritten, reorganized, and updated with new images (GH GTG issue #243). It is also now available as an online publication.
  • The contributor documentation has been rewritten to make it easier for developers to get involved and to clarify project contribution guidelines  (GH GTG issue #200). Namely, updates were made to the README.md file to clarify the set-up process for the development version, as well as numerous new guides and documentation for contributors in the docs/contributors/ folder.

Infrastructure and other notable updates

  • The entire GTG GNOME wiki site has been updated (GH GTG issue #200), broken links have been fixed, references to the old website have been removed.
  • We have migrated from LaunchPad to GitHub (and eventually GitLab), so references to LaunchPad have been removed.
  • We now have social media accounts on Mastodon and Twitter (GH GTG issue #294).
  • Flatpak packages on Flathub are going to be our official direct upstream-to-user software distribution mechanism (GH GTG issue #233).

Notice

In order to bring this release out of the door, some plugins have been disabled and are awaiting adoption by new contributors to test and maintain them. Please contribute to maintain your favorite plugin. Likewise, we had to remove the DBus module (and would welcome help to bring it back into a better shape, for those who want to control the app via DBus).


Getting and installing GTG 0.4

We hope to have our flatpak package ready in time for this announcement, or shortly afterwards. See the install page for details.


Spreading this announcement

We have made some social postings on Twitter, on Mastodon and on LinkedIn that you can re-share/retweet/boost. Please feel free to link to this announcement on forums and blogs as well!

The post Rebuild of EvanGTGelion: Getting Things GNOME 0.4 released! appeared first on The Open Sourcerer.

by Jeff at July 07, 2020 12:00 PM

July 06, 2020

GStreamerGStreamer Rust bindings 0.16.0 release

(GStreamer)

A new version of the GStreamer Rust bindings, 0.16.0, was released.

As usual this release follows the latest gtk-rs release.

This is the first version that includes optional support for new GStreamer 1.18 APIs. As GStreamer 1.18 was not released yet, these new APIs might still change. The minimum supported version of the bindings is still GStreamer 1.8 and the targetted GStreamer API version can be selected by applications via feature flags.

Apart from this, new version features mostly features API cleanup and the addition of a few missing APIs. The focus of this release was to make usage of GStreamer from Rust as convenient and complete as possible.

The new release also brings a lot of bugfixes, most of which were already part of the 0.15.x bugfix releases.

A new release of the GStreamer Rust plugins will follow in the next days.

Details can be found in the release notes for gstreamer-rs and gstreamer-rs-sys.

The code and documentation for the bindings is available on the freedesktop.org GitLab

as well as on crates.io.

If you find any bugs, notice any missing features or other issues please report them in GitLab.

July 06, 2020 02:00 PM

GStreamerGStreamer 1.17.2 unstable development release

(GStreamer)

The GStreamer team is pleased to announce the second development release in the unstable 1.17 release series.

The unstable 1.17 release series adds new features on top of the current stable 1.16 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

The unstable 1.17 release series is for testing and development purposes in the lead-up to the stable 1.18 series which is scheduled for release in a few weeks time. Any newly-added API can still change until that point, although it is rare for that to happen.

Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

The autotools build has been dropped entirely for this release, so it's finally all Meson from here on.

This development release is primarily for distributors and early adaptors and anyone who still needs to update their build/packaging setup for Meson.

On the documentation front we have switched away from gtk-doc to hotdoc, but we now provide a release tarball of the built documentation in html and devhelp format, and we recommend distributors switch to that and provide a single gstreamer documentation package in future. Packagers will not need to use hotdoc themselves.

Instead of a gst-validate tarball we now ship a gst-devtools tarball, and the gstreamer-editing-services tarball has been renamed to gst-editing-services for consistency with the module name in Gitlab.

Packagers: please note that plugins may have moved between modules, so please take extra care and make sure inter-module version dependencies are such that users can only upgrade all modules in one go, instead of seeing a mix of 1.17 and 1.16 on their system.

Binaries for Android, iOS, Mac OS X and Windows are also available at the usual location.

Release tarballs can be downloaded directly here:

As always, please let us know of any issues you run into by filing an issue in Gitlab.

July 06, 2020 01:00 PM

July 02, 2020

Phil NormandWeb-augmented graphics overlay broadcasting with WPE and GStreamer

(Phil Normand)

Graphics overlays are everywhere nowadays in the live video broadcasting industry. In this post I introduce a new demo relying on GStreamer and WPEWebKit to deliver low-latency web-augmented video broadcasts.

Readers of this blog might remember a few posts about WPEWebKit and a GStreamer element we at Igalia worked on …

by Philippe Normand at July 02, 2020 01:00 PM

June 28, 2020

Sebastian Pölsterlscikit-survival 0.13 Released

Today, I released version 0.13.0 of scikit-survival. Most notably, this release adds sksurv.metrics.brier_score and sksurv.metrics.integrated_brier_score, an updated PEP 517/518 compatible build system, and support for scikit-learn 0.23.

For a full list of changes in scikit-survival 0.13.0, please see the release notes.

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source following these instructions.

The time-dependent Brier score

The time-dependent Brier score is an extension of the mean squared error to right censored data:

$$ \mathrm{BS}^c(t) = \frac{1}{n} \sum_{i=1}^n I(y_i \leq t \land \delta_i = 1) \frac{(0 - \hat{\pi}(t | \mathbf{x}_i))^2}{\hat{G}(y_i)} + I(y_i > t) \frac{(1 - \hat{\pi}(t | \mathbf{x}_i))^2}{\hat{G}(t)} , $$

where $\hat{\pi}(t | \mathbf{x})$ is a model’s predicted probability of remaining event-free up to time point $t$ for feature vector $\mathbf{x}$, and $1/\hat{G}(t)$ is an inverse probability of censoring weight.

The Brier score is often used to assess calibration. If a model predicts a 10% risk of experiencing an event at time $t$, the observed frequency in the data should match this percentage for a well calibrated model. In addition, the Brier score is also a measure of discrimination: whether a model is able to predict risk scores that allow us to correctly determine the order of events. The concordance index is probably the most common measure of discrimination. However, the concordance index disregards the actual values of predicted risk scores – it is a ranking metric – and is unable to tell us anything about calibration.

Let’s consider an example based on data from the German Breast Cancer Study Group 2.

from sksurv.datasets import load_gbsg2
from sksurv.preprocessing import encode_categorical
from sklearn.model_selection import train_test_split
X, y = load_gbsg2()
X = encode_categorical(X)
X_train, X_test, y_train, y_test = train_test_split(
X, y, stratify=y["cens"], random_state=1)

We want to train a model on the training data and assess its discrimination and calibration on the test data. Here, we consider a Random Survival Forest and Cox’s proportional hazards model with elastic-net penalty.

from sksurv.ensemble import RandomSurvivalForest
from sksurv.linear_model import CoxnetSurvivalAnalysis
rsf = RandomSurvivalForest(max_depth=2, random_state=1)
rsf.fit(X_train, y_train)
cph = CoxnetSurvivalAnalysis(l1_ratio=0.99, fit_baseline_model=True)
cph.fit(X_train, y_train)

First, let’s start with discrimination as measured by the concordance index.

rsf_c = rsf.score(X_test, y_test)
cph_c = cph.score(X_test, y_test)

The result indicates that both models perform equally well, achieving a concordance index of 0.688, which is significantly better than a random model with 0.5 concordance index. Unfortunately, it doesn’t help us to decide which model we should choose. So let’s consider the time-dependent Brier score as an alternative, which asses discrimination and calibration.

We first need to determine for which time points $t$ we want to compute the Brier score for. We are going to use a data-driven approach here by selecting all time points between the 10% and 90% percentile of observed time points.

import numpy as np
lower, upper = np.percentile(y["time"], [10, 90])
times = np.arange(lower, upper + 1)

This returns 1690 time points, for which we need to estimate the probability of survival for, which is given by the survival function. Thus, we iterate over the predicted survival functions on the test data and evaluate each at the time points from above.

rsf_surv_prob = np.row_stack([
fn(times)
for fn in rsf.predict_survival_function(X_test, return_array=False)
])
cph_surv_prob = np.row_stack([
fn(times)
for fn in cph.predict_survival_function(X_test)
])

Note that calling predict_survival_function for RandomSurvivalForest with return_array=False requires scikit-survival 0.13.

In addition, we want to have a baseline to tell us how much better our models are from random. A random model would simply predict 0.5 every time.

random_surv_prob = 0.5 * np.ones((y_test.shape[0], times.shape[0]))

Another useful reference is the Kaplan-Meier estimator, that does not consider any features: it estimates a survival function only from y_test. We replicate this estimate for all samples in the test data.

from sksurv.functions import StepFunction
from sksurv.nonparametric import kaplan_meier_estimator
km_func = StepFunction(*kaplan_meier_estimator(y_test["cens"], y_test["time"]))
km_surv_prob = np.tile(km_func(times), (y_test.shape[0], 1))

Instead of comparing calibration across all 1690 time points, we’ll be using the integrated Brier score (IBS) over all time points, which will give us a single number to compare the models by.

from sksurv.metrics import integrated_brier_score
random_brier = integrated_brier_score(y, y_test, random_surv_prob, times)
km_brier = integrated_brier_score(y, y_test, km_surv_prob, times)
rsf_brier = integrated_brier_score(y, y_test, rsf_surv_prob, times)
cph_brier = integrated_brier_score(y, y_test, cph_surv_prob, times)

The results are summarized in the table below:

RSF Coxnet Random Kaplan-Meier
c-index 0.688 0.688 0.500
IBS 0.194 0.188 0.247 0.217

Despite Random Survival Forest and Cox’s proportional hazards model performing equally well in terms of discrimination, there seems to be a notable difference in terms of calibration, with Cox’s proportional hazards model outperforming Random Survival Forest.

As a final note, I want to clarify that the Brier score is only applicable for models that are able to estimate a survival function. Hence, it currently cannot be used with Survival Support Vector Machines.

June 28, 2020 04:02 PM

June 22, 2020

Michael SheldonQt QML Maps – Using the OSM plugin with API keys

(Michael Sheldon)

For a recent side-project I’ve been working on (a cycle computer for UBPorts phones) I found that when using the QtLocation Map QML element, nearly all the map types provided by the OSM plugin (besides the basic streetmap type) require an API key from Thunderforest. Unfortunately, there doesn’t appear to be a documented way of supplying an API key to the plugin, and the handful of forum posts and Stack Overflow questions on the topic are either unanswered or answered by people believing that it’s not possible. It’s not obvious, but after a bit of digging into the way the OSM plugin works I’ve discovered a mechanism by which an API key can be supplied to tile servers that require one.

When the OSM plugin is initialised it communicates with the Qt providers repository which tells it what URLs to use for each map type. The location of the providers repository can be customised through the osm.mapping.providersrepository.address OSM plugin property, so all we need to do to use our API key is to set up our own providers repository with URLs that include our API key as a parameter. The repository itself is just a collection of JSON files, with specific names (cycle, cycle-hires, hiking, hiking-hires, night-transit, night-transit-hires, satellite, street, street-hires, terrain, terrain-hires, transit, transit-hires) each corresponding to a map type. The *-hires files provide URLs for tiles at twice the normal resolution, for high DPI displays.

For example, this is the cycle file served by the default Qt providers repository:

{
    "UrlTemplate" : "http://a.tile.thunderforest.com/cycle/%z/%x/%y.png",
    "ImageFormat" : "png",
    "QImageFormat" : "Indexed8",
    "ID" : "thf-cycle",
    "MaximumZoomLevel" : 20,
    "MapCopyRight" : "<a href='http://www.thunderforest.com/'>Thunderforest</a>",
    "DataCopyRight" : "<a href='http://www.openstreetmap.org/copyright'>OpenStreetMap</a> contributors"
}

To provide an API key with our tile requests we can simply modify the UrlTemplate:

    "UrlTemplate" : "http://a.tile.thunderforest.com/cycle/%z/%x/%y.png?apikey=YOUR_API_KEY",

Automatic repository setup

I’ve created a simple tool for setting up a complete repository using a custom API key here: https://github.com/Elleo/qt-osm-map-providers

  1. First obtain an API key from https://www.thunderforest.com/docs/apikeys/
  2. Next clone my repository: git clone https://github.com/Elleo/qt-osm-map-providers.git
  3. Run: ./set_api_keys.sh your_api_key (replacing your_api_key with the key you obtained in step 1)
  4. Copy the files from this repository to your webserver (e.g. http://www.mywebsite.com/osm_repository)
  5. Set the osm.mapping.providersrepository.address property to point to the location setup in step 4 (see the QML example below)

QML Example

Here’s a quick example QML app that will make use of the custom repository we’ve set up:

import QtQuick 2.7
import QtQuick.Controls 2.5
import QtLocation 5.10

ApplicationWindow {

    title: qsTr("Map Example")
    width: 1280
    height: 720

    Map {
        anchors.fill: parent
        zoomLevel: 14
        plugin: Plugin {
            name: "osm"
            PluginParameter { name: "osm.mapping.providersrepository.address"; value: "http://www.mywebsite.com/osm_repository" }
            PluginParameter { name: "osm.mapping.highdpi_tiles"; value: true }
        }
        activeMapType: supportedMapTypes[1] // Cycle map provided by Thunderforest
    }
    
}

by Mike at June 22, 2020 04:18 PM

GStreamerGStreamer 1.17.1 unstable development release

(GStreamer)

The GStreamer team is pleased to announce the first development release in the unstable 1.17 release series.

The unstable 1.17 release series adds new features on top of the current stable 1.16 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

The unstable 1.17 release series is for testing and development purposes in the lead-up to the stable 1.18 series which is scheduled for release in a few weeks time. Any newly-added API can still change until that point, although it is rare for that to happen.

Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

The autotools build has been dropped entirely for this release, so it's finally all Meson from here on.

This development release is primarily for distributors and early adaptors and anyone who still needs to update their build/packaging setup for Meson.

On the documentation front we have switched away from gtk-doc to hotdoc, but we now provide a release tarball of the built documentation in html and devhelp format, and we recommend distributors switch to that and provide a single gstreamer documentation package in future.

Instead of a gst-validate tarball we now ship a gst-devtools tarball, and the gstreamer-editing-services tarball has been renamed to gst-editing-services for consistency with the module name in Gitlab.

Packagers: please note that plugins may have moved between modules, so please take extra care and make sure inter-module version dependencies are such that users can only upgrade all modules in one go, instead of seeing a mix of 1.17 and 1.16 on their system.

Binaries for Android, iOS, Mac OS X and Windows are also available at the usual location.

Release tarballs can be downloaded directly here:

As always, please let us know of any issues you run into by filing an issue in Gitlab.

June 22, 2020 02:30 PM

June 16, 2020

Víctor JáquezWebKit Flatpak SDK and gst-build

This post is an annex of Phil’s Introducing the WebKit Flatpak SDK. Please make sure to read it, if you haven’t already.

Recapitulating, nowadays WebKitGtk/WPE developers —and their CI infrastructure— are moving towards to Flatpak-based environment for their workflow. This Flatpak-based environment, or Flatpak SDK for short, can be visualized as a software sandboxed-container, which bundles all the dependencies required to compile, run and debug WebKitGtk/WPE.

In a day-by-day work, this approach removes the potential compilation of the world in order to obtain reproducible builds, improving the development and testing work flow.

But what if you are also involved in the development of one dependency?

This is the case of Igalia’s multimedia team where, besides developing the multimedia features for WebKitGtk and WPE, we also participate in the GStreamer development, the framework used for multimedia.

Because of this, in our workflow we usually need to build WebKit with a fix, hack or new feature in GStreamer. Is it possible to add in Flatpak our custom GStreamer build without messing its own GStreamer setup? Yes, it’s possible.

gst-build is a set of scripts in Python which clone GStreamer repositories, compile them and setup an uninstalled environment. This uninstalled environment allows a transient usage of the compiled framework from their build tree, avoiding installation and further mess up with our system.

The WebKit scripts that wraps Flatpak operations are also capable to handle the scripts of gst-build to build GStreamer inside the container, and, when running WebKit’s artifacts, the scripts enable the mentioned uninstalled environment, overloading Flatpak’s GStreamer.

How do we unveil all this magic?

First of all, setup a gst-build installation as it is documented. In this installation is were the GStreamer plumbing is done.

Later, gst-build operations through WebKit compilation scripts are enabled when the environment variable GST_BUILD_PATH is exported. This variable should point to the directory where the gst-build tree is placed.

And that’s all!

But let’s put these words in actual commands. The following workflow assumes that WebKit repository is cloned in ~/WebKit and the gst-build tree is in ~/gst-build (please, excuse my bashisms).

Compiling WebKitGtk with symbols, using LLVM as toolchain (this command will also compile GStreamer):

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/build-webkit --gtk --debug
...

Running the generated minibrowser (remind GST_BUILD_PATH is required again for a correct linking):

$ GST_BUILD_PATH=/home/vjaquez/gst-build Tools/Scripts/run-minibrowser --gtk --debug
...

Running media layout tests:

$ GST_BUILD_PATH=/home/vjaquez/gst-build ./Tools/Scripts/run-webkit-tests --gtk --debug media

But wait! There’s more...

What if you I want to parametrize the GStreamer compilation. To say, I would like to enable a GStreamer module or disable the built of a specific element.

gst-build, as the rest of GStreamer modules, uses meson build system, so it’s possible to pass arguments to meson through the environment variable GST_BUILD_ARGS.

For example, I would like to enable gstreamer-vaapi 😇

$ cd ~/WebKit
% CC=clang CXX=clang++ GST_BUILD_PATH=/home/vjaquez/gst-build GST_BUILD_ARGS="-Dvaapi=enabled" Tools/Scripts/build-webkit --gtk --debug
...

by vjaquez at June 16, 2020 11:49 AM

June 13, 2020

Phil NormandSetting up Debian containers on Fedora Silverblue

(Phil Normand)

After almost 20 years using Debian, I am trying something different, Fedora Silverblue. However for work I still need to use Debian/Ubuntu from time to time. In this post I am explaining the steps to setup Debian containers on Silverblue.

By default Silverblue comes with Toolbox which perfectly integrates …

by Philippe Normand at June 13, 2020 11:50 AM

June 11, 2020

Jean-François Fortin TamRevival of GTG, status update #2: git ready to test!

As a follow-up to my first global project situation update, I am happy to report great progress towards the successful revival of the GTG project.

You can see that in this fancy-pants teaser trailer (featuring epic music, big explosions and special effects), or this short status update video that also includes the trailer in it:

We’re getting really, really close. Here are some good recent news:

  1. We are seriously running out of bugs for what will become the 0.4 release.
    • I have tried pretty hard to break the git version of GTG and Diego has kept fixing issues faster than I could find new bugs 🤔 At this point, it seems to be quite robust and safe to use, so I need you to test it like maniacs.
    • The rest of the tickets in the issue tracker are all feature requests or non-critical issues that can wait to future releases (such as performance optimizations).
  2. Recently I have completed the reorganization and rewriting of the contributors documentation for the GTG project. Please take the time (7 to 9 minutes) to read that blog post.
  3. Thanks to Danielle Vansia’s diligent work, the effort to update and reorganize the user manual is well underway. I believe more great work is yet to come on that front. In addition to viewing with with Yelp, you can also read it online here.
  4. Thanks to Mart Raudsepp’s invaluable help, GTG now supports the highly-popular Meson build system.
    • That also means GNOME Builder can now build & run GTG directly.
    • You can still run GTG manually like before with the “launch.sh” script; simply ensure you have the “meson” package installed before doing so.
    • Unlike the previous launch script, this also facilitates translation work now as it automatically compiles the translation files and also supports running the development version with a language environment variable, such as “LANG=fr_CA.UTF8 ./launch.sh“!
  5. I have spent a couple of days (including a 9-10 hours nonstop coding session) reworking the code to harmonize, improve and deduplicate translatable strings, and redo the whole French translation (now with more chocolatine) and bring it to 100% completion as a way to test and ensure everything in the UI that can possibly be translated is, indeed, translated (barring one strange bug). I can assure you, the fact that I did not eat for over 53 hours in a row was a mere coïncidence.
  6. We have a Twitter account and a Mastodon account now. Go nuts.
  7. We are supposed to be migrating from GitHub to GNOME’s GitLab instance eventually. We’ll need help.

In prevision for the upcoming 0.4 release, I have also made a new release of libLarch, 3.0.1. This is a picture of me making that libLarch release:

Call for contributors
(testers, hackers, translators, packagers)

Now is a great time to get involved, whether with code, translations, or pre-packaging.

  • Considering that I’ve run out of bugs to report, I want you to start testing GTG’s git version now, and report bugs in GitHub².
    • See the read-me for tips on how to build and run the Git version, or see the footnotes below regarding our flatpak packages¹)
    • If nobody finds serious issues, then we can assume our code is “perfectly stable” and would be ready to make a release “any day now”… well, I still have to research and write release notes before that happens (wanna help?), however.
  • If you are a GNOME translator, now is your call to review and update your translations if you want to squeeze them in before the release scheduled to happen (which, barring the absence of new showstopper bugs, should happen within weeks at most).
    • Yes, I know this isn’t much of an “advance notice” at all, but we live in special times this cycle;
    • Also yes, I know the project is on GitHub instead of GNOME’s gitlab, but I haven’t got the skills and time to fix that myself. I’ll accept translation files thrown at me by email if that makes it any easier. If you are working on the translation for a particular language, you should let others know through this ticket.
  • If you are a Linux package maintainer who wants to be able to offer GTG 0.4 and libLarch 3.0.1 “from day one” or as an update in your distro, you may want to start preparations for packaging this release, considering that GTG and libLarch no longer depend on Python 2 nor GTK 2…

Adopt a a puppy plugin!

In order to be able to move fast towards 0.4 without being tied to single-handedly fixing “everything”, we’ve had to split the plugins (and data/synchronization backends) into a couple of categories: those that we can easily fix, those that are no longer relevant and those that are broken but “probably interesting to some users, while not mission-critical”. Those that were not trivial to fix have been deactivated (moved to the “unmaintained” subfolder)—at least until someone new (you?) cares enough about a particular feature to come fix and maintain it. Adopt a puppy today!

Alternate “backends” are particularly affected by this, as the only backend we’ve left enabled is the default “local storage” backend.

If you care about GTG integrating with Evolution, GNote or Tomboy (Tomboy-NG?), LaunchPad, Mantis, Bugzilla, Hamster, and Remember the Milk (that one seemed like a pretty popular backend), then please step up to contribute fixes and maintainership for your favorite plugin/backend. Otherwise, it will most likely stay deactivated.

You can see the issues related to plugins/backends here.


Footnotes

  1. All development infrastructure has been moved to GitHub; we will be decommissionning everything in LaunchPad (to the extent that it is possible to just “disable” things?) as soon as 0.4 comes out. We are supposed to be migrating from GitHub to GNOME’s GitLab instance next.
  2. One thing that is expected to be particularly important to release 0.4 to a wider audience is offering Flatpak packages. We’re mostly ready for this (see this ticket) but it might take a few more days before we can figure out how to have the nightly/dev package officially published as a flatpak (on Flathub, for example) before the 0.4 release package.
    If you don’t want to wait for that, and want a temporary flatpak to try the git version “now!!”, you can go in this folder, download the flatpak file that is sitting there and run: flatpak install -y --user gtg-git-2020-06-11.flatpak (for example). To uninstall it when you want to switch to a more official flatpak later on, do: flatpak uninstall --user org.gnome.GTGDevel; and if you have ideas on how to improve that flatpak package, feel free to help out in the ticket mentioned above.

P.s.: You might think grabbing a random package file from some obscure folder listing on a website is a bit reminiscent of Windows, and therefore by now you would inevitably be asking, “So where’s the keygen!?!”, but since there isn’t any, I would instead recommend you listen to the piece of music below to have the whole “install apps obtained from a website mentioned in a random post” experience!

                                   
   mmm        mmmmmmm          mmm 
 m"   "          #           m"   "
 #   mm          #           #   mm
 #    #          #           #    #
  "mmm"          #            "mmm"
                                   
                                   
Packaged by Diego.
Greetz to Bilal and the Flathub Team!!

“Against the Time”, by the ORiON group… Because that’s how software was installed back then.

The post Revival of GTG, status update #2: git ready to test! appeared first on The Open Sourcerer.

by Jeff at June 11, 2020 08:24 PM

June 09, 2020

Phil NormandWebKitGTK and WPE now supporting videos in the img tag

(Phil Normand)

Using videos in the <img> HTML tag can lead to more responsive web-page loads in most cases. Colin Bendell blogged about this topic, make sure to read his post on the cloudinary website. As it turns out, this feature has been supported for more than 2 years in Safari, but …

by Philippe Normand at June 09, 2020 04:00 PM

June 08, 2020

Phil NormandIntroducing the WebKit Flatpak SDK

(Phil Normand)

Working on a web-engine often requires a complex build infrastructure. This post documents our transition from JHBuild to Flatpak for the WebKitGTK and WPEWebKit development builds.

For the last 10 years, WebKitGTK has been relying on a custom JHBuild moduleset to handle its dependencies and (try to) ensure a reproducible …

by Philippe Normand at June 08, 2020 04:50 PM

June 03, 2020

Andy Wingoa baseline compiler for guile

(Andy Wingo)

Greets, my peeps! Today's article is on a new compiler for Guile. I made things better by making things worse!

The new compiler is a "baseline compiler", in the spirit of what modern web browsers use to get things running quickly. It is a very simple compiler whose goal is speed of compilation, not speed of generated code.

Honestly I didn't think Guile needed such a thing. Guile's distribution model isn't like the web, where every page you visit requires the browser to compile fresh hot mess; in Guile I thought it would be reasonable for someone to compile once and run many times. I was never happy with compile latency but I thought it was inevitable and anyway amortized over time. Turns out I was wrong on both points!

The straw that broke the camel's back was Guix, which defines the graph of all installable packages in an operating system using Scheme code. Lately it has been apparent that when you update the set of available packages via a "guix pull", Guix would spend too much time compiling the Scheme modules that contain the package graph.

The funny thing is that it's not important that the package definitions be optimized; they just need to be compiled in a basic way so that they are quick to load. This is the essential use-case for a baseline compiler: instead of trying to make an optimizing compiler go fast by turning off all the optimizations, just write a different compiler that goes from a high-level intermediate representation straight to code.

So that's what I did!

it don't do much

The baseline compiler skips any kind of flow analysis: there's no closure optimization, no contification, no unboxing of tagged numbers, no type inference, no control-flow optimizations, and so on. The only whole-program analysis that is done is a basic free-variables analysis so that closures can capture variables, as well as assignment conversion. Otherwise the baseline compiler just does a traversal over programs as terms of a simple tree intermediate language, emitting bytecode as it goes.

Interestingly the quality of the code produced at optimization level -O0 is pretty much the same.

This graph shows generated code performance of the CPS compiler relative to new baseline compiler, at optimization level 0. Bars below the line mean the CPS compiler produces slower code. Bars above mean CPS makes faster code. You can click and zoom in for details. Note that the Y axis is logarithmic.

The tests in which -O0 CPS wins are mostly because the CPS-based compiler does a robust closure optimization pass that reduces allocation rate.

At optimization level -O1, which adds partial evaluation over the high-level tree intermediate language and support for inlining "primitive calls" like + and so on, I am not sure why CPS peels out in the lead. No additional important optimizations are enabled in CPS at that level. That's probably something to look into.

Note that the baseline of this graph is optimization level -O1, with the new baseline compiler.

But as I mentioned, I didn't write the baseline compiler to produce fast code; I wrote it to produce code fast. So does it actually go fast?

Well against the -O0 and -O1 configurations of the CPS compiler, it does excellently:

Here you can see comparisons between what will be Guile 3.0.3's -O0 and -O1, compared against their equivalents in 3.0.2. (In 3.0.2 the -O1 equivalent is actually -O1 -Oresolve-primitives, if you are following along at home.) What you can see is that at these optimization levels, for these 8 files, the baseline compiler is around 4 times as fast.

If we compare to Guile 3.0.3's default -O2 optimization level, or -O3, we see bigger disparities:

Which is to say that Guile's baseline compiler runs at about 10x the speed of its optimizing compiler, which incidentally is similar to what I found for WebAssembly compilers a while back.

Also of note is that -O0 and -O1 take essentially the same time, with -O1 often taking less time than -O0. This is because partial evaluation can make the program smaller, at a cost of being less straightforward to debug.

Similarly, -O3 usually takes less time than -O2. This is because -O3 is allowed to assume top-level bindings that aren't exported from a module can be transformed to lexical bindings, which are more available for contification and inlining, which usually leads to smaller programs; it is a similar debugging/performance tradeoff to the -O0/-O1 case.

But what does one gain when choosing to spend 10 times more on compilation? Here I have a gnarly graph that plots performance on some microbenchmarks for all the different optimization levels.

Like I said, it's gnarly, but the summary is that -O1 typically gets you a factor of 2 or 4 over -O0, and -O2 often gets you another factor of 2 above that. -O3 is mostly the same as -O2 except in magical circumstances like the mbrot case, where it adds an extra 16x or so over -O2.

worse is better

I haven't seen the numbers yet of this new compiler in Guix, but I hope it can have a good impact. Already in Guile itself though I've seen a couple interesting advantages.

One is that because it produces code faster, Guile's boostrap from source can take less time. There is also a felicitous feedback effect in that because the baseline compiler is much smaller than the CPS compiler, it takes less time to macro-expand, which reduces bootstrap time (as bootstrap has to pay the cost of expanding the compiler, until the compiler is compiled).

The second fortunate result is that now I can use the baseline compiler as an oracle for the CPS compiler, when I'm working on new optimizations. There's nothing worse than suspecting that your compiler miscompiled itself, after all, and having a second compiler helps keep me sane.

stay safe, friends

The code, you ask? Voici.

Although this work has been ongoing throughout the past month, I need to add some words on the now before leaving you: there is a kind of cognitive dissonance between nerding out on compilers in the comfort of my home, rain pounding on the patio, and at the same time the world on righteous fire. I hope it is clear to everyone by now that the US police are an essentially racist institution: they harass, maim, and murder Black people at much higher rates than whites. My heart is with the protestors. Godspeed to you all, from afar. At the same time, all my non-Black readers should reflect on the ways they participate in systems that support white supremacy, and on strategies to tear them down. I know I will be. Stay safe, wear eye protection, and until next time: peace.

by Andy Wingo at June 03, 2020 08:39 PM

May 29, 2020

May 28, 2020

Christian SchallerInto the world of Robo vacums and Robo mops

(Christian Schaller)

So this is a blog post not related to Fedora or Red Hat, but rather my personal experience with getting a robo vacuum and robo mop into the house.

So about two Months ago my wife and I decided to get a Robo vacuum while shopping at Costco (a US wholesaler outfit). So we brought home the iRobot Roomba 980. Over the next week we ended up also getting the newer iRobot Roomba i7+ and the iRobot Braava m6 mopping robot. Our dream was that we would never have to vacuum or mop again, instead leaving that to our new robots to handle. With two little kids being able to cut that work from our todo list seemed like a dream come through.

I feel that whenever you get into a new technology it takes some time with your first product in that category to understand what questions to ask and what considerations to make. For instance I feel a lot of more informed and confident in my knowledge about electric cars having owned a Nissan Leaf for a few years now (enough to wish I had a Tesla instead for instance :). I guess our experience with robot vacuums here is similar.

Anyway, if you are considering buying a Robot vacuum or mop I think the first lesson we learned is that it is definitely not a magic solution. You have to prepare your house quite a bit before each run, including obvious things like tidying up anything on the floor like the kids legos etc., to discovering that certain furniture, like the IKEA Poang chairs are mortal enemies with your robo vacuum. We had to put our chair on top of the sofa as the Roomba would get stuck on it every time we tried to vacuum the floor. Also the door mat in front of our entrance door kept having its corners sucked into the vacuum getting it stuck. Anyway, our lesson learned is that vacuuming (or mopping) is not something we can do on an impulse or easily on a schedule, as it takes quite a bit of preparation. If you don’t have small kid leaving random stuff all over the house all the time you might be able to just set the vacuum on a schedule, but for us that has turned out to be a big no :). So in practice we only vacuum at night now when my wife and I have had time to prep the house after the kids have gone to bed.

It is worth nothing that we only got one vacuum now. We got the i7+ after we got the 980 due to realizing that the 980 didn’t have features like the smart map allowing you to for instance vacuum specific rooms. It also had other niceties like self emptying and it was supposed to be more quiet (which is nice when you run it at night). However in our experience it also had a less strong vacuum, so we felt it left more crap on the floor then the older 980 model. So in the end we returned the i7+ in favour of the 980, just because we felt it did a better job at vacuuming. It is quite loud though, so we can hear it very loud and clear up on the second floor while trying to fall asleep. So if you need a quiet house to sleep, this setup is not for you.

Another lesson we learned is that the vacuums or mops do not work great in darkness, so we now have to leave the light on downstairs at night when we want to vacuum or mop the floor. We should be able to automate that using Google Home, so Google Home could turn on the lights, start the vacuum and then once done, turn off the lights again. We haven’t actually gotten around to test that yet though.

As for the mop, I would say that it is not a replacement for mopping yourself, but it can reduce the frequency of you mopping yourself and thus help maintain a nice clear floor for longer after you done a full manual mop yourself. Also the m6 is super sensitive to edges, which I assume is to avoid it trying to mop your rugs and mats, but it also means that it can not traverse even small thresholds. So for us who have small thresholds between our kitchen area and the rest of the house we have to carry the mop over the thresholds and mop the rest of the first floor as a separate action, which is a bit of an annoyance now that we are running these things at night. That said the kitchen is the one room which needs moping more regularly, so in some sense the current setup where the roomba vacuums the whole first floor and the braava mop mops just the kitchen is a workable solution for us. One nice feature here is that they can be set up to run in order, so the mop will only start once the vacuum is done (that feature is the main reason we haven’t tested out other brand mops which might handle the threshold situation better).

So to conclude, would I recommend robot vacuums and robot mops to other parents with you kids? I would say yes, it has definitely helped us keep the house cleaner and nicer and let us spend less time cleaning the house. But it is not a miracle cure in any way or form, it still takes time and effort to prepare and set up the house and sometimes you still need to do especially the mopping yourself to get things really clean. As for the question of iRobot versus other brands I have no input as I haven’t really tested any other brands. iRobot is a local company so their vacuums are available in a lot of stores around me and I drive by their HQ on a regular basis, so that is the more or less random reason I ended up with their products as opposed to competing ones.

by uraeus at May 28, 2020 04:37 PM

May 17, 2020

Sebastian PölsterlSurvival Analysis for Deep Learning Tutorial for TensorFlow 2

A while back, I posted the Survival Analysis for Deep Learning tutorial. This tutorial was written for TensorFlow 1 using the tf.estimators API. The changes between version 1 and the current TensorFlow 2 are quite significant, which is why the code does not run when using a recent TensorFlow version. Therefore, I created a new version of the tutorial that is compatible with TensorFlow 2. The text is basically identical, but the training and evaluation procedure changed.

The complete notebook is available on GitHub, or you can run it directly using Google Colaboratory.

Notes on porting to TensorFlow 2

A nice feature of TensorFlow 2 is that in order to write custom metrics (such as concordance index) for TensorBoard, you don’t need to create a Summary protocol buffer manually, instead it suffices to call tf.summary.scalar and pass it a name and float. So instead of

from sksurv.metrics import concordance_index_censored
from tensorflow.core.framework import summary_pb2
c_index_metric = concordance_index_censored(…)[0]
writer = tf.summary.FileWriterCache.get(output_dir)
buf = summary_pb2.Summary(value=[summary_pb2.Summary.Value(
tag="c-index", simple_value=c_index_metric)])
writer.add_summary(buf, global_step=global_step)

you can just do

from sksurv.metrics import concordance_index_censored
with tf.summary.create_file_writer(output_dir):
c_index_metric = concordance_index_censored(…)[0]
summary.scalar("c-index", c_index_metric, step=step)

Another feature that I liked is that you can now iterate over an instance of tf.data.Dataset and directly access the tensors and their values. This is much more convenient than having to call make_one_shot_iterator first, which gives you an iterator, which you call get_next() on to get actual tensors.

Unfortunately, I also encountered some negatives when moving to TensorFlow 2. First of all, there’s currently no officially supported way to produce a view of the executed Graph that is identical to what you get with TensorFlow 1, unless you use the Keras training loop with the TensorBoard callback. There’s tf.summary.trace_export, which as described in this guide sounds like it would produce the graph, however, using this approach you can only view individual operations in TensorBoard, but you can’t inspect what’s the size of input and output tensors of an operation. After searching for while, I eventually found the answer in an Stack overflow post, and, as it turns out, that is exactly what the TensorBoard callback is doing.

Another thing I found odd is that if you define your custom loss as a subclass of tf.keras.losses.Loss, it insists that there are only two inputs y_true and y_pred. In the case of Cox’s proportional hazards loss the true label comprises an event indicator and an indicator matrix specifying which pairs in a batch are comparable. Luckily, the contents of y_pred don’t get checked, so you can just pass a list, but I would prefer to write something like

loss_fn(y_true_event=y_event, y_true_riskset=y_riskset, y_pred=pred_risk_score)

instead of

loss_fn(y_true=[y_event, y_riskset], y_pred=pred_risk_score)

Finally, although eager execution is now enabled by default, the code runs significantly faster in graph mode, i.e. annotating your model’s call method with @tf.function. I guess you are only supposed to use eager execution for debugging purposes.

May 17, 2020 02:07 PM

May 07, 2020

Christian SchallerGNOME is not the default for Fedora Workstation

(Christian Schaller)

We recently had a Fedora AMA where one of the questions asked is why GNOME is the default desktop for Fedora Workstation. In the AMA we answered why GNOME had been chosen for Fedora Workstation, but we didn’t challenge the underlying assumption built into the way the question was asked, and the answer to that assumption is that it isn’t the default. What I mean with this is that Fedora Workstation isn’t a box of parts, where you have default options that can be replaced, its a carefully procured and assembled operating system aimed at developers, sysadmins and makers in general. If you replace one or more parts of it, then it stops being Fedora Workstation and starts being ‘build your own operating system OS’. There is nothing wrong with wanting to or finding it interesting to build your own operating systems, I think a lot of us initially got into Linux due to enjoying doing that. And the Fedora project provides a lot of great infrastructure for people who want to themselves or through teaming up with others build their own operating systems, which is why Fedora has so many spins and variants available.
The Fedora Workstation project is something we made using those tools and it has been tested and developed as an integrated whole, not as a collection of interchangeable components. The Fedora Workstation project might of course over time replace certain parts with other parts over time, like how we are migrating from X.org to Wayland. But at some point we are going to drop standalone X.org support and only support X applications through XWayland. But that is not the same as if each of our users individually did the same. And while it might be technically possible for a skilled users to still get things moved back onto X for some time after we make the formal deprecation, the fact is that you would no longer be using ‘Fedora Workstation’. You be using a homebrew OS that contains parts taken from Fedora Workstation.

So why am I making this distinction? To be crystal clear, it is not to hate on you for wanting to assemble your own OS, in fact we love having anyone with that passion as part of the Fedora community. I would of course love for you to share our vision and join the Fedora Workstation effort, but the same is true for all the other spins and variant communities we have within the Fedora community too. No the reason is that we have a very specific goal of creating a stable and well working experience for our users with Fedora Workstation and one of the ways we achieve this is by having a tightly integrated operating system that we test and develop as a whole. Because that is the operating system we as the Fedora Workstation project want to make. We believe that doing anything else creates an impossible QA matrix, because if you tell people that ‘hey, any part of this OS is replaceable and should still work’ you have essentially created a testing matrix for yourself of infinite size. And while as software engineers I am sure many of us find experiments like ‘wonder if I can get Fedora Workstation running on a BSD kernel’ or ‘I wonder if I can make it work if I replace glibc with Bionic‘ fun and interesting, I am equally sure we all also realize what once we do that we are in self support territory and that Fedora Workstation or any other OS you use as your starting point can’t not be blamed if your system stops working very well. And replacing such a core thing as the desktop is no different to those other examples.

Having been in the game of trying to provide a high quality desktop experience both commercially in the form of RHEL Workstation and through our community efforts around Fedora Workstation I have seen and experienced first hand the problems that the mindset of interchangeable desktop creates. For instance before we switched to the Fedora Workstation branding and it was all just ‘Fedora’ I experienced reviewers complaining about missing features, features had actually spent serious effort implementing, because the reviewer decided to review a different spin of Fedora than the GNOME one. Other cases I remember are of customers trying to fix a problem by switching desktops, only to discover that while the initial issue they wanted fix got resolved by the switch they now got a new batch of issues that was equally problematic for them. And we where left trying to figure out if we should try to fix the original problem, the new ones or maybe the problems reported by users of a third desktop option. We also have had cases of users who just like the reviewer mentioned earlier, assumed something was broken or missing because they where using a different desktop than the one where the feature was added. And at the same time trying to add every feature everywhere would dilute our limited development resources so much that it made us move slow and not have the resources to focus on getting ready for major changes in the hardware landscape for instance.
So for RHEL we now only offer GNOME as the desktop and the same is true in Fedora Workstation, and that is not because we don’t understand that people enjoy experimenting with other desktops, but because it allows us to work with our customers and users and hardware partners on fixing the issues they have with our operating system, because it is a clearly defined entity, and adding the features they need going forward and properly support the hardware they are using, as opposed to spreading ourselves to thin that we just run around putting on band-aids for the problems reported.
And in the longer run I actually believe this approach benefits those of you who want to build your own OS to, or use an OS built by another team around a different set of technologies, because while the improvements might come in a bit later for you, the work we now have the ability to undertake due to having a clear focus, like our work on adding HiDPI support, getting Wayland ready for desktop use or enabling Thunderbolt support in Linux, makes it a lot easier for these other projects to eventually add support for these things too.

Update: Adam Jacksons oft quoted response to the old ‘linux is about choice meme’ is also a required reading for anyone wanting a high quality operating system

by uraeus at May 07, 2020 05:57 PM

May 04, 2020

Bastien NoceraDual-GPU support: Launch on the discrete GPU automatically

(Bastien Nocera) *reality TV show deep voice guy*

In 2016, we added a way to launch apps on the discrete GPU.

*swoosh effects*

In 2019, we added a way for that to work with the NVidia drivers.

*explosions*

In 2020, we're adding a way for applications to launch automatically on the discrete GPU.

*fast cuts of loads of applications being launched and quiet*




Introducing the (badly-named-but-if-you-can-come-up-with-a-better-name-youre-ready-for-computers) “PrefersNonDefaultGPU” desktop entry key.

From the specifications website:
If true, the application prefers to be run on a more powerful discrete GPU if available, which we describe as “a GPU other than the default one” in this spec to avoid the need to define what a discrete GPU is and in which cases it might be considered more powerful than the default GPU. This key is only a hint and support might not be present depending on the implementation. 
And support for that key is coming to GNOME Shell soon.

TL;DR

Add “PrefersNonDefaultGPU=true” to your application's .desktop file if it can benefit from being run on a more powerful GPU.

We've also added a switcherooctl command to recent versions of switcheroo-control so you can launch your apps on the right GPU from your scripts and tweaks.

by Bastien Nocera (noreply@blogger.com) at May 04, 2020 04:52 PM

May 02, 2020

Jean-François Fortin TamOverhauling your Open Source project’s “Developer Experience” and redefining the workflow

This started out as a simple status report following my first report on the revival of the Getting Things GNOME project, but turned out into a full-fledged article that, I believe, would be relevant to many community managers and FLOSS project maintainers out there. Particularly if you have an established open-source project looking for sustainable development but don’t have the luxury of paid developers, it should be worth investing the 7-9 minutes to read this.

As the world came to a standstill and as I finished my tax season accounting (two unrelated things, really), this month I have completed a major overhaul of the “developer experience” for GTG. The objective is to make it easier and more exciting for people to contribute to the project, by having:

  • a very clear workflow, objectives, and set of rules;
  • helpful & up-to-date reference documentation (particularly when it comes to building, testing and developing the core application).
This arguably depicts my efforts to clean up the cruft and pick up the missing pieces.

Indeed, from a community management standpoint, the project was suffering from two fundamental problems:

  1. It was completely unclear what is critical or not, and therefore what actually needs to be done to make a release. This would lead any potential contributor to feel overwhelmed and discouraged to work on the project. It is impossible to take action if you don’t know where you stand, don’t know how far you need to go, and if everything is vying for your attention.
  2. The documentation for contributors was mixed up with user documentation, and both were outdated and spread across four—or even five—websites. There were a gazillion things on LaunchPad, GitHub, ReadTheDocs, a defunct website/blog, and on the GTG wiki—which had at least 55 documentation pages, plus 50 pages of past Google Summer of Code projects, totalling somewhere over 105 pages, two thirds of which had broken links. When information was not “just” scattered, it was also often duplicated, conflicting, or so outdated that it was downright misleading. So, yeah.

Not everything is black and white in this world, but when you combine these two polarities problems together, you end up with a “mottled dove”—the Ikaruga.

Why yes, I am totally using a bipolar shoot-em-up bullet hell as the analogy for what the potential contributor’s developer experience must have felt like.

What the project probably looked like from an outsider’s perspective. Actually applies to many open-source projects out there.

Part 1: Fixing the workflow, redefining the objectives and policies

I am addressing the 1st problem mentioned above with:

Let’s take a minute to explain my philosophy.

To have a clear sense of direction, as a maintainer or core developer, you need to be able to know what is “critical” and what is better left for new contributors to tackle. This is why I created a dynamic list of issue labels and their descriptions, two of which are extremely important: “low-hanging-fruit” and “patch-or-wont-happen“. See CONTRIBUTING.md and the bug reporting & triage guide for further explanations.

Then, just as you must only assign “critical” (or “necessary”) issues to yourself, you must also be ruthless about the “minimum viable product“. If the release can be functional without a particular issue being solved, then that issue is not to be targetted to the milestone, unless a fix/patch is already being proposed or worked on somehow. That way your developers and maintainers can look only at the milestone as their guiding star and have a very clear sense of progression and of “when” it is done:

milestones progression“That sounds like a reference to the Rebuild of Evangelion”, you say?
Well of course we’re Eva nerds, what did you expect?

Obviously this is meant for an atomic “release early and often” development model, not the “time-based releases” model (which I don’t think makes much sense for independent projects).

This is what the setting expectations is all about. By clearly documenting the above, I am essentially establishing a “social contract” between users and contributors. This is not about being lazy, it’s about being brutally honest about the resources you have to contend with.

Part 2: Separating the documentation for contributors

To solve the 2nd fundamental problem, I spent some time analyzing the existing pages and documentation.

I decided that the wiki would now serve only for “Introducing/marketing the project” to users, acting as a website/landing page for the project. Other than historical documents, anything “documentation” would be relegated either to the official user manual, or to files in the development forge (both GitHub and GitLab automatically render Markdown files as nice HTML, so there is no need to use a wiki for that nowadays). This avoids everything becoming a giant kitchen sink mess, and makes it pleasant to read again.

To make that happen, this is what I’ve done in the past two weeks:

“Burning the Brushwood” (1893), by Eero Järnefelt
  1. Fixed all the broken links;
  2. Migrated any relevant contents to nicely rendered Markdown files into a central place (the main GTG Git repository on GitHub), then split, merged or rewrote a ton of “cornerstone” documentation including the new README, the new CONTRIBUTING file, and most of the stuff you see in docs/contributors/;
  3. Deleted the migrated wiki pages and associated links, archived the rest that remains there for “hysterical raisins“, by moving it to the bottom of the page;
  4. Wrote a new introduction and list of features & benefits for users, at the top of the wiki homepage;
  5. Rewrote remaining “cornerstone” wiki pages (new download/install page, new roadmap page, etc.);
  6. Archeologically recovered the epic lost manifesto page;
  7. Used some more archeology to create the press coverage page;
  8. Ordered the GTG.ReadTheDocs.io website to be destroyed and its remains cremated with the brushwood.

Behold: 37 wiki front page revisions later, the front page now does a decent job at answering the #1 question for people hearing about GTG for the first time: “Why would I use GTG? Why is it magical?” The wiki’s front page used to look like this, it now looks like this. Some might say it is now a very nice shrubbery.

On the other side, in the Git repository, my 23 commits involved 155 files, with 1399 line insertions and 888 line deletions.

The git commit timestamps don’t reflect the spread-out, multi-week nature of this work. Good thing I’m not invoicing GTG for that work, because it would cost more than a Nissan Micra.

Remaining GTG dev docs you can help with

Some documents in the new contributors docs folder are things that I have migrated but not actually reviewed for up-to-dateness or accuracy, such as the DBus API documentation or the plugins documentation. If there are outdated parts, I welcome you to contribute suggestions and ideally patches to address any remaining issues, as those areas are a bit out of my area of focus and expertise (I am many things, but I am not an API architect nor data structures specialist).

The post Overhauling your Open Source project’s “Developer Experience” and redefining the workflow appeared first on The Open Sourcerer.

by Jeff at May 02, 2020 08:17 PM

April 28, 2020

Christian SchallerFedora Workstation : Swamp draining for 6 years

(Christian Schaller)

As Fedora Workstation 32 was released today I ended up looking back at our efforts to drain the swamp over the last 6 years. In April of 2014 I wrote a blog post outlining our vision for the Fedora Workstation effort and what we wanted to achieve with it. I hadn’t looked at that blog post in years, but it was interesting going back to it and realize that while some of the details have changed it is still the vision we are pursuing today; to keep draining the swamp and make Fedora Workstation a top notch operating system for developers and makers in general. Which I guess is one of the hallmarks of a decent vision, that it allows for the details to change without invalidating it.

One of my pet peeves at the time with Linux as a desktop operating system was that so many of the so called efforts to make linux user friendly was essentially duck taping over the problems, creating fragile solutions that often made it harder for us to really move forward. In the yers since we addressed a lot of major swamp issues with our efforts around HiDPI & Bolt (getting ahead of hardware enablement for new monitors and Thunderbolt devices respectively), Flatpaks, GNOME Software and AppStream (making applications discoverable, deployable and maintainable), Wayland (making your desktop secure and future proof), LVFS and firmware handling (making them easily available for Linux users), Finger print reader standard (ensuring your hardware is fully supported) and coming up with ways to improve the lives of developers with improvements to the terminal or Fedora Toolbox, our developer pet container tool.

Working on these and other issues we early realized that a model where hardware gets enabled in a reactive manner, in response to new laptops being sold, was never going to yield a good result for our users. As long as we followed that model people where bound to always hit issues with laptops as they came out and then have to deal with those issues for the first 6-12 Months of its life. This is why I am so excited about our new partnership with Lenovo that we pre-announced on Friday as it is both the culmination of our efforts over the last 6 years, but also the starting point of a new era in terms of how we work with hardware makers. So instead of us spending a ton of time trying to reverse engineers basic drivers we can now rely on our hardware partner and their component vendors providing that and we can instead focus on what I call high level hardware enablement. Meaning that as we see new features coming into laptops and computers we can try to improve the infrastructure in the operating system to be able to take full advantage of said hardware, and we can do so in collaboration with the hardware makers knowing that once we provide the infrastructure they will ensure to provide drivers and similar fitting into that infrastructure. Our work on fingerprint readers and thunderbolt support for instance has been two great early examples of that.

Anyway, you are probably interested to know some of the new things coming in Fedora Workstaton 32, so here are some of my personal highlights:

New lock screen

This is more a cosmetic change, but one that every user will see upon logging into their Fedora system after a new install or upgrade. The new design features a faded version of your desktop background image and it should also feel more smooth as the password dialog now appears on the lock screen page as opposed to before where it sort of replaced it. The dialog now also tries to more discreetly than before inform you if your trying to type in the password while the lock screen is on. A big thanks to Allan Day and the GNOME design team for their work here trying to polish this part of the user interface.

GNOME extension app

GNOME Shell extensions are little tweaks and additional features for the desktop that our user have gotten accustomed to and enjoy greatly. Extensions are also the technology that powers the GNOME Classic session that provides those of our users who want it with a more traditional desktop experience. GNOME Shell extensions have gradually evolved in how we work with them since their inception as something you install through your web browser to now being handled through GNOME Software. With Fedora Workstation 32 we are making the new GNOME Shell extensions management app available as the next step in the evolution of GNOME Shell extensions, making it simple to turn any given extension on of our or quickly see which extensions you have installed.

GNOME Extensions app

GNOME Extensions handling app

Fedora Toolbox

Fedora Toolbox is our helper for making working with containers for development and testing as easy it possibly can be. Debarshi Ray and Ondřej Míchal have been hard at work porting the Fedora Toolbox to Go from shell for this release. For those wondering why we choose Go as the language; there was basically two reasons for that. One we felt that the toolbox had gone as far as it could as a shell script, and two that was the language used by all the components we rely on and interact with in the container space, like buildah and podman. We also wanted to make it easy for developers on those projects to contribute by using the same language as they use in their projects.

Fedora Toolbox

Fedora Toolbox running on Fedora Workstation 32

Performance improvements

Another area that we always try to give some love is general performance improvement. For example this time around Christian Hergert identified some really bad behavior of GNOME shell when running on a system with very high I/O. At the face of it GNOME Shell didn’t look like it should have been affected, but during some intensive debugging sessions Christian Hergert discovered that I/O was triggered by various API calls to do things like string translation. So he put together a set of patches to resolve the high I/O stalls and can now report that GNOME Shell keeps running smoothly as silk, even under high disk I/O situations.

PipeWire

Wim Taymans keeps making great strides forward with PipeWire, our tool for creating a unified media handler for audio, pro-audio and Video. In Fedora Workstation 32 we will be shipping the 0.3 version which has quite complete Jack support. In fact we are hoping to team up with the Fedora Jam team to finalize the Jack support during the Fedora 32 lifecycle by testing it extensively. We have a lot of Jacks apps already working with PipeWire, including a series of important Jack apps that we have put into Flatpaks in Fedora like Carla. While the support is there in PipeWire in Fedora 32 right now, there are some convenience work we are still needing to do, but we hope to get that pushed out by next week to make replacing Jack with PipeWire becomes very simple to both do and undo for testing purposes.

The PulseAudio support is the last piece that are still in progress. It works for simple music playback, but it is not a drop in replacement for PulseAudio yet, so while we hoped to encourage widespread testing in F32 we will aim at delaying that to F33 in order to polish the PulseAudio support more first. But once ready we will make this available for testing in a simple manner just like the Jack support.

There has also been further work on the video side of PipeWire, adding support for zero copy video capture, this has reduced the overhead of doing things like screen capturing significantly and should be a nice performance/resource usage improvement for everyone.

Firefox on Wayland

Martin Stransky and Jan Horak has been working hard to improve how Firefox runs and works when used as a Wayland native application fixing a truckload of bigger and smaller bugs this cycle. We feel that we crossed the corner now in terms of the Wayland version being just as stable and good as the X11 one. In fact we could move beyond just fixing bugs to actually adding features this time around for instance Martin Stransky worked on WebGL HW acceleration support enabling us to have that enabled by default now for the first time. We also made sure to taking advantage of the Pipewire zero copy support to improve your video conferencing applications running under Firefox which turned out to be even more important than we expected considering Covid-19 has everyone working from home.

Looking forward

We spent a lot of time and energy over the last 6 years to get to where we are now, putting in place a lot of the basic building blocks needed to make Linux a great desktop operating system. And it feels great that just as we kick of the new line of Lenovo laptops running Fedora we are also entering a new phase of development where we can move beyond getting our basic infrastructure in place, but we can really start taking advantage of it to rapidly improve the experience we are providing even more. A good example is the Firefox work mentioned above, where we finally could move on from ‘make it work with Wayland and PipeWire, to ‘lets take advantage of these new pieces to make Firefox on Linux better’. Another example here is that Adam Jackson is currently investigating how we can improve how Fedora Workstation performs for remote usage. This work includes looking at things like VNC and RDP and commercial offerings and figuring out how we can make our stack work better with such tools, on top of the improvements that PipeWire brings for such usecases.

There is some more heavy lifting needed before our next generation OS architecture, Silverblue, is ready to be our default offering, but it is improving leaps and bounds each release and already have a loyal following, personally I am very excited about the fact that we are quickly moving closer the point were we can make it our default and through that offer features like bulletproof OS updates, factory resets and solid version rollbacks.

On the Flatpak side Owen Taylor and Alex Larsson are putting in a lot of final touches on our Red Hat infrastructure. So for RHEL8.2 we will finally be able to build Flatpaks in RHEL infrastructure and provide a runtime and SDK for our RHEL customers to use. But equally exciting is that we will be able to offer these to the community at large, meaning that we can offer a high quality Flatpak Long Term Support runtime and SDK for ISVs that they can use to both target RHEL users, but also Fedora and other Linux distributions with, in a similar vein to how the Red Hat UBI works. We will also be looking at ways to make getting access to these on Fedora very simple for developers, so that developing towards this runtime becomes quick and easy on your Fedora system. Alex and Owen are also working on an incremental updates feature to be shared between Kubernetes containers and OCI Flatpaks, making both technologies better and updates a lot smaller.

We are also looking at a host of other smaller improvements, many of them in collaboration with our friends at Lenovo, like lap detection (so you can be sure the laptop doesn’t burn you), privacy features (like making it harder to read your screen from an angle) and far field microphones. There are also things like Lennarts HomeD idea which we will be looking at as a way to improve the end user experience.

So the future is looking bright and I hope to see many new faces in the Fedora community going forward, be that if you download Fedora Workstation 32 to install on your own system yourself or if you join us through buying a Fedora laptop from Lenovo this summer.

by uraeus at April 28, 2020 03:46 PM

April 26, 2020

Seungha YangWindows DXVA2 (via Direct3D 11) Support in GStreamer 1.17

DXVA2 based hardware accelerated decoding is now supported on Windows, as of GStreamer 1.17.

This is a list of supported codecs for now

  • H.264 (d3d11h264dec)
  • HEVC (d3d11h265dec)
  • VP9 (d3d11vp9dec)
  • VP8 (d3d11vp8dec)

What should I do to use them?

No special steps or dependencies are required to build this new element indeed.

The above listed new decoder elements are part of the d3d11 plugin in GStreamer. It doesn’t require any special build time dependencies and/or libraries as everything is already provided by the Windows SDK. Once it has been built, the only requirement is whether your hardware (i.e., GPU) is able to support hardware decoding or not.

NOTE: This is a hardware decoding feature, so if the VM does not provide a way to pass-through the GPU, it will not work inside the VM.

When you run gst-inspect-1.0 it will show a list of available decoder elements. This is an example what you might see with gst-inspect-1.0:

[gst-master] PS C:\Work\gst-build> gst-inspect-1.0.exe d3d11
Plugin Details:
Name d3d11
Description Direct3D11 plugin
Filename C:\Work\GST-BU~1\build\SUBPRO~1\GST-PL~3\sys\d3d11\gst
d3d11.dll
Version 1.17.0.1
License LGPL
Source module gst-plugins-bad
Binary package GStreamer Bad Plug-ins git
Origin URL Unknown package origin

d3d11vp8dec: Direct3D11 VP8 Intel(R) Iris(R) Plus Graphics Decoder
d3d11vp9dec: Direct3D11 VP9 Intel(R) Iris(R) Plus Graphics Decoder
d3d11h265dec: Direct3D11 H.265 Intel(R) Iris(R) Plus Graphics Decoder
d3d11h264dec: Direct3D11 H.264 Intel(R) Iris(R) Plus Graphics Decoder
d3d11videosink: Direct3D11 video sink bin
d3d11videosinkelement: Direct3D11 video sink
d3d11colorconvert: Direct3D11 Colorspace converter
d3d11download: Direct3D11 downloader
d3d11upload: Direct3D11 uploader

The output might be slightly different in your case. For instance, the device name might be something different than “Intel(R) Iris(R) Plus Graphics”. That’s expected :) It will vary based on your hardware vendor and device naming.

Also, if the list doesn’t contain elements for some codecs (for instance, d3d11vp8dec), it’s very likely that your hardware doesn’t support decoding the codec (for example, some Nvidia GPUs don’t support VP8 decoding).

Moreover, if you have multiple GPUs on your device, you will see separate per-GPU decoder elements, with longer names for instance d3d11h264device1dec or so.

Why do I need new D3D11 decoders?

GStreamer already ships with two vendor-specific decoder implementations: one is the Nvidia (aka NVCODEC) plugin and the other is the Intel MSDK plugin. So what’s the benefit of this new implementation?

The main advantage of vendor-specific APIs is supposed to be that they are expected to perform better than generic APIs like DXVA2. In most cases this might be true but sometimes it is not true. The performance and reliability can vary depending on how well it’s integrated in a framework. Moreover, not just the decoder implementation itself, but “how media pipeline was configured in an application” is a very important factor for performance and reliability.

In summary, the strengths of the new d3d11 decoders are:

  • Zero-copy playback with d3d11videosink
  • Vendor-independent implementation
  • UWP support

Zero-copy playback

On Windows, both NVCODEC and MSDK plugins will copy decoded data into a new memory space (cuda, gl, or sysmem depending on the details), which consumes more memory and will make applications slower.

However, when d3d11 decoder elements are configured and at the same time d3d11videosink element is selected for rendering, the decoded data will be passed to d3d11videosink without any copy operation.

The only memcpy-like operation will be color space conversion (YUV to RGB format), but it’s often an unavoidable operation, because YUV is not supported as the render format for most rendererers (Windows DirectComposition seems to support it, but that’s a special case).

Vendor-independent implementation

This is a very useful aspect of this new implementation. Due to the fact that DXVA2 and D3D11 are standard APIs on Windows and provided by the OS, in theory NO vendor-specific consideration is needed (in reality, app-specific workarounds would likely be needed due to buggy vendor-specific driver behavior).

So there would be no reason for the application to write hardware-specific code in the general case. Moreover, these new elements also work on AMD GPUs since we previously only supported Intel and Nvidia on Windows. They should also work with other GPUs supported by Windows 10 such as Qualcomm, but I have not tested that yet.

UWP support

When running on UWP (Universal Windows Platform), most (possibly all?) hardware-specific operations are required to be handled via the native Windows graphics layer such as Direct3D11/12. Due to this, these new d3d11 decoder elements are a requirement for hardware decoding of video on UWP. When I tested this with a UWP application on my laptop, it worked quite well (as I expected)!

Note that UWP is not officially supported by GStreamer yet, but I am expecting it will be possible soon thanks to the efforts of Nirbheek Chauhan who is an active GStreamer maintainer and also maintains Cerbero, the build system of GStreamer. Related to UWP, a very interesting talk from him is available here: https://gstconf.ubicast.tv/videos/gstreamer-windows-uwp-and-firefox-on-the-hololens-2/

We’re excited to see new Windows specific features coming to GStreamer more and more. Stay tuned for more news! :)

by Seungha Yang at April 26, 2020 11:23 AM

April 24, 2020

Christian SchallerA bold new chapter for Fedora Workstation

(Christian Schaller)

So you have probably seen the announcement that Lenovo are launching a set of Fedora Workstation based laptops. I am so happy and proud of this effort as it comes as the culmination of our hard effort over the last 6 years to drain the swamp and make Linux a more viable desktop operating system.
I am also so happy and proud that Lenovo was willing to work with us on this effort as they provide us with an incredible opportunity to reach both new and old Linux users around the globe with these systems, being the worlds biggest laptop maker with the widest global reach. Because one important aspect of this is that Lenovo will provide these laptops through all their sales channels in all their markets. This means you can of course order them online through their website, but it also means companies can order them through Lenovos business to business channels and it means that in any country where Lenovo is present you can order them, so this is not a North America only or Europe only, this is truly a global offering.

There are a lot of people who has been involved here in helping to make this happen, but special thanks goes to Egbert Gracias from Lenovo who was critical in making this happen and also a special thanks to Alberto Ruiz who spearheaded this effort from our side.

Our engineering team here at Red Hat has also been hard at work ensuring we can support these models very well be that by bugfixes to kernel drivers or by polishing up things like the Linux fingerprint support. As we go forward we hope to build on this relationship to take linux laptops to the next level and I am also very happy to say that we got Jared Dominguez on on team now to help us develop better work practices and closer relationships with our hardware partners and original device manufacturers.


Also a special thanks to Jakub Steiner for putting together the little sizzle video above, it was supposed to be used at our booth at Red Hat Summit next week, but with that going virtual we repurposed it for this announcement.

by uraeus at April 24, 2020 02:38 PM

April 15, 2020

Sebastian Pölsterlscikit-survival 0.12 Released

Version 0.12 of scikit-survival adds support for scikit-learn 0.22 and Python 3.8 and comes with two noticeable improvements:

  1. sklearn.pipeline.Pipeline will now be automatically patched to add support for predict_cumulative_hazard_function and predict_survival_function if the underlying estimator supports it (see first example ).
  2. The regularization strength of the ridge penalty in sksurv.linear_model.CoxPHSurvivalAnalysis can now be set per feature (see second example ).

For a full list of changes in scikit-survival 0.12, please see the release notes.

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source via pip:

 pip install -U scikit-survival

Using pipelines

You can now create a scikit-learn pipeline and directly call predict_cumulative_hazard_function and predict_survival_function if the underlying estimator supports it, such as RandomSurvivalForest below.

from sklearn.pipeline import make_pipeline
from sksurv.datasets import load_breast_cancer
from sksurv.ensemble import RandomSurvivalForest
from sksurv.preprocessing import OneHotEncoder
X, y = load_breast_cancer()
pipe = make_pipeline(OneHotEncoder(), RandomSurvivalForest())
pipe.fit(X, y)
surv_fn = pipe.predict_survival_function(X, y)

Per-feature regularization strength

If you want to fit Cox’s proportional hazards model to a large set of features, but only shrink the coefficients for a subset of features, previously, you had to use CoxnetSurvivalAnalysis and set the penalty_factor parameter accordingly. This release adds a similar option to CoxPHSurvivalAnalysis, which only uses ridge regression.

For instance, consider the breast cancer data, which comprises 4 established markers (age, tumor size, tumor grade, and estrogen receptor status) and 76 genetic markers. It is sensible to fit a model where the established markers enter unpenalized and only the coefficients of the genetic markers get penalized. We can achieve this by creating an array for the regularization strength $\alpha$ where the entries corresponding to the established markers are zero.

import numpy as np
from sksurv.linear_model import CoxPHSurvivalAnalysis
X, y = load_breast_cancer()
# the last 4 features are: age, er, grade, size
num_genes = X.shape[1] - 4
# add 2, because after one-hot encoding grade becomes three features
alphas = np.ones(X.shape[1] + 2)
# do not penalize established markers
alphas[num_genes:] = 0.0
# fit the model
pipe = make_pipeline(OneHotEncoder(), CoxPHSurvivalAnalysis(alpha=alphas))
pipe.fit(X, y)

April 15, 2020 10:10 AM

April 14, 2020

Andy Wingounderstanding webassembly code generation throughput

(Andy Wingo)

Greets! Today's article looks at browser WebAssembly implementations from a compiler throughput point of view. As I wrote in my article on Firefox's WebAssembly baseline compiler, web browsers have multiple wasm compilers: some that produce code fast, and some that produce fast code. Implementors are willing to pay the cost of having multiple compilers in order to satisfy these conflicting needs. So how well do they do their jobs? Why bother?

In this article, I'm going to take the simple path and just look at code generation throughput on a single chosen WebAssembly module. Think of it as X-ray diffraction to expose aspects of the inner structure of the WebAssembly implementations in SpiderMonkey (Firefox), V8 (Chrome), and JavaScriptCore (Safari).

experimental setup

As a workload, I am going to use a version of the "Zen Garden" demo. This is a 40-megabyte game engine and rendering demo, originally released for other platforms, and compiled to WebAssembly a couple years later. Unfortunately the original URL for the demo was disabled at some point in late 2019, so it no longer has a home on the web. A bit of a weird situation and I am not clear on licensing either. In any case I have a version downloaded, and have hacked out a minimal set of "imports" that the WebAssembly module needs from the host to allow the module to compile and link when run from a JavaScript shell, without requiring WebGL and similar facilities. So the benchmark is just to instantiate a WebAssembly module from the 40-megabyte byte array and see how long it takes. It would be better if I had more test cases (and would be happy to add them to the comparison!) but this is a start.

I start by benchmarking the various WebAssembly implementations, firstly in their standard configuration and then setting special run-time flags to measure the performance of the component compilers. I run these tests on the core-rich machine that I use for browser development (2 Xeon Silver 4114 CPUs for a total of 40 logical cores). The default-configuration numbers are therefore not indicative of performance on a low-end Android phone, but we can use them to extract aspects of the different implementations.

Since I'm interested in compiler throughput, I'm not particularly concerned about how well a compiler will use all 40 cores. Therefore when testing the specific compilers I will set implementation-specific flags to disable parallelism in the compiler and GC: --single-threaded on V8, --no-threads on SpiderMonkey, and --useConcurrentGC=false --useConcurrentJIT=false on JSC. To further restrict any threads that the implementation might decide to spawn, I'll bind these to a single core on my machine using taskset -c 4. Otherwise the machine is in its normal configuration (nothing else significant running, all cores available for scheduling, turbo boost enabled).

I'll express results in nanoseconds per WebAssembly code byte. Of the 40 megabytes or so in the Zen Garden demo, only 23 891 164 bytes are actually function code; the rest is mostly static data (textures and so on). So I'll divide the total time by this code byte count.

I tested V8 at git revision 0961376575206, SpiderMonkey at hg revision 8ec2329bef74, and JavaScriptCore at subversion revision 259633. The benchmarks can be run using just a shell; see the pull request. I timed how long it took to instantiate the Zen Garden demo, ensuring that a basic export was callable. I collected results from 20 separate runs, sleeping a second between them. The bars in the charts below show the median times, with a histogram overlay of all results.

results & analysis

We can see some interesting results in this graph. Note that the Y axis is logarithmic. The "concurrent tiering" results in the graph correspond to the default configurations (no special flags, no taskset, all cores available).

The first interesting conclusions that pop out for me concern JavaScriptCore, which is the only implementation to have a baseline interpreter (run using --useWasmLLInt=true --useBBQJIT=false --useOMGJIT=false). JSC's WebAssembly interpreter is actually structured as a compiler that generates custom WebAssembly-specific bytecode, which is then run by a custom interpreter built using the same infrastructure as JSC's JavaScript interpreter (the LLInt). Directly interpreting WebAssembly might be possible as a low-latency implementation technique, but since you need to validate the WebAssembly anyway and eventually tier up to an optimizing compiler, apparently it made sense to emit fresh bytecode.

The part of JSC that generates baseline interpreter code runs slower than SpiderMonkey's baseline compiler, so one is tempted to wonder why JSC bothers to go the interpreter route; but then we recall that on iOS, we can't generate machine code in some contexts, so the LLInt does appear to address a need.

One interesting feature of the LLInt is that it allows tier-up to the optimizing compiler directly from loops, which neither V8 nor SpiderMonkey support currently. Failure to tier up can be quite confusing for users, so good on JSC hackers for implementing this.

Finally, while baseline interpreter code generation throughput handily beats V8's baseline compiler, it would seem that something in JavaScriptCore is not adequately taking advantage of multiple cores; if one core compiles at 51ns/byte, why do 40 cores only do 41ns/byte? It could be my tests are misconfigured, or it could be that there's a nice speed boost to be found somewhere in JSC.

JavaScriptCore's baseline compiler (run using --useWasmLLInt=false --useBBQJIT=true --useOMGJIT=false) runs much more slowly than SpiderMonkey's or V8's baseline compiler, which I think can be attributed to the fact that it builds a graph of basic blocks instead of doing a one-pass compile. To me these results validate SpiderMonkey's and V8's choices, looking strictly from a latency perspective.

I don't have graphs for code generation throughput of JavaSCriptCore's optimizing compiler (run using --useWasmLLInt=false --useBBQJIT=false --useOMGJIT=true); it turns out that JSC wants one of the lower tiers to be present, and will only tier up from the LLInt or from BBQ. Oh well!

V8 and SpiderMonkey, on the other hand, are much of the same shape. Both implement a streaming baseline compiler and an optimizing compiler; for V8, we get these via --liftoff --no-wasm-tier-up or --no-liftoff, respectively, and for SpiderMonkey it's --wasm-compiler=baseline or --wasm-compiler=ion.

Here we should conclude directly that SpiderMonkey generates code around twice as fast as V8 does, in both tiers. SpiderMonkey can generate machine code faster even than JavaScriptCore can generate bytecode, and optimized machine code faster than JSC can make baseline machine code. It's a very impressive result!

Another conclusion concerns the efficacy of tiering: for both V8 and SpiderMonkey, their baseline compilers run more than 10 times as fast as the optimizing compiler, and the same ratio holds between JavaScriptCore's baseline interpreter and compiler.

Finally, it would seem that the current cross-implementation benchmark for lowest-tier code generation throughput on a desktop machine would then be around 50 ns per WebAssembly code byte for a single core, which corresponds to receiving code over the wire at somewhere around 160 megabits per second (Mbps). If we add in concurrency and manage to farm out compilation tasks well, we can obviously double or triple that bitrate. Optimizing compilers run at least an order of magnitude slower. We can conclude that to the desktop end user, WebAssembly compilation time is indistinguishable from download time for the lowest tier. The optimizing tier is noticeably slower though, running more around 10-15 Mbps per core, so time-to-tier-up is still a concern for faster networks.

Going back to the question posed at the start of the article: yes, tiering shows a clear benefit in terms of WebAssembly compilation latency, letting users interact with web sites sooner. So that's that. Happy hacking and until next time!

by Andy Wingo at April 14, 2020 08:59 AM

April 08, 2020

Andy Wingomulti-value webassembly in firefox: a binary interface

(Andy Wingo)

Hey hey hey! Hope everyone is staying safe at home in these weird times. Today I have a final dispatch on the implementation of the multi-value feature for WebAssembly in Firefox. Last week I wrote about multi-value in blocks; this week I cover function calls.

on the boundaries between things

In my article on Firefox's baseline compiler, I mentioned that all WebAssembly engines in web browsers treat the function as the unit of compilation. This facilitates streaming, parallel compilation of WebAssembly modules, by farming out compilation of individual functions to worker threads. It also allows for easy tier-up from quick-and-dirty code generated by the low-latency baseline compiler to the faster code produced by the optimizing compiler.

There are some interesting Conway's Law implications of this choice. One is that division of compilation tasks becomes an opportunity for division of human labor; there is a whole team working on the experimental Cranelift compiler that could replace the optimizing tier, and in my hackings on Firefox I have had minimal interaction with them. To my detriment, of course; they are fine people doing interesting things. But the code boundary means that we don't need to communicate as we work on different parts of the same system.

Boundaries are where places touch, and sometimes for fluid crossing we have to consider boundaries as places in their own right. Functions compiled with the baseline compiler, with Ion (the production optimizing compiler), and with Cranelift (the experimental optimizing compiler) are all able to call each other because they actively maintain a common boundary, a binary interface (ABI). (Incidentally the A originally stands for "application", essentially reflecting division of labor between groups of people making different components of a software system; Conway's Law again.) Let's look closer at this boundary-place, with an eye to how it changes with multi-value.

what's in an ABI?

Among other things, an ABI specifies a calling convention: which arguments go in registers, which on the stack, how the stack values are represented, how results are returned to the callers, which registers are preserved over calls, and so on. Intra-WebAssembly calls are a closed world, so we can design a custom ABI if we like; that's what V8 does. Sometimes WebAssembly may call functions from the run-time, though, and so it may be useful to be closer to the C++ ABI on that platform (the "native" ABI); that's what Firefox does. (Incidentally here I think Firefox is probably leaving a bit of performance on the table on Windows by using the inefficient native ABI that only allows four register parameters. I haven't measured though so perhaps it doesn't matter.) Using something closer to the native ABI makes debugging easier as well, as native debugger tools can apply more easily.

One thing that most native ABIs have in common is that they are really only optimized for a single result. This reflects their heritage as artifacts from a world built with C and C++ compilers, where there isn't a concept of a function with more than one result. If multiple results are required, they are represented instead as arguments, typically as pointers to memory somewhere. Consider the AMD64 SysV ABI, used on Unix-derived systems, which carefully specifies how to pass arbitrary numbers of arbitrary-sized data structures to a function (§3.2.3), while only specifying what to do for a single return value. If the return value is too big for registers, the ABI specifies that a pointer to result memory be passed as an argument instead.

So in a multi-result WebAssembly world, what are we to do? How should a function return multiple results to its caller? Let's assume that there are some finite number of general-purpose and floating-point registers devoted to return values, and that if the return values will fit into those registers, then that's where they go. The problem is then to determine which results will go there, and if there are remaining results that don't fit, then we have to put them in memory. The ABI should indicate how to address that memory.

When looking into a design, I considered three possibilities.

first thought: stack results precede stack arguments

When a function needs some of its arguments passed on the stack, it doesn't receive a pointer to those arguments; rather, the arguments are placed at a well-known offset to the stack pointer.

We could do the same thing with stack results, either reserving space deeper on the stack than stack arguments, or closer to the stack pointer. With the advent of tail calls, it would make more sense to place them deeper on the stack. Like this:

The diagram above shows the ordering of stack arguments as implemented by Firefox's WebAssembly compilers: later arguments are deeper (farther from the stack pointer). It's an arbitrary choice that happens to match up with what the native ABIs do, as it was easier to re-use bits of the already-existing optimizing compiler that way. (Native ABIs use this stack argument ordering because of sloppiness in a version of C from before I was born. If you were starting over from scratch, probably you wouldn't do things this way.)

Stack result order does matter to the baseline compiler, though. It's easier if the stack results are placed in the same order in which they would be pushed on the virtual stack, so that when the function completes, the results can just be memmove'd down into place (if needed). The same concern dictates another aspect of our ABI: unlike calls, registers are allocated to the last results rather than the first results. This is to make it easy to preserve stack invariant (1) from the previous article.

At first I thought this was the obvious option, but I ran into problems. It turns out that stack arguments are fundamentally unlike stack results in some important ways.

While a stack argument is logically consumed by a call, a stack result starts life with a call. As such, if you reserve space for stack results just by decrementing the stack pointer before a call, probably you will need to load the results eagerly into registers thereafter or shuffle them into other positions to be able to free the allocated stack space.

Eager shuffling is busy-work that should be avoided if possible. It's hard to avoid in the baseline compiler. For example, a call to a function with 10 arguments will consume 10 values from the temporary stack; any results will be pushed on after removing argument values from the stack. If there any stack results, it's almost impossible to avoid a post-call memmove, to move stack results to where they should be before the 10 argument values were pushed on (and probably spilled). So the baseline compiler case is not optimal.

However, things get gnarlier with the Ion optimizing compiler. Like many other optimizing compilers, Ion is designed to compute the necessary stack frame size ahead of time, and to never move the stack pointer during an activation. The only exception is for pushing on any needed stack arguments for nested calls (which are popped directly after the nested call). So in that case, assuming there are a number of multi-value calls in a stack frame, we'll be shuffling in the optimizing compiler as well. Not great.

Besides the need to shuffle, stack arguments and stack results differ as regards ownership and garbage collection. A callee "owns" the memory for its stack arguments; it is responsible for them. The caller can't assume anything about the contents of that memory after a call, especially if the WebAssembly implementation supports tail calls (a whole 'nother blog post, that). If the values being passed are just bits, that's one thing, but with the reference types proposal, some result values may be managed by the garbage collector. The callee is responsible for making stack arguments visible to the garbage collector; the caller is responsible for the results. The caller will need to emit metadata to allow the garbage collector to see stack result references. For this reason, a stack result actually starts life just before a call, because it can become initialized at any point and thus needs to be traced during the entire callee activation. Not all callers can easily add garbage collection roots for writable stack slots, so the need to place stack results in a fixed position complicates calling multi-value WebAssembly functions in some cases (e.g. from C++).

second thought: pointers to individual stack results

Surely there are more well-trodden solutions to the multiple-result problem. If we encoded a multi-value return in C, how would we do it? Consider a function in C that has three 64-bit integer results. The idiomatic way to encode it would be to have one of the results be the return value of the function, and the two others to be passed "by reference":

int64_t foo(int64_t* a, int64_t* b) {
  *a = 1;
  *b = 2;
  return 3;
}
void call_foo(void) {
  int64 a, b, c;
  c = foo(&a, &b);
}

This program shows us a possibility for encoding WebAssembly's multiple return values: pass an additional argument for each stack result, pointing to the location to which to write the stack result. Like this:

The result pointers are normal arguments, subject to normal argument allocation. In the above example, given that there are already stack arguments, they will probably be passed on the stack, but in many cases the stack result pointers may be passed in registers.

The result locations themselves don't even need to be on the stack, though they certainly will be in intra-WebAssembly calls. However the ability to write to any memory is a useful form of flexibility when e.g. calling into WebAssembly from C++.

The advantage of this approach is that we eliminate post-call shuffles, at least in optimizing compilers. But, having to make an argument for each stack result, each of which might itself become a stack argument, seems a bit offensive. I thought we might be able to do a little better.

third thought: stack result area, passed as pointer

Given that stack results are going to be written to memory, it doesn't really matter where they will be written, from the perspective of the optimizing compiler at least. What if we allocated them all in a block and just passed one pointer to the block? Like this:

Here there's just one additional argument, no matter how many stack results. While we're at it, we can specify that the layout of the stack arguments should be the same as how they would be written to the baseline stack, to make the baseline compiler's job easier.

As I started implementation with the baseline compiler, I chose this third approach, essentially because I was already allocating space for the results in a block in this way by bumping the stack pointer.

When I got to the optimizing compiler, however, it was quite difficult to convince Ion to allocate an area on the stack of the right shape.

Looking back on it now, I am not sure that I made the right choice. The thing is, the IonMonkey compiler started life as an optimizing compiler for JavaScript. It can represent unboxed values, which is how it came to be used as a compiler for asm.js and later WebAssembly, and it does a good job on them. However it has never had to represent aggregate data structures like a C++ class, so it didn't have support for spilling arbitrary-sized data to the stack. It took a while staring at the register allocator to convince it to allocate arbitrary-sized stack regions, and then to allocate component scalar values out of those regions. If I had just asked the register allocator to give me one appropriate-sized stack slot for each scalar, and hacked out the ability to pass separate pointers to the stack slots to WebAssembly calls with stack results, then I would have had an easier time of it, and perhaps stack slot allocation could be more dense because multiple results wouldn't need to be allocated contiguously.

As it is, I did manage to hack it in, and I think in a way that doesn't regress. I added a layer over an argument type vector that adds a synthetic stack results pointer argument, if the function returns stack results; iterating over this type with ABIArgIter will allocate a stack result area pointer, either as a register argument or a stack argument. In the optimizing compiler, I added add a kind of value allocation coresponding to a variable-sized stack area, (using pointer tagging again!), and extended the register allocator to allocate LStackArea, and the component stack results. Interestingly, I had to add a kind of definition that starts life on the stack; previously all Ion results started life in registers and were only spilled if needed.

In the end, a function will capture the incoming stack result area argument, either as a normal SSA value (for Ion) or stored to a stack slot (baseline), and when returning will write stack results to that pointer as appropriate. Passing in a pointer as an argument did make it relatively easy to implement calls from WebAssembly to and from C++, getting the variable-shape result area to be known to the garbage collector for C++-to-WebAssembly calls was simple in the end but took me a while to figure out.

Finally I was a bit exhausted from multi-value work and ready to walk away from the "JS API", the bit that allows multi-value WebAssembly functions to be called from JavaScript (they return an array) or for a JavaScript function to return multiple values to WebAssembly (via an iterable) -- but then when I got to thinking about this blog post I preferred to implement the feature rather than document its lack. Avoidance-of-document-driven development: it's a thing!

towards deployment

As I said in the last article, the multi-value feature is about improved code generation and also making a more capable base for expressing further developments in the WebAssembly language.

As far as code generation goes, things are progressing but it is still early days. Thomas Lively has implemented support in LLVM for emitting return of C++ aggregates via multiple results, which is enabled via the -experimental-multivalue-abi cc1 flag. Thomas has also been implementing multi-value support in the binaryen WebAssembly toolchain component, used by the emscripten C++-to-WebAssembly toolchain. I think it will be a few months though before everything lands in a way that end users can take advantage of.

On the specification side, the multi-value feature is now at phase 4 since January, which basically means things are all done there.

Implementation-wise, V8 has had experimental support since 2017 or so, and the feature was staged last fall, although V8 doesn't yet support multi-value in their baseline compiler. WebKit also landed support last fall.

Unlike V8 and SpiderMonkey, JavaScriptCore (the JS and wasm engine in WebKit) actually implements a WebAssembly interpreter as their solution to the one-pass streaming compilation problem. Then on the compiler side, there are two tiers that both operate on basic block graphs (OMG and BBQ; I just puked a little in my mouth typing that). This strategy makes the compiler implementation quite straightforward. It's also an interesting design point because JavaScriptCore's garbage collector scans the stack conservatively; there's no need for the compiler to do bookkeeping on the GC's behalf, which I'm sure was a relief to the hacker. Anyway, multi-value in WebKit is done too.

The new thing of course is that finally, in Firefox, the feature is now fully implemented (woo) and enabled by default on Nightly builds (woo!). I did that! It took me a while! Perhaps too long? Anyway it's done. Thanks again to Bloomberg for supporting this work; large ups to y'all for helping the web move forward.

See you next time with a more general article rounding up compile-time benchmarks on a variety of WebAssembly implementations. Until then, happy hacking!

by Andy Wingo at April 08, 2020 09:02 AM

April 03, 2020

Andy Wingomulti-value webassembly in firefox: from 1 to n

(Andy Wingo)

Greetings, hackers! Today I'd like to write about something I worked on recently: implementation of the multi-value future feature of WebAssembly in Firefox, as sponsored by Bloomberg.

In the "minimum viable product" version of WebAssembly published in 2018, there were a few artificial restrictions placed on the language. Functions could only return a single value; if a function would naturally return two values, it would have to return at least one of them by writing to memory. Loops couldn't take parameters; any loop state variables had to be stored to and loaded from indexed local variables at each iteration. Similarly, any block that would naturally return more than one result would also have to do so via locals.

This restruction is lifted with the multi-value proposal. Function types now map from result type to result type, where a result type is a sequence of value types. That is to say, just as functions can take multiple arguments, they can return multiple results. Similarly, with the multi-value proposal, block types are now the same as function types: loops and blocks can take arguments and return any number of results. This change improves the expressiveness of WebAssembly as a compilation target; a C++ program compiled to multi-value WebAssembly can be encoded in fewer bytes than before. Multi-value also establishes a base for other language extensions. For example, the exception handling proposal builds on multi-value to pass multiple values to catch blocks.

So, that's multi-value. You would think that relaxing a restriction would be easy, but you'd be wrong! This task took me 5 months and had a number of interesting gnarly bits. This article is part one of two about interesting aspects of implementing multi-value in Firefox, specifically focussing on blocks. We'll talk about multi-value function calls next week.

multi-value in blocks

In the last article, I presented the basic structure of Firefox's WebAssembly support: there is a baseline compiler optimized for low latency and an optimizing compiler optimized for throughput. (There is also Cranelift, a new experimental compiler that may replace the current implementation of the optimizing compiler; but that doesn't affect the basic structure.)

The optimizing compiler applies traditional compiler techniques: SSA graph construction, where values flow into and out of graphs using the usual defs-dominate-uses relationship. The only control-flow joins are loop entry and (possibly) block exit, so the addition of loop parameters means in multi-value there are some new phi variables in that case, and the expansion of block result count from [0,1] to [0,n] means that you may have more block exit phi variables. But these compilers are built to handle these situations; you just build the SSA and let the optimizing compiler go to town.

The problem comes in the baseline compiler.

from 1 to n

Recall that the baseline compiler is optimized for compiler speed, not compiled speed. If there are only ever going to be 0 or 1 result from a block, for example, the baseline compiler's internal data structures will use something like a Maybe<ValType> to represent that block result.

If you then need to expand this to hold a vector of values, the naïve approach of using a Vector<ValType> would mean heap allocation and indirection, and thus would regress the baseline compiler.

In this case, and in many other similar cases, the solution is to use value tagging to represent 0 or 1 value type directly in a word, and the general case by linking out to an external vector. As block types are function types, they actually appear as function types in the WebAssembly type section, so they are already parsed; the BlockType in that case can just refer out to already-allocated memory.

In fact this value-tagging pattern applies all over the place. (The jit/ links above are for the optimizing compiler, but they relate to function calls; will write about that next week.) I have a bit of pause about value tagging, in that it's gnarly complexity and I didn't measure the speed of alternative implementations, but it was a useful migration strategy: value tagging minimizes performance risk to existing specialized use cases while adding support for new general cases. Gnarly it is, then.

control-flow joins

I didn't mention it in the last article, but there are two important invariants regarding stack discipline in the baseline compiler. Recall that there's a virtual stack, and that some elements of the virtual stack might be present on the machine stack. There are four kinds of virtual stack entry: register, constant, local, and spilled. Locals indicate local variable reads and are mostly like registers in practice; when registers spill to the stack, locals do too. (Why spill to the temporary stack instead of leaving the value in the local variable slot? Because locals are mutable. A local.get captures a local variable value at its point of execution. If future code changes the local variable value, you wouldn't want the captured value to change.)

Digressing, the stack invariants:

  1. Spilled values precede registers and locals on the virtual stack. If u and v are virtual stack entries and u is older than v, then if u is in a register or is a local, then v is not spilled.

  2. Older values precede newer values on the machine stack. Again for u and v, if they are both spilled, then u will be farther from the stack pointer than v.

There are five fundamental stack operations in the baseline compiler; let's examine them to see how the invariants are guaranteed. Recall that before multi-value, targets of non-local exits (e.g. of the br instruction) could only receive 0 or 1 value; if there is a value, it's passed in a well-known register (e.g. %rax or %xmm0). (On 32-bit machines, 64-bit values use a well-known pair of registers.)

push(v)
Results of WebAssembly operations never push spilled values, neither onto the virtual nor the machine stack. v is either a register, a constant, or a reference to a local. Thus we guarantee both (1) and (2).
pop() -> v
Doesn't affect older stack entries, so (1) is preserved. If the newest stack entry is spilled, you know that it is closest to the stack pointer, so you can pop it by first loading it to a register and then incrementing the stack pointer; this preserves (2). Therefore if it is later pushed on the stack again, it will not be as a spilled value, preserving (1).
spill()
When spilling the virtual stack to the machine stack, you first traverse stack entries from new to old to see how far you need to spill. Once you get to a virtual stack entry that's already on the stack, you know that everything older has already been spilled, because of (1), so you switch to iterating back towards the new end of the stack, pushing registers and locals onto the machine stack and updating their virtual stack entries to be spilled along the way. This iteration order preserves (2). Note that because known constants never need to be on the machine stack, they can be interspersed with any other value on the virtual stack.
return(height, v)
This is the stack operation corresponding to a block exit (local or nonlocal). We drop items from the virtual and machine stack until the stack height is height. In WebAssembly 1.0, if the target continuation takes a value, then the jump passes a value also; in that case, before popping the stack, v is placed in a well-known register appropriate to the value type. Note however that v is not pushed on the virtual stack at the return point. Popping the virtual stack preserves (1), because a stack and its prefix have the same invariants; popping the machine stack also preserves (2).
capture(t)
Whereas return operations happen at block exits, capture operations happen at the target of block exits (the continuation). If no value is passed to the continuation, a capture is a no-op. If a value is passed, it's in a register, so we just push that register onto the virtual stack. Both invariants are obviously preserved.

Note that a value passed to a continuation via return() has a brief instant in which it has no name -- it's not on the virtual stack -- but only a location -- it's in a well-known place. capture() then gives that floating value a name.

Relatedly, there is another invariant, that the allocation of old values on block entry is the same as their allocation on block exit, so that all predecessors of the block exit flow all values via the same places. This is preserved by spilling on block entry. It's a big hammer, but effective.

So, given all this, how do we pass multiple values via return()? We don't have unlimited registers, so the %rax strategy isn't going to work.

The answer for the baseline compiler is informed by our lean into the stack machine principle. Multi-value returns are allocated in such a way that a capture() can push them onto the virtual stack. Because spilled values must precede registers, we therefore allocate older results on the stack, and put the last result in a register (or register pair for i64 on 32-bit platforms). Note that it's possible in theory to allocate multiple results to registers; we'll touch on this next week.

Therefore the implementation of return(height, v1..vn) is straightforward: we first pop register results, then spill the remaining virtual stack items, then shuffle stack results down towards height. This should result in a memmove of contiguous stack results towards the frame pointer. However because const values aren't present on the machine stack, depending on the stack height difference, it may mean a split between moving some values toward the frame pointer and some towards the stack pointer, then filling in by spilling constants. It's gnarly, but it is what it is. Note that the links to the return and capture implementations above are to the post-multi-value world, so you can see all the details there.

that's it!

In summary, the hard part of multi-value blocks was reworking internal compiler data structures to be able to represent multi-value block types, and then figuring out the low-level stack manipulations in the baseline compiler. The optimizing compiler on the other hand was pretty easy.

When it comes to calls though, that's another story. We'll get to that one next week. Thanks again to Bloomberg for supporting this work; I'm really delighted that Igalia and Bloomberg have been working together for a long time (coming on 10 years now!) to push the web platform forward. A special thanks also to Mozilla's Lars Hansen for his patience reviewing these patches. Until next week, then, stay at home & happy hacking!

by Andy Wingo at April 03, 2020 10:56 AM

April 01, 2020

Bastien NoceraPAM testing using pam_wrapper and dbusmock

(Bastien Nocera) On the road to libfprint and fprintd 2.0, we've been fixing some long-standing bugs, including one that required porting our PAM module from dbus-glib to sd-bus, systemd's D-Bus library implementation.

As you can imagine, I have confidence in my ability to write bug-free code at the first attempt, but the foresight to know that this code will be buggy if it's not tested (and to know there's probably a bug in the tests if they run successfully the first time around). So we will have to test that PAM module, thoroughly, before and after the port.

Replacing fprintd

First, to make it easier to run and instrument, we needed to replace fprintd itself. For this, we used dbusmock, which is both a convenience Python library and way to write instrumentable D-Bus services, and wrote a template. There are a number of existing templates for a lot of session and system services, in case you want to test the integration of your code with NetworkManager, low-memory-monitor, or any number of other services.

We then used this to write tests for the command-line utilities, so we can both test our new template and test the command-line utilities themselves.

Replacing gdm

Now that we've got a way to replace fprintd and a physical fingerprint reader, we should write some tests for the (old) PAM module to replace sudo, gdm, or the login authentication services.

Co-workers Andreas Schneier and Jakub Hrozek worked on pam_wrapper, an LD_PRELOAD library to mock the PAM library, and Python helpers to write simple PAM services. This LWN article explains how to test PAM applications, and PAM modules.

After fixing a few bugs in pam_wrapper, and combining with the fprintd dbusmock work above, we could wrap and test the fprintd PAM module like it never was before.

Porting to sd-bus

Finally, porting the PAM module to sd-bus was pretty trivial, a loop of 1) writing tests that work against the old PAM module, 2) porting a section of the code (like the fingerprint reader enumeration, or the timeout support), and 3) testing against the new sd-bus based code. The result was no regressions that we could test for.

Conclusion

Both dbusmock, and pam_wrapper are useful tools in your arsenal to write tests, and given those (fairly) easy to use CIs in GNOME and FreeDesktop.org's GitLabs, it would be a shame not to.

You might also be interested in umockdev, to mock a number of device types, and mocklibc (which combined with dbusmock powers polkit's unattended CI)

by Bastien Nocera (noreply@blogger.com) at April 01, 2020 05:53 PM