January 22, 2022

Thomas Vander SticheleQuick way to process an Inbox folder in Obsidian

(Thomas Vander Stichele)

Obsidian’s Gems of the Year 2021 nomination has been a great source of cool ideas to add tweaks to my Obsidian setup.

In particular, Quick Capture (mac/iOS) and Inbox Processing was a great gem to uncover as I try and implement the weekly review stage of my Second Brain/PARA setup!

I noticed that the archive/move script was a little slow, taking several seconds to open up the dialog for selecting a folder, breaking my flow. I checked the code and noticed it built a set of folders recursively.

I simplified the code for my use case, removing the archive folder path, and using the file explorer’s built in move dialog (which is much faster) and a callback to advance.

The resulting gist is Obsidian: Archive current file and then open next file in folder (Templater script) · GitHub

I’m sure it could be improved further if I understood the execution, variable scope, and callback model better, but this is good enough for me!

I get very little coding time these days, and I hate working in an environment I haven’t had a chance to really master yet. It’s all trial and error through editing a javascript file in a markdown editor with no syntax highlighting. But it’s still a nice feeling when you can go in and out of a code base in a few hours and scratch the itch you had.

flattr this!

by Thomas at January 22, 2022 10:11 PM

January 21, 2022

Jean-François Fortin TamThe post-2020 Linux server landscape metamorphosis

This is “part one” of a three-part blog post on the challenges of keeping up with the “software updates threadmill” in the land of Linux. The next two parts are going to be about the Linux desktop. This first part focuses on the server side and will require about 5 minutes to read.

Shadow of the Colossus photoOne does not easily disturb a slow-moving, 160 tons colossus…

It used to be that you could leisurely deploy a L.A.M.P. server, and stop caring about it for years because PHP’s releases, and the dependency changes in web applications, were happening really slowly. Not so anymore. With the 7.x and 8.x series, PHP has considerably sped up its releasing cadence, and shortened the shelf life of releases. I’ve seen a drastic shift happen in the policies of web application developers, including Matomo (née Piwik) and Kanboard. Even WordPress, one of the most conservative behemoths of the industry (understandable, given that they power roughly half of the websites in the world), requires PHP 7.4 and no longer runs on PHP 5.x.

“Just put everything in containers and continous-deploy all that shit!” I hear you say, “It’s the future!” But I’m not a sysadmin, I’m not day-in-day-out into that crap, and the only reason I run a dedicated server machine in the office is because Matomo doesn’t scale well on shared hosting and their SaaS prices are quite expensive for an individual when you don’t like being artificially capped to a certain number of visitors per month, and, y’know, “How hard can it be, really?”… but I am happiest when I never have to touch/upgrade that server and don’t have to learn rocket science to deploy something. I understand now how infrastructure work would eventually turn you into a Bastard Operator from Hell™.

Circa 2014, I deployed CentOS 7 on my personal server to be able to run Matomo with better performance, because the Pitivi website had a lot of visitors (which is useful to derive knowledge such as “what screen resolutions do people actually use and what can we afford for our UI’s design?”) and its Matomo database weighted multiple gigabytes.

Fast forward a couple of years, and I’ve fallen behind on Matomo updates because, in part, of newer PHP requirements needing me to resort to third-party repositories to get a recent-enough version of PHP to run it. But I eventually did, and it worked, for a time.

Then in late 2019, CentOS 8 came out, so I switched to that, but you can’t just upgrade from one major CentOS version to another, you have to clean install, so it had to wait until early 2020 when I mustered enough courage and enrolled a friendly geeky neighbor to help me out with that yak shave. “Good, I’ll be set for another half-decade now”, I thought.

Then, at the end of 2020, CentOS 8’s support cycle got shortened with an EoL date set to the end of 2021, and the public was told to migrate to RHEL or CentOS Stream instead, and there was much discontent on the Interwebs about that, to say the least. But fair enough, I migrated to CentOS Stream, because I’m not running a mission-critical server powering the stock exchange; CentOS Stream was the easiest path forward, a couple commands and you’re done. Cool. I can live with that.

My respite was short lived however, and my stress levels are rising again as—since the announcement of CentOS Stream 9—I am realizing there may be no upgrade path planned for the now short-cycle CentOS. Or at least, the lack of such documentation in the announcement, the worrysome comments there (ctrl+F “upgrade”), the fact that my request for clarification remained unanswered, and a redhatter’s comment in this Reddit thread saying that there is no upgrade path (and none is planned), are all factors that do not inspire me much confidence.

I love Red Hat (and the majority of my FLOSS friends work there), have tremendous admiration for their work, and wish them the best. While the creation of CentOS Stream may be a great move from a development & maintenance process standpoint—I can see the appeal, really—the way this situation was handled is not exactly the way I would have handled it from a PR & marketing standpoint (Red Hat got a ton of flak and eroded some of its trust in the process), and finding out that there is no upgrade path between the now short-lifespan CentOS (Stream) releases, is the final nail in the coffin when it comes to people like me being able to casually use this platform. I know I’m not the only one in this situation.

I would have preferred to remain on CentOS for my web infrastructure needs, but this may very reluctantly force me to switch to Debian Stable in a couple of months (when I actually get the time and courage to do so), because I can’t deal with this kind of tedious work a third time in such short timespans. And frankly, at this point, the prospect of going to Rocky Linux does not excite me particularly; I might encounter similar worries about the platform’s future or struggles when it comes to upgrades, so I’m more inclined to hedge my bets with Debian, because it’s the one constant in the Linux landscape: Debian was there in the beginning in 1993, and it can’t be killed—it’s reasonable to say now that it will always be there until the end of times, free of ownership influence. And at least the damned thing can be routinely upgraded from one release to the next. Life is too short to be spent clean-installing servers.

It remains to be seen whether Red Hat’s community entreprise OS strategy will be a net benefit beyond engineering considerations, and I realize I am not the target customer when it comes to enterprise scenarios, nor am I owed anything for free. Yet the price Red Hat has paid in potential brand damage, burning a lot of goodwill in exchange for increased development process efficiency and potentially a small sales uptick, in the resonating words of Dormin, “may be heavy indeed.”

by Jeff at January 21, 2022 09:45 PM

January 16, 2022

GStreamerGStreamer Rust bindings 0.18.0 release

(GStreamer)

A new version of the GStreamer Rust bindings, 0.18.0, was released. Together with the bindings, also a new version of the GStreamer Rust plugins was released.

As usual this release follows the latest gtk-rs 0.15 release and the corresponding API changes.

This release includes optional support for the latest new GStreamer 1.20 APIs. As GStreamer 1.20 was not released yet, these new APIs might still change. The minimum supported version of the bindings is still GStreamer 1.8 and the targetted GStreamer API version can be selected by applications via feature flags.

Apart from this, the new version features a lot of API cleanup and improvements, and the addition of a few missing bindings. As usual, the focus of this release was to make usage of GStreamer from Rust as convenient and complete as possible.

The new release also brings a lot of bugfixes, most of which were already part of the 0.17.x bugfix releases.

Details can be found in the release notes for gstreamer-rs.

The code and documentation for the bindings is available on the freedesktop.org GitLab

as well as on crates.io.

If you find any bugs, notice any missing features or other issues please report them in GitLab.

January 16, 2022 11:00 AM

January 15, 2022

Jan SchmidtPulling on a thread

(Jan Schmidt)

I’m attending the https://linux.conf.au/ conference online this weekend, which is always a good opportunity for some sideline hacking.

I found something boneheaded doing that today.

There have been a few times while inventing the OpenHMD Rift driver where I’ve noticed something strange and followed the thread until it made sense. Sometimes that leads to improvements in the driver, sometimes not.

In this case, I wanted to generate a graph of how long the computer vision processing takes – from the moment each camera frame is captured until poses are generated for each device.

To do that, I have a some logging branches that output JSON events to log files and I write scripts to process those. I used that data and produced:

Pose recognition latency.
dt = interpose spacing, delay = frame to pose latency

Two things caught my eye in this graph. The first is the way the baseline latency (pink lines) increases from ~20ms to ~58ms. The 2nd is the quantisation effect, where pose latencies are clearly moving in discrete steps.

Neither of those should be happening.

Camera frames are being captured from the CV1 sensors every 19.2ms, and it takes that 17-18ms for them to be delivered across the USB. Depending on how many IR sources the cameras can see, figuring out the device poses can take a different amount of time, but the baseline should always hover around 17-18ms because the fast “device tracking locked” case take as little as 1ms.

Did you see me mention 19.2ms as the interframe period? Guess what the spacing on those quantisation levels are in the graph? I recognised it as implying that something in the processing is tied to frame timing when it should not be.

OpenHMD Rift CV1 tracking timing

This 2nd graph helped me pinpoint what exactly was going on. This graph is cut from the part of the session where the latency has jumped up. What it shows is a ~1 frame delay between when the frame is received (frame-arrival-finish-local-ts) before the initial analysis even starts!

That could imply that the analysis thread is just busy processing the previous frame and doesn’t get start working on the new one yet – but the graph says that fast analysis is typically done in 1-10ms at most. It should rarely be busy when the next frame arrives.

This is where I found the bone headed code – a rookie mistake I wrote when putting in place the image analysis threads early on in the driver development and never noticed.

There are 3 threads involved:

  • USB service thread, reading video frame packets and assembling pixels in framebuffers
  • Fast analysis thread, that checks tracking lock is still acquired
  • Long analysis thread, which does brute-force pose searching to reacquire / match unknown IR sources to device LEDs

These 3 threads communicate using frame worker queues passing frames between each other. Each analysis thread does this pseudocode:

while driver_running:
    Pop a frame from the queue
    Process the frame
    Sleep for new frame notification

The problem is in the 3rd line. If the driver is ever still processing the frame in line 2 when a new frame arrives – say because the computer got really busy – the thread sleeps anyway and won’t wake up until the next frame arrives. At that point, there’ll be 2 frames in the queue, but it only still processes one – so the analysis gains a 1 frame latency from that point on. If it happens a second time, it gets later by another frame! Any further and it starts reclaiming frames from the queues to keep the video capture thread fed – but it only reclaims one frame at a time, so the latency remains!

The fix is simple:

while driver_running:
   Pop a frame
   Process the frame
   if queue_is_empty():
     sleep for new frame notification

Doing that for both the fast and long analysis threads changed the profile of the pose latency graph completely.

Pose latency and inter-pose spacing after fix

This is a massive win! To be clear, this has been causing problems in the driver for at least 18 months but was never obvious from the logs alone. A single good graph is worth a thousand logs.

What does this mean in practice?

The way the fusion filter I’ve built works, in between pose updates from the cameras, the position and orientation of each device are predicted / updated using the accelerometer and gyro readings. Particularly for position, using the IMU for prediction drifts fairly quickly. The longer the driver spends ‘coasting’ on the IMU, the less accurate the position tracking is. So, the sooner the driver can get a correction from the camera to the fusion filter the less drift we’ll get – especially under fast motion. Particularly for the hand controllers that get waved around.

Before: Left Controller pose delays by sensor After: Left Controller pose delays by sensor

Poses are now being updated up to 40ms earlier and the baseline is consistent with the USB transfer delay.

You can also visibly see the effect of the JPEG decoding support I added over Christmas. The ‘red’ camera is directly connected to USB3, while the ‘khaki’ camera is feeding JPEG frames over USB2 that then need to be decoded, adding a few ms delay.

The latency reduction is nicely visible in the pose graphs, where the ‘drop shadow’ effect of pose updates tailing fusion predictions largely disappears and there are fewer large gaps in the pose observations when long analysis happens (visible as straight lines jumping from point to point in the trace):

Before: Left Controller poses After: Left Controller poses

by thaytan at January 15, 2022 09:10 AM

January 09, 2022

Sebastian Pölsterlscikit-survival 0.17 released

This release adds support for scikit-learn 1.0, which includes support for feature names. If you pass a pandas dataframe to fit, the estimator will set a feature_names_in_ attribute containing the feature names. When a dataframe is passed to predict, it is checked that the column names are consistent with those passed to fit. The example below illustrates this feature.

For a full list of changes in scikit-survival 0.17.0, please see the release notes.

Installation

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source following these instructions.

Feature Names Support

Prior to scikit-survival 0.17, you could pass a pandas dataframe to estimators’ fit and predict methods, but the estimator was oblivious to the feature names accessible via the dataframe’s columns attribute. With scikit-survival 0.17, and thanks to scikit-learn 1.0, feature names will be considered when a dataframe is passed.

Let’s illustrate feature names support using the Veteran’s Lung Cancer dataset.

from sksurv.datasets import load_veterans_lung_cancer
X, y = load_veterans_lung_cancer()
X.head(3)
Age_in_years Celltype Karnofsky_score Months_from_Diagnosis Prior_therapy Treatment
0 69.0 squamous 60.0 7.0 no standard
1 64.0 squamous 70.0 5.0 yes standard
2 38.0 squamous 60.0 3.0 no standard

The original data has 6 features, three of which contain strings, which we encode as numeric using OneHotEncoder.

from sksurv.preprocessing import OneHotEncoder
transform = OneHotEncoder()
Xt = transform.fit_transform(X)

Transforms now have a get_feature_names_out() method, which will return the name of features after the transformation.

transform.get_feature_names_out()
array(['Age_in_years', 'Celltype=large', 'Celltype=smallcell',
'Celltype=squamous', 'Karnofsky_score', 'Months_from_Diagnosis',
'Prior_therapy=yes', 'Treatment=test'], dtype=object)

The transformed data returned by OneHotEncoder is again a dataframe, which can be used to fit Cox’s proportional hazards model.

from sksurv.linear_model import CoxPHSurvivalAnalysis
model = CoxPHSurvivalAnalysis().fit(Xt, y)

Since we passed a dataframe, the feature_names_in_ attribute will contain the names of the dataframe used when calling fit.

model.feature_names_in_
array(['Age_in_years', 'Celltype=large', 'Celltype=smallcell',
'Celltype=squamous', 'Karnofsky_score', 'Months_from_Diagnosis',
'Prior_therapy=yes', 'Treatment=test'], dtype=object)

This is used during prediction to check that the data matches the format of the training data. For instance, when passing a raw numpy array instead of a dataframe, a warning will be issued.

pred = model.predict(Xt.values)
UserWarning: X does not have valid feature names, but CoxPHSurvivalAnalysis was fitted with feature names

Moreover, it will also check that the order of columns matches.

X_reordered = pd.concat(
(Xt.drop("Age_in_years", axis=1), Xt.loc[:, "Age_in_years"]),
axis=1
)
pred = model.predict(X_reordered)
FutureWarning: The feature names should match those that were passed during fit. Starting version 1.2, an error will be raised.
Feature names must be in the same order as they were in fit.

For more details on feature names support, have a look at the scikit-learn release highlights.

January 09, 2022 03:45 PM

December 17, 2021

Seungha YangGStreamer ❤ Windows: A primer on the cool stuff you’ll find in the 1.20 release

The GStreamer community keeps focusing their efforts on improving Windows support and is still adding various super fascinating features for Windows. GStreamer is about to release a new stable release (1.20) very soon, so you may want to know what’s new on the Windows front 😊

Of course there are not only new features, but also some bug fixes and enhancements since the previous 1.18 stable release series. I guarantee you will find much more stable and optimized Windows-specific elements and features in GStreamer 1.20.

What’s new?

  • A new desktop capture element, named d3d11screencapturesrc including a fancy GstDeviceProvider implementation to enumerate/select target monitors for capture
  • Direct3D11/DXVA decoder supports AV1 and MPEG2 codecs
  • VP9 decoding got more reliable and stable thanks to a newly written codec parser
  • Support for decoding interlaced H.264/AVC streams
  • Hardware-accelerated video deinterlacing
  • Video mixing with the Direct3D11 API
  • MediaFoundation API based hardware encoders gained the ability to receive Direct3D11 textures as an input

New Windows Desktop/Screen Capture Element

There is a new implementation of screen capture for Windows based on the Desktop Duplication API. This new implementation will likely show better performance than the other Windows screen capture elements, in case that you use sufficiently recent Windows versions (Windows 10 or 11).

You may know that there was a Desktop Duplication API based element already added in the 1.18 release, namely the dxgiscreencapsrc element. However, after the 1.18 release, I found various design issues of it and decided to re-write the element to perform better and make it nicer/cleaner. I expect that the old dxgiscreencapsrc element will be deprecated soon and the newly implemented d3d11screencapturesrc will then be the primary Windows desktop capture element.

What’s better than dxgiscreencapsrc?

  • Multiple capture instances: One known limitation of the Desktop Duplication API is that only single capture session for a specific physical monitor is allowed in a single process (But capturing multiple different monitors in a single process is allowed). Due to the limitation, you were not able to configure multiple dxgiscreencapsrc elements in your application to capture the same monitor.
    To overcome the limitation, the new implementation was designed to hold per-monitor dedicated capture object which behaves as if server. Then, each d3d11screencapturesrc element will request frame to the capture object, like a single-server/multiple-client communication model. As a result of the new design, you can place multiple d3d11screencapturesrc elements to capture the same monitor in your application.
  • Performance improvement: The new element will be able to convey captured Direct3D11 texture as-is, without copying it to another system memory. This would be a major factor of performance improvement in case that the d3d11screencapturesrc is linked with Direct3D11-aware elements. Of course, d3d11screencapturesrc element can be linked with non-Direct3D11 elements as old dxgiscreencapsrc element can do.
  • Easier and fancy target monitor selection: The GstDeviceProvider implementation for this element will make it easier to enumerate monitors you want to capture. Alternatively, your application may have its own monitor enumeration method. No worry, you will be able to specify target monitor you want to capture very explicit way by passing HMONITOR handle to d3d11screencapturesrc as well.
    Do you wonder which monitor can be captured via the new element? Then just run gst-device-monitor-1.0 with command.
    See below example. You can guide gst-device-monitor-1.0 by specifying Monitor/Source class so that only monitor capture elements can be shown.
gst-device-monitor-1.0.exe Monitor/Source
Probing devices...
Device found:
name  : Generic PnP Monitor
class : Source/Monitor
caps : video/x-raw(memory:D3D11Memory), format=BGRA, width=2560, height=1440, framerate=[ 0/1, 2147483647/1 ]
video/x-raw, format=BGRA, width=2560, height=1440, framerate=[ 0/1, 2147483647/1 ]
properties:
device.api = d3d11
device.name = "\\\\.\\DISPLAY1"
device.path = "\\\\\?\\DISPLAY\#CMN152A\#5\&15b18d46\&0\&UID512\#\{e6f07b5f-ee97-4a90-b076-33f57bf4eaa7\}"
device.primary = true
device.type = internal
device.hmonitor = 65537
device.adapter.luid = 56049
device.adapter.description = "AMD\ Radeon\(TM\)\ Graphics"
desktop.coordinates.left = 0
desktop.coordinates.top = 0
desktop.coordinates.right = 1707
desktop.coordinates.bottom = 960
display.coordinates.left = 0
display.coordinates.top = 0
display.coordinates.right = 2560
display.coordinates.bottom = 1440
gst-launch-1.0 d3d11screencapturesrc monitor-handle=65537 ! ...

You may also be interested to learn that Andoni Morales Alastruey is working on a very awesome feature which is to extend the functionality of the d3d11screencapturesrc element so that you can capture a specific window instead of the entire desktop area belonging to a physical monitor. Andoni’s work should make this possible. I expect it will be one of the coolest things we will see in the future although it will likely not make it into the upcoming 1.20 release at this point.

AV1 and MPEG2 Codec Support through Direct3D11/DXVA

After we introduced a new design/infrastructure for hardware-accelerated stateless video decoding into GStreamer (See also a blog post written by Víctor Jáquez), the GStreamer community has been focusing on the new design, called GstCodecs.

As described in the Víctor’s blog post, initially I wanted to support hardware-accelerated video decoding for various codecs through Direct3D11/DXVA, and I implemented it based on Chromium’s code, including a base class for that approach to be easily extensible to the other APIs, such as VA-API, NVDEC, and V42L stateless codecs. After that, we (Nicolas Dufresne, Víctor Jáquez, He Junyan and me) worked together on the infrastructure to make it more stable and mature.

After then, He Junyan and Víctor Jáquez implemented infrastructure for AV1 and MPEG2 codecs as well. Thanks to that work, our Direct3D11/DXVA implementation was able to adopt this support seamlessly. Now you will be able to use d3d11av1dec and d3d11mpeg2dec elements (only when your GPU supports those codecs, of course).

Stabilized VP9 decoding via Newly Written Bitstream Parser

GStreamer provides functionality to parse compressed video/audio bitstreams, AVC/HEVC/VP9 codec streams for example. One major use case for the compressed video bitstream parser is stateless video decoding APIs, such as DXVA and VA-API.

Historically, when such bitstream parsers were written, GStreamer-vaapi was the main consumer of their output and therefore they focused on being VA-API friendly.

But things have changed. Nowadays the trend in GStreamer is towards stateless decoding implementations. That means our parser implementations are being more generic, and they are being more and more improved to support various use cases via the newly implemented GstCodecs infrastructure.

We also strongly recommend using newly written implementations (GstVA over Gstreamer-vaapi, nvh264sldec than nvh264dec for example) for end-users of hardware-accelerated video decoding. I expect our new GstCodecs-based decoder elements will be the major implementation and promoted over existing ones in the near future.

Back to the VP9 decoding story, when I worked on Direct3D11/DXVA based VP9 decoding, I found that the already-implemented VP9 bitstream parsing library is too VA-API specific and that there was a lack of functionalities required by other stateless video decoding APIs (DXVA and NVDEC specifically). After reading the VP9 specification and DXVA VP9 specification documents carefully, I decided to re-wrote the parsing library to make it more generic and cleaner. Now, as per my test, newly written stateless VP9 video decoders show better compliance score than before with new VP9 parser.

Interlaced H.264/AVC Decoding Support

As I mentioned above, I originally implemented the GstCodecs infrastructure based on Chromium’s code base. That was clean and lean in some aspects, but the Chromium code base didn’t support interlaced H.264/AVC decoding.

So, in order to support decoding interlaced H.264/AVC streams, I had to refactor GstCodecs’ code, and it was applied not only for Direct3D11/DXVA, but also for VA-API.

Recently I also added interlaced decoding support to the newly written NVIDIA stateless H.264/AVC decoder as well.

Hardware Accelerated Video Deinterlacing

As a follow-up task since we now have the ability to decode interlaced H.264/AVC stream, we needed to be able to process interlaced streams and deinterlace them for rendering. You will now get this functionality by using the d3d11deinterlace element.

NOTE: This new d3d11deinterlace element should work well not only with streams decoded by a Direct3D11/DXVA decoder, but all interlaced streams including those generated by software based decoders.

Direct3D11 based Video Mixing/Composing

d3d11compositor element was added to support composing/mixing multiple video streams into one, like what compositor (software implementation) or glvideomixer (GPU based, but for OpenGL) elements do. In case you use other Direct3D11-aware elements in your pipeline, I’d recommend you use the d3d11compositor element for video compositing. That will likely be the best candidate in terms of performance.

MediaFoundation Video Encoders got faster

See my blog post. You will be likely able to see better performance than before (1.18) in case that you’ve used Direct3D11/DXVA decoder and MediaFoundation video encoder pair.

Acknowledgements

The above mentioned improvements were not made only by myself, but were the result of team work with other GStreamer developers. Special thanks to Nicolas Dufresne, Víctor Jáquez and He Junyan.

What’s next on the TO-DO list?

  • Direct3D12: This is the future/next technology on Windows and I already implemented a proof-of-concept decoder implementation which works well with AMD GPU. Although it needs some enhancement, I expect we will end up needing Direct3D12 based video processing if your application is based on Direct3D12, or if you need more advanced features that the existing Direct3D11 based implementation can not provide.
  • Add more GPU vendor-specific encoder API support: Recently I found that the existing Intel MSDK plugin can perform much better on Windows and also can be more featureful both on Windows and Linux, but existing Intel MSDK plugin needed to be re-designed from my perspective. So I verified it via my proposal. Moreover, there’s also an AMD GPU specific approach which I am looking at. Why not NVIDIA? I know how GStreamer’s NVIDIA encoder can be better for Windows.
  • Direct3D Compute Shader (a.k.a DirectCompute): That would be very nice feature to support for generic purpose commutation tasks, like most of things CUDA or OpenCL can do (HDR tone-mapping for example).

Anything you want for Windows? please ping me, I will likely be there 😁

by Seungha Yang at December 17, 2021 04:30 PM

Mathieu Duponchelleawstranscriber

awstranscriber, a GStreamer wrapper for AWS Transcribe API

If all you want to know is how to use the element, you can head over here.

I actually implemented this element over a year ago, but never got around to posting about it, so this will be the first post in a series about speech-to-text, text processing and closed captions in GStreamer.

Speech-to-text has a long history, with multiple open source libraries implementing a variety of approaches for that purpose[1], but they don't necessarily offer either the same accuracy or ease of use as proprietary services such as Amazon's Transcribe API.

My overall goal for the project, which awstranscriber was only a part of, was the ability to generate a transcription for live streams and inject it into the video bitstream or carry it alongside.

The main requirements were to keep it as synchronized as possible with the content, while keeping latency in check. We'll see how these requirements informed the design of some of the elements, in particular when it came to closed captions.

My initial intuition about text was, to quote a famous philosopher: "How hard can it be?"; turns out the answer was "actually more than I would have hoped".

[1] pocketsphinx, Kaldi just to name a few

The element

In GStreamer terms, the awstranscriber element is pretty straightforward: take audio in, push timed text out.

The Streaming API for AWS is (roughly) synchronous: past a 10 second buffer duration, the service will only consume audio data in real time, I thus decided to make the element a live one by:

  • synchronizing its input to the clock
  • returning NO_PREROLL from its state change function
  • reporting a latency

Event handling is fairly light: The element doesn't need to handle seeks in any particular manner, only consumes and produces fixed caps, and can simply disconnect from and reconnect to the service when it gets flushed.

As the element is designed for a live use case with a fixed maximum latency, it can't wait for complete sentences to be formed before pushing text out. And as one intended consumer for its output is closed captions, it also can't just push the same sentence multiple times as it is getting constructed, because that would completely overflow the CEA 608 bandwidth (more about that in later blog posts, but think roughly 2 characters per video frame maximum).

Instead, the goal is for the element to push one word (or punctuation symbol) at a time.

Initial implementation

When I initially implemented the element, the Transcribe API had a pretty significant flaw for my use case: while it provided me with "partial" results, which sounded great for lowering the latency, there was no way to identify partial results between messages.

Here's an illustration (this is just an example, the actual output is more complex).

After feeding five seconds of audio data to the service, I would receive a first message:

{
  words: [
    {
      start_time: 0.5,
      end_time: 0.8,
      word: "Hello",
    }
  ]

  partial: true,
}

Then after one more second I would receive:

{
  words: [
    {
      start_time: 0.5,
      end_time: 0.9,
      word: "Hello",
    },
    {
      start_time: 1.1,
      end_time: 1.6,
      word: "World",
    }
  ]

  partial: true,
}

and so on, until the service decided it was done with the sentence and started a new one. There were multiple problems with this, compounding each other:

  • The service seemed to have no predictable "cut-off" point, that is it would sometimes provide me with 30-second long sentences before considering it finished (partial: false) and starting a new one.

  • As long as a result was partial, the service could change any of the words it had previously detected, even if they were first reported 10 seconds prior.

  • The actual timing of the items could also shift (slightly)

This made the task of outputting one word at a time, just in time to honor the user-provided latency, seemingly impossible: as items could not be strictly identified from one partial result to the next, I could not tell whether a given word whose end time matched with the running time of the element had already been pushed or had been replaced with a new interpretation by the service.

Continuing with the above example, and admitting a 10-second latency, I could decide at 9 seconds running time to push "Hello", but then receive a new partial result:

{
  words: [
    {
      start_time: 0.5,
      end_time: 1.0,
      word: "Hey",
    },
    {
      start_time: 1.1,
      end_time: 1.6,
      word: "World",
    },
    ...
  ]

  partial: true,
}

What to then do with that "Hey"? Was it a new word that ought to be pushed? An old one with a new meaning arrived too late that ought to be discarded? Artificial intelligence attempting first contact?

Fortunately, after some head scratching and ~~some~~lots of blankly looking at the JSON, I noticed a behavior which while undocumented seemed to always hold true: while any feature of an item could change, the start time would never grow past its initial value.

Given that, I finally managed to write some quite convoluted code that ended up yielding useful results, though punctuation was very hit and miss, and needed some more complex conditions to (sometimes) get output.

You can still see that code in all its glory here, I'm happy to say that it is gone now!

Second iteration

Supposedly, you always need to write a piece of code three times before it's good, but I'm happy with two in this case.

6 months ago or so, I stumbled upon an innocuously titled blog post from AWS' machine learning team:

Improve the streaming transcription experience with Amazon Transcribe partial results stabilization

And with those few words, all my problems were gone!

In practice when this feature is enabled, the individual words that form a partial result are explicitly marked as stable: once that is the case, they will no longer change, either in terms of timing or contents.

Armed with this, I simply removed all the ugly, complex, scarily fragile code from the previous iteration, and replaced it all with a single, satisfyingly simple index variable: when receiving a new partial result, simply push all words from index to last_stable_result, update index, done.

The output was not negatively impacted in any way, in fact now the element actually pushes out punctuation reliably as well, which doesn't hurt.

I also exposed a property on the element to let the user control how aggressively the service actually stabilizes results, offering a trade-off between latency and accuracy.

Quick example

If you want to test the element, you'll need to build gst-plugins-rs[1], set up an AWS account, and obtain credentials which you can either store in a credentials file, or provide as environment variables to rusoto.

Once that's done, and you have installed the plugin in the right place or set the GST_PLUGIN_PATH environment variable to the directory where the plugin got built,you should be able to run such a pipeline:

gst-launch-1.0 uridecodebin uri=https://storage.googleapis.com/www.mathieudu.com/misc/chaplin.mkv name=d d. ! audio/x-raw ! queue ! audioconvert ! awstranscriber ! fakesink dump=true

Example output:

Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Redistribute latency...
Redistribute latency...
Redistribute latency...0.0 %)
00000000 (0x7f7618011a80): 49 27 6d                                         I'm             
00000000 (0x7f7618011ac0): 73 6f 72 72 79                                   sorry           
00000000 (0x7f7618011b00): 2e                                               .               
00000000 (0x7f7618011e10): 49                                               I               
00000000 (0x7f76180120c0): 64 6f 6e 27 74                                   don't           
00000000 (0x7f7618012100): 77 61 6e 74                                      want            
00000000 (0x7f76180127a0): 74 6f                                            to              
00000000 (0x7f7618012c70): 62 65                                            be              
00000000 (0x7f7618012cb0): 61 6e                                            an              
00000000 (0x7f7618012d70): 65 6d 70 65 72 6f 72                             emperor         
00000000 (0x7f7618012db0): 2e                                               .               
00000000 (0x7f7618012df0): 54 68 61 74 27 73                                That's          
00000000 (0x7f7618012e30): 6e 6f 74                                         not             
00000000 (0x7f7618012e70): 6d 79                                            my              
00000000 (0x7f7618012eb0): 62 75 73 69 6e 65 73 73                          business

I could probably recite that whole "The dictator" speech by now by the way, one more clip that is now ruined for me. The predicaments of multimedia engineering!

gst-inspect-1.0 awstranscriber for more information on its properties.

[1] you don't need to build the entire project, but instead justcd /net/rusoto before running cargo build

Thanks

  • Sebastian Dröge at Centricular (gst Rust goodness)

  • Jordan Petridis at Centricular (help with the initial implementation)

  • cablecast for sponsoring this work!

Next

In future blog posts, I will talk about closed captions, probably make a few mistakes in the process, and explain why text processing isn't necessarily all that easy.

Feel free to comment if you have issues, or actually end up implementing interesting stuff using this element!

December 17, 2021 03:18 PM

December 15, 2021

Mathieu Duponchellewebrtcsink

webrtcsink, a new GStreamer element for WebRTC streaming

webrtcsink is an all-batteries included GStreamer WebRTC producer, that tries its best to do The Right Thing™.

Following up on the last part of my last blog post, I have spent some time these past few months working on a WebRTC sink element to make use of the various mitigation techniques and congestion control mechanisms currently available in GStreamer.

This post will briefly present the implementation choices I made, the current features and my ideas for future improvements, with a short demo at the end.

Note that webrtcsink requires latest GStreamer main at the time of writing, all required patches will be part of the 1.20 release.

The element

The choice I made here was to make this element a simple sink: while it wraps webrtcbin, which supports both sending and receiving media streams, webrtcsink will only offer sendonly streams to its consumers.

The element, unlike webrtcbin, only accepts raw audio and video streams, and takes care of the encoding and payloading itself.

Properties are exposed to let the application control what codecs are offered to consumers (and in what order), for instance video-caps=video/x-vp9;video/x-vp8, and the choice of the actual encoders can be controlled through the GStreamer feature rank mechanism.

This decision means that webrtcsink has direct control over the encoders, in particular it can update their target bitrate according to network conditions, more on that later.

Signalling

Applications that use webrtcsink can implement their own signalling mechanism, by implementing a rust API, the element however comes with its own default signalling protocol, implemented by the default signaller alongside a standalone signalling server script, written in python.

The protocol is based on the protocol from the gst-examples, extended to support a 1 producer -> N consumers configuration, it is admittedly a bit ugly but does the job, I have plans for improving this, see Future prospects.

Congestion control

webrtcsink makes use of the statistics it gathers thanks to the transport-cc RTP extension in order to modulate the target bitrate produced by the video encoders when congestion is detected on the network.

The heuristic I implemented is a hybrid of a Proof-of-Concept Matthew Waters implemented recently and the Google Congestion Control algorithm.

As far as my synthetic testing has gone, it works decently and is fairly reactive, it will however certainly evolve in the future as more real-life testing happens, more on that later.

Packet loss mitigation techniques

webrtcsink will offer to honor retransmission requests, and will propose sending ulpfec + red packets for Forward Error Correction on video streams.

The amount of FEC overhead is modified dynamically alongside the bitrate in order not to cause the peer connection to suffer from self-inflicted wounds: when the network is congested, sending more packets isn't necessarily the brightest idea!

The algorithm to update the overhead is very naive at the moment, it could be refined for instance by taking the roundtrip time into account: when that time is low enough, retransmission requests will usually be sufficient for addressing packet loss, and the element could reduce the amount of FEC packets it sends out accordingly.

Statistics monitoring

webrtcsink exposes the statistics from webrtcbin and adds a few of its own through a property on the element.

I have implemented a simple server / client application as an example, the web application can plot a few handpicked statistics for any given consumer, and turned out to be quite helpful as a debugging / development tool, see the demo video for an illustration.

Future prospects

In no particular order, here is a wishlist for future improvements:

  • Implementing the default signalling server as a rust crate. This will allow running the signalling server either standalone, or letting webrtcsink instantiate it in process, thus reducing the amount of plumbing needed for basic usage. In addition, that crate would expose a trait to let applications extend the default protocol without having to reimplement their own.

  • Sanitize the default protocol: at the moment it is an ugly mixture of JSON and plaintext, it does the job but could be nicer.

  • More congestion control algorithms: at the moment the element exposes a property to pick the congestion control method, either homegrown or disabled, implementing more algorithms (for instance GCC, NADA or SCReAM) can't hurt.

  • Implementing flexfec: this is a longstanding wishlist item for me, ULP FEC has shortcomings that are addressed by flexfec, a GStreamer implementation would be generally useful.

  • High-level integration tests: I am not entirely sure what those would look like, but the general idea would be to set up a peer connection from the element to various browsers, apply various network conditions, and verify that the output isn't overly garbled / frozen / poor quality. That is a very open-ended task because the various components involved can't be controlled in a fully deterministic manner, and the tests should only act as a robust alarm mechanism and not try to validate the final output at the pixel level.

Demo

Thanks

This new element was made possible in part thanks to the contributions from

  • Matthew Waters at Centricular (webrtcbin)

  • Sebastian Droege at Centricular (GStreamer rust goodness)

  • Olivier from Collabora (RTP stack)

  • The good people at Pexip (RTP stack, transport-cc)

  • Sequence for sponsoring this work

This is not an exhaustive list!

December 15, 2021 12:00 AM

December 13, 2021

Andy Wingowebassembly: the new kubernetes?

(Andy Wingo)

I had an "oh, duh, of course" moment a few weeks ago that I wanted to share: is WebAssembly the next Kubernetes?

katers gonna k8s

Kubernetes promises a software virtualization substrate that allows you to solve a number of problems at the same time:

  • Compared to running services on bare metal, Kubernetes ("k8s") lets you use hardware more efficiently. K8s lets you run many containers on one hardware server, and lets you just add more servers to your cluster as you need them.

  • The "cloud of containers" architecture efficiently divides up the work of building server-side applications. Your database team can ship database containers, your backend team ships java containers, and your product managers wire them all together using networking as the generic middle-layer. It cuts with the grain of Conway's law: the software looks like the org chart.

  • The container abstraction is generic enough to support lots of different kinds of services. Go, Java, C++, whatever -- it's not language-specific. Your dev teams can use what they like.

  • The operations team responsible for the k8s servers that run containers don't have to trust the containers that they run. There is some sandboxing and security built-in.

K8s itself is an evolution on a previous architecture, OpenStack. OpenStack had each container be a full virtual machine, with a whole kernel and operating system and everything. K8s instead generally uses containers, which don't generally require a kernel in the containers. The result is that they are lighter-weight -- think Docker versus VirtualBox.

In a Kubernetes deployment, you still have the kernel at a central place in your software architecture. The fundamental mechanism of containerization is the Linux kernel process, with private namespaces. These containers are then glued together by TCP and UDP sockets. However, though one or more kernel process per container does scale better than full virtual machines, it doesn't generally scale to millions of containers. And processes do have some start-up time -- you can't spin up a container for each request to a high-performance web service. These technical contraints lead to certain kinds of system architectures, with generally long-lived components that keep some kind of state.

k8s <=? w9y

Server-side WebAssembly is in a similar space as Kubernetes -- or rather, WebAssembly is similar to processes plus private namespaces. WebAssembly gives you a good abstraction barrier and (can give) high security isolation. It's even better in some ways because WebAssembly provides "allowlist" security -- it has no capabilities to start with, requiring that the "host" that runs the WebAssembly explicitly delegate some of its own capabilities to the guest WebAssembly module. Compare to processes which by default start with every capability and then have to be restricted.

Like Kubernetes, WebAssembly also gets you Conway's-law-affine systems. Instead of shipping containers, you ship WebAssembly modules -- and some metadata about what kinds of things they need from their environment (the 'imports'). And WebAssembly is generic -- it's a low level virtual machine that anything can compile to.

But, in WebAssembly you get a few more things. One is fast start. Because memory is data, you can arrange to create a WebAssembly module that starts with its state pre-initialized in memory. Such a module can start in microseconds -- fast enough to create one on every request, in some cases, just throwing away the state afterwards. You can run function-as-a-service architectures more effectively on WebAssembly than on containers. Another is that the virtualization is provided entirely in user-space. One process can multiplex between many different WebAssembly modules. This lets one server do more. And, you don't need to use networking to connect WebAssembly components; they can transfer data in memory, sometimes even without copying.

(A digression: this lightweight in-process aspect of WebAssembly makes it so that other architectures are also possible, e.g. this fun hack to sandbox a library linked into Firefox. They actually shipped that!)

I compare WebAssembly to K8s, but really it's more like processes and private namespaces. So one answer to the question as initially posed is that no, WebAssembly is not the next Kubernetes; that next thing is waiting to be built, though I know of a few organizations that have started already.

One thing does seem clear to me though: WebAssembly will be at the bottom of the new thing, and therefore that the near-term trajectory of WebAssembly is likely to follow that of Kubernetes, which means...

  • Champagne time for analysts!

  • The Gartner ✨✨Magic Quadrant✨✨™®© rides again

  • IBM spins out a new WebAssembly division

  • Accenture starts asking companies about their WebAssembly migration plan

  • The Linux Foundation tastes blood in the waters

And so on. I see turbulent waters in the near-term future. So in that sense, in which Kubernetes is not essentially a technical piece of software but rather a nexus of frothy commercial jousting, then yes, certainly: we have a fun 5 years or so ahead of us.

by Andy Wingo at December 13, 2021 03:50 PM

December 08, 2021

Víctor JáquezGstVA in GStreamer 1.20

It was a year and half ago when I announced a new VA-API H.264 decoder element in gst-plugins-bad. And it was bundled in GStreamer release 1.18 a couple months later. Since then, we have been working adding more decoders and filters, fixing bugs, and enhancing its design. I wanted to publish this blog post as soon as release 1.20 was announced, but, since the developing window is closed, which means no more new features will be included, I’ll publish it now, to create buzz around the next GStreamer release.

Here’s the list of new GstVA decoders (of course, they are only available if your driver supports them):

  • vah265dec
  • vavp8dec
  • vavp9dec
  • vaav1dec
  • vampeg2dec

Also, there are a couple new features in vah264dec (common to all gstcodecs-based H.264 decoders):

  • Supports interlaced streams (vah265dec and vampeg2dec too).
  • Added a compliance property to trick the specification conformance for lower the latency, for example, or to enable non-standard features.

But not only decoders, there are two new elements for post-processing:

  • vapostproc
  • vadeinterlace

vapostproc is similar to vaapipostproc but without the interlace operation, since it was moved to another element. The reason for this is because there are deinterlacing methods which require to hold a list of referenced frames, thus these methods are broken in vaapipostproc, and adding them would increase the complexity of the element with no need. To keep things simple it’s better to handle deinterlacing in a different element.

This is the list of filters and features supported by vapostproc:

  • Color conversion
  • Resizing
  • Cropping
  • Color balance (Intel only -so far-)
  • Video direction (Intel only)
  • Skin tone enhancement (Intel only)
  • Denoise and Sharpen (Intel only)

And, I ought to say, HDR is in the pipeline, but it will be released after 1.20.

While vadeinterlace only does that, deinterlacing. But it supports all the available methods currently in the VA-API specification, using the new way to select the field to extract, since the old one (used by GStreamer-VAAPI and FFMPEG) is a bit more expensive.

Finally, both video filters, if they cannot handle the income format, they are configured in passthrough mode.

But there are not only new elements, there’s also a new library!

Since many others elements need to share a common VADisplay in the GStreamer pipeline, the new library expose only the GstVaDisplay object by now. The new library must be thin and lean, exposing only what it’s requested by other elements, such as gst-msdk. We have pending to merge after 1.20, for example, the add of GstContext helpers, and the plan is to expose the allocators and bufferpools later.

Another huge task are encoders. After the freeze, we’ll merge the first
implementation of the H.264 encoder
, and add, in different iterations, more encoders.

As I said in the previous blog post, all these elements are ranked as none, so the won’t be autoplugged, for example by playbin. To do so, users need to export the environment variable GST_PLUGIN_FEATURE_RANK as documented.

$ GST_PLUGINS_FEATURE_RANK=vah264dec:MAX,vah265dec:MAX,vampeg2dec:MAX,vavp8dec:MAX,vavp9dec:MAX gst-play-1.0 stream.mp4

Thanks a bunch to He Junyan, Seungha Yang and Nicolas Dufresne, for all the effort and care.


Still, the to-do list is large enough. Just to share what I have in my notes:

  • Add a new upload method in glupload to interop with VA surfaces — though this hardly will be merged since it creates a circular dependency between -base and -bad.
  • vavc1dec — it might need a rewrite of vc1parse.
  • vajpegdec — it needs a rewrite of jpegparse.
  • vaalphacombine — decoding alpha channel with VA within vp9alphacodebin and vp8alphacodebin
  • vamixer — similar to compositor, glmixer or vaapioverlay, to compose a single frame from different video streams.
  • And encoders (mainly H.264 and H.265).

As a final mode, GStreamer-VAAPI has enter into maintenance mode. The general plan, without any promise or dates, is to deprecate it when most of its use cases were covered by GstVA.

by vjaquez at December 08, 2021 11:58 AM

November 25, 2021

Jean-François Fortin TamCMSes & static site generators: why I (still) chose WordPress for my business websites

For many years, until 2021, the idéemarque* website was my own static HTML hand-written codebase, which had the advantage of performance and flexibility (vs “what a theme dictates”), but was also impossible to scale, because it had a bus factor of 1 and a pain level over 9000. I even had it version-controlled in Git all the way back to 2014 (back when I finally joined the Git masochists sect). I was the only person in the world who could maintain it or contribute to it, because, quite frankly, you need to reach geek level 30+ to enter that dungeon, while most people, including new generations, don’t know how to use computers.

Pictured: How I felt whenever I had to make changes to my static website.

I spent time evaluating various alternatives to “coding everything by hand”, including Hugo and Publii (I’ll spare you all the framework-style templating systems turbonerd crap like Bootstrap, Smarty, Django, etc.). Hugo and Publii are very cool conceptually, and would work wonders for a casual geek’s blog, but there are a number of problems they cannot address conceptually, in my view:

  • Not advanced and flexible enough for me to do “anything” easily (oh you want to integrate XYZ dynamic features? Yeah, let’s see how well you paint yourself into that corner)
  • Relies on their own themes that I can’t be bothered to learn hacking (and I don’t want to be hiring a dev specialized in those technologies to do it for me) just to be able to accomplish my vision (“I want it to look just like that!“). It’s easy to make a nice-looking hello-world website if you fit within the theme’s planned usecases, but as soon as you start saying “I want every core page to have a unique layout depending on the contents” and “I want the front page to look & behave differently from all the rest”, you run into limitations pretty quickly and say “Screw this, if I’m going to start hacking this thing, I’m no better off than writing my own custom website codebase myself.”
  • They are arguably designed and best suited for the “one user” usecase. Collaborative editing and permissions management? Not happening.
  • Nobody but turbonerds has the skills to understand and manage a static site generator. Most people can’t be bothered to handle files and folders. They live in the browser, and simply giving them a log-in to your site is the only way I can see to lower the barriers to entry among your team.

As much as we geeks love to hate WordPress and the bloat it represents, it is pretty much the standard website building platform that is visibly thriving, and that we know will still be Free & Open-Source and available ten years from now; and for all its warts, with enough experience and geekdom you can tweak and harden it into something fairly reliable and somewhat secure (the fact that automatic updates are now available for both the core and extensions helps; yes, it makes me nervous to think that things may change and possibly break on their own, but you can’t afford to have an un-patched website nowadays, and I think the security benefits of automatic updates outweigh the theoretical compatibility worries).

Combined with the new Gutenberg editing experience, some themes out there are also flexible enough to easily lay out complex pages without feeling like I’m running into limitations all the time.

Pictured: me deploying WordPress for my mostly-static sites and trying to make it fast.

Not everything is perfect of course. As of 2021, on the performance front, getting consistent and reliable caching working with SuperCache is a mindboggling experience, full of “mandelbugs” (like this one); in my case, each of my websites has at least some (or all) of the caching behavior not working (whether it is some pages never being able to generate cache files, or the cached files not being retained, no matter what you do and what combination of voodoo incantation and settings you use), but maybe someday someone will complete a heavy round of refactoring to improve the situation (maybe you can help there?) and things will Just Work™. But for now, I guess I’ll live with that.

All in all, it is only from 2019 onwards, after much research (and much technological progress in general), that I found myself with enough tooling to make this work in a way that would meet my expectations of design & workflow flexibility, and therefore feel confident enough that this will be my long-term solution for a particular type/segment of my websites. My personal website (of which this blog is only a subset) still is hand-coded, however, because it “does the job.”

Years ago, someone once told me that whenever someone in your team decides to write your company’s website from scratch (or using some templating system), they “inevitably end up reimplementing WordPress… poorly.”

So yeah. We’re using WordPress.


*: idéemarque is the first and only Free & Open-Source branding agency that contributes to the desktop Linux landscape on a daily basis, because that's what I do and I'm already in too deep. Or, as an academic would say, I'm just "unreasonably persistent."

by Jeff at November 25, 2021 06:15 PM

November 03, 2021

GStreamerGStreamer 1.19.3 unstable development release

(GStreamer)

GStreamer 1.19.3 unstable development release

The GStreamer team is pleased to announce the third development release in the unstable 1.19 release series.

The unstable 1.19 release series adds new features on top of the current stable 1.18 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

The unstable 1.19 release series is for testing and development purposes in the lead-up to the stable 1.20 series which is scheduled for release in a few weeks time. Any newly-added API can still change until that point, although it is rare for that to happen.

Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

This development release is primarily for distributors and early adaptors and anyone who still needs to update their build/packaging setup for Meson.

Binaries for Android, iOS, Mac OS X and Windows will also be available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please let us know of any issues you run into by filing an issue in Gitlab.

November 03, 2021 04:00 PM

October 30, 2021

Sebastian Pölsterlscikit-survival 0.16 released

I am proud to announce the release if version 0.16.0 of scikit-survival, The biggest improvement in this release is that you can now change the evaluation metric that is used in estimators’ score method. This is particular useful for hyper-parameter optimization using scikit-learn’s GridSearchCV. You can now use as_concordance_index_ipcw_scorer, as_cumulative_dynamic_auc_scorer, or as_integrated_brier_score_scorer to adjust the score method to your needs. The example below illustrates how to use these in practice.

For a full list of changes in scikit-survival 0.16.0, please see the release notes.

Installation

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source following these instructions.

Hyper-Parameter Optimization with Alternative Metrics

The code below is also available as a notebook and can directly be executed by clicking

In this example, we are going to use the German Breast Cancer Study Group 2 dataset. We want to fit a Random Survival Forest and optimize it’s max_depth hyper-parameter using scikit-learn’s GridSearchCV.

Let’s begin by loading the data.

import numpy as np
from sksurv.datasets import load_gbsg2
from sksurv.preprocessing import encode_categorical
gbsg_X, gbsg_y = load_gbsg2()
gbsg_X = encode_categorical(gbsg_X)
lower, upper = np.percentile(gbsg_y["time"], [10, 90])
gbsg_times = np.arange(lower, upper + 1)

Next, we create an instance of Random Survival Forest.

from sksurv.ensemble import RandomSurvivalForest
rsf_gbsg = RandomSurvivalForest(random_state=1)

We define that we want to evaluate the performance of each hyper-parameter configuration by 3-fold cross-validation.

from sklearn.model_selection import KFold
cv = KFold(n_splits=3, shuffle=True, random_state=1)

Next, we define the set of hyper-parameters to evaluate. Here, we search for the best value for max_depth between 1 and 10 (excluding). Note that we have to prefix max_depth with estimator__, because we are going to wrap the actual RandomSurvivalForest instance with one of the classes above.

cv_param_grid = {
"estimator__max_depth": np.arange(1, 10, dtype=int),
}

Now, we can put all the pieces together and start searching for the best hyper-parameters that maximize concordance_index_ipcw.

from sklearn.model_selection import GridSearchCV
from sksurv.metrics import as_concordance_index_ipcw_scorer
gcv_cindex = GridSearchCV(
as_concordance_index_ipcw_scorer(rsf_gbsg, tau=gbsg_times[-1]),
param_grid=cv_param_grid,
cv=cv,
).fit(gbsg_X, gbsg_y)

The same process applies when optimizing hyper-parameters to maximize cumulative_dynamic_auc.

from sksurv.metrics import as_cumulative_dynamic_auc_scorer
gcv_iauc = GridSearchCV(
as_cumulative_dynamic_auc_scorer(rsf_gbsg, times=gbsg_times),
param_grid=cv_param_grid,
cv=cv,
).fit(gbsg_X, gbsg_y)

While as_concordance_index_ipcw_scorer and as_cumulative_dynamic_auc_scorer can be used with any estimator, as_integrated_brier_score_scorer is only available for estimators that provide the predict_survival_function method, which includes RandomSurvivalForest. If available, hyper-parameters that maximize the negative intergrated time-dependent Brier score will be selected, because a lower Brier score indicates better performance.

from sksurv.metrics import as_integrated_brier_score_scorer
gcv_ibs = GridSearchCV(
as_integrated_brier_score_scorer(rsf_gbsg, times=gbsg_times),
param_grid=cv_param_grid,
cv=cv,
).fit(gbsg_X, gbsg_y)

Finally, we can visualize the results of the grid search and compare the best performing hyper-parameter configurations (marked with a red dot).

import matplotlib.pyplot as plt
def plot_grid_search_results(gcv, ax, name):
ax.errorbar(
x=gcv.cv_results_["param_estimator__max_depth"].filled(),
y=gcv.cv_results_["mean_test_score"],
yerr=gcv.cv_results_["std_test_score"],
)
ax.plot(
gcv.best_params_["estimator__max_depth"],
gcv.best_score_,
'ro',
)
ax.set_ylabel(name)
ax.yaxis.grid(True)
_, axs = plt.subplots(3, 1, figsize=(6, 6), sharex=True)
axs[-1].set_xlabel("max_depth")
plot_grid_search_results(gcv_cindex, axs[0], "c-index")
plot_grid_search_results(gcv_iauc, axs[1], "iAUC")
plot_grid_search_results(gcv_ibs, axs[2], "$-$IBS")
Results of hyper-parameter optimization.

Results of hyper-parameter optimization.

When optimizing for the concordance index, a high maximum depth works best, whereas the other metrics are best when choosing a maximum depth of 5 and 6, respectively.

October 30, 2021 10:41 AM

October 25, 2021

Jan Schmidt2.5 years of Oculus Rift

(Jan Schmidt)

Once again time has passed, and another update on Oculus Rift support feels due! As always, it feels like I’ve been busy with work and not found enough time for Rift CV1 hacking. Nevertheless, looking back over the history since I last wrote, there’s quite a lot to tell!

In general, the controller tracking is now really good most of the time. Like, wildly-swing-your-arms-and-not-lose-track levels (most of the time). The problems I’m hunting now are intermittent and hard to identify in the moment while using the headset – hence my enthusiasm over the last updates for implementing stream recording and a simulation setup. I’ll get back to that.

Outlier Detection

Since I last wrote, the tracking improvements have mostly come from identifying and rejecting incorrect measurements. That is, if I have 2 sensors active and 1 sensor says the left controller is in one place, but the 2nd sensor says it’s somewhere else, we’ll reject one of those – choosing the pose that best matches what we already know about the controller. The last known position, the gravity direction the IMU is detecting, and the last known orientation. The tracker will now also reject observations for a time if (for example) the reported orientation is outside the range we expect. The IMU gyroscope can track the orientation of a device for quite a while, so can be relied on to identify strong pose priors once we’ve integrated a few camera observations to get the yaw correct.

It works really well, but I think improving this area is still where most future refinements will come. That and avoiding incorrect pose extractions in the first place.

Plot of headset tracking – orientation and position

The above plot is a sample of headset tracking, showing the extracted poses from the computer vision vs the pose priors / tracking from the Kalman filter. As you can see, there are excursions in both position and orientation detected from the video, but these are largely ignored by the filter, producing a steadier result.

Left Touch controller tracking – orientation and position

This plot shows the left controller being tracked during a Beat Saber session. The controller tracking plot is quite different, because controllers move a lot more than the headset, and have fewer LEDs to track against. There are larger gaps here in the timeline while the vision re-acquires the device – and in those gaps you can see the Kalman filter interpolating using IMU input only (sometimes well, sometimes less so).

Improved Pose Priors

Another nice thing I did is changes in the way the search for a tracked device is made in a video frame. Before starting looking for a particular device it always now gets the latest estimate of the previous device position from the fusion filter. Previously, it would use the estimate of the device pose as it was when the camera exposure happened – but between then and the moment we start analysis more IMU observations and other camera observations might arrive and be integrated into the filter, which will have updated the estimate of where the device was in the frame.

This is the bit where I think the Kalman filter is particularly clever: Estimates of the device position at an earlier or later exposure can improve and refine the filter’s estimate of where the device was when the camera captured the frame we’re currently analysing! So clever. That mechanism (lagged state tracking) is what allows the filter to integrate past tracking observations once the analysis is done – so even if the video frame search take 150ms (for example), it will correct the filter’s estimate of where the device was 150ms in the past, which ripples through and corrects the estimate of where the device is now.

LED visibility model

To improve the identification of devices better, I measured the actual angle from which LEDs are visible (about 75 degrees off axis) and measured the size. The pose matching now has a better idea of which LEDs should be visible for a proposed orientation and what pixel size we expect them to have at a particular distance.

Better Smoothing

I fixed a bug in the output pose smoothing filter where it would glitch as you turned completely around and crossed the point where the angle jumps from +pi to -pi or vice versa.

Improved Display Distortion Correction

I got a wide-angle hi-res webcam and took photos of a checkerboard pattern through the lens of my headset, then used OpenCV and panotools to calculate new distortion and chromatic aberration parameters for the display. For me, this has greatly improved. I’m waiting to hear if that’s true for everyone, or if I’ve just fixed it for my headset.

Persistent Config Cache

Config blocks! A long time ago, I prototyped code to create a persistent OpenHMD configuration file store in ~/.config/openhmd. The rift-kalman-filter branch now uses that to store the configuration blocks that it reads from the controllers. The first time a controller is seen, it will load the JSON calibration block as before, but it will now store it in that directory – removing a multiple second radio read process on every subsequent startup.

Persistent Room Configuration

To go along with that, I have an experimental rift-room-config branch that creates a rift-room-config.json file and stores the camera positions after the first startup. I haven’t pushed that to the rift-kalman-filter branch yet, because I’m a bit worried it’ll cause surprising problems for people. If the initial estimate of the headset pose is wrong, the code will back-project the wrong positions for the cameras, which will get written to the file and cause every subsequent run of OpenHMD to generate bad tracking until the file is removed. The goal is to have a loop that monitors whether the camera positions seem stable based on the tracking reports, and to use averaging and resetting to correct them if not – or at least to warn the user that they should re-run some (non-existent) setup utility.

Video Capture + Processing

The final big ticket item was a rewrite of how the USB video frame capture thread collects pixels and passes them to the analysis threads. This now does less work in the USB thread, so misses fewer frames, and also I made it so that every frame is now searched for LEDs and blob identities tracked with motion vectors, even when no further analysis will be done on that frame. That means that when we’re running late, it better preserves LED blob identities until the analysis threads can catch up – increasing the chances of having known LEDs to directly find device positions and avoid searching. This rewrite also opened up a path to easily support JPEG decode – which is needed to support Rift Sensors connected on USB 2.0 ports.

Session Simulator

I mentioned the recording simulator continues to progress. Since the tracking problems are now getting really tricky to figure out, this tool is becoming increasingly important. So far, I have code in OpenHMD to record all video and tracking data to a .mkv file. Then, there’s a simulator tool that loads those recordings. Currently it is capable of extracting the data back out of the recording, parsing the JSON and decoding the video, and presenting it to a partially implemented simulator that then runs the same blob analysis and tracking OpenHMD does. The end goal is a Godot based visualiser for this simulation, and to be able to step back and forth through time examining what happened at critical moments so I can improve the tracking for those situations.

To make recordings, there’s the rift-debug-gstreamer-record branch of OpenHMD. If you have GStreamer and the right plugins (gst-plugins-good) installed, and you set env vars like this, each run of OpenHMD will generate a recording in the target directory (make sure the target dir exists):

export OHMD_TRACE_DIR=/home/user/openhmd-traces/
export OHMD_FULL_RECORDING=1

Up Next

The next things that are calling to me are to improve the room configuration estimation and storage as mentioned above – to detect when the poses a camera is reporting don’t make sense because it’s been bumped or moved.

I’d also like to add back in tracking of the LEDS on the back of the headset headband, to support 360 tracking. I disabled those because they cause me trouble – the headband is adjustable relative to the headset, so the LEDs don’t appear where the 3D model says they should be and that causes jitter and pose mismatches. They need special handling.

One last thing I’m finding exciting is a new person taking an interest in Rift S and starting to look at inside-out tracking for that. That’s just happened in the last few days, so not much to report yet – but I’ll be happy to have someone looking at that while I’m still busy over here in CV1 land!

As always, if you have any questions, comments or testing feedback – hit me up at thaytan@noraisin.net or on @thaytan Twitter/IRC.

Thank you to the kind people signed up as Github Sponsors for this project!

by thaytan at October 25, 2021 04:44 PM

October 20, 2021

Bastien NoceraPSA: gnome-settings-daemon's MediaKeys API is going away

(Bastien Nocera)

 In 2007, Jan Arne Petersen added a D-Bus API to what was still pretty much an import into gnome-control-center of the "acme" utility I wrote to have all the keys on my iBook working.

It switched the code away from remapping keyboard keys to "XF86Audio*", to expecting players to contact the D-Bus daemon and ask to be forwarded key events.

 

Multimedia keys circa 2003

In 2013, we added support for controlling media players using MPRIS, as another interface. Fast-forward to 2021, and MPRIS support is ubiquitous, whether in free software, proprietary applications or even browsers. So we'll be parting with the "org.gnome.SettingsDaemon.MediaKeys" D-Bus API. If your application still wants to work with older versions of GNOME, it is recommended to at least quiet the MediaKeys API's unavailability.

 

Multimedia keys in 2021
 

TL;DR: Remove code that relies on gnome-settings-daemon's MediaKeys API, make sure to add MPRIS support to your app.

by Bastien Nocera (noreply@blogger.com) at October 20, 2021 01:12 PM

October 04, 2021

Stéphane CerveauGenerate a minimal GStreamer build, tailored to your needs

GStreamer is a powerful multimedia framework with over 30 libraries and more than 1600 elements in 230 plugins providing a wide variety of functionality. This makes it possible to build a huge variety of applications, however it also makes it tricky to ship in a constrained device. Luckily, most applications only use a subset of this functionality, and up until now there wasn’t an easy way to generate a build with just enough GStreamer for a specific application.

Thanks to a partnership with Huawei, you can now use gst-build to generate a minimal GStreamer build, tailored to a specific application, or set of applications. In this blog post, we’ll look at the major changes that have been introduced in GStreamer to make this possible, and provide a small example of what can be achieved with minimal, custom builds.

gst-build

gst-build is the build system that GStreamer developers use. In previous posts, we described how to get started on Linux or Windows and how to use it as a daily development tool.

Since GStreamer 1.18, it is possible to build all of GStreamer into a single shared library called gstreamer-full. This library can include not only GStreamer’s numerous libraries, but also all the plugins and other GStreamer dependencies such as a GLib. Applications can then either dynamically or statically link with gstreamer-full.

Creating the gstreamer-full combined library

By providing -Ddefault_library=static -Dintrospection=disabled to the Meson configure command line, it will generate a static build of all the GStreamer libraries which support the static scheme. This will also produce a shared library called gstreamer-full containing all of GStreamer. For now the GObject introspection needs to be disabled as the static build is not ready to support it (see gst-build-167).

Tailoring GStreamer

Generating a combined library doesn’t by itself reduce the total size. To achieve this goal, we need to select which libraries and plugins are included.

gst-build is a highly configurable build system that already provides options to select which plugins are built. But using the gstreamer-full mechanism, one can select exactly which libraries are included in the final gstreamer-full library by passing the -Dgst-full-libraries= argument to meson. The plugins are then automatically included according to the configuration and the dependencies available.

Lets have an example:

$ meson build-gst-full \
  --buildtype=release \
  --strip \
  --default-library=static \
  --wrap-mode=forcefallback \
  -Dauto_features=disabled \
  -Dgst-full-libraries=app,video,player \
  -Dbase=enabled \
  -Dgood=enabled \
  -Dbad=enabled \
  -Dgst-plugins-base:typefind=enabled \
  -Dgst-plugins-base:app=enabled \
  -Dgst-plugins-base:playback=enabled \
  -Dgst-plugins-base:volume=enabled \
  -Dgst-plugins-base:videoconvert=enabled \
  -Dgst-plugins-base:audioconvert=enabled \
  -Dgst-plugins-good:audioparsers=enabled \
  -Dgst-plugins-good:isomp4=enabled \
  -Dgst-plugins-good:deinterlace=enabled \
  -Dgst-plugins-good:audiofx=enabled \
  -Dgst-plugins-bad:videoparsers=enabled

In this example, we generate a gstreamer-full library using only the feature we explicitly specify. The first step to do that is to disable the automatic selection of features based on which dependencies are already installed (-Dauto_features=disabled). Then we explictly enable the features of each subpackage that we want (ie -Dgst-plugins-base:typefind=enabled -Dgst-plugins-base:app=enabled) and we use -Dgst-full-libraries=app,video,player to tell gst-build to bundle and expose only those specific libraries (app, video, player) in gstreamer-full. We also force meson to build as many dependencies itself by using the fallback with (--wrap-mode=forcefallback), this way, those dependencies will be included in the gstreamer-full library.

Tailoring it further

In our collaboration with Huawei, we decided to push further the idea of tailoring GStreamer to increase the granuality beyond the plugin to be able to select individual element or other features inside each plugin.

GStreamer plugins contain different kind of features. The most common type of plugin features are elements, but you can select other type of loadable features such as device provider, typefind and dynamic-type. For example, the ALSA plugin includes a device provider and various elements such as alsasrc and alsasink.

One key goal of the project was to be able to build only the features needed, reducing the binary file size. For example if the user selects only one element such as flacparse from the audioparsers plugin, the code used by the other parsers should not be included in the final binary.

Note that for now, this project has been focused on Linux platforms and has not been tested on other platforms such as Windows or macOS.

We first experimented with the linker option to garbage collect sections (--gc-sections). This option removes code sections which are not used by any part of the final program, except the public library methods. With the standard compiler options, there is normally a section per C source file, but using the -ffunction-sections and -fdata-sections compiler options, the compiler will generate one section per function and per data symbol.

But according to the documention, this feature must be used with care as it can bring an overhead of code if no sections can be garbage-collected and also because of incompatibility with debugger and slowness ( See -fdata-sections comment in gcc man page).

As GStreamer is a very widely used project, we decided to avoid this solution as it could possibly lead to inconsistent results.

Splitting the code inside GStreamer

Instead of creating new sections automatically, we decided to play with linker rules. The linker (ld) already only pulls in object files if they are called by other objects. So this has the effect of entirely omitting any code that isn’t used by the current program.

Up until now, every plugin had a function that would call the registration function for each feature present in the plugin. This function is called when the plugin is loaded. This plugin initialization function was the only one in each plugin with a predictable name. To be able to select plugins, we needed to expose a registration function for each feature. We were very careful to put these in the same file where the feature is implemented.

The plugin initialization function is now in its own file that can be ignored when linking features one by one. This work required modifying every single GStreamer plugin. A registration method has been declared through a set of macro for each of feature available in GStreamer official repositories.

To declare an element, you should use the macro GST_ELEMENT_REGISTER_DEFINE(element, "name", TANK, GST_TYPE). From any plugin you can register the element calling GST_ELEMENT_REGISTER(element, plugin).

The final part of this work was to create a “static” plugin in gstreamer-full which will contain all the features (element etc.) selected by the gst-build configure with -Dgst-full-elements=.

With these in place, all the features which are not selected don’t get included.

Compose your GStreamer feature(s) menu:

Five new options have been added to gst-build:

  • gst-full-plugins: Select the plugin you’d like to include. By default, this is all the plugins enabled by the build process. At least one must be passed or all will be included.
  • gst-full-elements: Select the element(s) using the plugin1:elt1,elt2;plugin2:elt1 format
  • gst-full-typefind-functions: Select the typefind(s) using the plugin1:tf1,tf2;plugin2:tf1 format
  • gst-full-device-providers: Select the decide-provider(s) using the plugin1:dp1,;dp2;plugin2:dp1 format
  • gst-full-dynamic-types: Select the dynamic-type(s) using the plugin1:dt1,;dt2;plugin2:dt1 format

You can find more information about the work achieved in the gst-build-199 merge request.

A light menu

Let’s start with a default build with all features and get some metrics to see the benefits. The gstreamer-full library will embed as much as possible its external dependencies such as glib. Plugin dependencies will be kept dynamic but as soon as we’ll select one or another plugin/feature, the dependency will be removed.

The build has been performed using an official GStreamer Fedora Docker image.

$ docker pull registry.freedesktop.org/gstreamer/gst-plugins-bad/amd64/fedora:2021-03-30.0-master
$ meson build-gst-full \
  -Ddefault_library=static \
  -Dintrospection=disabled \
  --buildtype=release \
  --strip \
  --wrap-mode=forcefallback

$ ninja -C build-gst-full

After a successful build, we can reconfigure the last gstreamer-full by providing new options to the meson comamnd line through --reconfigure. In this use-case, we’ll enable only 3 elements from the coreelements plugin in GStreamer.

$ meson build-gst-full --reconfigure -Dgst-full-plugins=coreelements '-Dgst-full-elements=coreelements:filesrc,fakesink,identity' '-Dgst-full-libraries=[]'
$ ninja -C build-gst-full

In this example we are first passing the plugin(s) you’d like to enable, coreelements and then passing the elements we’d like to include filesrc, etc. We remove all additional GST libraries except the core library.

lib (stripped) default tailored
ligstreamer-full.so 49208656 (49.2 M) 3250256 (3.2M)

This library can now be used through its pkg-config as gstreamer-full file to build a custom GST application.

Use of a linker script

As a final touch, we have also added an option to provide gst-build with a linker script to select exactly what gets included in the final gstreamer-full library. With this linker script you are now able to drop all the public code which is not used by your application and keep only the necessary code. See gst-build-195

The option is: gst-full-version-script=path_to_version_script

Wrapping up

Some interesting merge requests:

All this work is now available upstream (1.19.0) and should available in the next 1.20 release of GStreamer.

As usual, if you would like to learn more about meson, gst-build or any other parts of GStreamer, please contact us!

October 04, 2021 12:00 AM

October 01, 2021

Christian SchallerPipeWire and fixing the Linux Video Capture stack

(Christian Schaller)

Wim Taymans

Wim Taymans laying out the vision for the future of Linux multimedia


PipeWire has already made great strides forward in terms of improving the audio handling situation on Linux, but one of the original goals was to also bring along the video side of the house. In fact in the first few releases of Fedora Workstation where we shipped PipeWire we solely enabled it as a tool to handle screen sharing for Wayland and Flatpaks. So with PipeWire having stabilized a lot for audio now we feel the time has come to go back to the video side of PipeWire and work to improve the state-of-art for video capture handling under Linux. Wim Taymans did a presentation to our team inside Red Hat on the 30th of September talking about the current state of the world and where we need to go to move forward. I thought the information and ideas in his presentation deserved wider distribution so this blog post is building on that presentation to share it more widely and also hopefully rally the community to support us in this endeavour.

The current state of video capture, usually webcams, handling on Linux is basically the v4l2 kernel API. It has served us well for a lot of years, but we believe that just like you don’t write audio applications directly to the ALSA API anymore, you should neither write video applications directly to the v4l2 kernel API anymore. With PipeWire we can offer a lot more flexibility, security and power for video handling, just like it does for audio. The v4l2 API is an open/ioctl/mmap/read/write/close based API, meant for a single application to access at a time. There is a library called libv4l2, but nobody uses it because it causes more problems than it solves (no mmap, slow conversions, quirks). But there is no need to rely on the kernel API anymore as there are GStreamer and PipeWire plugins for v4l2 allowing you to access it using the GStreamer or PipeWire API instead. So our goal is not to replace v4l2, just as it is not our goal to replace ALSA, v4l2 and ALSA are still the kernel driver layer for video and audio.

It is also worth considering that new cameras are getting more and more complicated and thus configuring them are getting more complicated. Driving this change is a new set of cameras on the way often called MIPI cameras, as they adhere to the API standards set by the MiPI Alliance. Partly driven by this V4l2 is in active development with a Codec API addition, statefull/stateless, DMABUF, request API and also adding a Media Controller (MC) Graph with nodes, ports, links of processing blocks. This means that the threshold for an application developer to use these APIs directly is getting very high in addition to the aforementioned issues of single application access, the security issues of direct kernel access and so on.

libcamera logo


Libcamera is meant to be the userland library for v4l2.


Of course we are not the only ones seeing the growing complexity of cameras as a challenge for developers and thus libcamera has been developed to make interacting with these cameras easier. Libcamera provides unified API for setup and capture for cameras, it hides the complexity of modern camera devices, it is supported for ChromeOS, Android and Linux.
One way to describe libcamera is as the MESA of cameras. Libcamera provides hooks to run (out-of-process) vendor extensions like for image processing or enhancement. Using libcamera is considering pretty much a requirement for embedded systems these days, but also newer Intel chips will also have IPUs configurable with media controllers.

Libcamera is still under heavy development upstream and do not yet have a stable ABI, but they did add a .so version very recently which will make packaging in Fedora and elsewhere a lot simpler. In fact we have builds in Fedora ready now. Libcamera also ships with a set of GStreamer plugins which means you should be able to get for instance Cheese working through libcamera in theory (although as we will go into, we think this is the wrong approach).

Before I go further an important thing to be aware of here is that unlike on ALSA, where PipeWire can provide a virtual ALSA device to provide backwards compatibility with older applications using the ALSA API directly, there is no such option possible for v4l2. So application developers will need to port to something new here, be that libcamera or PipeWire. So what do we feel is the right way forward?

Ideal Linux Multimedia Stack

How we envision the Linux multimedia stack going forward


Above you see an illustration of what we believe should be how the stack looks going forward. If you made this drawing of what the current state is, then thanks to our backwards compatibility with ALSA, PulseAudio and Jack, all the applications would be pointing at PipeWire for their audio handling like they are in the illustration you see above, but all the video handling from most applications would be pointing directly at v4l2 in this diagram. At the same time we don’t want applications to port to libcamera either as it doesn’t offer a lot of the flexibility than using PipeWire will, but instead what we propose is that all applications target PipeWire in combination with the video camera portal API. Be aware that the video portal is not an alternative or a abstraction of the PipeWire API, it is just a way to set up the connection to PipeWire that has the added bonus of working if your application is shipping as a Flatpak or another type of desktop container. PipeWire would then be in charge of talking to libcamera or v42l for video, just like PipeWire is in charge of talking with ALSA on the audio side. Having PipeWire be the central hub means we get a lot of the same advantages for video that we get for audio. For instance as the application developer you interact with PipeWire regardless of if what you want is a screen capture, a camera feed or a video being played back. Multiple applications can share the same camera and at the same time there are security provided to avoid the camera being used without your knowledge to spy on you. And also we can have patchbay applications that supports video pipelines and not just audio, like Carla provides for Jack applications. To be clear this feature will not come for ‘free’ from Jack patchbays since Jack only does audio, but hopefully new PipeWire patchbays like Helvum can add video support.

So what about GStreamer you might ask. Well GStreamer is a great way to write multimedia applications and we strongly recommend it, but we do not recommend your GStreamer application using the v4l2 or libcamera plugins, instead we recommend that you use the PipeWire plugins, this is of course a little different from the audio side where PipeWire supports the PulseAudio and Jack APIs and thus you don’t need to port, but by targeting the PipeWire plugins in GStreamer your GStreamer application will get the full PipeWire featureset.

So what is our plan of action>
So we will start putting the pieces in place for this step by step in Fedora Workstation. We have already started on this by working on the libcamera support in PipeWire and packaging libcamera for Fedora. We will set it up so that PipeWire can have option to switch between v4l2 and libcamera, so that most users can keep using the v4l2 through PipeWire for the time being, while we work with upstream and the community to mature libcamera and its PipeWire backend. We will also enable device discoverer for PipeWire.

We are also working on maturing the GStreamer elements for PipeWire for the video capture usecase as we expect a lot of application developers will just be using GStreamer as opposed to targeting PipeWire directly. We will start with Cheese as our initial testbed for this work as it is a fairly simple application, using Cheese as a proof of concept to have it use PipeWire for camera access. We are still trying to decide if we will make Cheese speak directly with PipeWire, or have it talk to PipeWire through the pipewiresrc GStreamer plugin, but have their pro and cons in the context of testing and verifying this.

We will also start working with the Chromium and Firefox projects to have them use the Camera portal and PipeWire for camera support just like we did work with them through WebRTC for the screen sharing support using PipeWire.

There are a few major items we are still trying to decide upon in terms of the interaction between PipeWire and the Camera portal API. It would be tempting to see if we can hide the Camera portal API behind the PipeWire API, or failing that at least hide it for people using the GStreamer plugin. That way all applications get the portal support for free when porting to GStreamer instead of requiring using the Camera portal API as a second step. On the other side you need to set up the screen sharing portal yourself, so it would probably make things more consistent if we left it to application developers to do for camera access too.

What do we want from the community here?
First step is just help us with testing as we roll this out in Fedora Workstation and Cheese. While libcamera was written motivated by MIPI cameras, all webcams are meant to work through it, and thus all webcams are meant to work with PipeWire using the libcamera backend. At the moment that is not the case and thus community testing and feedback is critical for helping us and the libcamera community to mature libcamera. We hope that by allowing you to easily configure PipeWire to use the libcamera backend (and switch back after you are done testing) we can get a lot of you to test and let us what what cameras are not working well yet.

A little further down the road please start planning moving any application you maintain or contribute to away from v4l2 API and towards PipeWire. If your application is a GStreamer application the transition should be fairly simple going from the v4l2 plugins to the pipewire plugins, but beyond that you should familiarize yourself with the Camera portal API and the PipeWire API for accessing cameras.

For further news and information on PipeWire follow our @PipeWireP twitter account and for general news and information about what we are doing in Fedora Workstation make sure to follow me on twitter @cfkschaller.
PipeWire

by uraeus at October 01, 2021 05:38 PM

September 29, 2021

Thibault SaunierGStreamer: one repository to rule them all

(Thibault Saunier)

For the last years, the GStreamer community has been analysing and discussing the idea of merging all the modules into one single repository. Since all the official modules are released in sync and the code evolves simultaneously between those repositories, having the code split was a burden and several core GStreamer developers believed that it was worth making the effort to consolidate them into a single repository. As announced a while back this is now effective and this post is about explaining the technical choices and implications of that change.

You can also check out our Monorepo FAQ for a list of questions and answers.

Technicall details of the unification

Since we moved to meson as a build system a few years ago we implemented gst-build which leverages the meson subproject feature to build all GStreamer modules as one single project. This greatly enhanced the development experience of the GStreamer framework but we considered that we could improve it even more by having all GStreamer code in a single repository that looks the same as gst-build.

This is what the new unified git repository looks like, gst-build in the main gstreamer repository, except that all the code from the GStreamer modules located in the subprojects/ directory are checked in.

This new setup now lives in the main default branch of the gstreamer repository, the master branches for all the other modules repositories are now retired and frozen, no new merge request or code change will be accepted there.

This is only the first step and we will consider reorganizing the repository in the future, but the goal is to minimize disruptions.

The technical process for merging the repositories looks like:

foreach GSTREAMER_MODULE
    git remote add GSTREAMER_MODULE.name GSTREAMER_MODULE.url
    git fetch GSTREAMER_MODULE.name
    git merge GSTREAMER_MODULE.name/master
    git mv list_all_files_from_merged_gstreamer_module() GSTREAMER_MODULE.shortname
    git commit -m "Moved all files from " + GSTREAMER_MODULE.name
endforeach

This allows us to keep the exact same history (and checksum of each commit) for all the old gstreamer modules in the new repository which guarantees that the code is still exactly the same as before.

Releases with the new setup

In the same spirit of avoiding disruption, releases will look exactly the same as before. In the new unique gstreamer repository we still have meson subprojects for each GStreamer modules and they will have their own release tarballs. In practice, this means that not much (nothing?) should change for distribution packagers and consumers of GStreamer tarballs.

What should I do with my pending MRs in old modules repositories?

Since we can not create new merge requests in your name on gitlab, we wrote a move_mrs_to_monorepo script that you can run yourself. The script is located in the gstreamer repository and you can start moving all your pending MRs by simply calling it (scripts/move_mrs_to_monorepo.py and follow the instructions).


You can also check out our Monorepo FAQ for a list of questions and answers.

Thanks to everyone in the community for providing us with all the feedback and thanks to Xavier Claessens for co-leading the effort.

We are still working on ensuring the transition as smoothly as possible and if you have any question don’t hesitate to come talk to us in #gstreamer on the oftc IRC network.

Happy GStreamer hacking!

by thiblahute at September 29, 2021 09:34 PM

September 24, 2021

Christian SchallerFedora Workstation: Our Vision for Linux Desktop

(Christian Schaller)

Fedora Workstation
So I have spoken about what is our vision for Fedora Workstation quite a few times before, but I feel it is often useful to get back to it as we progress with our overall effort.So if you read some of my blog posts about Fedora Workstation over the last 5 years, be aware that there is probably little new in here for you. If you haven’t read them however this is hopefully a useful primer on what we are trying to achieve with Fedora Workstation.

The first few years after we launched Fedora Workstation in 2014 we focused on lot on establishing a good culture around what we where doing with Fedora, making sure that it was a good day to day desktop driver for people, and not just a great place to develop the operating system itself. I think it was Fedora Project Lead Matthew Miller who phrased it very well when he said that we want to be Leading Edge, not Bleeding Edge. We also took a good look at the operating system from an overall stance and tried to map out where Linux tended to fall short as a desktop operating system and also tried to ask ourselves what our core audience would and should be. We refocused our efforts on being a great Operating System for all kinds of developers, but I think it is fair to say that we decided that was to narrow a wording as our efforts are truly to reach makers of all kinds like graphics artists and musicians, in addition to coders. So I thought I go through our key pillar efforts and talk about where they are at and where they are going.

Flatpak

Flatpak logo
One of the first things we concluded was that our story for people who wanted to deploy applications to our platform was really bad. The main challenge was that the platform was moving very fast and it was a big overhead for application developers to keep on top of the changes. In addition to that, since the Linux desktop is so fragmented, the application developers would have to deal with the fact that there was 20 different variants of this platform, all moving at a different pace. The way Linux applications was packaged, with each dependency being packaged independently of the application created pains on both sides, for the application developer it means the world kept moving underneath them with limited control and for the distributions it meant packaging pains as different applications who all depended on the same library might work or fail with different versions of a given library. So we concluded we needed a system which allowed us to decouple of application from the host OS to let application developers update their platform at a pace of their own choosing and at the same time unify the platform in the sense that the application should be able to run without problems on the latest Fedora releases, the latest RHEL releases or the latest versions of any other distribution out there. As we looked at it we realized there was some security downsides compared to the existing model, since the Os vendor would not be in charge of keeping all libraries up to date and secure, so sandboxing the applications ended up a critical requirement. At the time Alexander Larsson was working on bringing Docker to RHEL and Fedora so we tasked him with designing the new application model. The initial idea was to see if we could adjust Docker containers to the desktop usecase, but Docker containers as it stood at that time were very unsuited for the purpose of hosting desktop applications and our experience working with the docker upstream at the time was that they where not very welcoming to our contributions. So in light of how major the changes we would need to implement and the unlikelyhood of getting them accepted upstream, Alex started on what would become Flatpak. Another major technology that was coincidentally being developed at the same time was OSTree by Colin Walters. To this day I think the best description of OSTree is that it functions as a git for binaries, meaning it allows you a simple way to maintain and update your binary applications with minimally sized updates. It also provides some disk deduplication which we felt was important due to the duplication of libraries and so on that containers bring with them. Finally another major design decision Alex did was that the runtime/baseimage should be hosted outside the container, so make possible to update the runtime independently of the application with relevant security updates etc.

Today there is a thriving community around Flatpaks, with the center of activity being flathub, the Flatpak application repository. In Fedora Workstation 35 you should start seeing Flatpak from Flathub being offered as long as you have 3rd party repositories enabled. Also underway is Owen Taylor leading our efforts of integrating Flatpak building into the internal tools we use at Red Hat for putting RHEL together, with the goal of switching over to Flatpaks as our primary application delivery method for desktop applications in RHEL and to help us bridge the Fedora and RHEL application ecosystem.

You can follow the latest news from Flatpak through the official Flatpak twitter account.

Silverblue

So another major issue we decided needing improvements was that of OS upgrades (as opposed to application updates). The model pursued by Linux distros since their inception is one of shipping their OS as a large collection of independently packaged libraries. This setup is inherently fragile and requires a lot of quality engineering and testing to avoid problems, but even then sometimes things sometimes fail, especially in a fast moving OS like Fedora. A lot of configuration changes and updates has traditionally been done through scripts and similar, making rollback to an older version in cases where there is a problem also very challenging. Adventurous developers could also have done changes to their own copy of the OS that would break the upgrade later on. So thanks to all the great efforts to test and verify upgrades they usually go well for most users, but we wanted something even more sturdy. So the idea came up to move to a image based OS model, similar to what people had gotten used to on their phones. And OSTree once again became the technology we choose to do this, especially considering it was being used in Red Hat first foray into image based operating systems for servers (the server effort later got rolled into CoreOS as part of Red Hat acquiring CoreOS). The idea is that you ship the core operating system as a singular image and then to upgrade you just replace that image with a new image, and thus the risks of problems are greatly reduced. On top of that each of those images can be tested and verified as a whole by your QE and test teams. Of course we realized that a subset of people would still want to be able to tweak their OS, but once again OSTree came to our rescue as it allows developers to layer further RPMS on top of the OS image, including replacing current system libraries with for instance newer ones. The great thing about OSTree layering is that once you are done testing/using the layers RPMS you can with a very simple command just drop them again and go back to the upstream image. So combined with applications being shipped as Flatpaks this would create an OS that is a lot more sturdy, secure and simple to update and with a lot lower chance of an OS update breaking any of your applications. On top of that OSTree allows us to do easy OS rollbacks, so if the latest update somehow don’t work for you can you quickly rollback while waiting for the issue you are having to be fixed upstream. And hence Fedora Silverblue was born as the vehicle for us to develop and evolve an image based desktop operating system.

You can follow our efforts around Silverblue through the offical Silverblue twitter account.

Toolbx

Toolbox with RHEL

Toolbox pet container with RHEL UBI


So Flatpak helped us address a lot of the the gaps for making a better desktop OS on the application side and Silverblue was the vehicle for our vision on the OS side, but we realized that we also needed some way for all kinds of developers to be able to easily take advantage of the great resource that is the Fedora RPM package universe and the wider tools universe out there. We needed something that provided people with a great terminal experience. We had already been working on various smaller improvements to the terminal for a while, but we realized we needed something a lot more substantial. Accessing an immutable OS like Silverblue through a terminal window tends to be quite limiting. So that it is usually not want you want to do and also you don’t want to rely on the OSTree layering for running all your development tools and so on as that is going to be potentially painful when you upgrade your OS.
Luckily the container revolution happening in the Linux world pointed us to the solution here too, as while containers were rolled out the concept of ‘pet containers’ were also born. The idea of a pet container is that unlike general containers (sometimes refer to as cattle containers) pet container are containers that you care about on an individual level, like your personal development environment. In fact pet containers even improves on how we used to do things as they allow you to very easily maintain different environments for different projects. So for instance if you have two projects, hosted in two separate pet containers, where the two project depends on two different versions of python, then containers make that simple as it ensures that there is no risk of one of your projects ‘contaminating’ the others with its dependencies, yet at the same time allow you to grab RPMS or other kind of packages from upstream resources and install them in your container. In fact while inside your pet container the world feels a lot like it always has when on the linux command line. Thanks to the great effort of Dan Walsh and his team we had a growing number of easy to use container tools available to us, like podman. Podman is developed with the primary usecase being for running and deploying your containers at scale, managed by OpenShift and Kubernetes. But it also gave us the foundation we needed for Debarshi Ray to kicked of the Toolbx project to ensure that we had an easy to use tool for creating and managing pet containers. As a bonus Toolbx allows us to achieve another important goal, to allow Fedora Workstation users to develop applications against RHEL in a simple and straightforward manner, because Toolbx allows you to create RHEL containers just as easy as it allows you to create Fedora containers.

You can follow our efforts around Toolbox on the official Toolbox twitter account

Wayland

Ok, so between Flatpak, Silverblue and Toolbox we have the vision clear for how to create a robust OS, with a great story for application developers to maintain and deliver applications for it, to Toolbox providing a great developer story on top of this OS. But we also looked at the technical state of the Linux desktop and realized that there where some serious deficits we needed to address. One of the first one we saw was the state of graphics where X.org had served us well for many decades, but its age was showing and adding new features as they came in was becoming more and more painful. Kristian Høgsberg had started work on an alternative to X while still at Red Hat called Wayland, an effort he and a team of engineers where pushing forward at Intel. There was a general agreement in the wider community that Wayland was the way forward, but apart from Intel there was little serious development effort being put into moving it forward. On top of that, Canonical at the time had decided to go off on their own and develop their own alternative architecture in competition with X.org and Wayland. So as we were seeing a lot of things happening in the graphics space horizon, like HiDPI, and also we where getting requests to come up with a way to make Linux desktops more secure, we decided to team up with Intel and get Wayland into a truly usable state on the desktop. So we put many of our top developers, like Olivier Fourdan, Adam Jackson and Jonas Ådahl, on working on maturing Wayland as quickly as possible.
As things would have it we also ended up getting a lot of collaboration and development help coming in from the embedded sector, where companies such as Collabora was helping to deploy systems with Wayland onto various kinds of embedded devices and contributing fixes and improvements back up to Wayland (and Weston). To be honest I have to admit we did not fully appreciate what a herculean task it would end up being getting Wayland production ready for the desktop and it took us quite a few Fedora releases before we decided it was ready to go. As you might imagine dealing with 30 years of technical debt is no easy thing to pay down and while we kept moving forward at a steady pace there always seemed to be a new batch of issues to be resolved, but we managed to do so, not just by maturing Wayland, but also by porting major applications such as Martin Stransky porting Firefox, and Caolan McNamara porting LibreOffice over to Wayland. At the end of the day I think what saw us through to success was the incredible collaboration happening upstream between a large host of individual contributors, companies and having the support of the X.org community. And even when we had the whole thing put together there where still practical issues to overcome, like how we had to keep defaulting to X.org in Fedora when people installed the binary NVidia driver because that driver did not work with XWayland, the X backwards compatibility layer in Wayland. Luckily that is now in the process of becoming a thing of the past with the latest NVidia driver updates support XWayland and us working closely with NVidia to ensure driver and windowing stack works well.

PipeWire

Pipewire in action

Example of PipeWire running


So now we had a clear vision for the OS and a much improved and much more secure graphics stack in the form of Wayland, but we realized that all the new security features brought in by Flatpak and Wayland also made certain things like desktop capturing/remoting and web camera access a lot harder. Security is great and critical, but just like the old joke about the most secure computer being the one that is turned off, we realized that we needed to make sure these things kept working, but in a secure and better manner. Thankfully we have GStreamer co-creator Wim Taymans on the team and he thought he could come up with a pulseaudio equivalent for video that would allow us to offer screen capture and webcam access in a convenient and secure manner.
As Wim where prototyping what we called PulseVideo at the time we also started discussing the state of audio on Linux. Wim had contributed to PulseAudio to add a security layer to it, to make for instance it harder for a rogue application to eavesdrop on you using your microphone, but since it was not part of the original design it wasn’t a great solution. At the same time we talked about how our vision for Fedora Workstation was to make it the natural home for all kind of makers, which included musicians, but how the separateness of the pro-audio community getting in the way of that, especially due to the uneasy co-existence of PulseAudio on the consumer side and Jack for the pro-audio side. As part of his development effort Wim came to the conclusion that he code make the core logic of his new project so fast and versatile that it should be able to deal with the low latency requirements of the pro-audio community and also serve its purpose well on the consumer audio and video side. Having audio and video in one shared system would also be an improvement for us in terms of dealing with combined audio and video sources as guaranteeing audio video sync for instance had often been a challenge in the past. So Wims effort evolved into what we today call PipeWire and which I am going to be brave enough to say has been one of the most successful launches of a major new linux system component we ever done. Replacing two old sound servers while at the same time adding video support is no small feat, but Wim is working very hard on fixing bugs as quickly as they come in and ensure users have a great experience with PipeWire. And at the same time we are very happy that PipeWire now provides us with the ability of offering musicians and sound engineers a new home in Fedora Workstation.

You can follow our efforts on PipeWire on the PipeWire twitter account.

Hardware support and firmware

In parallel with everything mentioned above we where looking at the hardware landscape surrounding desktop linux. One of the first things we realized was horribly broken was firmware support under Linux. More and more of the hardware smarts was being found in the firmware, yet the firmware access under Linux and the firmware update story was basically non-existent. As we where discussing this problem internally, Peter Jones who is our representative on UEFI standards committee, pointed out that we probably where better poised to actually do something about this problem than ever, since UEFI was causing the firmware update process on most laptops and workstations to become standardized. So we teamed Peter up with Richard Hughes and out of that collaboration fwupd and LVFS was born. And in the years since we launched that we gone from having next to no firmware available on Linux (and the little we had only available through painful processes like burning bootable CDs etc.) to now having a lot of hardware getting firmware update support and more getting added almost on a weekly basis.
For the latest and greatest news around LVFS the best source of information is Richard Hughes twitter account.

In parallel to this Adam Jackson worked on glvnd, which provided us with a way to have multiple OpenGL implementations on the same system. For those who has been using Linux for a while I am sure you remembers the pain of the NVidia driver and Mesa fighting over who provided OpenGL on your system as it was all tied to a specific .so name. There was a lot of hacks being used out there to deal with that situation, of varying degree of fragility, but with the advent of glvnd nobody has to care about that problem anymore.

We also decided that we needed to have a part of the team dedicated to looking at what was happening in the market and work on covering important gaps. And with gaps I mean fixing the things that keeps the hardware vendors from being able to properly support Linux, not writing drivers for them. Instead we have been working closely with Dell and Lenovo to ensure that their suppliers provide drivers for their hardware and when needed we work to provide a framework for them to plug their hardware into. This has lead to a series of small, but important improvements, like getting the fingerprint reader stack on Linux to a state where hardware vendors can actually support it, bringing Thunderbolt support to Linux through Bolt, support for high definition and gaming mice through the libratbag project, support in the Linux kernel for the new laptop privacy screen feature, improved power management support through the power profiles daemon and now recently hiring a dedicated engineer to get HDR support fully in place in Linux.

Summary

So to summarize. We are of course not over the finish line with our vision yet. Silverblue is a fantastic project, but we are not yet ready to declare it the official version of Fedora Workstation, mostly because we want to give the community more time to embrace the Flatpak application model and for developers to embrace the pet container model. Especially applications like IDEs that cross the boundary between being in their own Flatpak sandbox while also interacting with things in your pet container and calling out to system tools like gdb need more work, but Christian Hergert has already done great work solving the problem in GNOME Builder while Owen Taylor has put together support for using Visual Studio Code with pet containers. So hopefully the wider universe of IDEs will follow suit, in the meantime one would need to call them from the command line from inside the pet container.

The good thing here is that Flatpaks and Toolbox also works great on traditional Fedora Workstation, you can get the full benefit of both technologies even on a traditional distribution, so we can allow for a soft and easy transition.

So for anyone who made it this far, appoligies for this become a little novel, that was not my intention when I started writing it :)

Feel free to follow my personal twitter account for more general news and updates on what we are doing around Fedora Workstation.
Christian F.K. Schaller photo

by uraeus at September 24, 2021 05:40 PM

September 23, 2021

GStreamerGStreamer 1.19.2 unstable development release and git monorepo transition

(GStreamer)

GStreamer 1.19.2 unstable development release

The GStreamer team is pleased to announce the second development release in the unstable 1.19 release series.

The unstable 1.19 release series adds new features on top of the current stable 1.18 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

The unstable 1.19 release series is for testing and development purposes in the lead-up to the stable 1.20 series which is scheduled for release in a few weeks time. Any newly-added API can still change until that point, although it is rare for that to happen.

Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

This development release is primarily for distributors and early adaptors and anyone who still needs to update their build/packaging setup for Meson.

Packagers: please note that plugins may have moved between modules, so please take extra care and make sure inter-module version dependencies are such that users can only upgrade all modules in one go, instead of seeing a mix of 1.19 and 1.18 on their system. Also, gst-plugins-good 1.19.2 depends on 1.19.2 gst-plugins-base (even if the meson dependency check claims otherwise).

Binaries for Android, iOS, Mac OS X and Windows will also be available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please let us know of any issues you run into by filing an issue in Gitlab.

Git monorepo transition

Following the 1.19.2 release all git master branches in git are now frozen and will stay frozen. as we embark on some infrastructure plumbing that will see many of our git modules merged into a mono repository, which should massively streamline and improve our developer experience.

Expect a few days of disruptions, but we'll do our best to get the show back on the road as quickly as possible!

We'll let you know when it's all done and will post more details and migration instructions once everything's in place.

When we emerge on the other side there will be a shiny new main branch in the GStreamer repository and development will continue there.

Nothing will happen to the existing stable branches and repositories, and the existing master branches will also stay in place for the time being, but will no longer be updated.

There will still be separate release tarballs for the various modules.

More details soon, stay tuned.

September 23, 2021 12:00 PM

September 16, 2021

Christian SchallerCool happenings in Fedora Workstation land

(Christian Schaller)

Been some time since my last update, so I felt it was time to flex my blog writing muscles again and provide some updates of some of the things we are working on in Fedora in preparation for Fedora Workstation 35. This is not meant to be a comprehensive whats new article about Fedora Workstation 35, more of a listing of some of the things we are doing as part of the Red Hat desktop team.

NVidia support for Wayland
One thing we spent a lot of effort on for a long time now is getting full support for the NVidia binary driver under Wayland. It has been a recurring topic in our bi-weekly calls with the NVidia engineering team ever since we started looking at moving to Wayland. There has been basic binary driver support for some time, meaning you could run a native Wayland session on top of the binary driver, but the critical missing piece was that you could not get support for accelerated graphics when running applications through XWayland, our X.org compatibility layer. Which basically meant that any application requiring 3D support and which wasn’t a native Wayland application yet wouldn’t work. So over the last Months we been having a great collaboration with NVidia around closing this gap, with them working closely with us in fixing issues in their driver while we have been fixing bugs and missing pieces in the rest of the stack. We been reporting and discussing issues back and forth allowing us a very quickly turnaround on issues as we find them which of course all resulted in the NVidia 470.42.01 driver with XWayland support. I am sure we will find new corner cases that needs to be resolved in the coming Months, but I am equally sure we will be able to quickly resolve them due to the close collaboration we have now established with NVidia. And I know some people will wonder why we spent so much time working with NVidia around their binary driver, but the reality is that NVidia is the market leader, especially in the professional Linux workstation space, and there are lot of people who either would end up not using Linux or using Linux with X without it, including a lot of Red Hat customers and Fedora users. And that is what I and my team are here for at the end of the day, to make sure Red Hat customers are able to get their job done using their Linux systems.

Lightweight kiosk mode
One of the wonderful things about open source is the constant flow of code and innovation between all the different parts of the ecosystem. For instance one thing we on the RHEL side have often been asked about over the last few years is a lightweight and simple to use solution for people wanting to run single application setups, like information boards, ATM machines, cash registers, information kiosks and so on. For many use cases people felt that running a full GNOME 3 desktop underneath their application was either to resource hungry and or created a risk that people accidentally end up in the desktop session. At the same time from our viewpoint as a development team we didn’t want a completely separate stack for this use case as that would just increase our maintenance burden as we would end up having to do a lot of things twice. So to solve this problem Ray Strode spent some time writing what we call GNOME Kiosk mode which makes setting up a simple session running single application easy and without running things like the GNOME shell, tracker, evolution etc. This gives you a window manager with full support for the latest technologies such as compositing, libinput and Wayland, but coming in at about 18MB, which is about 71MB less than a minimal GNOME 3 desktop session. You can read more about the new Kiosk mode and how to use it in this great blog post from our savvy Edge Computing Product Manager Ben Breard. The kiosk mode session described in Ben’s article about RHEL will be available with Fedora Workstation 35.

high-definition mouse wheel support
A major part of what we do is making sure that Red Hat Enterprise Linux customers and Fedora users get hardware support on par with what you find on other operating systems. We try our best to work with our hardware partners, like Lenovo, to ensure that such hardware support comes day and date with when those features are enabled on other systems, but some things ends up taking longer time for various reasons. Support for high-definition mouse wheels was one of those. Peter Hutterer, our resident input expert, put together a great blog post explaining the history and status of high-definition mouse wheel support. As Peter points out in his blog post the feature is not yet fully supported under Wayland, but we hope to close that gap in time for Fedora Workstation 35.

Mouse with hires mouse

Mouse with HiRes scroll wheel

PipeWire
I feel I can’t do one of these posts without talking about latest developments in PipeWire, our unified audio and video server. Wim Taymans keeps working with rapidly growing PipeWire community to fix issues as they are reported and add new features to PipeWire. Most recently Wims focus has been on implementing support for S/PDIF passthrough support over both S/PDIF and HDMI connections. This will allow us to send undecoded data over such connections which is critical for working well with surround sound systems and soundbars. Also the PipeWire community has been working hard on further improving the Bluetooth support with bluetooth battery status support for head-set profile and using Apple extensions. aptX-LL and FastStream codec support was also added. And of course a huge amount of bug fixes, it turns out that when you replace two different sound servers that has been around for close to two decades there are a lot of corner cases to cover :). Make sure to check out two latest release notes for 0.3.35 and for 0.3.36 for details.

Screenshot of Easyeffects

EasyEffects is a great example of a cool new application built with PipeWire

Privacy screen
Another feature that we have been working on as a result of our Lenovo partnership is Privacy screen support. For those not familiar with this technology it is basically to allow you to reduce the readability of your screen when viewed from the side, so that if you are using your laptop at a coffee shop for instance then a person sitting close by will have a lot harder time trying to read what is on your screen. Hans de Goede has been shepherding the kernel side of this forward working with Marco Trevisan from Canonical on the userspace part of it (which also makes this a nice example of cross-company collaboration), allowing you to turn this feature on or off. This feature though is not likely to fully land in time for Fedora Workstation 35 so we are looking at if we will bring this in as an update to Fedora Workstation 35 or if it will be a Fedora Workstation 36 feature.

Penny

zink inside

Zink inside the penny


As most of you know the future of 3D graphics on Linux is the Vulkan API from the Khronos Group. This doesn’t mean that OpenGL is going away anytime soon though, as there is a large host of applications out there using this API and for certain types of 3D graphics development developers might still choose to use OpenGL over Vulkan. Of course for us that creates a little bit of a challenge because maintaining two 3D graphics interfaces is a lot of work, even with the great help and contributions from the hardware makers themselves. So we been eyeing the Zink project for a while, which aims at re-implementing OpenGL on top of Vulkan, as a potential candidate for solving our long term needs to support the OpenGL API, but without drowning us in work while doing so. The big advantage to Zink is that it allows us to support one shared OpenGL implementation across all hardware and then focus our HW support efforts on the Vulkan drivers. As part of this effort Adam Jackson has been working on a project called Penny.

Zink implements OpenGL in terms of Vulkan, as far as the drawing itself is concerned, but presenting that drawing to the rest of the system is currently system-specific (GLX). For hardware that already has a Mesa driver, we use GBM. On NVIDIA’s Vulkan (and probably any other binary stacks on Linux, and probably also like WSL or macOS + MoltenVK) we download the image from the GPU back to the CPU and then use the same software upload/display path as llvmpipe, which as you can imagine is Not Fast.

Penny aims to extend Zink by replacing both of those paths, and instead using the various Vulkan WSI extensions to manage presentation. Even for the GBM case this should enable higher performance since zink will have more information about the rendering pipeline (multisampling in particular is poorly handled atm). Future window system integration work can focus on Vulkan, with EGL and GLX getting features “for free” once they’re enabled in Vulkan.

3rd party software cleanup
Over time we have been working on adding more and more 3rd party software for easy consumption in Fedora Workstation. The problem we discovered though was that due to this being done over time, with changing requirements and expectations, the functionality was not behaving in a very intuitive way and there was also new questions that needed to be answered. So Allan Day and Owen Taylor spent some time this cycle to review all the bits and pieces of this functionality and worked to clean it up. So the goal is that when you enable third-party repositories in Fedora Workstation 35 it behaves in a much more predictable and understandable way and also includes a lot of applications from Flathub. Yes, that is correct you should be able to install a lot of applications from Flathub in Fedora Workstation 35 without having to first visit the Flathub website to enable it, instead they will show up once you turned the knob for general 3rd party application support.

Power profiles
Another item we spent quite a bit of time for Fedora Workstation 35 is making sure we integrate the Power Profiles work that Bastien Nocera has been working on as part of our collaboration with Lenovo. Power Profiles is basically a feature that allows your system to behave in a smarter way when it comes to power consumption and thus prolongs your battery life. So for instance when we notice you are getting low on battery we can offer you to go into a strong power saving mode to prolong how long you can use the system until you can recharge. More in-depth explanation of Power profiles in the official README.

Wayland
I usually also have ended up talking about Wayland in my posts, but I expect to be doing less going forward as we have now covered all the major gaps we saw between Wayland and X.org. Jonas Ådahl got the headless support merged which was one of our big missing pieces and as mentioned above Olivier Fourdan and Jonas and others worked with NVidia on getting the binary driver with XWayland support working with GNOME Shell. Of course this being software we are never truly done, there will of course be new issues discovered, random bugs that needs to be fixed, and of course also new features that needs to be implemented. We already have our next big team focus in place, HDR support, which will need work from the graphics drivers, up through Mesa, into the window manager and the GUI toolkits and in the applications themselves. We been investigating and trying out some things for a while already, but we are now ready to make this a main focus for the team. In fact we will soon be posting a new job listing for a fulltime engineer to work on HDR vertically through the stack so keep an eye out for that if you are interested in working on this. The job will be open to candidates who which to work remotely, so as long as Red Hat has a business presence in the country you live we should be able to offer you the job if you are the right candidate for us. Update:Job listing is now online for our HDR engineer.

BTW, if you want to see future updates and keep on top of other happenings from Fedora and Red Hat in the desktop space, make sure to follow me on twitter.

by uraeus at September 16, 2021 12:58 PM

September 08, 2021

GStreamerGStreamer 1.18.5 stable bug fix release

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.18 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and important security fixes, and it should be safe to update from 1.18.x.

Highlighted bugfixes:

  • basesink: fix reverse frame stepping
  • downloadbuffer/sparsefile: several fixes for win32
  • systemclock: Update monotonic reference time when re-scheduling, fixes high CPU usage with gnome-music when pausing playback
  • audioaggregator: fix glitches when resyncing on discont
  • compositor: Fix NV12 blend operation
  • rtspconnection: Add IPv6 support for tunneled mode
  • avidemux: fix playback of some H.264-in-AVI streams
  • jpegdec: Fix crash when interlaced field height is not DCT block size aligned
  • qmlglsink: Keep old buffers around a bit longer if they were bound by QML
  • qml: qtitem: don't potentially leak a large number of buffers
  • rtpjpegpay: fix image corruption when compiled with MSVC on Windows
  • rtspsrc: seeking improvements
  • rtpjitterbuffer: Avoid generation of invalid timestamps
  • rtspsrc: Fix behaviour of select-streams, new-manager, request-rtcp-key and before-send signals with GLib >= 2.62
  • multiudpsink: Fix broken SO_SNDBUF get/set on Windows
  • openh264enc: fix broken sps/pps header generation and some minor leaks
  • mpeg2enc: fix interlace-mode detection and unbound memory usage if encoder can't keep up
  • mfvideosrc: Fix for negative MF stride and for negotiation when interlace-mode is specified
  • tsdemux: fix seek-with-stop regression and decoding errors after seeking with dvdlpcmdec
  • rtsp-server: seek handling improvements
  • gst-libav: fix build (and other issues) with ffmpeg 4.4
  • cerbero: spandsp: Fix build error with Visual Studio 2019
  • win32 packages: Fix hang in GLib when `G_SLICE` environment variable is set
  • various stability, performance and reliability improvements
  • memory leak fixes
  • build fixes

See the GStreamer 1.18.5 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Download tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-devtools, gstreamer-vaapi, gstreamer-sharp, gst-omx, or gstreamer-docs.

September 08, 2021 11:30 PM

August 12, 2021

Jan SchmidtOpenHMD update

(Jan Schmidt)

A while ago, I wrote a post about how to build and test my Oculus CV1 tracking code in SteamVR using the SteamVR-OpenHMD driver. I have updated those instructions and moved them to https://noraisin.net/diary/?page_id=1048 – so use those if you’d like to try things out.

The pandemic continues to sap my time for OpenHMD improvements. Since my last post, I have been working on various refinements. The biggest visible improvements are:

  • Adding velocity and acceleration API to OpenHMD.
  • Rewriting the pose transformation code that maps from the IMU-centric tracking space to the device pose needed by SteamVR / apps.

Adding velocity and acceleration reporting is needed in VR apps that support throwing things. It means that throwing objects and using gravity-grab to fetch objects works in Half-Life: Alyx, making it playable now.

The rewrite to the pose transformation code fixed problems where the rotation of controller models in VR didn’t match the rotation applied in the real world. Controllers would appear attached to the wrong part of the hand, and rotate around the wrong axis. Movements feel more natural now.

Ongoing work – record and replay

My focus going forward is on fixing glitches that are caused by tracking losses or outliers. Those problems happen when the computer vision code either fails to match what the cameras see to the device LED models, or when it matches incorrectly.

Tracking failure leads to the headset view or controllers ‘flying away’ suddenly. Incorrect matching leads to controllers jumping and jittering to the wrong pose, or swapping hands. Either condition is very annoying.

Unfortunately, as the tracking has improved the remaining problems get harder to understand and there is less low-hanging fruit for improvement. Further, when the computer vision runs at 52Hz, it’s impossible to diagnose the reasons for a glitch in real time.

I’ve built a branch of OpenHMD that uses GStreamer to record the CV1 camera video, plus IMU and tracking logs into a video file.

To go with those recordings, I’ve been working on a replay and simulation tool, that uses the Godot game engine to visualise the tracking session. The goal is to show, frame-by-frame, where OpenHMD thought the cameras, headset and controllers were at each point in the session, and to be able to step back and forth through the recording.

Right now, I’m working on the simulation portion of the replay, that will use the tracking logs to recreate all the poses.

by thaytan at August 12, 2021 05:30 PM

August 05, 2021

Bastien Nocerapower-profiles-daemon: Follow-up

(Bastien Nocera)

Just about a year after the original announcement, I think it's time to see the progress on power-profiles-daemon.

Note that I would still recommend you read the up-to-date project README if you have questions about why this project was necessary, and why a new project was started rather than building on an existing one.

 The project was born out of the need to make a firmware feature available to end-users for a number of lines of Lenovo laptops for them to be fully usable on Fedora. For that, I worked with Mark Pearson from Lenovo, who wrote the initial kernel support for the feature and served as our link to the Lenovo firmware team, and Hans de Goede, who worked on making the kernel interfaces more generic.

More generic, but in a good way

 With the initial kernel support written for (select) Lenovo laptops, Hans implemented a more generic interface called platform_profile. This interface is now the one that power-profiles-daemon will integrate with, and means that it also supports a number of Microsoft Surface, HP, Lenovo's own Ideapad laptops, and maybe Razer laptops soon.

 The next item to make more generic is Lenovo's "lap detection" which still relies on a custom driver interface. This should be soon transformed into a generic proximity sensor, which will mean I get to work some more on iio-sensor-proxy.

Working those interactions

 power-profiles-dameon landed in a number of distributions, sometimes enabled by default, sometimes not enabled by default (sigh, the less said about that the better), which fortunately meant that we had some early feedback available.

 The goal was always to have the user in control, but we still needed to think carefully about how the UI would look and how users would interact with it when a profile was temporarily unavailable, or the system started a "power saver" mode because battery was running out.

 The latter is something that David Redondo's work on the "HoldProfile" API made possible. Software can programmatically switch to the power-saver or performance profile for the duration of a command. This is useful to switch to the Performance profile when running a compilation (eg. powerprofilesctl jhbuild --no-interact build gnome-shell), or for gnome-settings-daemon to set the power-saver profile when low on battery.

 The aforementioned David Redondo and Kai Uwe Broulik also worked on the KDE interface to power-profiles-daemon, as Florian Müllner implemented the gnome-shell equivalent.

Promised by me, delivered by somebody else :)

 I took this opportunity to update the Power panel in Settings, which shows off the temporary switch to the performance mode, and the setting to automatically switch to power-saver when low on battery.

Low-Power, everywhere

 Talking of which, while it's important for the system to know that they're targetting a power saving behaviour, it's also pretty useful for applications to try and behave better.
 
 Maybe you've already integrated with "low memory" events using GLib, but thanks to Patrick Griffis you can be an event better ecosystem citizen and monitor whether the system is in "Power Saver" mode and adjust your application's behaviour.
 
 This feature will be available in GLib 2.70 along with documentation of useful steps to take. GNOME Software will already be using this functionality to avoid large automated downloads when energy saving is needed.

Availability

 The majority of the above features are available in the GNOME 41 development branches and should get to your favourite GNOME-friendly distribution for their next release, such as Fedora 35.

by Bastien Nocera (noreply@blogger.com) at August 05, 2021 03:50 PM

August 02, 2021

Phil NormandIntroducing the GNOME Web Canary flavor

(Phil Normand)

Today I am happy to unveil GNOME Web Canary which aims to provide bleeding edge, most likely very unstable builds of Epiphany, depending on daily builds of the WebKitGTK development version. Read on to know more about this.

Until recently the GNOME Web browser was available for end-users in two …

by Philippe Normand at August 02, 2021 12:00 PM

July 09, 2021

Víctor JáquezVideo decoding in GStreamer with Vulkan

Warning: Vulkan video is still work in progress, from specification to available drivers and applications. Do not use it for production software just yet.

Introduction

Vulkan is a cross-platform Application Programming Interface (API), backed by the Khronos Group, aimed at graphics developers for a wide range of different tasks. The interface is described by a common specification, and it is implemented by different drivers, usually provided by GPU vendors and Mesa.

One way to visualize Vulkan, at first glance, is like a low-level OpenGL API, but better described and easier to extend. Even more, it is possible to implement OpenGL on top of Vulkan. And, as far as I am told by my peers in Igalia, Vulkan drivers are easier and cleaner to implement than OpenGL ones.

A couple years ago, a technical specification group (TSG), inside the Vulkan Working Group, proposed the integration of hardware accelerated video compression and decompression into the Vulkan API. In April 2021 the formed Vulkan Video TSG published an introduction to the
specification
. Please, do not hesitate to read it. It’s quite good.

Matthew Waters worked on a GStreamer plugin using Vulkan, mainly for uploading, composing and rendering frames. Later, he developed a library mapping Vulkan objects to GStreamer. This work was key for what I am presenting here. In 2019, during the last GStreamer Conference, Matthew delivered a talk about his work. Make sure to watch it, it’s worth it.

Other key components for this effort were the base classes for decoders and the bitstream parsing libraries in GStreamer, jointly developed by Intel, Centricular, Collabora and Igalia. Both libraries allow using APIs for stateless video decoding and encoding within the GStreamer framework, such as Vulkan Video, VAAPI, D3D11, and so on.

When the graphics team in Igalia told us about the Vulkan Video TSG, we decided to explore the specification. Therefore, Igalia decided to sponsor part of my time to craft a GStreamer element to decode H.264 streams using these new Vulkan extensions.

Assumptions

As stated at the beginning of this text, this development has to be considered unstable and the APIs may change without further notice.

Right now, the only Vulkan driver that offers these extensions is the beta NVIDIA driver. You would need, at least, version 455.50.12 for Linux, but it would be better to grab the latest one. And, of course, I only tested this on Linux. I would like to thank NVIDIA for their Vk Video samples. Their test application drove my work.

Finally, this work assumes the use of the main development branch of GStreamer, because the base classes for decoders are quite recent. Naturally, you can use gst-build for an efficient upstream workflow.

Work done

This work basically consists of two new objects inside the GstVulkan code:

  • GstVulkanDeviceDecoder: a GStreamer object in GstVulkan library, inherited from GstVulkanDevice, which enables VK_KHR_video_queue and VK_KHR_video_decode_queue extensions. Its purpose is to handle codec-agnostic operations.
  • vulkanh264dec: a GStreamer element, inherited from GstH264Decoder, which tries to instantiate a GstVulkanDeviceDecoder to composite it and is in charge of handling codec-specific operations later, such as matching the parsed structures. It outputs, in the source pad, memory:VulkanImage featured frames, with NV12 color format.

  • So far this pipeline works without errors:

    $ gst-launch-1.0 filesrc location=big_buck_bunny_1080p_h264.mov ! parsebin ! vulkanh264dec ! fakesink
    

    As you might see, the pipeline does not use vulkansink to render frames. This is because the Vulkan format output by the driver’s decoder device is VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, which is NV12 crammed in a single image, while for GstVulkan a NV12 frame is a buffer with two images, one per component. So the current color conversion in GstVulkan does not support this Vulkan format. That is future work, among other things.

    You can find the merge request for this work in GStreamer’s Gitlab.

    Future work

    As was mentioned before, it is required to fully support VK_FORMAT_G8_B8R8_2PLANE_420_UNORM format in GstVulkan. That requires thinking about how to keep backwards compatibility. Later, an implementation of the sampler to convert this format to RGB will be needed, so that decoded frames can be rendered by vulkansink.

    Also, before implementing any new feature, the code and its abstractions will need to be cleaned up, since currently the division between codec-specific and codec-agnostic code is not strict, and it must be fixed.

    Another important cleanup task is to enhance the way the Vulkan headers are handled. Since the required headers files for video extensions are beta, they are not expected to be available in the system, so temporally I had to add the those headers as part of the GstVulkan library.

    Then it will be possible to implement the H.265 decoder, since the NVIDIA driver also supports it.

    Later on, it will be nice to start thinking about encoders. But this requires extending support for stateless encoders in GStreamer, something I want do to for the new VAAPI plugin too.

    Thanks for bearing with me, and thanks to Igalia for sponsoring this work.

    by vjaquez at July 09, 2021 05:38 PM

    June 28, 2021

    GStreamerGStreamer Rust bindings 0.17.0 release

    (GStreamer)

    A new version of the GStreamer Rust bindings, 0.17.0, was released.

    As usual this release follows the latest gtk-rs release.

    This is the first version that includes optional support for new GStreamer 1.20 APIs. As GStreamer 1.20 was not released yet, these new APIs might still change. The minimum supported version of the bindings is still GStreamer 1.8 and the targetted GStreamer API version can be selected by applications via feature flags.

    Apart from this, the new version features a lot of API cleanup, especially of the subclassing APIs, and the addition of a few missing bindings. As usual, the focus of this release was to make usage of GStreamer from Rust as convenient and complete as possible.

    The new release also brings a lot of bugfixes, most of which were already part of the 0.16.x bugfix releases.

    A new release of the GStreamer Rust plugins will follow in the next days.

    Details can be found in the release notes for gstreamer-rs.

    The code and documentation for the bindings is available on the freedesktop.org GitLab

    as well as on crates.io.

    If you find any bugs, notice any missing features or other issues please report them in GitLab.

    June 28, 2021 11:00 PM

    June 24, 2021

    Seungha YangGStreamer Media Foundation Video Encoder Is Now Faster — Direct3D11 Awareness

    GStreamer Media Foundation Video Encoder Is Now Faster — Direct3D11 Awareness

    TL;DR

    GStreamer MediaFoundation video encoders (H.264, HEVC, and VP9 if supported by GPU) gained the ability to accept Direct3D11 textures, which will bring noticeable performance improvements

    As of the GStreamer 1.18 release, hardware accelerated Direct3D11/DXVA video decoding and MediaFoundation based video encoding features were landed.

    Those native Windows video APIs can be very helpful for application development/deployment, since they are hardware platform-agnostic APIs for the Windows platform. The questions is if they are sufficiently competitive with hardware-specific APIs such as NVIDIA NVCODEC SDK or Intel Media SDK?

    Probably the answer is … “NO”

    How much faster than before are things?

    One simple way to compare performance would be to measure the time spent for transcoding. Of course, encoded file size and visual quality are also very important factors. However, as per my experiments, resulting video file size and visual quality (in terms of PSNR) were very close to each other. Then our remaining interest is speed!

    Let’s take a look at my measurement. I performed the measurement by using one 4K H.264 video content with an NVIDIA RTX 3060 GPU and an Intel Core i7–1065G7 integrated GPU. For reference, NVCODEC and Intel Media SDK plugins were tested by using GStreamer 1.19.1 as well. Each test used performance (speed) oriented encoding options to be a fair comparison.

    - NVIDA RTX 3060

    GStreamer 1.18 — 2 min 1 sec
    GStreamer 1.19.1 1 min 9 sec
    NVCODEC plugin (nvh264dec/nvh264enc pair) — 1 min 19 sec

    - Intel Core i7–1065G7 integrated GPU

    GStreamer 1.18 — 3 min 8 sec
    GStreamer 1.19.1 — 2 min 45 sec
    Intel Media SDK plugin (msdkh264dec/msdkh264enc pair)3 min 10 sec

    So, is it true that the Direct3D11/DXVA and MediaFoundation combination can be faster than hardware-specific APIs? Yes, as you can see

    Note that such results would be very system environment and encoding option dependent, so, you’d likely see different numbers

    Why MediaFoundation plugin got faster

    GStreamer 1.18 — The story was, because of the lack of Direct3D11 integration at MediaFoundation plugin side, each decoded frame (Direct3D11 texture) must be downloaded into system memory first, which is usually very slow path. And then, the memory was copied to another system memory allocated by MediaFoundation. Moreover, likely GPU driver would upload to GPU memory again. Well, twice visible redundant copies and another potential copy per frame!?!? hrm…

    In GStreamer 1.19.1, thanks to the Direct3D11 integration, MediaFoundation can accept Direct3D11 texture, which means we don’t need to download GPU texture and re-upload it any more.

    More details

    Since all Direct3D11/DXVA, MediaFoundation, NVCODEC, and Intel Media SDK APIs work with underlying GPU hardware, the performance should not be much different in theory, unless there are visible overhead around GPU vendor’s driver implementation.

    Then, remaining factor would be API consumer-side optimization.
    And yes, from GStreamer plugin implementation point of view, Direct3D11/DXVA and MediaFoundation plugins
    are more optimized than NVCODEC and MSDK plugins in terms of GPU memory transfer on Windows.

    It doesn’t mean Direct3D11/DXVA and MediaFoundation themselves are superior APIs than hardware-specific APIs at all. The difference is just result of more or less optimized plugin implementations

    You can try this enhancement right now!

    Install the official GStreamer 1.19.1 release, and just run this command.

    gst-launch-1.0.exe filesrc location=where-your-h264-file-located ! parsebin ! d3d11h264dec ! queue ! mfh264enc ! h264parse ! mp4mux ! filesink location=my-output.mp4

    You will be likely able to see the improvement by yourself :)

    There are still a lot of interesting topics for better Windows support in GStreamer. Specifically, nicer text support via DirectWrite and fine tuned GPU scheduling via Direct3D12 are on our radar. Not only for video features, we will keep improving various Windows specific features, including audio capture/render device support.

    If you’ve see any bugs, please contact me, and even better would be a bug report at GitLab. I’m watching it most of time 😊

    by Seungha Yang at June 24, 2021 04:01 PM

    June 15, 2021

    GStreamerIRC Channel has moved from Freenode to OFTC

    (GStreamer)

    Due to the widely reported issues at the Freenode IRC network, the official GStreamer discussion IRC channel has moved to #gstreamer on the OFTC IRC network alongside other Freedesktop projects.

    You can connect to it with your existing IRC client, or using Matrix which has a browser client and native apps for all platforms.

    For more information, please see the mailing list announcement.

    June 15, 2021 08:30 AM

    June 03, 2021

    Thomas Vander SticheleAmazing Marvin and KeyCombiner

    (Thomas Vander Stichele)

    I recently came across an excellent tool called KeyCombiner that helps you practice keyboard shortcuts (3 sets for free, $29/6 months for more sets). I spent some time to create a set for Amazing Marvin, my current todo manager of choice.

    The shareable URL to use in KeyCombiner is https://keycombiner.com/collecting/collections/shared/f1f78977-0920-4888-a86d-d00a7201502e

    I generated it from the printed PDF version of Marvin’s keyboard guide and a bunch of manual editing, in a google sheet.

    Keyboard shortcuts are great timesavers and help reduce friction, but it’s getting harder to learn them properly, and this tool has been a great help for some other apps, and for figuring out common shortcuts across apps, and for picking custom shortcuts (in other apps) that don’t conflict. If this is a problem you recognize, give KeyCombiner a try.

    flattr this!

    by Thomas at June 03, 2021 02:27 AM