## January 11, 2018

### Andy Wingo — spectre and the end of langsec

I remember in 2008 seeing Gerald Sussman, creator of the Scheme language, resignedly describing a sea change in the MIT computer science curriculum. In response to a question from the audience, he said:

The work of engineers used to be about taking small parts that they understood entirely and using simple techniques to compose them into larger things that do what they want.

But programming now isn't so much like that. Nowadays you muck around with incomprehensible or nonexistent man pages for software you don't know who wrote. You have to do basic science on your libraries to see how they work, trying out different inputs and seeing how the code reacts. This is a fundamentally different job.

Like many I was profoundly saddened by this analysis. I want to believe in constructive correctness, in math and in proofs. And so with the rise of functional programming, I thought that this historical slide from reason towards observation was just that, historical, and that the "safe" languages had a compelling value that would be evident eventually: that "another world is possible".

In particular I found solace in "langsec", an approach to assessing and ensuring system security in terms of constructively correct programs. One obvious application is parsing of untrusted input, and indeed the langsec.org website appears to emphasize this domain as one in which a programming languages approach can be fruitful. It is, after all, a truth universally acknowledged, that a program with good use of data types, will be free from many common bugs. So far so good, and so far so successful.

The basis of language security is starting from a programming language with a well-defined, easy-to-understand semantics. From there you can prove (formally or informally) interesting security properties about particular programs. For example, if a program has a secret k, but some untrusted subcomponent C of it should not have access to k, one can prove if k can or cannot leak to C. This approach is taken, for example, by Google's Caja compiler to isolate components from each other, even when they run in the context of the same web page.

But the Spectre and Meltdown attacks have seriously set back this endeavor. One manifestation of the Spectre vulnerability is that code running in a process can now read the entirety of its address space, bypassing invariants of the language in which it is written, even if it is written in a "safe" language. This is currently being used by JavaScript programs to exfiltrate passwords from a browser's password manager, or bitcoin wallets.

Mathematically, in terms of the semantics of e.g. JavaScript, these attacks should not be possible. But practically, they work. Spectre shows us that the building blocks provided to us by Intel, ARM, and all the rest are no longer "small parts understood entirely"; that instead now we have to do "basic science" on our CPUs and memory hierarchies to know what they do.

What's worse, we need to do basic science to come up with adequate mitigations to the Spectre vulnerabilities (side-channel exfiltration of results of speculative execution). Retpolines, poisons and masks, et cetera: none of these are proven to work. They are simply observed to be effective on current hardware. Indeed mitigations are anathema to the correctness-by-construction: if you can prove that a problem doesn't exist, what is there to mitigate?

Spectre is not the first crack in the edifice of practical program correctness. In particular, timing side channels are rarely captured in language semantics. But I think it's fair to say that Spectre is the most devastating vulnerability in the langsec approach to security that has ever been uncovered.

Where do we go from here? I see but two options. One is to attempt to make the behavior of the machines targetted by secure language implementations behave rigorously as architecturally specified, and in no other way. This is the approach taken by all of the deployed mitigations (retpolines, poisoned pointers, masked accesses): modify the compiler and runtime to prevent the CPU from speculating through vulnerable indirect branches (prevent speculative execution), or from using fetched values in further speculative fetches (prevent this particular side channel). I think we are missing a model and a proof that these mitigations restore target architectural semantics, though.

However if we did have a model of what a CPU does, we have another opportunity, which is to incorporate that model in a semantics of the target language of a compiler (e.g. micro-x86 versus x86). It could be that this model produces a co-evolution of the target architectures as well, whereby Intel decides to disclose and expose more of its microarchitecture to user code. Cacheing and other microarchitectural side-effects would then become explicit rather than transparent.

Rich Hickey has this thing where he talks about "simple versus easy". Both of them sound good but for him, only "simple" is good whereas "easy" is bad. It's the sort of subjective distinction that can lead to an endless string of Worse Is Better Is Worse Bourbaki papers, according to the perspective of the author. Anyway transparent caching in the CPU has been marvelously easy for most application developers and fantastically beneficial from a performance perspective. People needing constant-time operations have complained, of course, but that kind of person always complains. Could it be, though, that actually there is some other, better-is-better kind of simplicity that should replace the all-pervasive, now-treacherous transparent cacheing?

I don't know. All I will say is that an ad-hoc approach to determining which branches and loads are safe and which are not is not a plan that inspires confidence. Godspeed to the langsec faithful in these dark times.

## December 30, 2017

### Michael Sheldon — Speech Recognition – Mozilla’s DeepSpeech, GStreamer and IBus

Recently Mozilla released an open source implementation of Baidu’s DeepSpeech architecture, along with a pre-trained model using data collected as part of their Common Voice project.

In an attempt to make it easier for application developers to start working with the DeepSpeech model I’ve developed a GStreamer plugin, an IBus plugin and created some PPAs. To demonstrate what’s possible here’s a video of the IBus plugin providing speech recognition to any application under Linux:

Video of DeepSpeech IBus Plugin

### GStreamer DeepSpeech Plugin

I’ve created a GStreamer element which can be placed into an audio pipeline, it will then report any recognised speech via bus messages. It automatically segments audio based on configurable silence thresholds making it suitable for continuous dictation.

Here’s a couple of example pipelines using gst-launch.

To perform speech recognition on a file, printing all bus messages to the terminal:

gst-launch-1.0 -m filesrc location=/path/to/file.ogg ! decodebin ! audioconvert ! audiorate ! audioresample ! deepspeech ! fakesink

To perform speech recognition on audio recorded from the default system microphone, with changes to the silence thresholds:

gst-launch-1.0 -m pulsesrc ! audioconvert ! audiorate ! audioresample ! deepspeech silence-threshold=0.3 silence-length=20 ! fakesink

The source code is available here: https://github.com/Elleo/gst-deepspeech.

### IBus Plugin

I’ve also created a proof of concept IBus plugin which allows speech recognition to be used as an input method for virtually any application. It uses the above GStreamer plugin to perform speech recognition and then commits the text to the currently focused input field whenever a bus message is received from the deepspeech element.

It’ll need a lot more work before it’s really useful, especially in terms of adding in various voice editing commands, but hopefully it’ll provide a useful starting point for something more complete.

The source code is available here: https://github.com/Elleo/ibus-deepspeech

### PPAs

To make it extra easy to get started playing around with these projects I’ve also created a couple of PPAs for Ubuntu 17.10:

DeepSpeech PPA – This contains packages for libdeepspeech, libdeepspeech-dev, libtensorflow-cc and deepspeech-model (be warned, the model is around 1.3GB).

gst-deepspeech PPA – This contains packages for my GStreamer and IBus plugins (gstreamer1.0-deepspeech and ibus-deepspeech). Please note that you’ll also need the DeepSpeech PPA enabled to fulfil the dependencies of these packages.

I’d love to hear about any projects that find these plugins useful

## December 22, 2017

### Sebastian Dröge — GStreamer Rust bindings release 0.10.0 & gst-plugin release 0.1.0

Today I’ve released version 0.10.0 of the Rust GStreamer bindings, and after a journey of more than 1½ years the first release of the GStreamer plugin writing infrastructure crate “gst-plugin”.

Check the repositories¹² of both for more details, the code and various examples.

#### GStreamer Bindings

Some of the changes since the 0.9.0 release were already outlined in the previous blog post, and most of the other changes were also things I found while writing GStreamer plugins. For the full changelog, take a look at the CHANGELOG.md in the repository.

Other changes include

• I went over the whole API in the last days, added any missing things I found, simplified API as it made sense, changed functions to take Option<_> if allowed, etc.
• Bindings for using and writing typefinders. Typefinders are the part of GStreamer that try to guess what kind of media is to be handled based on looking at the bytes. Especially writing those in Rust seems worthwhile, considering that basically all of the GIT log of the existing typefinders consists of fixes for various kinds of memory-safety problems.
• Bindings for the Registry and PluginFeature were added, as well as fixing the relevant API that works with paths/filenames to actually work on Paths
• Bindings for the GStreamer Net library were added, allowing to build applications that synchronize their media of the network by using PTP, NTP or a custom GStreamer protocol (for which there also exists a server). This could be used for building video-walls, systems recording the same scene from multiple cameras, etc. and provides (depending on network conditions) up to < 1ms synchronization between devices.

Generally, this is something like a “1.0” release for me now (due to depending on too many pre-1.0 crates this is not going to be 1.0 anytime soon). The basic API is all there and nicely usable now and hopefully without any bugs, the known-missing APIs are not too important for now and can easily be added at a later time when needed. At this point I don’t expect many API changes anymore.

#### GStreamer Plugins

The other important part of this announcement is the first release of the “gst-plugin” crate. This provides the basic infrastructure for writing GStreamer plugins and elements in Rust, without having to write any unsafe code.

I started experimenting with using Rust for this more than 1½ years ago, and while a lot of things have changed in that time, this release is a nice milestone. In the beginning there were no GStreamer bindings and I was writing everything manually, and there were also still quite a few pieces of code written in C. Nowadays everything is in Rust and using the automatically generated GStreamer bindings.

Unfortunately there is no real documentation for any of this yet, there’s only the autogenerated rustdoc documentation available from here, and various example GStreamer plugins inside the GIT repository that can be used as a starting point. And various people already wrote their GStreamer plugins in Rust based on this.

The basic idea of the API is however that everything is as Rust-y as possible. Which might not be too much due to having to map subtyping, virtual methods and the like to something reasonable in Rust, but I believe it’s nice to use now. You basically only have to implement one or more traits on your structs, and that’s it. There’s still quite some boilerplate required, but it’s far less than what would be required in C. The best example at this point might be the audioecho element.

Over the next days (or weeks?) I’m not going to write any documentation yet, but instead will write a couple of very simple, minimal elements that do basically nothing and can be used as starting points to learn how all this works together. And will write another blog post or two about the different parts of writing a GStreamer plugin and element in Rust, so that all of you can get started with that.

Let’s hope that the number of new GStreamer plugins written in C is going to decrease in the future, and maybe even new people who would’ve never done that in C, with all the footguns everywhere, can get started with writing GStreamer plugins in Rust now.

### Sebastian Pölsterl — Denoising Autoencoder as TensorFlow estimator

Denoising Autoencoder as TensorFlow estimator

I recently started to use Google's deep learning framework TensorFlow. Since version 1.3, TensorFlow includes a high-level interface inspired by scikit-learn. Unfortunately, as of version 1.4, only 3 different classification and 3 different regression models implementing the Estimator interface are included. To better understand the Estimator interface, Dataset API, and components in tf-slim, I started to implement a simple Autoencoder and applied it to the well-known MNIST dataset of handwritten digits. This post is about my journey and is split in the following sections:

I will assume that you are familiar with TensorFlow basics. The full code is available at https://github.com/sebp/tf_autoencoder.

## Estimators

The tf.estimator.Estimator is at the heart TenorFlow's high-level interface and is similar to Kera's Model API. It hides most of the boilerplate required to train a model: managing Sessions, writing summary statistics for TensorBoard, or saving and loading checkpoints. An Estimator has three main methods: train, evaluate, and predict. Each of these methods requires a callable input function as first argument that feeds the data to the estimator (more on that later).

## Custom estimators

You can write your own custom model implementing the Estimator interface by passing a function returning an instance of tf.estimator.EstimatorSpec as first argument to tf.estimator.Estimator.

def model_fn(features, labels, mode):
…
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops)

The first argument – mode – is one of tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL or tf.estimator.ModeKeys.PREDICT and determines which of the remaining values must be provided.

In TRAIN mode:

• loss: A Tensor containing a scalar loss value.
• train_op: An Op that runs one step of training. We can use the return value of tf.contrib.layers.optimize_loss here.

In EVAL mode:

• loss: A scalar Tensor containing the loss on the validation data.
• eval_metric_ops: A dictionary that maps metric names to Tensors of metrics to calculate, typically, one of the tf.metrics functions.

In PREDICT mode:

• predictions: A dictionary that maps key names of your choice to Tensors containing the predictions from the model.

An important difference to the Estimators included with TensorFlow is that we need to call relevant tf.summary functions in model_fn ourselves. However, the Estimator will take care of writing summaries to disk so we can inspect them in TenorBoard.

## Autoencoder model

The Autoencoder model is straightforward, it consists of two major parts: an encoder and an decoder. The encoder has an input layer (28*28 = 784 dimensions in the case of MNIST) and one or more hidden layers, decreasing in size. In the decoder, we reverse the operations of the encoder by blowing the output of the smallest hidden layer up to the size of the input (optionally, with hidden layers of increasing size in-between). The loss function computes the difference between the original image and the reconstructed image (the output of the decoder). Common loss functions are mean squared error and cross-entropy.

To construct the encoder network, we specify a list containing the number of hidden units for each layer and (optionally) add dropout layers in-between:

def encoder(inputs, hidden_units, dropout, is_training):
net = inputs
for num_hidden_units in hidden_units:
net = tf.contrib.layers.fully_connected(
net, num_outputs=num_hidden_units)
if dropout is not None:
net = slim.dropout(net, is_training=is_training)
return net

where add_hidden_layer_summary adds a histogram of the activations and the fraction of non-zero activations to be displayed in TensorBoard. The latter is particularly useful when debugging networks with rectified linear units (ReLU). If too many hidden units return 0 values early during optimization, the model won't be able to learn anymore, in which case one would typically try to lower the learning rate or choose a different activation function.

The network of the decoder is almost identical, we just explicitly use a linear activation function (activation_fn=None) and no dropout in the last layer:

def decoder(inputs, hidden_units, dropout, is_training):
net = inputs
for num_hidden_units in hidden_units[:-1]:
net = tf.contrib.layers.fully_connected(
net, num_outputs=num_hidden_units)
if dropout is not None:
net = slim.dropout(net, is_training=is_training)

net = tf.contrib.layers.fully_connected(net, hidden_units[-1],
activation_fn=None)
tf.summary.histogram('activation', net)
return net

You may have noticed that we did no specify any activation function so far. Thanks to TenorFlow's arg_scope context manager, we can easily set the activation function for all fully connected layers. At the same time we set an appropriate weight initializer and (optionally) use weight decay:

def autoencoder(inputs, hidden_units, activation_fn, dropout, weight_decay, mode):
is_training = mode == tf.estimator.ModeKeys.TRAIN

weights_init = slim.initializers.variance_scaling_initializer()
if weight_decay is None:
weights_regularizer = None
else:
weights_reg = tf.contrib.layers.l2_regularizer(weight_decay)

with slim.arg_scope([tf.contrib.layers.fully_connected],
weights_initializer=weights_init,
weights_regularizer=weights_reg,
activation_fn=activation_fn):
net = encoder(inputs, hidden_units, dropout, is_training)
n_features = inputs.shape[1].value
decoder_units = hidden_units[:-1][::-1] + [n_features]
net = decoder(net, decoder_units, dropout, is_training)
return net

where slim.initializers.variance_scaling_initializer corresponds to the initialization of He et al., which is the current recommendation for networks with ReLU activations.

This concludes the architecture of the autoencoder. Next, we need to implement the model_fn function passed to tf.estimator.Estimator as outlined above.

## Autoencoder model_fn

First, we construct the network's architecture using the autoencoder function described above:

logits = autoencoder(inputs=features,
hidden_units=hidden_units,
activation_fn=activation_fn,
dropout=dropout,
weight_decay=weight_decay,
mode=mode)

Subsequent steps depend on the value of mode. In prediction mode, we merely have to return the reconstructed image, therefore we make sure all values are within the interval [0; 1] by applying the sigmoid function:

probs = tf.nn.sigmoid(logits)
predictions = {"prediction": probs}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions)

In training and evaluation mode, we need to compute the loss, which is cross-entropy in this example:

tf.losses.sigmoid_cross_entropy(labels, logits)
total_loss = tf.losses.get_total_loss(add_regularization_losses=is_training)

The second line is needed to add the $\ell_2$-losses used in weight decay.

Most importantly, training relies on choosing an optimizer, here we use Adam and an exponential learning rate decay. The latter dynamically updates the learning rate during training according to the formula
$$\text{decayed learning rate} = \text{base learning rate} \cdot 0.96^{\lfloor i / 1000 \rfloor} ,$$ where $i$ is the current iteration. It would probably work as well without learning rate decay, but I included it for the sake of completeness.

if mode == tf.estimator.ModeKeys.TRAIN:
train_op = tf.contrib.layers.optimize_loss(
loss=total_loss,
learning_rate=learning_rate,
learning_rate_decay_fn=lambda lr, gs: tf.train.exponential_decay(lr, gs, 1000, 0.96, staircase=True),
global_step=tf.train.get_global_step(),

# Add histograms for trainable variables
for var in tf.trainable_variables():
tf.summary.histogram(var.op.name, var)

Note that we add a histogram of all trainable variables for TensorBoard in the last part.

Finally, we compute the root mean squared error when in evaluation mode:

if mode == tf.estimator.ModeKeys.EVAL:
eval_metric_ops = {
"rmse": tf.metrics.root_mean_squared_error(
tf.cast(labels, tf.float64), tf.cast(probs, tf.float64))
}

and return the specification of our autoencoder estimator:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops)

## Feeding data to an Estimator via the Dataset API

Once we constructed our estimator, e.g. via

estimator = AutoEncoder(hidden_units=[128, 64, 32],
dropout=None,
weight_decay=1e-5,
learning_rate=0.001)

we would like to train it by calling train, which expects a callable that returns two tensors, one representing the input data and one the groundtruth data. The easiest way would be to use tf.estimator.inputs.numpy_input_fn, but instead I want to introduce TensorFlow's Dataset API, which is more generic.

The Dataset API comprises two elements:

1. tf.data.Dataset represents a dataset and any transformations applied to it.
2. tf.data.Iterator is used to extract elements from a Dataset. In particular, Iterator.get_next() returns the next element of a Dataset and typically is what is fed to an estimator.

Here, I'm using what is called an initializable Iterator, inspired by this post. We define one placeholder for the input image and one for the groundtruth image and initialize the placeholders before training starts using a hook. First, let's create a Dataset from the placeholders:

placeholders = [
tf.placeholder(data.dtype, data.shape, name='input_image'),
tf.placeholder(data.dtype, data.shape, name='groundtruth_image')
]
dataset = tf.data.Dataset.from_tensor_slices(placeholders)

Next, we shuffle the dataset and allow retrieving data from it until the specified number of epochs has been reached:

dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(num_epochs)

When creating input for evaluation or prediction, we are going to skip these two steps.

Finally, we combine multiple elements into a batch and create an iterator from the dataset:

dataset = dataset.batch(batch_size)

iterator = dataset.make_initializable_iterator()
next_example, next_label = iterator.get_next()

To initialize the placeholders, we need to call tf.Sesssion.run with feed_dict = {placeholders[0]: input_data, placeholders[1]: groundtruth_data}. Since the Estimator will create a Session for us, we need a way to call our initialization code after the session has been created and before training begins. The Estimator's train, evaluate and predict methods accept a list of SessionRunHook subclasses as the hooks argument, which we can use to inject our code in the right place. Therefore, we first create a generic hook that runs after the session has been created:

class IteratorInitializerHook(tf.train.SessionRunHook):
"""Hook to initialise data iterator after Session is created."""

def __init__(self):
self.iterator_initializer_func = None

def after_create_session(self, session, coord):
"""Initialise the iterator after the session has been created."""
assert callable(self.iterator_initializer_func)
self.iterator_initializer_func(session)

To make things a little bit nicer, we create an InputFunction class which implements the __call__ method. Thus, it will behave like a function and we can pass it directly to tf.estimator.Estimator.train and related methods.

class InputFunction:
def __init__(self, data, batch_size, num_epochs, mode):
self.data = data
self.batch_size = batch_size
self.mode = mode
self.num_epochs = num_epochs
self.init_hook = IteratorInitializerHook()

def __call__(self):
# Define placeholders
placeholders = [
tf.placeholder(self.data.dtype, self.data.shape, name='input_image'),
tf.placeholder(self.data.dtype, self.data.shape, name='reconstruct_image')
]

# Build dataset pipeline
dataset = tf.data.Dataset.from_tensor_slices(placeholders)
if self.mode == tf.estimator.ModeKeys.TRAIN:
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(self.num_epochs)
dataset = dataset.batch(self.batch_size)

# create iterator from dataset
iterator = dataset.make_initializable_iterator()
next_example, next_label = iterator.get_next()

# create initialization hook
def _init(sess):
feed_dict = dict(zip(placeholders, [self.data, self.data])
sess.run(iterator.initializer,
feed_dict=feed_dict)

self.init_hook.iterator_initializer_func = _init

return next_example, next_label

Finally, we can use the InputFunction class to train our autoencoder for 30 epochs:

from tensorflow.examples.tutorials.mnist import input_data as mnist_data

train_input_fn = InputFunction(
data=mnist.train.images,
batch_size=256,
num_epochs=30,
mode=tf.estimator.ModeKeys.TRAIN)
autoencoder.train(train_input_fn, hooks=[train_input_fn.init_hook])

The video below shows ten reconstructed images from the test data and their corresponding groundtruth after each epoch of training:
Your browser does not support the video tag.

## Denoising Autoencoder

A denoising autoencoder is slight variation on the autoencoder described above. The only difference is that input images are randomly corrupted before they are fed to the autoencoder (we still use the original, uncorrupted image to compute the loss). This acts as a form of regularization to avoid overfitting.

noise_factor = 0.5  # a float in [0; 1)

noise = noise_factor * tf.random_normal(input_img.shape.as_list())
input_corrupted = tf.clip_by_value(tf.add(input_img, noise), 0., 1.)
return input_corrupted, groundtruth

The function above takes two Tensors representing the input and groundtruth image, respectively, and corrupts the input image by the specified amount of noise. We can use this function to transform all of the images using Dataset's map function:

dataset = dataset.map(add_noise, num_parallel_calls=4)
dataset = dataset.prefetch(512)

The function passed to map will be part of the compute graph, thus you have to use TensorFlow operations to modify your input or use tf.py_func. The num_parallel_calls arguments speeds up preprocessing significantly, because multiple images are transformed in parallel. The second line ensures a certain amount of corrupted images are precomputed, otherwise the transformation would only be applied when executing iterator.get_next(), which would result in a delay for each batch and bad GPU utilization. The video below shows the groundtruth, input and output of the denoising autoencoder for up to 60 epochs:
Your browser does not support the video tag.

I hope this tutorial gave you some insight on how to implement a custom TensorFlow estimator and use the Dataset API.

## References

sebp Fri, 12/22/2017 - 12:39

### Sebastian Pölsterl — Denoising Autoencoder as TensorFlow estimator

Denoising Autoencoder as TensorFlow estimator

I recently started to use Google's deep learning framework TensorFlow. Since version 1.3, TensorFlow includes a high-level interface inspired by scikit-learn. Unfortunately, as of version 1.4, only 3 different classification and 3 different regression models implementing the Estimator interface are included. To better understand the Estimator interface, Dataset API, and components in tf-slim, I started to implement a simple Autoencoder and applied it to the well-known MNIST dataset of handwritten digits. This post is about my journey and is split in the following sections:

I will assume that you are familiar with TensorFlow basics. The full code is available at https://github.com/sebp/tf_autoencoder.

## Estimators

The tf.estimator.Estimator is at the heart TenorFlow's high-level interface and is similar to Kera's Model API. It hides most of the boilerplate required to train a model: managing Sessions, writing summary statistics for TensorBoard, or saving and loading checkpoints. An Estimator has three main methods: train, evaluate, and predict. Each of these methods requires a callable input function as first argument that feeds the data to the estimator (more on that later).

## Custom estimators

You can write your own custom model implementing the Estimator interface by passing a function returning an instance of tf.estimator.EstimatorSpec as first argument to tf.estimator.Estimator.

def model_fn(features, labels, mode):
…
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops)

The first argument – mode – is one of tf.estimator.ModeKeys.TRAIN, tf.estimator.ModeKeys.EVAL or tf.estimator.ModeKeys.PREDICT and determines which of the remaining values must be provided.

In TRAIN mode:

• loss: A Tensor containing a scalar loss value.
• train_op: An Op that runs one step of training. We can use the return value of tf.contrib.layers.optimize_loss here.

In EVAL mode:

• loss: A scalar Tensor containing the loss on the validation data.
• eval_metric_ops: A dictionary that maps metric names to Tensors of metrics to calculate, typically, one of the tf.metrics functions.

In PREDICT mode:

• predictions: A dictionary that maps key names of your choice to Tensors containing the predictions from the model.

An important difference to the Estimators included with TensorFlow is that we need to call relevant tf.summary functions in model_fn ourselves. However, the Estimator will take care of writing summaries to disk so we can inspect them in TenorBoard.

## Autoencoder model

The Autoencoder model is straightforward, it consists of two major parts: an encoder and an decoder. The encoder has an input layer (28*28 = 784 dimensions in the case of MNIST) and one or more hidden layers, decreasing in size. In the decoder, we reverse the operations of the encoder by blowing the output of the smallest hidden layer up to the size of the input (optionally, with hidden layers of increasing size in-between). The loss function computes the difference between the original image and the reconstructed image (the output of the decoder). Common loss functions are mean squared error and cross-entropy.

To construct the encoder network, we specify a list containing the number of hidden units for each layer and (optionally) add dropout layers in-between:

def encoder(inputs, hidden_units, dropout, is_training):
net = inputs
for num_hidden_units in hidden_units:
net = tf.contrib.layers.fully_connected(
net, num_outputs=num_hidden_units)
if dropout is not None:
net = slim.dropout(net, is_training=is_training)
return net

where add_hidden_layer_summary adds a histogram of the activations and the fraction of non-zero activations to be displayed in TensorBoard. The latter is particularly useful when debugging networks with rectified linear units (ReLU). If too many hidden units return 0 values early during optimization, the model won't be able to learn anymore, in which case one would typically try to lower the learning rate or choose a different activation function.

The network of the decoder is almost identical, we just explicitly use a linear activation function (activation_fn=None) and no dropout in the last layer:

def decoder(inputs, hidden_units, dropout, is_training):
net = inputs
for num_hidden_units in hidden_units[:-1]:
net = tf.contrib.layers.fully_connected(
net, num_outputs=num_hidden_units)
if dropout is not None:
net = slim.dropout(net, is_training=is_training)

net = tf.contrib.layers.fully_connected(net, hidden_units[-1],
activation_fn=None)
tf.summary.histogram('activation', net)
return net

You may have noticed that we did no specify any activation function so far. Thanks to TenorFlow's arg_scope context manager, we can easily set the activation function for all fully connected layers. At the same time we set an appropriate weight initializer and (optionally) use weight decay:

def autoencoder(inputs, hidden_units, activation_fn, dropout, weight_decay, mode):
is_training = mode == tf.estimator.ModeKeys.TRAIN

weights_init = slim.initializers.variance_scaling_initializer()
if weight_decay is None:
weights_regularizer = None
else:
weights_reg = tf.contrib.layers.l2_regularizer(weight_decay)

with slim.arg_scope([tf.contrib.layers.fully_connected],
weights_initializer=weights_init,
weights_regularizer=weights_reg,
activation_fn=activation_fn):
net = encoder(inputs, hidden_units, dropout, is_training)
n_features = inputs.shape[1].value
decoder_units = hidden_units[:-1][::-1] + [n_features]
net = decoder(net, decoder_units, dropout, is_training)
return net

where slim.initializers.variance_scaling_initializer corresponds to the initialization of He et al., which is the current recommendation for networks with ReLU activations.

This concludes the architecture of the autoencoder. Next, we need to implement the model_fn function passed to tf.estimator.Estimator as outlined above.

## Autoencoder model_fn

First, we construct the network's architecture using the autoencoder function described above:

logits = autoencoder(inputs=features,
hidden_units=hidden_units,
activation_fn=activation_fn,
dropout=dropout,
weight_decay=weight_decay,
mode=mode)

Subsequent steps depend on the value of mode. In prediction mode, we merely have to return the reconstructed image, therefore we make sure all values are within the interval [0; 1] by applying the sigmoid function:

probs = tf.nn.sigmoid(logits)
predictions = {"prediction": probs}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions)

In training and evaluation mode, we need to compute the loss, which is cross-entropy in this example:

tf.losses.sigmoid_cross_entropy(labels, logits)
total_loss = tf.losses.get_total_loss(add_regularization_losses=is_training)

The second line is needed to add the $\ell_2$-losses used in weight decay.

Most importantly, training relies on choosing an optimizer, here we use Adam and an exponential learning rate decay. The latter dynamically updates the learning rate during training according to the formula
$$\text{decayed learning rate} = \text{base learning rate} \cdot 0.96^{\lfloor i / 1000 \rfloor} ,$$ where $i$ is the current iteration. It would probably work as well without learning rate decay, but I included it for the sake of completeness.

if mode == tf.estimator.ModeKeys.TRAIN:
train_op = tf.contrib.layers.optimize_loss(
loss=total_loss,
learning_rate=learning_rate,
learning_rate_decay_fn=lambda lr, gs: tf.train.exponential_decay(lr, gs, 1000, 0.96, staircase=True),
global_step=tf.train.get_global_step(),

# Add histograms for trainable variables
for var in tf.trainable_variables():
tf.summary.histogram(var.op.name, var)

Note that we add a histogram of all trainable variables for TensorBoard in the last part.

Finally, we compute the root mean squared error when in evaluation mode:

if mode == tf.estimator.ModeKeys.EVAL:
eval_metric_ops = {
"rmse": tf.metrics.root_mean_squared_error(
tf.cast(labels, tf.float64), tf.cast(probs, tf.float64))
}

and return the specification of our autoencoder estimator:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops)

## Feeding data to an Estimator via the Dataset API

Once we constructed our estimator, e.g. via

estimator = AutoEncoder(hidden_units=[128, 64, 32],
dropout=None,
weight_decay=1e-5,
learning_rate=0.001)

we would like to train it by calling train, which expects a callable that returns two tensors, one representing the input data and one the groundtruth data. The easiest way would be to use tf.estimator.inputs.numpy_input_fn, but instead I want to introduce TensorFlow's Dataset API, which is more generic.

The Dataset API comprises two elements:

1. tf.data.Dataset represents a dataset and any transformations applied to it.
2. tf.data.Iterator is used to extract elements from a Dataset. In particular, Iterator.get_next() returns the next element of a Dataset and typically is what is fed to an estimator.

Here, I'm using what is called an initializable Iterator, inspired by this post. We define one placeholder for the input image and one for the groundtruth image and initialize the placeholders before training starts using a hook. First, let's create a Dataset from the placeholders:

placeholders = [
tf.placeholder(data.dtype, data.shape, name='input_image'),
tf.placeholder(data.dtype, data.shape, name='groundtruth_image')
]
dataset = tf.data.Dataset.from_tensor_slices(placeholders)

Next, we shuffle the dataset and allow retrieving data from it until the specified number of epochs has been reached:

dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(num_epochs)

When creating input for evaluation or prediction, we are going to skip these two steps.

Finally, we combine multiple elements into a batch and create an iterator from the dataset:

dataset = dataset.batch(batch_size)

iterator = dataset.make_initializable_iterator()
next_example, next_label = iterator.get_next()

To initialize the placeholders, we need to call tf.Sesssion.run with feed_dict = {placeholders[0]: input_data, placeholders[1]: groundtruth_data}. Since the Estimator will create a Session for us, we need a way to call our initialization code after the session has been created and before training begins. The Estimator's train, evaluate and predict methods accept a list of SessionRunHook subclasses as the hooks argument, which we can use to inject our code in the right place. Therefore, we first create a generic hook that runs after the session has been created:

class IteratorInitializerHook(tf.train.SessionRunHook):
"""Hook to initialise data iterator after Session is created."""

def __init__(self):
self.iterator_initializer_func = None

def after_create_session(self, session, coord):
"""Initialise the iterator after the session has been created."""
assert callable(self.iterator_initializer_func)
self.iterator_initializer_func(session)

To make things a little bit nicer, we create an InputFunction class which implements the __call__ method. Thus, it will behave like a function and we can pass it directly to tf.estimator.Estimator.train and related methods.

class InputFunction:
def __init__(self, data, batch_size, num_epochs, mode):
self.data = data
self.batch_size = batch_size
self.mode = mode
self.num_epochs = num_epochs
self.init_hook = IteratorInitializerHook()

def __call__(self):
# Define placeholders
placeholders = [
tf.placeholder(self.data.dtype, self.data.shape, name='input_image'),
tf.placeholder(self.data.dtype, self.data.shape, name='reconstruct_image')
]

# Build dataset pipeline
dataset = tf.data.Dataset.from_tensor_slices(placeholders)
if self.mode == tf.estimator.ModeKeys.TRAIN:
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(self.num_epochs)
dataset = dataset.batch(self.batch_size)

# create iterator from dataset
iterator = dataset.make_initializable_iterator()
next_example, next_label = iterator.get_next()

# create initialization hook
def _init(sess):
feed_dict = dict(zip(placeholders, [self.data, self.data])
sess.run(iterator.initializer,
feed_dict=feed_dict)

self.init_hook.iterator_initializer_func = _init

return next_example, next_label

Finally, we can use the InputFunction class to train our autoencoder for 30 epochs:

from tensorflow.examples.tutorials.mnist import input_data as mnist_data

train_input_fn = InputFunction(
data=mnist.train.images,
batch_size=256,
num_epochs=30,
mode=tf.estimator.ModeKeys.TRAIN)
autoencoder.train(train_input_fn, hooks=[train_input_fn.init_hook])

The video below shows ten reconstructed images from the test data and their corresponding groundtruth after each epoch of training:
Your browser does not support the video tag.

## Denoising Autoencoder

A denoising autoencoder is slight variation on the autoencoder described above. The only difference is that input images are randomly corrupted before they are fed to the autoencoder (we still use the original, uncorrupted image to compute the loss). This acts as a form of regularization to avoid overfitting.

noise_factor = 0.5  # a float in [0; 1)

noise = noise_factor * tf.random_normal(input_img.shape.as_list())
input_corrupted = tf.clip_by_value(tf.add(input_img, noise), 0., 1.)
return input_corrupted, groundtruth

The function above takes two Tensors representing the input and groundtruth image, respectively, and corrupts the input image by the specified amount of noise. We can use this function to transform all of the images using Dataset's map function:

dataset = dataset.map(add_noise, num_parallel_calls=4)
dataset = dataset.prefetch(512)

The function passed to map will be part of the compute graph, thus you have to use TensorFlow operations to modify your input or use tf.py_func. The num_parallel_calls arguments speeds up preprocessing significantly, because multiple images are transformed in parallel. The second line ensures a certain amount of corrupted images are precomputed, otherwise the transformation would only be applied when executing iterator.get_next(), which would result in a delay for each batch and bad GPU utilization. The video below shows the groundtruth, input and output of the denoising autoencoder for up to 60 epochs:
Your browser does not support the video tag.

I hope this tutorial gave you some insight on how to implement a custom TensorFlow estimator and use the Dataset API.

## References

sebp Fri, 12/22/2017 - 12:39

## December 19, 2017

### Christian Schaller — Why hasn’t The Year of the Linux Desktop happened yet?

Having spent 20 years of my life on Desktop Linux I thought I should write up my thinking about why we so far hasn’t had the Linux on the Desktop breakthrough and maybe more importantly talk about the avenues I see for that breakthrough still happening. There has been a lot written of this over the years, with different people coming up with their explanations. My thesis is that there really isn’t one reason, but rather a range of issues that all have contributed to holding the Linux Desktop back from reaching a bigger market. Also to put this into context, success here in my mind would be having something like 10% market share of desktop systems, that to me means we reached critical mass. So let me start by listing some of the main reasons I see for why we are not at that 10% mark today before going onto talking about how I think that goal might possible to reach going forward.

Things that have held us back

• Fragmented market
• One of the most common explanations for why the Linux Desktop never caught on more is the fragmented state of the Linux Desktop space. We got a large host of desktop projects like GNOME, KDE, Enlightenment, Cinnamon etc. and a even larger host of distributions shipping these desktops. I used to think this state should get a lot of the blame, and I still believe it owns some of the blame, but I have also come to conclude in recent years that it is probably more of a symptom than a cause. If someone had come up with a model strong enough to let Desktop Linux break out of its current technical user niche then I am now convinced that model would easily have also been strong enough to leave the Linux desktop fragmentation behind for all practical purposes. Because at that point the alternative desktops for Linux would be as important as the alternative MS Windows shells are. So in summary, the fragmentation hasn’t helped for sure and is still not helpful, but it is probably a problem that has been overstated.

• Lack of special applications
• Another common item that has been pointed to is the lack of applications. We know that for sure in the early days of Desktop Linux the challenge you always had when trying to convince anyone of moving to Desktop Linux was that they almost invariably had one or more application they relied on that was only available on Windows. I remember in one of my first jobs after University when I worked as a sysadmin we had a long list of these applications that various parts of the organization relied on, be that special tools to interface with a supplier, with the bank, dealing with nutritional values of food in the company cafeteria etc. This is a problem that has been in rapid decline for the last 5-10 years due to the move to web applications, but I am sure that in a given major organization you can still probably find a few of them. But between the move to the web and Wine I don’t think this is a major issue anymore. So in summary this was a major roadblock in the early years, but is a lot less of an impediment these days.

• Lack of big name applications
• Adopting a new platform is always easier if you can take the applications you are familiar with you. So the lack of things like MS Office and Adobe Photoshop would always contribute to making a switch less likely. Just because in addition to switching OS you would also have to learn to use new tools. And of course along those lines there where always the challenge of file format compatibility, in the early days in a hard sense that you simply couldn’t reliably load documents coming from some of these applications, to more recently softer problems like lack of metrically identical fonts. The font for example issue has been mostly resolved due to Google releasing fonts metrically compatible with MS default fonts a few years ago, but it was definitely a hindrance for adoption for many years. The move to web for a lot of these things has greatly reduced this problem too, with organizations adopting things like Google Docs at rapid pace these days. So in summary, once again something that used to be a big problem, but which is at least a lot less of a problem these days, but of course there are still apps not available for Linux that does stop people from adopting desktop linux.

• Lack of API and ABI stability
• This is another item that many people have brought up over the years. I think I have personally vacillated over the importance of this one multiple times over the years. Changing APIs are definitely not a fun thing for developers to deal with, it adds extra work often without bringing direct benefit to their application. Linux packaging philosophy probably magnified this problem for developers with anything that could be split out and packaged separately was, meaning that every application was always living on top of a lot of moving parts. That said the reason I am sceptical to putting to much blame onto this is that you could always find stable subsets to rely on. So for instance if you targeted GTK2 or Qt back in the day and kept away from some of the more fast moving stuff offered by GNOME and KDE you would not be hit with this that often. And of course if the Linux Desktop market share had been higher then people would have been prepared to deal with these challenges regardless, just like they are on other platforms that keep changing and evolving quickly like the mobile operating systems.

• Apple resurgence
• This might of course be the result of subjective memory, but one of the times where it felt like there could have been a Linux desktop breakthrough was at the same time as Linux on the server started making serious inroads. The old Unix workstation market was coming apart and moving to Linux already, the worry of a Microsoft monopoly was at its peak and Apple was in what seemed like mortal decline. There was a lot of media buzz around the Linux desktop and VC funded companies was set up to try to build a business around it. Reaching some kind of critical mass seemed like it could be within striking distance. Of course what happened here was that Steve Jobs returned to Apple and we suddenly had MacOSX come onto the scene taking at least some air out of the Linux Desktop space. The importance of this one I do find exceptionally hard to quantify though, part of me feels it had a lot of impact, but on the other hand it isn’t 100% clear to me that the market and the players at the time would have been able to capitalize even if Apple had gone belly-up.

• Microsoft aggressive response
• In the first 10 years of Desktop linux there was no doubt that Microsoft was working hard to try to nip any sign of Desktop Linux gaining any kind of foothold or momentum. I do remember for instance that Novell for quite some time was trying to establish a serious Desktop Linux business after having bought Miguel de Icaza’s company Helix Code. However it seemed like a pattern quickly emerged that every time Novell or anyone else tried to announce a major Linux desktop deal, Microsoft came running in offering next to free Windows licensing to get people to stay put. Looking at Linux migrations even seemed like it became a goto policy for negotiating better prices from Microsoft. So anyone wanting to attack the desktop market with Linux would have to contend with not only market inertia, but a general depression of the price of a desktop operating systems, and knowing that Microsoft would respond to any attempt to build momentum around Linux desktop deals with very aggressive sales efforts. So in summary, this probably played an important part as it meant that the pay per copy/subscription business model that for instance Red Hat built their server business around became really though to make work in the desktop space. Because the price point ended up so low it required gigantic volumes to become profitable, which of course is a hard thing to quickly achieve when fighting against an entrenched market leader. So in summary Microsoft in some sense successfully fended of Linux breaking through as a competitor although it could be said they did so at the cost of fatally wounding the per copy fee business model they built their company around and ensured that the next wave of competitors Microsoft had to deal with like iOS and Android based themselves on business models where the cost of the OS was assumed to be zero, thus contributing to the Windows Phone efforts being doomed.

• Piracy

• Red Hat mostly stayed away
• So few people probably don’t remember or know this, but Red Hat was actually founded as a desktop Linux company. The first major investment in software development that Red Hat ever did was setting up the Red Hat Advanced Development Labs, hiring a bunch of core GNOME developers to move that effort forward. But when Red Hat pivoted to the server with the introduction of Red Hat Enterprise Linux the desktop quickly started playing second fiddle. And before I proceed, all these events where many years before I joined the company, so just as with my other points here, read this as an analysis of someone without first hand knowledge. So while Red Hat has always offered a desktop product and have always been a major contributor to keeping the Linux desktop ecosystem viable, Red Hat was focused on the server side solutions and the desktop offering was always aimed more narrowly things like technical workstation customers and people developing towards the RHEL server. It is hard to say how big an impact Red Hats decision to not go after this market has had, on one side it would probably have been beneficial to have the Linux company with the deepest pockets and the strongest brand be a more active participant, but on the other hand staying mostly out of the fight gave other companies a bigger room to give it a go.

• Canonical business model not working out
• This bullet point is probably going to be somewhat controversial considering I work for Red Hat (although this is my private blog my with own personal opinions), but on the other hand I feel one can not talk about the trajectory of the Linux Desktop over the last decade without mentioning Canonical and Ubuntu. So I have to assume that when Mark Shuttleworth was mulling over doing Ubuntu he probably saw a lot of the challenges that I mention above, especially the revenue generation challenges that the competition from Microsoft provided. So in the end he decided on the standard internet business model of the time, which was to try to quickly build up a huge userbase and then dealing with how to monetize it later on. So Ubuntu was launched with an effective price point of zero, in fact you could even get install media sent to you for free. The effort worked in the sense that Ubuntu quickly became the biggest player in the Linux desktop space and it certainly helped the Linux desktop marketshare grow in the early years. Unfortunately I think it still basically failed, and the reason I am saying that is that it didn’t manage to grow big enough to provide Ubuntu with enough revenue through their appstore or their partner agreements to allow them to seriously re-invest in the Linux Desktop and invest in the kind of marketing effort needed to take Linux to a less super technical audience. So once it plateaued what they had was enough revenue to keep what is a relatively barebones engineering effort going, but not the kind of income that would allow them to steadily build the Linux Desktop market further. Mark then tried to capitalize on the mindshare and market share he had managed to build, by branching out into efforts like their TV and Phone efforts, but all those efforts eventually failed.
It would probably be an article in itself to deeply discuss why the grow userbase strategy failed here vs why for instance Android succeeded with this model, but I think the short version goes back to the fact that you had an entrenched market leader and the Linux Desktop isn’t different enough from a Mac or Windows desktops to drive the type of market change the transition from feature phones to smartphones was.
And to be clear I am not criticizing Mark here for the strategy he choose, if I where in his shoes back when he started Ubuntu I am not sure I would have been able to come up a different strategy that would have been plausible to succeed from his starting point. That said it did contribute to even further push the expected price of desktop Linux down and thus making it even harder for people to generate significant revenue from desktop linux. On the other hand one can argue that this would likely have happened anyway due to competitive pressure and Windows piracy. Canonicals recent focus pivot away from the desktop towards trying to build a business in the server and IoT space is in some sense a natural consequence of hitting the desktop growth plateau and not having enough revenue to invest in further growth.
So in summary, what was once seen as the most likely contender to take the Linux Desktop to critical mass turned out to have taken off with to little rocket fuel and eventually gravity caught up with them. And what we can never know for sure is if they during this run sucked so much air out of the market that it kept someone who could have taken us further with a different business model from jumping in.

• Original device manufacturer support
• THis one is a bit of a chicken and egg issue. Yes, lack of (perfect) hardware support has for sure kept Linux back on the Desktop, but lack of marketshare has also kept hardware support back. As with any system this is a question of reaching critical mass despite your challenges and thus eventually being so big that nobody can afford ignoring you. This is an area where we even today are still not fully there yet, but which I do feel we are getting closer all the time. When I installed Linux for the very first time, which I think was Red Hat Linux 3.1 (pre RHEL days) I spent about a weekend fiddling just to get my sound card working. I think I had to grab a experimental driver from somewhere and compile it myself. These days I mostly expect everything to work out of the box except more unique hardware like ambient light sensors or fingerprint readers, but even such devices are starting to land, and thanks to efforts from vendors such as Dell things are looking pretty good here. But the memory of these issues is long so a lot of people, especially those not using Linux themselves, but have heard about Linux, still assume hardware support is a very much hit or miss issue still.

# What does the future hold?

So any who has read my blog posts probably know I am an optimist by nature. This isn’t just some kind of genetic disposition towards optimism, but also a philosophical belief that optimism breeds opportunity while pessimism breeds failure. So just because we haven’t gotten the Linux Desktop to 10% marketshare so far doesn’t mean it will not happen going forward. It just means we haven’t achieved it so far. One of the key identifies of open source is that it is incredibly hard to kill, because unlike proprietary software, just because a company goes out of business or decides to shut down a part of its business, the software doesn’t go away or stop getting developed. As long as there is a strong community interested in pushing it forward it remains and evolves and thus when opportunity comes knocking again it is ready to try again. And that is definitely true of Desktop Linux which from a technical perspective is better than it has ever been, the level of polish is higher than ever before, the level of hardware support is better than ever before and the range of software available is better than ever before.

And the important thing to remember here is that we don’t exist in a vacuum, the world around us constantly change too, which means that the things that blocked us in the past or the companies that blocked us in the past might no be around or able to block us tomorrow. Apple and Microsoft are very different companies today than they where 10 or 20 years ago and their focus and who they compete with are very different. The dynamics of the desktop software market is changing with new technologies and paradigms all the time. Like how online media consumption has moved from things like your laptop to phones and tablets for instance. 5 years ago I would have considered iTunes a big competitive problem, today the move to streaming services like Spotify, Hulu, Amazon or Netflix has made iTunes feel archaic and a symbol of bygone times.

And many of the problems we faced before, like weird Windows applications without a Linux counterpart has been washed away by the switch to browser based applications. And while Valve’s SteamOS effort didn’t taken off, it has provided Linux users with access to a huge catalog of games, removing a reason that I know caused a few of my friends to mostly abandon using Linux on their computers. And you can actually as a consumer buy linux from a range of vendors now, who try to properly support Linux on their hardware. And this includes a major player like Dell and smaller outfits like System76 and Purism.

And since I do work for Red Hat managing our Desktop Engineering team I should address the question of if Red Hat will be a major driver in taking Desktop linux to that 10%? Well Red Hat will continue to support end evolve our current RHEL Workstation product, and we are seeing a steady growth of new customers for it. So if you are looking for a solid developer workstation for your company you should absolutely talk to Red Hat sales about RHEL Workstation, but Red Hat is not looking at aggressively targeting general consumer computers anytime soon. Caveat here, I am not a C-level executive at Red Hat, so I guess there is always a chance Jim Whitehurst or someone else in the top brass is mulling over a gigantic new desktop effort and I simply don’t know about it, but I don’t think it is likely and thus would not advice anyone to hold their breath waiting for such a thing to be announced :). That said Red Hat like any company out there do react to market opportunities as they arise, so who knows what will happen down the road. And we will definitely keep pushing Fedora Workstation forward as the place to experience the leading edge of the Desktop Linux experience and a great portal into the world of Linux on servers and in the cloud.

So to summarize; there are a lot of things happening in the market that could provide the right set of people the opportunity they need to finally take Linux to critical mass. Whether there is anyone who has the timing and skills to pull it off is of course always an open question and it is a question which will only be answered the day someone does it. The only thing I am sure of is that Linux community are providing a stronger technical foundation for someone to succeed with than ever before, so the question is just if someone can come up with the business model and the market skills to take it to the next level. There is also the chance that it will come in a shape we don’t appreciate today, for instance maybe ChromeOS evolves into a more full fledged operating system as it grows in popularity and thus ends up being the Linux on the Desktop end game? Or maybe Valve decides to relaunch their SteamOS effort and it provides the foundation for a major general desktop growth? Or maybe market opportunities arise that will cause us at Red Hat to decide to go after the desktop market in a wider sense than we do today? Or maybe Endless succeeds with their vision for a Linux desktop operating system? Or maybe the idea of a desktop operating system gets supplanted to the degree that we in the end just sit there saying ‘Alexa, please open the IDE and take dictation of this new graphics driver I am writing’ (ok, probably not that last one ;)

And to be fair there are a lot of people saying that Linux already made it on the desktop in the form of things like Android tablets. Which is technically correct as Android does run on the Linux kernel, but I think for many of us it feels a bit more like a distant cousin as opposed to a close family member both in terms of use cases it targets and in terms of technological pedigree.

As a sidenote, I am heading of on Yuletide vacation tomorrow evening, taking my wife and kids to Norway to spend time with our family there. So don’t expect a lot new blog posts from me until I am back from DevConf in early February. I hope to see many of you at DevConf though, it is a great conference and Brno is a great town even in freezing winter. As we say in Norway, there is no such thing as bad weather, it is only bad clothing.

## December 15, 2017

### Christian Schaller — Some predictions for 2018

So I spent a few hours polishing my crystal ball today, so here are some predictions for Linux on the Desktop in 2018. The advantage of course for me to publish these now is that I can then later selectively quote the ones I got right to prove my brilliance and the internet can selectively quote the ones I got wrong to prove my stupidity :)

Prediction 1: Meson becomes the defacto build system of the Linux community

Meson has been going from strength to strength this year and a lot of projects
which passed on earlier attempts to replace autotools has adopted it. I predict this
trend will continue in 2018 and that by the end of the year everyone agrees that Meson
has replaced autotools as the Linux community build system of choice. That said I am not
convinced the Linux kernel itself will adopt Meson in 2018.

Prediction 2: Rust puts itself on a clear trajectory to replace C and C++ for low level programming

Another rising start of 2017 is the programming language Rust. And while its pace of adoption
will be slower than Meson I do believe that by the time 2018 comes to a close the general opinion is
that Rust is the future of low level programming, replacing old favorites like C and C++. Major projects
like GNOME and GStreamer are already adopting Rust at a rapid pace and I believe even more projects will
join them in 2018.

Prediction 3: Apples decline as a PC vendor becomes obvious

Ever since Steve Jobs died it has become quite clear in my opinion that the emphasis
on the traditional desktop is fading from Apple. The pace of hardware refreshes seems
to be slowing and MacOS X seems to be going more and more stale. Some pundits have already
started pointing this out and I predict that in 2018 Apple will be no longer consider the
cool kid on the block for people looking for laptops, especially among the tech savvy crowd.
Hopefully a good opportunity for Linux on the desktop to assert itself more.

Prediction 4: Traditional distro packaging for desktop applications
will start fading away in favour of Flatpak

From where I am standing I think 2018 will be the breakout year for Flatpak as a replacement
for gettings your desktop applications as RPMS or debs. I predict that by the end of 2018 more or
less every Linux Desktop user will be at least running 1 flatpak on their system.

Prediction 5: Linux Graphics competitive across the board

I think 2018 will be a breakout year for Linux graphics support. I think our GPU drivers and API will be competitive with any other platform both in completeness and performance. So by the end of 2018 I predict that you will see Linux game ports by major porting houses
like Aspyr and Feral that perform just as well as their Windows counterparts. What is more I also predict that by the end of 2018 discreet graphics will be considered a solved problem on Linux.

Prediction 6: H265 will be considered a failure

I predict that by the end of 2018 H265 will be considered a failed codec effort and the era of royalty bearing media codecs will effectively start coming to and end. H264 will be considered the last successful royalty bearing codec and all new codecs coming out will
all be open source and royalty free.

### Bastien Nocera — More Bluetooth (and gaming) features

In the midst of post-release bug fixing, we've also added a fair number of new features to our stack. As usual, new features span a number of different components, so integrators will have to be careful picking up all the components when, well, integrating.

Do you have a PlayStation 3 joypad that feels just a little bit "off"? You can't find the Sony logo anywhere on it? The figures on the face buttons look like barbed wire? And if it were a YouTube video, it would say "No copyright intended"?

Bingo. When plugged in via USB, those devices advertise themselves as SHANWAN or Gasia, and implement the bare minimum to work when plugged into a PlayStation 3 console. But as a Linux computer would behave slightly differently, we need to fix a couple of things.

The first fix was simple, but necessary to be able to do any work: disable the rumble motor that starts as soon as you plug the pad through USB.

Once that's done, we could work around the fact that the device isn't Bluetooth compliant, and hard-code the HID service it's supposed to offer.

Bluetooth LE Battery reporting

Bluetooth Low Energy is the new-fangled (7-year old) protocol for low throughput devices, from a single coin-cell powered sensor, to input devices. What's great is that there's finally a standardised way for devices to export their battery statuses. I've added support for this in BlueZ, which UPower then picks up for desktop integration goodness.

There are a number of Bluetooth LE joypads available for pickup, including a few that should be firmware upgradeable. Look for "Bluetooth 4" as well as "Bluetooth LE" when doing your holiday shopping.

gnome-bluetooth work

Finally, this is the boring part. Benjamin and I reworked code that's internal to gnome-bluetooth, as used in the Settings panel as well as the Shell, to make it use modern facilities like GDBusObjectManager. The overall effect of this is, less code, less brittle and more reactive when Bluetooth adapters come and go, such as when using airplane mode.

Apart from the kernel patch mentioned above (you'll know if you need it :), those features have been integrated in UPower 0.99.7 and in the upcoming BlueZ 5.48. And they will of course be available in Fedora, both in rawhide and as updates to Fedora 27 as soon as the releases have been done and built.

GG!

## December 14, 2017

### Sebastian Dröge — A GStreamer Plugin like the Rec Button on your Tape Recorder – A Multi-Threaded Plugin written in Rust

As Rust is known for “Fearless Concurrency”, that is being able to write concurrent, multi-threaded code without fear, it seemed like a good fit for a GStreamer element that we had to write at Centricular.

Previous experience with Rust for writing (mostly) single-threaded GStreamer elements and applications (also multi-threaded) were all quite successful and promising already. And in the end, this new element was also a pleasure to write and probably faster than doing the equivalent in C. For the impatient, the code, tests and a GTK+ example application (written with the great Rust GTK bindings, but the GStreamer element is also usable from C or any other language) can be found here.

#### What does it do?

The main idea of the element is that it basically works like the rec button on your tape recorder. There is a single boolean property called “record”, and whenever it is set to true it will pass-through data and whenever it is set to false it will drop all data. But different to the existing valve element, it

• Outputs a contiguous timeline without gaps, i.e. there are no gaps in the output when not recording. Similar to the recording you get on a tape recorder, you don’t have 10s of silence if you didn’t record for 10s.
• Handles and synchronizes multiple streams at once. When recording e.g. a video stream and an audio stream, every recorded segment starts and stops with both streams at the same time
• Is key-frame aware. If you record a compressed video stream, each recorded segment starts at a keyframe and ends right before the next keyframe to make it most likely that all frames can be successfully decoded

The multi-threading aspect here comes from the fact that in GStreamer each stream usually has its own thread, so in this case the video stream and the audio stream(s) would come from different threads but would have to be synchronized between each other.

The GTK+ example application for the plugin is playing a video with the current playback time and a beep every second, and allows to record this as an MP4 file in the current directory.

#### How did it go?

This new element was again based on the Rust GStreamer bindings and the infrastructure that I was writing over the last year or two for writing GStreamer plugins in Rust.

As written above, it generally went all fine and was quite a pleasure but there were a few things that seem noteworthy. But first of all, writing this in Rust was much more convenient and fun than writing it in C would’ve been, and I’ve written enough similar code in C before. It would’ve taken quite a bit longer, I would’ve had to debug more problems in the new code during development (there were actually surprisingly few things going wrong during development, I expected more!), and probably would’ve written less exhaustive tests because writing tests in C is just so inconvenient.

##### Rust does not prevent deadlocks

While this should be clear, and was also clear to myself before, this seems like it might need some reiteration. Safe Rust prevents data races, but not all possible bugs that multi-threaded programs can have. Rust is not magic, only a tool that helps you prevent some classes of potential bugs.

For example, you can’t just stop thinking about lock order if multiple mutexes are involved, or that you can carelessly use condition variables without making sure that your conditions actually make sense and accessed atomically. As a wise man once said, “the safest program is the one that does not run at all”, and a deadlocking program is very close to that.

The part about condition variables might be something that can be improved in Rust. Without this, you can easily end up in situations where you wait forever or your conditions are actually inconsistent. Currently Rust’s condition variables only require a mutex to be passed to the functions for waiting for the condition to be notified, but it would probably also make sense to require passing the same mutex to the constructor and notify functions to make it absolutely clear that you need to ensure that your conditions are always accessed/modified while this specific mutex is locked. Otherwise you might end up in debugging hell.

Fortunately during development of the plugin I only ran into a simple deadlock, caused by accidentally keeping a mutex locked for too long and then running into conflict with another one. Which is probably an easy trap if the most common way of unlocking a mutex is to let the mutex lock guard fall out of scope. This makes it impossible to forget to unlock the mutex, but also makes it less explicit when it is unlocked and sometimes explicit unlocking by manually dropping the mutex lock guard is still necessary.

So in summary, while a big group of potential problems with multi-threaded programs are prevented by Rust, you still have to be careful to not run into any of the many others. Especially if you use lower-level constructs like condition variables, and not just e.g. channels. Everything is however far more convenient than doing the same in C, and with more support by the compiler, so I definitely prefer writing such code in Rust over doing the same in C.

##### Missing API

As usual, for the first dozen projects using a new library or new bindings to an existing library, you’ll notice some missing bits and pieces. That I missed relatively core part of GStreamer, the GstRegistry API, was surprising nonetheless. True, you usually don’t use it directly and I only need to use it here for loading the new plugin from a non-standard location, but it was still surprising. Let’s hope this was the biggest oversight. If you look at the issues page on GitHub, you’ll find a few other things that are still missing though. But nobody needed them yet, so it’s probably fine for the time being.

Another part of missing APIs that I noticed during development was that many manual (i.e. not auto-generated) bindings didn’t have the Debug trait implemented, or not in a too useful way. This is solved now, as otherwise I wouldn’t have been able to properly log what is happening inside the element to allow easier debugging later if something goes wrong.

Apart from that there were also various other smaller things that were missing, or bugs (see below) that I found in the bindings while going through all these. But those seem not very noteworthy – check the commit logs if you’re interested.

##### Bugs, bugs, bgsu

I also found a couple of bugs in the bindings. They can be broadly categorized in two categories

• Annotation bugs in GStreamer. The auto-generated parts of the bindings are generated from an XML description of the API, that is generated from the C headers and code and annotations in there. There were a couple of annotations that were wrong (or missing) in GStreamer, which then caused memory leaks in my case. Such mistakes could also easily cause memory-safety issues though. The annotations are fixed now, which will also benefit all the other language bindings for GStreamer (and I’m not sure why nobody noticed the memory leaks there before me).
• Bugs in the manually written parts of the bindings. Similarly to the above, there was one memory leak and another case where a function could’ve returned NULL but did not have this case covered on the Rust-side by returning an Option_>.

Generally I was quite happy with the lack of bugs though, the bindings are really ready for production at this point. And especially, all the bugs that I found are things that are unfortunately “normal” and common when writing code in C, while Rust is preventing exactly these classes of bugs. As such, they have to be solved only once at the bindings layer and then you’re free of them and you don’t have to spent any brain capacity on their existence anymore and can use your brain to solve the actual task at hand.

##### Inconvenient API

Similar to the missing API, whenever using some rather new API you will find things that are inconvenient and could ideally be done better. The biggest case here was the GstSegment API. A segment represents a (potentially open-ended) playback range and contains all the information to convert timestamps to the different time bases used in GStreamer. I’m not going to get into details here, best check the documentation for them.

A segment can be in different formats, e.g. in time or bytes. In the C API this is handled by storing the format inside the segment, and requiring you to pass the format together with the value to every function call, and internally there are some checks then that let the function fail if there is a format mismatch. In the previous version of the Rust segment API, this was done the same, and caused lots of unwrap() calls in this element.

But in Rust we can do better, and the new API for the segment now encodes the format in the type system (i.e. there is a Segment<Time>) and only values with the correct type (e.g. ClockTime) can be passed to the corresponding functions of the segment. In addition there is a type for a generic segment (which still has all the runtime checks) and functions to “cast” between the two.

Overall this gives more type-safety (the compiler already checks that you don’t mix calculations between seconds and bytes) and makes the API usage more convenient as various error conditions just can’t happen and thus don’t have to be handled. Or like in C, are simply ignored and not handled, potentially leaving a trap that can cause hard to debug bugs at a later time.

That Rust requires all errors to be handled makes it very obvious how many potential error cases the average C code out there is not handling at all, and also shows that a more expressive language than C can easily prevent many of these error cases at compile-time already.

## December 11, 2017

### GStreamer — GStreamer 1.12.4 stable release (binaries)

Pre-built binary images of the 1.12.4 stable release of GStreamer are now available for Windows 32/64-bit, iOS and Mac OS X and Android.

The builds are available for download from: Android, iOS, Mac OS X and Windows.

## December 09, 2017

### Sebastian Pölsterl — scikit-survival 0.5 released

scikit-survival 0.5 released

Today, I released a new version of scikit-survival. This release adds support for the latest version of scikit-learn (0.19) and pandas (0.21). In turn, support for Python 3.4, scikit-learn 0.18 and pandas 0.18 has been dropped.

Many people are confused about the meaning of predictions. Often, they assume that predictions of a survival model should always be non-negative since the input is the time to an event. However, this not always the case. In general, predictions are risk scores of arbitrary scale. In particular, survival models usually do not predict the exact time of an event, but the relative order of events. If samples are ordered according to their predicted risk score (in ascending order), one obtains the sequence of events, as predicted by the model. A more detailed explanation is available in the Understanding Predictions in Survival Analysis section of the documentation.

You can install the latest version via Anaconda (Linux, OSX and Windows):

conda install -c sebp scikit-survival

or via pip:

pip install -U scikit-survival
sebp Sat, 12/09/2017 - 12:33

### Survival functions

Loving the package. Is there a way to pull survival functions (a la Cox PH models) from your SVM and GBM estimators?

### Hi Joe,

In reply to by Joe (not verified)

Hi Joe,

there is a way, but it's currently not implemented, I'm afraid. See last paragraph of https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for details.

### Sebastian Pölsterl — scikit-survival 0.5 released

scikit-survival 0.5 released

Today, I released a new version of scikit-survival. This release adds support for the latest version of scikit-learn (0.19) and pandas (0.21). In turn, support for Python 3.4, scikit-learn 0.18 and pandas 0.18 has been dropped.

Many people are confused about the meaning of predictions. Often, they assume that predictions of a survival model should always be non-negative since the input is the time to an event. However, this not always the case. In general, predictions are risk scores of arbitrary scale. In particular, survival models usually do not predict the exact time of an event, but the relative order of events. If samples are ordered according to their predicted risk score (in ascending order), one obtains the sequence of events, as predicted by the model. A more detailed explanation is available in the Understanding Predictions in Survival Analysis section of the documentation.

You can install the latest version via Anaconda (Linux, OSX and Windows):

conda install -c sebp scikit-survival

or via pip:

pip install -U scikit-survival
sebp Sat, 12/09/2017 - 12:33

### Survival functions

Loving the package. Is there a way to pull survival functions (a la Cox PH models) from your SVM and GBM estimators?

### Hi Joe,

In reply to by Joe (not verified)

Hi Joe,

there is a way, but it's currently not implemented, I'm afraid. See last paragraph of https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for details.

## December 07, 2017

### GStreamer — GStreamer 1.12.4 stable release

The GStreamer team is pleased to announce the fourth bugfix release in the stable 1.12 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.12.x.

See /releases/1.12/ for the full release notes.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Check out the release notes for GStreamer core, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-vaapi, or gst-omx, or download tarballs for gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-validate, gstreamer-vaapi, or gst-omx.

### Víctor Jáquez — Enabling HuC for SKL/KBL in Debian/testing

Recently, our friend Florent complained that it was impossible to set a constant bitrate when encoding H.264 using low-power profile with gstreamer-vaapi .

Low-power (LP) profiles are VA-API entry points, available in Intel SkyLake-based procesor and succesors, which provide video encoding with low power consumption.

Later on, Ullysses and Sree, pointed out that CBR in LP is ony possible if HuC is enabled in the kernel.

HuC is a firmware, loaded by i915 kernel module, designed to offload some of the media functions from the CPU to GPU. One of these functions is bitrate control when encoding. HuC saves unnecessary CPU-GPU synchronization.

In order to load HuC, it is required first to load GuC, another Intel’s firmware designed to perform graphics workload scheduling on the various graphics parallel engines.

How we can install and configure these firmwares to enable CBR in low-power profile, among other things, in Debian/testing?

## Check i915 parameters

First we shall confirm that our kernel and our i915 kernel module is capable to handle this functionality:

$sudo modinfo i915 | egrep -i "guc|huc|dmc" firmware: i915/bxt_dmc_ver1_07.bin firmware: i915/skl_dmc_ver1_26.bin firmware: i915/kbl_dmc_ver1_01.bin firmware: i915/kbl_guc_ver9_14.bin firmware: i915/bxt_guc_ver8_7.bin firmware: i915/skl_guc_ver6_1.bin firmware: i915/kbl_huc_ver02_00_1810.bin firmware: i915/bxt_huc_ver01_07_1398.bin firmware: i915/skl_huc_ver01_07_1398.bin parm: enable_guc_loading:Enable GuC firmware loading (-1=auto, 0=never [default], 1=if available, 2=required) (int) parm: enable_guc_submission:Enable GuC submission (-1=auto, 0=never [default], 1=if available, 2=required) (int) parm: guc_log_level:GuC firmware logging level (-1:disabled (default), 0-3:enabled) (int) parm: guc_firmware_path:GuC firmware path to use instead of the default one (charp) parm: huc_firmware_path:HuC firmware path to use instead of the default one (charp)  ## Install firmware $ sudo apt install firmware-misc-nonfree


UPDATE: In order to install this Debian package, you should have enabled the non-free apt repository in your sources list.

Verify the firmware are installed:

....
$cat /etc/modprobe.d/i915.conf options i915 enable_guc_loading=1 enable_guc_submission=1  ## Reboot $ sudo systemctl reboot


## Verification

Now it is possible to verify that the i915 module kernel loaded the firmware correctly by looking at the kenrel logs:

$journalctl -b -o short-monotonic -k | egrep -i "i915|dmr|dmc|guc|huc" [ 10.303849] miau kernel: Setting dangerous option enable_guc_loading - tainting kernel [ 10.303852] miau kernel: Setting dangerous option enable_guc_submission - tainting kernel [ 10.336318] miau kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 10.338664] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_dmc_ver1_01.bin [ 10.339635] miau kernel: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1) [ 10.361811] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_huc_ver02_00_1810.bin [ 10.362422] miau kernel: i915 0000:00:02.0: firmware: direct-loading firmware i915/kbl_guc_ver9_14.bin [ 10.393117] miau kernel: [drm] GuC submission enabled (firmware i915/kbl_guc_ver9_14.bin [version 9.14]) [ 10.410008] miau kernel: [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0 [ 10.559614] miau kernel: snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915]) [ 11.937413] miau kernel: i915 0000:00:02.0: fb0: inteldrmfb frame buffer device  That means that HuC and GuC firmwares were loaded successfully. Now we can check the status of the modules using sysfs $ sudo cat /sys/kernel/debug/dri/0/i915_guc_load_status
GuC firmware status:
path: i915/kbl_guc_ver9_14.bin
fetch: SUCCESS
version wanted: 9.14
version found: 9.14
header: offset is 0; size = 128
uCode: offset is 128; size = 142272
RSA: offset is 142400; size = 256

GuC status 0x800330ed:
Bootrom status = 0x76
uKernel status = 0x30
MIA Core status = 0x3

Scratch registers:
0:     0xf0000000
1:     0x0
2:     0x0
3:     0x5f5e100
4:     0x600
5:     0xd5fd3
6:     0x0
7:     0x8
8:     0x3
9:     0x74240
10:     0x0
11:     0x0
12:     0x0
13:     0x0
14:     0x0
15:     0x0
$sudo cat /sys/kernel/debug/dri/0/i915_huc_load_status HuC firmware status: path: i915/kbl_huc_ver02_00_1810.bin fetch: SUCCESS load: SUCCESS version wanted: 2.0 version found: 2.0 header: offset is 0; size = 128 uCode: offset is 128; size = 218304 RSA: offset is 218432; size = 256 HuC status 0x00006080:  ## Test GStremer $ gst-launch-1.0 videotestsrc num-buffers=1000 ! video/x-raw, format=NV12, width=1920, height=1080, framerate=$$fraction$$30/1 ! vaapih264enc bitrate=8000 keyframe-period=30 tune=low-power rate-control=cbr ! mp4mux ! filesink location=test.mp4
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Got context from element 'vaapiencodeh264-0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)"$$GstVaapiDisplayGLX$$\ vaapidisplayglx0";
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:11.620036001
Setting pipeline to PAUSED ...
Setting pipeline to NULL ...
Freeing pipeline ...
$gst-discoverer-1.0 test.mp4 Analyzing file:///home/vjaquez/gst/master/intel-vaapi-driver/test.mp4 Done discovering file:///home/vjaquez/test.mp4 Topology: container: Quicktime video: H.264 (High Profile) Properties: Duration: 0:00:33.333333333 Seekable: yes Live: no Tags: video codec: H.264 / AVC bitrate: 8084005 encoder: VA-API H264 encoder datetime: 2017-12-07T14:29:23Z container format: ISO MP4/M4A  Misison accomplished! ## References ## December 06, 2017 ### Bastien Nocera — UTC and Anywhere on Earth support A quick post to tell you that we finally added UTC support to Clocks' and the Shell's World Clocks section. And if you're into it, there's also Anywhere on Earth support. You will need to have git master versions of libgweather (our cities and timezones database), and gnome-clocks. This feature will land in GNOME 3.28. Many thanks to Giovanni for coming up with an API he was happy with after I attempted a couple of iterations on one. Enjoy! Update: As expected, a bug crept in. Thanks to Colin Guthrie for spotting the error in the "Anywhere on Earth" timezone. See this section for the fun we have to deal with. ## November 26, 2017 ### Sebastian Dröge — GStreamer Rust bindings release 0.9 About 3 months, a GStreamer Conference and two bug-fix releases have passed now since the GStreamer Rust bindings release 0.8.0. Today version 0.9.0 (and 0.9.1 with a small bugfix to export some forgotten types) with a couple of API improvements and lots of additions and cleanups was released. This new version depends on the new set of releases of the gtk-rs crates (glib/etc). The full changelog can be found here, but below is a short overview of the (in my opinion) most interesting changes. #### Tutorials The basic tutorials 1 to 8 were ported from C to Rust by various contributors. The C versions and the corresponding explanatory text can be found here, and it should be relatively easy to follow the text together with the Rust code. This should make learning to use GStreamer from Rust much easier, in combination with the few example applications that exist in the repository. #### Type-safety Improvements Previously querying the current playback position from a pipeline (and various other things analogous) was giving you a plain 64-bit integer, just like in C. However in Rust we can easily do better. The main problem with just getting an integer was that there are “special” values that have the meaning of “no value known”, specifically GST_CLOCK_TIME_NONE for values in time. In C this often causes bugs by code ignoring this special case and then doing calculations with such a value, resulting in completely wrong numbers. In the Rust bindings these are now expressed as an Option_> so that the special case has to be handled separately, and in combination with that for timed values there is a new type called ClockTime that is implementing all the arithmetic traits and others so you can still do normal arithmetic operations on the values, while the implementation of those operations takes care of GST_CLOCK_TIME_NONE. Also it was previously easy to get a value in bytes and add it to a value in time. Whenever multiple formats are possible, a new type called FormatValue is now used that combines the value itself with its format to prevent such mistakes. #### Error Handling Various operations in GStreamer can fail with a custom enum type: link pads (PadLinkReturn), pushing a buffer (FlowReturn), changing an element’s state (StateChangeReturn). Previously handling this was not as convenient as the usual Result-based error handling in Rust. With this release, all these types provide a function into_result() that allows to convert into a Result that splits the enum into its good and bad cases, e.g. FlowSuccess and FlowError. Based on this, the usual Rust error handling is possible, including usage of the ?-operator. Once the Try trait is stable, it will also be possible to directly use the ?-operator on FlowReturn and the others before conversion into a Result. All these enums are also marked as #[must_use] now, which causes a compiler warning if code is not specifically handling them (which could mean to explicitly ignore them), making it even harder to ignore errors caused by any failures of such operations. In addition, all the examples and tutorials make use of the above now and many examples were ported to the failure crate and implement proper error handling in all situations now, for example the decodebin example. #### Various New API Apart from all of the above, a lot of new API was added. Both for writing GStreamer-based applications, and making that easier, as well as for writing GStreamer plugins in Rust. For the latter, the gst-plugin-rs repository with various crates (and plugins) was ported to the GStreamer bindings and completely rewritten, but more on that in another blog post in the next couple of days once the gst-plugin crate is released and published on crates.io. ## November 24, 2017 ### Víctor Jáquez — Intel MediaSDK on Debian (testing) Everybody knows it: install Intel MediaSDK in GNU/Linux is a PITA. With CentOS or Yocto is less cumbersome, if you trust blindly on scripts ran as root. I don’t like CentOS, I feel it like if I were living in the past. I like Debian (testing, of course) and I also wanted to understand a little more about MediaSDK. And this is what I did to have Intel MediaSDK working in Debian/testing. First, I did a pristine installation of Debian testing with a netinst image in my NUC 6i5SYK, with a normal desktop user setup (Gnome3). The madness comes later. Intel’s identifies two types of MediaSDK installation: Gold and Generic. Gold is for CentOS, and Generic for the rest of distributions. Obviously, Generic means you’re on your own. For the purpose of this exercise I used as reference Generic Linux* Intel® Media Server Studio Installation. Let’s begin by grabbing the Intel® Media Server Studio – Community Edition. You will need to register yourself and accept the user agreement, because this is proprietary software. At the end, you should have a tarball named MediaServerStudioEssentials2017R3.tar.gz #### Extract the files for Generic instalation $ cd ~
$tar xvf MediaServerStudioEssentials2017R3.tar.gz$ cd MediaServerStudioEssentials2017R3
$tar xvf SDK2017Production16.5.2.tar.gz$ cd SDK2017Production16.5.2/Generic
$mkdir tmp$ tar -xvC tmp -f intel-linux-media_generic_16.5.2-64009_64bit.tar.gz


### Kernel

Bad news: in order to get MediaSDK working you need to patch the mainlined kernel.

Worse news: the available patches are only for the version 4.4 the kernel.

Still, systemd works on 4.4, as far as I know, so it would not be a big problem.

##### Grab building dependencies
$sudo apt install build-essential devscripts libncurses5-dev$ sudo apt build-dep linux


#### Grab kernel source

I like to use the sources from the git repository, since it would be possible to do some rebasing and blaming in the future.

$cd ~$ git clone https://github.com/torvalds/linux.git
...
$git pull -v --tags$ git checkout -b 4.4 v4.4


#### Extract MediaSDK patches

$cd ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/patches/kmd/4.4$ tar xvf intel-kernel-patches.tar.bz2
$git am 0002-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch  TODO: Shall I need to modify the EXTRAVERSION string in kernel’s Makefile? #### Build and install the kernel Notice that we are using our current kernel configuration. That is error prone. I guess that is why I had to select NVM manually. $ cp /boot/config-4.12.0-1-amd64 ./.config
$make olddefconfig$ make nconfig # -- select NVM
$scripts/config --disable DEBUG_INFO$ make deb-pkg
...
$sudo dpkg -i linux-image-4.4.0+_4.4.0+-2_amd64.deb linux-headers-4.4.0+_4.4.0+-2_amd64.deb linux-firmware-image-4.4.0+_4.4.0+-2_amd64.deb  #### Configure GRUB2 to boot Linux 4.4. by default This part was absolutely tricky for me. It took me a long time to figure out how to specify the kernel ID in the grubenv. $ sudo vi /etc/default/grub


And change the line GRUB_DEFAULT=saved. By default it is set to 0. And update GRUB.

$sudo update-grub  Now look for the ID of the installed kernel image in /etc/grub/grub.cfg and use it: $ sudo grub-set-default "gnulinux-4.4.0+-advanced-2c246bc6-65bb-48ea-9517-4081b016facc>gnulinux-4.4.0+-advanced-2c246bc6-65bb-48ea-9517-4081b016facc"


Please note it is twice and separated by a >. Don’t ask me why.

#### Copy MediaSDK firmware (and libraries too)

I like to use rsync rather normal cp because there are the options like --dry-run and --itemize-changes to verify what I am doing.

$cd ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp$ sudo rsync -av --itemize-changes ./lib /
$sudo rsync -av --itemize-changes ./opt/intel/common /opt/intel$ sudo rsync -av --itemize-changes ./opt/intel/mediasdk/{include,lib64,plugins} /opt/intel/mediasdk


All these directories contain blobs that do the MediaSDK magic. They are dlopened by hard coded paths by mfx_dispatch, which will be explain later.

In /lib lives the firmware (kernel blob).

In /opt/intel/common… I have no idea what are those shared objects.

In /opt/intel/mediasdk/include live header files for programming an compilation.

In /opt/intel/mediasdk/lib64 live the driver for the modified libva (iHD) and other libraries.

In /opt/intel/mediasdk/plugins live, well, plugins…

In conclusion, all these bytes are darkness and mystery.

$sudo systemctl reboot  The system should boot, automatically, in GNU/Linux 4.4. Please, log with Xorg, not in Wayland, since it is not supported, as far as I know. ### GStreamer For compiling GStreamer I will use gst-uninstalled. Someone may say that I should use gst-build because is newer and faster, but I feel more comfortable doing the following kind of hacks with the old&good autotools. Basically this is a reproduction of Quick-start guide to gst-uninstalled for GStreamer 1.x. $ sudo apt build-dep gst-plugins-{base,good,bad}1.0
$wget https://cgit.freedesktop.org/gstreamer/gstreamer/plain/scripts/create-uninstalled-setup.sh -q -O - | sh  I will modify the gst-uninstalled script, and keep it outside of the repository. For that I will use the systemd file-hierarchy spec for user’s executables. $ cd ~/gst
$mkdir -p ~/.local/bin$ mv master/gstreamer/scripts/gst-uninstalled ~/.local/bin
$ln -sf ~/.local/bin/gst-uninstalled ./gst-master  Do not forget to edit your ~/.profile to add ~/.local/bin in the environment variable PATH. #### Patch ~/.local/bin/gst-uninstalled The modifications are to handle the three dependencies libraries that are required by MediaSDK: libdrm, libva and mfx_dispatch. diff --git a/scripts/gst-uninstalled b/scripts/gst-uninstalled index 81f83b6c4..d79f19abd 100755 --- a/scripts/gst-uninstalled +++ b/scripts/gst-uninstalled @@ -122,7 +122,7 @@ GI_TYPELIB_PATH=$GST/gstreamer/gst:$GI_TYPELIB_PATH export LD_LIBRARY_PATH export DYLD_LIBRARY_PATH export GI_TYPELIB_PATH - + export PKG_CONFIG_PATH="\$GST_PREFIX/lib/pkgconfig\
:$GST/gstreamer/pkgconfig\ @@ -140,6 +140,9 @@$GST_PREFIX/lib/pkgconfig\
:$GST/orc\ :$GST/farsight2\
:$GST/libnice/nice\ +:$GST/drm\
+:$GST/libva/pkgconfig\ +:$GST/mfx_dispatch\
${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"

export GST_PLUGIN_PATH="\
@@ -227,6 +230,16 @@ export GST_VALIDATE_APPS_DIR=$GST_VALIDATE_APPS_DIR:$GST/gst-editing-services/te
export GST_VALIDATE_PLUGIN_PATH=$GST_VALIDATE_PLUGIN_PATH:$GST/gst-devtools/validate/plugins/
export GIO_EXTRA_MODULES=$GST/prefix/lib/gio/modules:$GIO_EXTRA_MODULES

+# MediaSDK
+export LIBVA_DRIVERS_PATH=/opt/intel/mediasdk/lib64
+export LIBVA_DRIVER_NAME=iHD
+export LD_LIBRARY_PATH="\
+/opt/intel/common/mdf/lib64\
+:$GST/drm/.libs\ +:$GST/drm/intel/.libs\
+:$GST/libva/va/.libs\ +:$LD_LIBRARY_PATH"
+


Now, initialize the gst-uninstalled environment:

$cd ~/gst$ ./gst-master

##### libdrm

Grab libdrm from its repository and switch to the branch with the supported version by MediaSDK.

$cd ~/gst/master$ git clone git://anongit.freedesktop.org/mesa/drm
$cd drm$ git checkout -b intel libdrm-2.4.67


Extract the distributed tarball in the cloned repository.

$tar -xv --strip-components=1 -C . -f ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/libdrm/2.4.67-64009/libdrm-2.4.67.tar.bz2  Then we could check the big delta between upstream and the changes done by Intel for MediaSDK. Let’s put it in a commit for later rebases. $ git add -u
$git add .$ git commit -m "mediasdk changes"


Get build dependencies and compile.

$sudo apt build-dep libdrm$ ./configure
$make -j8  Since the pkgconfig files (*.pc) of libdrm are generated to work installed, it is needed to modify them in order to work uninstalled. $ prefix=${HOME}/gst/master/drm$ sed -i -e "s#^libdir=.*#libdir=${prefix}/.libs#"${prefix}/*.pc
$sed -i -e "s#^includedir=.*#includedir=${prefix}#" ${prefix}/*.pc  In order to C preprocessor could find the uninstalled libdrm header files we need to make them available in the expected path according to the pkgconfig file and right now they are not there. To fix that it is possible to create proper symbolic links. $ cd ~/gst/master/drm
$ln -s include/drm/ libdrm  #### libva This modified a version of libva. These modifications messed a bit with the opensource version of libva, because Intel decided not to prefix the library, or some other strategy. In gstreamer-vaapi we had to blacklist VA-API version 0.99, because it is the version number, arbitrary set, of this modified version of libva for MediaSDK. Again, grab the original libva from repo and change the branch aiming to the divert point. It was difficult to find the divert commit id since even the libva version number was changed. Doing some archeology I guessed the branch point was in version 1.0.15, but I’m not sure. $ cd ~/gst/master
$git clone https://github.com/01org/libva.git$ cd libva
$git checkout -b intel libva-1.0.15$ tar -xv --strip-components=1 -C . -f ~/MediaServerStudioEssentials2017R3/SDK2017Production16.5.2/Generic/tmp/opt/intel/mediasdk/opensource/libva/1.67.0.pre1-64009/libva-1.67.0.pre1.tar.bz2
$git add -u$ git add .
$git commit -m "mediasdk"  Before compile, verify that Makefile is going to link against the uninstalled libdrm. You can do that by grepping for LIBDRM in Makefile. Get compilation dependencies and build. $ sudo apt build-dep libva
$./configure$ make -j8


Moidify the pkgconfig files for uninstalled

$prefix=${HOME}/gst/master/libva
$sed -i -e "s#^libdir=.*#libdir=${prefix}/va/.libs#" ${prefix}/pkgconfig/*.pc$ sed -i -e "s#^includedir=.*#includedir=${prefix}#"${prefix}/pkgconfig/*.pc


$cd ~/gst/master/libva/va$ ln -sf drm/va_drm.h


#### mfx_dispatch

This static library which must be linked with MediaSDK applications. In our case, to the GStreamer plugin.

According to its documentation (included in the tarball):

the dispatcher is a layer that lies between application and the SDK implementations. Upon initialization, the dispatcher locates the appropiate platform-specific SDK implementation. If there is none, it will select the software SDK implementation. The dispatcher will redirect subsequent function calls to the same functions in the selected SDK implementation.

In the tarball there is the source of the mfx_dispatcher, but it only compiles with cmake. I have not worked with cmake on uninstalled setups, but we are lucky, there is a repository with autotools support:

$cd ~/gst/master$ git clone https://github.com/lu-zero/mfx_dispatch.git


And compile. After running ./configure it is better to confirm, grepping the generated Makefie, that the uninstalled versions of libdrm and libva are going to be used.

$autoreconf --install$ ./configure
$make -j8  Finally, just as the other libraries, it is required to fix the pkgconfig files:d $ prefix=${HOME}/gst/master/mfx_dispatch$ sed -i -e "s#^libdir=.*#libdir=${prefix}/.libs#"${prefix}/*.pc
$vainfo libva info: VA-API version 0.99.0 libva info: va_getDriverName() returns 0 libva info: User requested driver 'iHD' libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so libva info: Found init function __vaDriverInit_0_32 libva info: va_openDriver() returns 0 vainfo: VA-API version: 0.99 (libva 1.67.0.pre1) vainfo: Driver version: 16.5.2.64009-ubit vainfo: Supported profile and entrypoints VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice VAProfileH264ConstrainedBaseline: <unknown entrypoint> VAProfileH264ConstrainedBaseline: <unknown entrypoint> VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSlice VAProfileH264Main : <unknown entrypoint> VAProfileH264Main : <unknown entrypoint> VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSlice VAProfileH264High : <unknown entrypoint> VAProfileH264High : <unknown entrypoint> VAProfileMPEG2Simple : VAEntrypointEncSlice VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointEncSlice VAProfileMPEG2Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileJPEGBaseline : VAEntrypointVLD VAProfileJPEGBaseline : VAEntrypointEncPicture VAProfileVP8Version0_3 : VAEntrypointEncSlice VAProfileVP8Version0_3 : VAEntrypointVLD VAProfileVP8Version0_3 : <unknown entrypoint> VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointEncSlice VAProfileVP9Profile0 : <unknown entrypoint> <unknown profile> : VAEntrypointVideoProc VAProfileNone : VAEntrypointVideoProc VAProfileNone : <unknown entrypoint>  It works! #### Compile GStreamer I normally make a copy of ~/gst/master/gstreamer/script/git-update.sh in ~/.local/bin in order to modify it, like adding support for ccache, disabling gtkdoc and gobject-instrospections, increase the parallel tasks, etc. But that is out of the scope of this document. $ cd ~/gst/master/
$./gstreamer/scripts/git-update.sh  Everything should be built without issues and, at the end, we could test if the gst-msdk elements are available: $ gst-inspect-1.0 msdk
Plugin Details:
Name                     msdk
Description              Intel Media SDK encoders
Version                  1.13.0.1
Source release date      2017-11-23 16:39 (UTC)
Binary package           GStreamer Bad Plug-ins git
Origin URL               Unknown package origin

msdkh264dec: Intel MSDK H264 decoder
msdkh264enc: Intel MSDK H264 encoder
msdkh265dec: Intel MSDK H265 decoder
msdkh265enc: Intel MSDK H265 encoder
msdkmjpegdec: Intel MSDK MJPEG decoder
msdkmjpegenc: Intel MSDK MJPEG encoder
msdkmpeg2enc: Intel MSDK MPEG2 encoder
msdkvp8dec: Intel MSDK VP8 decoder
msdkvp8enc: Intel MSDK VP8 encoder

9 features:
+-- 9 elements


Great!

Now, let’s run a simple pipeline. Please note that gst-msdk elements have rank zero, then they will not be autoplugged, it is necessary to craft the pipeline manually:

\$ gst-launch-1.0 filesrc location= ~/test.264 ! h264parse ! msdkh264dec ! videoconvert ! xvimagesink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
libva info: VA-API version 0.99.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Got EOS from element "pipeline0".
Execution ended after 0:00:02.502411331
Setting pipeline to PAUSED ...
Setting pipeline to NULL ...
Freeing pipeline ...


\o/

## November 20, 2017

### GStreamer — Orc 0.4.28 bug-fix release

The GStreamer team is pleased to announce another maintenance bug-fix release of liborc, the Optimized Inner Loop Runtime Compiler. Main changes since the previous release:

• Numerous undefined behaviour fixes
• Ability to disable tests
• Fix meson dist behaviour

## November 17, 2017

Earlier this year I worked on a certain GStreamer plugin that is called “ipcpipeline”. This plugin provides elements that make it possible to interconnect GStreamer pipelines that run in different processes.  In this blog post I am going to explain how this plugin works and the reason why you might want to use it in your application.

## Why ipcpipeline?

In GStreamer, pipelines are meant to be built and run inside a single process. Normally one wouldn’t even think about involving multiple processes for a single pipeline. You can (and should) involve multiple threads, of course, which is easily done using the queue element, in order to do parallel processing. But since you can involve multiple threads, why would you want to involve multiple processes as well?

Splitting part of a pipeline to a different process is useful when there is one or more elements that need to be isolated for security reasons. Imagine the case where you have an application that uses a hardware video decoder and therefore has device access privileges. Also imagine that in the same pipeline you have elements that download and parse video content directly from a network server, like most Video On Demand applications would do. Although I don’t mean to say that GStreamer is not secure, it can be a good idea to think ahead and make it as hard as possible for an attacker to take advantage of potential security flaws. In theory, maybe someone could exploit a bug in the container parser by sending it crafted data from a fake server and then take control of other things by exploiting those device access privileges, or cause a system crash. ipcpipeline could help to prevent that.

## How does it work?

In the – oversimplified – diagram below we can see how the media pipeline in a video player would look like with GStreamer:

With ipcpipeline, this pipeline can be split into two processes, like this:

As you can see, the split mainly involves 2 elements: ipcpipelinesink, which serves as the sink for the first pipeline, and ipcpipelinesrc, which serves as the source for the second pipeline. These two elements internally talk to each other through a unix pipe or socket, transferring buffers, events, queries and messages over this socket, thus linking the two pipelines together.

This mechanism doesn’t look very special, though. You might be wondering at this point, what is the difference between using ipcpipeline and some other existing mechanism like a pair of fdsink/fdsrc or udpsink/udpsrc or RTP? What is special about these elements is that the two pipelines behave as if they were a single pipeline, with the elements of the second one being part of a GstBin in the first one:

The diagram above illustrates how you can think of a pipeline that uses the ipcpipeline mechanism. As you can see, ipcpipelinesink behaves as a GstBin that contains the whole remote pipeline. This practically means that whenever you change the state of ipcpipelinesink, the remote pipeline’s state changes as well. It also means that all messages, events and queries that make sense are forwarded from one pipeline to the other, trying to implement as closely as possible the behavior that a GstBin would have.

This design practically allows you to modify an existing application to use this split-pipeline mechanism without having to change the pipeline control logic or implement your own IPC for controlling the second pipeline. It is all integrated in the mechanism already.

ipcpipeline follows a master-slave design. The pipeline that controls the state changes of the other pipeline is called the “master”, while the other one is called the “slave”. In the above example, the pipeline that contains the ipcpipelinesink element is the “master”, while the other one is the “slave”. At the moment of writing, the opposite setup is not implemented, so it’s always the downstream part of the pipeline that can be slaved and ipcpipelinesink is always the “master”.

While it is possible to have only one “master” pipeline, it is possible to have multiple “slave” ones. This allows, for example, to split an audio decoder and a video decoder into different processes:

It is also possible to have multiple ipcpipelinesink elements connect to the same slave pipeline. In this case, the slave pipeline will follow the state that is closest to PLAYING between the two states that it will get from the two ipcpipelinesinks. Also, messages from the slave pipeline will only be forwarded through one of the two ipcpipelinesinks, so you will not notice any duplicate messages. Behavior should be exactly the same as in the split slaves scenario.

## Where is the code?

ipcpipeline is part of the GStreamer bad plugins set (here). Documentation is included with the code and there are also some examples that you can try out to get familiar with it. Happy hacking!

## November 01, 2017

### Sebastian Pölsterl — scikit-survival 0.4 released and presented at PyCon UK 2017

scikit-survival 0.4 released and presented at PyCon UK 2017

I'm pleased to announce that scikit-survival version 0.4 has been released.

This release adds CoxnetSurvivalAnalysis, which implements an efficient algorithm to fit Cox’s proportional hazards model with LASSO, ridge, and elastic net penalty. This allows fitting a Cox model to high-dimensional data and perform feature selection. Moreover, it includes support for Windows with Python 3.5 and later by making the cvxopt package optional.

conda install scikit-survival

or via pip (all platforms):

pip install -U scikit-survival

## PyCon UK

Last week, I presented an Introduction to Survival Analysis with scikit-survival at PyCon UK in Cardiff in front of a packed audience of genuinely interested people. I hope some people will give scikit-survial a try and use it in their work.

The slides of my presentation are available at https://k-d-w.org/pyconuk-2017/.

sebp Wed, 11/01/2017 - 23:29

### Sebastian Pölsterl — scikit-survival 0.4 released and presented at PyCon UK 2017

scikit-survival 0.4 released and presented at PyCon UK 2017

I'm pleased to announce that scikit-survival version 0.4 has been released.

This release adds CoxnetSurvivalAnalysis, which implements an efficient algorithm to fit Cox’s proportional hazards model with LASSO, ridge, and elastic net penalty. This allows fitting a Cox model to high-dimensional data and perform feature selection. Moreover, it includes support for Windows with Python 3.5 and later by making the cvxopt package optional.

conda install scikit-survival

or via pip (all platforms):

pip install -U scikit-survival

## PyCon UK

Last week, I presented an Introduction to Survival Analysis with scikit-survival at PyCon UK in Cardiff in front of a packed audience of genuinely interested people. I hope some people will give scikit-survial a try and use it in their work.

The slides of my presentation are available at https://k-d-w.org/pyconuk-2017/.

sebp Wed, 11/01/2017 - 23:29

## October 30, 2017

### Víctor Jáquez — GStreamer Conference 2017

This year, the GStreamer Conference happened in Prague, along with the traditional autumn Hackfest.

Prague is a beautiful city, though this year I couldn’t visit it as much as I wanted, since the Embedded Linux Conference Europe and the Open Source Summit also took place there, and Igalia, being a Linux Foundation sponsor, had a booth in the venue, where I talked about our work with WebKit, Snabb, and obviously, GStreamer.

But, let’s back to the GStreamer Hackfest and Conference.

One of the features that I like the most of the GStreamer project is its community, the people involved in it, by writing code, sharing their work with many others. They might appear a bit tough at beginning (or at least that looked to me) but in real they are all kind and talented persons. And I’m proud of consider myself part of this community. Nonetheless it has a diversity problem, as many other Open Source communities.

During the Hackfest, Hyunjun and I, met with Sree and talked about the plans for GStreamer-VAAPI, the new features in VA-API and libva and how we could map them to the GStreamer’s design. Also we talked about the future developments in the msdk elements, merged one year ago in gst-plugins-bad. Also, I talked a bit with Nicolas Dufresne regarding kmssink and DMABuf.

In the Conference, which happened in the same venue as the hackfest, I talked wit the authors of gstreamer-media-SDK. They are really energetic.

I delivered my usual talk about GStreamer-VAAPI. You can find the slides, as a web presentation, here. Also, as every year, our friends of Ubicast, recorded the talks, and made them available for streaming almost instantaneously:

My colleague Enrique talked in the Conference about the Media Source Extensions (MSE) on WebKit, and Hyunjun shared his experience with VA-API on Rust.

Also, in the conference venue, we showed a couple demos. One of them was a MinnowBoard running WPE, rendering videos from YouTube using gstreamer-vaapi to decode video.

## October 25, 2017

### Sebastian Dröge — Multi-threaded raw video conversion and scaling in GStreamer

Another new feature that landed in GStreamer already a while ago, and is included in the 1.12 release, is multi-threaded raw video conversion and scaling. The short story is that it lead to e.g. 3.2x speed-up converting 1080p video to 4k with 4 cores.

I had a few cases where a single core was not able to do rescaling in real-time anymore, even on a quite fast machine. One of the cases was 60fps 4k video in the v210 (10 bit YUV) color format, which is a lot of bytes per second in a not very processing-friendly format. GStreamer’s video converter and scaler is already quite optimized and using SIMD instructions like SSE or Neon, so there was not much potential for further optimizations in that direction.
However basically every machine nowadays has multiple CPU cores that could be used and raw video conversion/scaling is an almost perfectly parallelizable problem, and the way how the conversion code was already written it was relatively easy to add.

The way it works now is similar to the processing model of libraries like OpenMP or Rayon. The whole work is divided into smaller, equal sub-problems that are then handled in parallel, then it is waiting until all parts are done and the result is combined. In our specific case that means that each plane of the video frame is cut into 2, 4, or more slices of full rows, which are then converted separately. The “combining” step does not exist, all sub-conversions are directly written to the correct place in the output already.

As a small helper object for this kind of processing model, I wrote GstParallelizedTaskRunner which might also be useful for other pieces of code that want to do the same.

In the end it was not much work, but the results were satisfying. For example the conversion of 1080p to 4k video in the v210 color format with 4 threads gave a speedup of 3.2x. At that point it looks like the main bottleneck was memory bandwidth, but I didn’t look closer as this is already more than enough for the use cases I was interested in.

## October 21, 2017

### Sebastian Dröge — Rendering HTML5 video in Servo with GStreamer

At the Web Engines Hackfest in A Coruña at the beginning of October 2017, I was working on adding some proof-of-concept code to Servo to render HTML5 videos with GStreamer. For the impatient, the results can be seen in this video here

And the code can be found here and here.

##### Details

Servo is Mozilla‘s experimental browser engine written in Rust, optimized for high-performance, parallelized rendering. Some of the parts of Servo are being merged in Firefox as part of the Project Quantum, and already provide a lot of performance and stability improvements there.

During the hackfest I actually spent most of the time trying to wrap my head around the huge Servo codebase. It seems very well-structured and designed, exactly what you would expect from starting such a project from scratch by a company that has decades of experience writing browser engines already. After also having worked on WebKit in the past, I would say that you can see the difference of a legacy codebase from the end of the 90s and something written in a modern language with modern software engineering practices.

To the actual implementation of HTML5 video rendering via GStreamer, I actually started on top of the initial implementation that Philippe Normand started before already. That one was rendering the video in a separate window though, and did not work with the latest version of Servo anymore. I cleaned it up and made it work again (probably the best task you can do to learn a new codebase), and then added support for actually rendering the video inside the web view.

This required quite a few additions on the Servo side, some of which are probably more hacks than anything else, but from the GStreamer-side is was extremely simple. In Servo currently all the infrastructure for media rendering is still missing, while GStreamer has more than a decade of polishing for making integration into other software as easy as possible.

All the GStreamer code was written with the GStreamer Rust bindings, containing not a single line of unsafe code.

As you can see from the above video, the results work quite well already. Media controls or anything more fancy are not working though. Also rendering is currently done completely in software, and a RGBA frame is then uploaded via OpenGL to the GPU for rendering. However, hardware codecs can already be used just fine, and basically every media format out there is supported.

##### Future

While this all might sound great, unfortunately Mozilla’s plans for media support in Servo are different. They’re planning to use the C++ Firefox/Gecko media backend instead of GStreamer. Best to ask them for reasons, I would probably not repeat them correctly.

Nonetheless, I’ll try to keep the changes updated with latest Servo and once they add more things for media support themselves add the corresponding GStreamer implementations in my branch. It still provides value for both showing that GStreamer is very well capable of handling web use cases (which it already showed in WebKit), as well as being a possibly better choice for people trying to use Servo on embedded systems or with hardware codecs in general. But as I’ll have to work based on what they do, I’m not going to add anything fundamentally new myself at this point as I would have to rewrite it around whatever they decide for the implementation of it anyway.

Also once that part is there, having GStreamer directly render to an OpenGL texture would be added, which would allow direct rendering with hardware codecs to the screen without having the CPU worry about all the raw video data.

But for now, it’s waiting until they catch up with the Firefox/Gecko media backend.

## October 20, 2017

### Sebastian Dröge — DASH trick-mode playback in GStreamer: Fast-forward/rewind without saturating your network and CPU

GStreamer now has support for I-frame-only (aka keyframe) trick mode playback of DASH streams. It works only on DASH streams with ISOBMFF (aka MP4) fragments, and only if these contain all the required information. This is something I wanted to blog about since many months already, and it’s even included in the GStreamer 1.10 release already.

When trying to play back a DASH stream with rates that are much higher than real-time (say 32x), or playing the streams in reverse, you can easily run into various problems. This is something that was already supported by GStreamer in older versions, for both DASH streams as well as local files or HLS streams but it’s far from ideal. What would happen is that you usually run out of available network bandwidth (you need to be able to download the stream 32x faster than usual), or out of CPU/GPU resources (it needs to be decoded 32x faster than usual) and even if all that works, there’s no point in displaying 960 (30fps at 32x) frames per second.

To get around that, GStreamer 1.10 can now (if explicitly requested with GST_SEEK_FLAG_TRICKMODE_KEY_UNITS) only download and decode I-frames. Depending on the distance of I-frames in the stream and the selected playback speed, this looks more or less smooth. Also depending on that, this might still yield to many frames to be downloaded or decoded in real-time, so GStreamer also measures the distance between I-frames, how fast data can be downloaded and whether decoders and sinks can catch up to decide whether to skip over a couple of I-frames and maybe only download every third I-frame.

If you want to test this, grab the playback-test from GStreamer, select the trickmode key-units mode, and seek in a DASH stream while providing a higher positive or negative (reverse) playback rate.

Let us know if you run into any problems with any specific streams!

##### Short Implementation Overview

From an implementation point of view this works by having the DASH element in GStreamer (dashdemux) not only download the ISOBMFF fragments but also parses the headers of each to get the positions and distances of each I-frame in the fragment. Based on that it then decides which ones to download or whether to skip ahead one or more fragments. The ISOBMFF headers are then passed to the MP4 demuxer (qtdemux), followed by discontinuous buffers that only contain the actual I-frames and nothing else. While this sounds rather simple from an high-level point of view, getting this all right in the details was the result of a couple of months of work by Edward Hervey and myself.

Currently the heuristics for deciding which I-frames to download and how much to skip ahead are rather minimal, but it’s working fine in many situations already. A lot of tuning can still be done though, and some streams are working less well than others which can also be improved.

## October 19, 2017

### Christian Schaller — Looking back at Fedora Workstation so far

So I have over the last few years blogged regularly about upcoming features in Fedora Workstation. Well I thought as we putting the finishing touches on Fedora Workstation 27 I should try to look back at everything we have achieved since Fedora Workstation was launched with Fedora 21. The efforts I highlight here are efforts where we have done significant or most development. There are of course a lot of other big changes that has happened over the last few years by the wider community that we leveraged and offer in Fedora Workstation, examples here include things like Meson and Rust. This post is not about those, but that said I do want to write a post just talking about the achievements of the wider community at some point, because they are very important and crucial too. And along the same line this post will not be speaking about the large number of improvements and bugfixes that we contributed to a long list of projects, like to GNOME itself. This blog is about taking stock and taking some pride in what we achieved so far and major hurdles we past on our way to improving the Linux desktop experience.
This blog is also slightly different from my normal format as I will not call out individual developers by name as I usually do, instead I will focus on this being a totality and thus just say ‘we’.

• Wayland – We been the biggest contributor since we joined the effort and have taken the lead on putting in place all the pieces needed for actually using it on a desktop, including starting to ship it as our primary offering in Fedora Workstation 25. This includes putting a lot of effort into ensuring that XWayland works smoothly to ensure full legacy application support.
• Libinput – A new library we created for handling all input under both X and Wayland. This came about due to needing input handling that was not tied to X due to Wayland, but it has even improved input handling for X itself. Libinput is being rapidly developed and improved, with 1.9 coming out just a few days ago.
• glvnd – Dealing with multiple OpenGL implementations have been a pain under Linux for years. We worked with NVidia on this effort to ensure that you can install multiple OpenGL implementations on the system and have your system be able to use the correct one depending on which GPU and driver you are using. We keep expanding on this solution to cover more usecases, so for Fedora Workstation 27 we expect to bring glvnd support to XWayland for instance.
• Porting Firefox to GTK3 – We ported Firefox to GTK3, including making sure it works under Wayland. This work also provided the foundation for HiDPI support in Firefox. We are the single biggest contributor to Firefox Linux support.
• Porting LibreOffice to GTK3 – We ported LibreOffice to GTK3, which included Wayland support, touch support and HiDPI support. Our team is one of the major contributors to LibreOffice and help the project forward on a lot of fronts.
• Google Drive integration – We extended the general Google integration in GNOME 3 to include support for Google Drive as we found that a lot of our users where relying on Google Apps at their work.
• Flatpak – We created Flatpak to lead the way in moving desktop applications into their own namespaces and containers, resolving a lot of long term challenges for desktop applications on Linux. We expect to have new infrastructure in place in Fedora soon to allow Fedora packagers to quickly and easily turn their applications into Flatpaks.
• Linux Firmware Service – We created the Linux Firmware service to provide a way for Linux users to get easy access to UEFI firmware on their linux system and worked with great vendors such as Dell and Logitech to get them to support it for their devices. Many bugs experienced by Linux users over the years could have been resolved by firmware updates, but with tooling being spotty many Linux users where not even aware that there was fixes available.
• GNOME Software – We created GNOME Software to give us a proper Software Store on Fedora and extended it over time to include features such as fonts, GStreamer plugins, GNOME Shell extensions and UEFI firmware updates. Today it is the main Store type application used not just by us, but our work has been adopted by other major distributions too.
• mp3, ac3 and aac support – We have spent a lot of time to be able to bring support for some of the major audio codecs to Fedora like MP3, AC3 and AAC. In the age of streaming supporting codecs is maybe of less importance than it used to be, but there is still a lot of media on peoples computers they need and want access to.
• Fedora Media Creator – Cross platform media creator making it very easy to create Fedora Workstation install media regardless of if you are on Windows, Mac or Linux. As we move away from optical media offering ISO downloads started feeling more and more outdated, with the media creator we have given a uniform user experience to quickly create your USB install media, especially important for new users coming in from Windows and Mac environments.
• Captive portal – We added support for captive portals in Network Manager and GNOME 3, ensuring easy access to the internet over public wifi networks. This feature has been with us for a few years now, but it is still a much appreciated addition.
• HiDPI support – We worked to add support for HiDPI across X, Wayland, GTK3 and GNOME3. We lead the way on HiDPI support under Linux and keep working on various applications to this date to polish up the support.
• Touch support – We worked to add support for touchscreens across X, Wayland, GTK3 and GNOME3. We spent significant resources enabling this, both on laptop touchscreens, but also to support modern wacom devices.
• QGNOME Platform – We created the QGNOME Platform to ensure that Qt applications work well under GNOME3 and gives a nice native and integrated feel. So while we ship GNOME as our desktop offering we want Qt applications to work well and feel native. This is an ongoing effort, but for many important applications it already is a great improvement.
• Nautilus improvements. Nautilus had been undermaintained for quite a while so we had Carlos Soriano spend significant time on reworking major parts of it and adding new features like renaming multiple files at ones, updating the views and in general bring it up to date.
• Night light support in GNOME – We added support for automatic adjusting the color and light settings on your system based on light sensors found in modern laptops. This integrated functionality that you before had to install extra software like Red Shift to enable.
• libratbag – We created a library that enable easy configuration of high end mice and other kind of input devices. This has led to increased collaboration with a lot of gaming mice manufacturers to ensure full support for their devices under Linux.
• RADV – We created a full open source Vulkan implementation for ADM GPUs which recently got certified as Vulkan compliant. We wanted to give open source Vulkan a boost, so we created the RADV project, which now has an active community around it and is being tested with major games.
• GNOME Shell performance improvements – We been working on various performance improvements to GNOME Shell over the last few years, with significant improvements having happened. We want to push the envelope on this further though and are planning a major performance hackfest around Shell performance and resource usage early next year.
• GNOME terminal developer improvements – We worked to improve the features of GNOME Terminal to make it an even better tool for developers with items such as easier naming of terminals and notifications for long running jobs.
• GNOME Builder – Improving the developer story is crucial for us and we been doing a lot of work to make GNOME Builder a great tool for developer to use to both improve the desktop itself, but also development in general.
• Pipewire – We created a new media server to unify audio, pro-audio and video. First version which we are shipping in Fedora 27 to handle our video capture.
• Fleet Commander – We launched Fleet Commander our new tool for managing large Linux desktop deployments. This answer a long standing call from many of Red Hats major desktop customers and many admins of large scale linux deployments at Universities and similar for a powerful yet easy to use administration tool for large desktop deployments.

I am sure I missed something, but this is at least a decent list of Fedora Workstation highlights for the last few years. Next onto working on my Fedora Workstation 27 blogpost :)

## October 18, 2017

### Christian Schaller — Fleet Commander ready for takeoff!

Alberto Ruiz just announced Fleet Commander as production ready! Fleet Commander is our new tool for managing large deployments of Fedora Workstation and RHEL desktop systems. So get our to Albertos Fleet Commander blog post for all the details.

## October 17, 2017

### Enrique Ocaña González — Attending the GStreamer Conference 2017

This weekend I’ll be in Node5 (Prague) presenting our Media Source Extensions platform implementation work in WebKit using GStreamer.

The Media Source Extensions HTML5 specification allows JavaScript to generate media streams for playback and lets the web page have more control on complex use cases such as adaptive streaming.

My plan for the talk is to start with a brief introduction about the motivation and basic usage of MSE. Next I’ll show a design overview of the WebKit implementation of the spec. Then we’ll go through the iterative evolution of the GStreamer platform-specific parts, as well as its implementation quirks and challenges faced during the development. The talk continues with a demo, some clues about the future work and a final round of questions.

Our recent MSE work has been on desktop WebKitGTK+ (the WebKit version powering the Epiphany, aka: GNOME Web), but we also have MSE working on WPE and optimized for a Raspberry Pi 2. We will be showing it in the Igalia booth, in case you want to see it working live.

I’ll be also attending the GStreamer Hackfest the days before. There I plan to work on webm support in MSE, focusing on any issue in the Matroska demuxer or the vp9/opus/vorbis decoders breaking our use cases.

See you there!

UPDATE 2017-10-22:

The talk slides are available at https://eocanha.org/talks/gstconf2017/gstconf-2017-mse.pdf and the video is available at https://gstconf.ubicast.tv/videos/media-source-extension-on-webkit (the rest of the talks here).

## October 12, 2017

### Christian Schaller — AAC support will be available in Fedora Workstation 27!

So I am really happy to announce another major codec addition to Fedora Workstation 27 namely the addition of the codec called AAC. As you might have seen from Tom Callaways announcement this has just been cleared for inclusion in Fedora.

For those not well versed in the arcane lore of audio codecs AAC is the codec used for things like iTunes and is found in a lot of general media files online. AAC stands for Advanced Audio Coding and was created by the MPEG working group as the successor to mp3. Especially due to Apple embracing the format there is a lot of files out there using it and thus we wanted to support it in Fedora too.

What we will be shipping in Fedora is a modified version of the AAC implementation released by Google, which was originally written by Frauenhoffer. On top of that we will of course be providing GStreamer plugins to enable full support for playing and creating AAC files for GStreamer applications.

Be aware though that AAC is a bit of an umbrella term for a lot of different technologies and thus you might be able to come across files that claims to use AAC, but which we can not play back. The most likely reason for that would be that it requires a AAC profile we do not support. The version of AAC that we will be shipping has also be carefully created to fit within the requirements for software in Fedora, so if you are a packager be aware that unlike with for instance mp3, this change does not mean you can package and ship any AAC implementation you want to in Fedora.

I am expecting to have more major codec announcements soon, so stay tuned :)

## October 06, 2017

### Jean-François Fortin Tam — Liberté logicielle et matérielle, compte rendu de l’émission La Sphère du 16 septembre

Le 13 septembre, je reçus un curieux courriel m’invitant à participer à l’émission « La Sphère » pour un épisode dédié au logiciel libre, sur la principale chaîne radiophonique de Radio-Canada le samedi 16 septembre.

Quelques minutes avant le début de l’émission

L’épisode dure environ une heure, et la version baladodiffusion est divisée en divers segments, mais comme on m’a amené à commenter à travers pas mal tous les segments ou presque, je vous invite à écouter l’épisode intégral si le coeur vous en dit.

# Loi de Murphy

Le tout s’est bien déroulé, bien que les sujets potentiels pour lesquels je m’étais préparé ne correspondaient pas aux questions m’étant posées en ondes:

• Ayant reçu quatre thématiques à minuit la veille de l’émission, je rédigeai en vitesse, le matin même—avant de me diriger vers les studios—quelques 1200 mots pour répondre à ces thématiques de façon structurée. Je m’étais donc préparé des points de discussion et exemples clairs à citer—au cas où on m’amènerait à parler de sécurité informatique, d’abus de corporations non-transparentes, ou de la « futilité » perçue du logiciel libre dans un monde où le matériel n’est pas forcément sous notre contrôle…
• Ce document, affiché à l’écran du Librem que j’avais devant moi en studio, n’a finalement pas servi, les discussions ayant pris des tournures complètement différentes. Dès les premières questions, je réalisais que je n’aurais pas l’opportunité de rentrer dans du technique/légal/philo de profondeur, et qu’il fallait donc que je réoriente toute ma stratégie de discussion sur-le-champ. Mes réponses lors de l’émission étaient donc toutes construites en temps réel, dans le feu de l’action.

On m’a parfois lancé des questions stéréotypées—un peu réthoriques certes, mais c’était sans doute pour soulever des questions que le public cible se pose probablement!—me forçant dans une position corrective/défensive (où il fallait que je corrige avant toute chose l’idée reçue avant de pouvoir même envisager parler d’autre chose), mais il est justement pertinent de débusquer ces idées reçues, puisqu’il s’agit d’une émission de vulgarisation pour le grand public…

# Quelques moments de surprise

Durant l’émission, j’ai également flairé quelques propos autour de la table qui n’étaient pas aussi nuancés que je l’aurais souhaité, ou encore des questions m’ayant parfois laissé bouche bée (telles que « Dans le fond, les gens ne contribuent-t-ils pas au libre principalement pour se faire du CV et laisser tomber une fois embauchés? » ainsi que « Si je veux installer un CRM dans une compagnie, je pourrai jamais utiliser un Librem pour le faire » — dans le deuxième cas, j’étais tellement déconcerté de la largesse d’une telle affirmation que je ne pouvais que vaguement répondre « Hum… ça dépend? »)

Ma réaction

Si j’avais pu préparer une réponse à ces deux questions à brûle-pourpoint, j’aurais par exemple voulu:

• dire que personne ne contribue au libre d’une façon ainsi machiavéllique—contribuer au libre est une question de philosophie et d’éthique autant que de méthodologie, et si le coeur n’y est pas les contributions ne seront pas convaincantes, il n’y aurait pas de quoi se bâtir une riche carrière;
• chercher à savoir de quoi mon interlocuteur parlait exactement côté CRM, étant donné que les CRMs libres (ou au moins ouverts) sont multiples et que, généralement, les applications sont majoritairement des infrastructures web aujourd’hui.

J’aurais voulu ouvrir la boîte de pandore (tout le volet sécurité informatique et vie privée, qui est extrêmement riche d’actualités et particulièrement frappant), répondre à toutes les préconceptions, donner des exemples et contre-arguments impeccables, mais il n’y avait pas le temps (comme on peut l’entendre, l’émission en direct a même dû se terminer de façon précipitée). Rendu à un certain point, il faut être conscient des contraintes du flot de conversation et aller au plus simple et direct… sinon, c’est trois heures de discussion qu’il aurait fallu.

# Un résultat positif

Je suis certes perfectionniste (comme vous avez pu le constater ci-haut), mais il reste que c’était une bonne émission. Après tout, remettons la chose en contexte:

• il s’agit ici d’une émission destinée au grand public, et la majorité des auditeurs cibles ne sont pas des experts en informatique;
• à mes yeux, il est quasi miraculeux que la majeure portion d’une heure d’émission sur une chaîne nationale ait été consacrée au sujet du logiciel libre.

Je suis donc tout à fait reconnaissant envers l’équipe de La Sphère d’avoir cherché à faire une vulgarisation du sujet et des enjeux du logiciel libre… dans la sphère publique!