Jul 212018

Here’s a little timeline of some fun we had with the GNOME master Flatpak runtime last week:

  • Tuesday, July 10: a bad runtime build is published.  Trying to start any application results in error while loading shared libraries: libdw.so.1: cannot open shared object file: No such file or directory. Problem is the library is present in org.gnome.Sdk instead of org.gnome.Platform, where it is required.
  • Thursday, July 12:  the bug is reported on WebKit Bugzilla (since it broke Epiphany Technology Preview)
  • Saturday, July 14: having returned from GUADEC, I notice the bug report and bisect the issue to a particular runtime build. Mathieu Bridon fixes the issue in the freedesktop SDK and opens a merge request.
  • Monday, July 16: Mathieu’s fix is committed. We now have to wait until Tuesday for the next build.
  • Tuesday, Wednesday, and Thursday: we deal with various runtime build failures. Each day, we get a new build log and try to fix whatever build failure is reported. Then, we wait until the next day and see what the next failure is. (I’m not aware of any way to build the runtime locally. No doubt it’s possible somehow, but there are no instructions for doing so.)
  • Friday, July 20: we wait. The build has succeeded and the log indicates the build has been published, but it’s not yet available via flatpak update
  • Saturday, July 21: the successful build is now available. The problem is fixed.

As far as I know, it was not possible to run any nightly applications during this two week period, except developer applications like Builder that depend on org.gnome.Sdk instead of the normal org.gnome.Platform. If you used Epiphany Technology Preview and wanted a functioning web browser, you had to run arcane commands to revert to the last good runtime version.

This multi-week response time is fairly typical for us. We need to improve our workflow somehow. It would be nice to be able to immediately revert to the last good build once a problem has been identified, for instance.

Meanwhile, even when the runtime is working fine, some apps have been broken for months without anyone noticing or caring. Perhaps it’s time for a rethink on how we handle nightly apps. It seems likely that only a few apps, like Builder and Epiphany, are actually being regularly used. The release team has some hazy future plans to take over responsibility for the nightly apps (but we have to take over the runtimes first, since those are more important), and we’ll need to somehow avoid these issues when we do so. Having some form of notifications for failed builds would be a good first step.

P.S. To avoid any possible misunderstandings: the client-side Flatpak technology itself very good. It’s only the server-side infrastructure that is problematic here. Clearly we have a lot to fix, but it won’t require any changes in Flatpak.

Michael Catanzaro: On Flatpak Nightlies
Source: Planet Gnome

Jul 202018

Over on the Facebook code site, Daniel Xu announces the release of oomd under the GPLv2. Oomd is a user-space “out of memory” killer that was mentioned in our recent article on the block I/O latency controller and it uses the pressure stall information covered in an even more recent article.

Oomd constantly monitors PSI [Pressure Stall Information] metrics to assess whether a system is under unrecoverable load. PSI alone is insufficient, so oomd also monitors the system holistically. This is in contrast to Linux’s OOM killer, which focuses primarily on the kernel’s concerns. Since OOM detection criteria can vary depending on workload, the plugin system supports customization to both the detection and process kill strategies.

Thanks to this new ability to monitor key system resource indicators, oomd is able to take corrective action in userspace before a system-wide OOM occurs. Corrective action is configured via a flexible plugin system that is capable of executing custom code. Thus, in addition to oomd’s default process SIGKILL behavior, application developers can customize their plugin with alternate strategies, such as sending a ‘back off’ RPC to the main workload or dumping system logs to a remote service.”
Open sourcing oomd, a new approach to handling OOMs
Source: LWN.Net