Keep your libraries updated

Jimmy Angelakos
Jimmy Angelakos
  • Updated

This article discusses the necessity of keeping your software libraries updated.

Updates?

One of the benefits of the open-source world is the unmatched ability to issue updates rapidly to correct bugs and security vulnerabilities.

Software bug fixes are constantly being released, from recently introduced regressions to long-standing bugs undetected for years, they are corrected by a wide range of people, from professionals paid to enhance the software, to individual contributors spotting a problem.

Security updates eliminating threats have been known to roll out in a matter of hours, which is totally unheard of in the proprietary software world.

By not keeping our system up-to-date, we are missing out on these vital improvements of the open-source software.

Repercussions of "if it's not broke, don't fix it"

It's easy to not update our software to later versions and think that a tested and monitored system is stable.

However, systems are tested to the best of our abilities, and undiagnosed problems can be present and missed. We have seen many times, discovery of latent bugs or the triggering of unexpected behaviours in software previously considered to be extremely stable.

Additionally, the more time that passes, the more we'll be locked into the old version, and the harder it will be to upgrade because the products, tools, and dependencies that we interact with will have become incompatible.

Why upgrading doesn't occur

Upgrading may not occur due to many reasons, such as

  • The "if we don't touch it, it won't break" mentality, and prioritising our tests over the open-source contributions.
  • Lack of knowledge about how to upgrade, and test updates frequently.
  • No zero-downtime procedure defined, or reluctance to schedule maintenance windows due to impact of downtime.
  • Difficult deployment procedure.

If we have taken reasonable steps to eliminate the above factors, upgrading will become easier and safer.

Real-world failures

Sometimes skipping an update or two (or four years’ worth of updates) for a library means that we can trigger unexpected breakages in other software that are very hard to troubleshoot. The potential impact of very complex incidents outweighs the impact of having downtime maintenance for updates.

The reason that diagnosing problems with old versions is often twofold:

  1. The bug is not part of the software that it manifests in but comes from an underlying library.
  2. The behaviour is impossible to reproduce unless the outdated library is considered.

Let’s look at two real-world cases where this happened in the context of PostgreSQL:

Scenario 1

The Database appears to function normally. However, when it is backed up with pg_dump and restored elsewhere, the restored database exhibits troubling behaviour. When we create a function in the restored DB, running it results in:

ERROR: cache lookup failed for type 5150

The investigation starts with querying pg_type to find the culprit. However, that proves fruitless as the same database copy, restored elsewhere, exhibits failures for a different TYPE ID. Trying barman restore instead of pg_dump does not fix the problem either.

Some are starting to suspect a deep bug in PostgreSQL that corrupts it's pg_catalog, unlikely as that seems.

The debuggers come out: people go hunting for the bug in Postgres’ code with gdb but a surprising discovery is made: the error is coming not from within Postgres, but from the extension PGAudit.

A quick search confirms that this was triggered by a known bug that was fixed in PGAudit more than a year previously.

An equally quick check reveals that on the restored database systems, the PGAudit extension hadn’t been updated for two years.

Scenario 2

Database used for PostGIS exhibits an unexplained performance slowdown that is restricted to specific geospatial data values.

SELECT ST_DistanceSphere('POINT(-150 33)',
'POINT(-120.120120 42.488888)');

is more than fifty times slower than

SELECT ST_DistanceSphere('POINT(-150 33)',
'POINT(-120.120120 42.4888881)');

The behaviour is not reproducible. It’s not even reproducible using the exact same PostgreSQL and PostGIS version combination.

Finally, someone manages to trigger this on a test system – but only for this exact data value.

We drag out the profilers. Only this particular number seems to cause the slowdown and the culprit according to perf seems to be... a multiplication?!

+ 62.78% 61.29% postgres libm-2.23.so [.] __mul

A quick check confirms that this does not happen on systems with different glibc versions.

We determined that some slow paths for sine/cosine calculations were found and eliminated in the GNU libc mathematical functions almost two years previously.

The system in question used a glibc that was four years old.

We have seen many other scenarios of old versions causing complex incidents, even involving total downtime for long periods of time while being diagnosed and resolved.

Help your system have the latest tools that it needs – update.

We can see that skipping library updates can trigger breakages in the real world. There are clear benefits to upgrading early and upgrading often. Even though our system appears stable, a use case will come along that will trigger a bug. Even worse, it may trigger the bug further down the dependency line, where it’s harder to troubleshoot. Also, by not updating, we are exposing our users to security risks that may appear down the road (we have seen that nobody is immune to this).

Locking down our dependencies does not pay off: a few months or years down the line, we will have no idea how to upgrade our system. The longer we wait, the harder it will be to upgrade, other products, tools, and dependencies that we interact with will have upgraded and be incompatible.

Any potential issues caused by the upgrade are worth the risk because they will be easier to fix by comparison. Especially skipping minor version updates, where there is usually no functionality change or breaking changes. Having scheduled downtimes for updates will prolong your system's life and, in the long term, increase its uptime.

To summarise, eliminate the barriers and make it easier to update your system's dependencies often. The benefits are worth it.

Source

This article, written by Jimmy Angelakos, was migrated from the 2ndQuadrant blog.

Was this article helpful?

0 out of 0 found this helpful