Fighting in the firmwar(e) trenches

Most Firmware Sucks (but it doesn’t have to)

mfMedium

--

Strategies to improve your firmware

In the past decade I have contributed to or ran engineering for several startups, consulted on numerous projects, built multiple products, wrote apps, tools, scripts, and a ton of firmware. I have made a rather painful observation:

Most firmware sucks.

This article hypothesizes why this is so. It then describes some practices that are readily adapted to firmware and improve quality, testability, and reliability. While many software engineers already do these, too many firmware developers do not.

Why Most Firmware Sucks

Firmware developers often come from software-adjacent fields like mechanical engineering, electrical engineering, or physics. They may have substantial technical training but often lack formal software training and the associated best practices. To make matters worse, firmware development and runtime environments are actively hostile:

  1. Logging can be hard or impossible (no filesystem, no persistent storage, no network, no console, no memory).
  2. printf debugging can bog down the CPU or bus, distorting runtime behavior.
  3. Try getting a peripheral device to pause while debugging. Go on, I’ll wait. It won’t though! That’s a little hardware joke for you.
  4. Multithreading is nonexistent (or more accurately, embedded threading is not like application multithreading).
  5. C is heavily favored over C++ by SDK developers, so put away that ultra-modern 30 year old language and use this 50 year old language instead.
  6. The tools are old, expensive, proprietary, quirky, and hard to script.

Bad news first: The firmware environment will not improve. The tools will remain shitty. The languages will remain old.

Now, the good news: Firmware can improve.

Firmware Is (Mostly) Software

Software engineering is a craft, not an art. A craft is the distilled experiences, practices, pain, and wisdom of practitioners [2]. Applying these software practices to firmware yields improvements in testability, flexibility, and maintenance, just as they do in other software engineering areas. I’m going to concentrate on just three practices that I’ve used on small, resource-constrained startup teams.

It’s also good to remember that firmware is (mostly) software, not entirely software. There are characteristics of firmware that make it hard in ways software isn’t, and vice versa. No amount of software engineering is going to remove the need for manual, laborious, in-hardware system testing.

1. Use gitflow (and git, of course)

Good version control systems are commonplace. A good strategy will make your firmware even better.

Firmware developers end up using git as a way to save their work, to tinker with ideas without touching the main branch, and, very occasionally, to collaborate. gitflow is a strategy that helps with software management complexity: rapidly changing requirements, hardware revisions, testing, verification, release, and maintenance. While firmware developers don’t frequently encounter the team scale and complexity pressures that trigger gitflow’s natural emergence, that doesn’t mean they can’t benefit from it. Here is my shortlist of good habits:

  1. Always use a remote repository. Accountability, if only to yourself, improves code quality. And you never know when the dev machine is going to die or when a project is going to be shelved…or restarted.
  2. Use gitflow to organize development initiatives into different feature, fix, and release branches.
  3. Use embroidered semantic versioning or equivalent to tag the build products. Ideally tagging is an automated part of the build process.
  4. Archive named build resources where the hardware team can find them. Firmware should never go from the firmware developer to a deployment device, manufacturing line, or to production. There are too many opportunities to introduce small changes with big consequences. Just because a firmware image is small enough to travel on a thumbdrive doesn’t mean it should.
  5. yolo git push origin master --force is not a branch management strategy.

2. Write Testable Firmware

Firmware is hard to write, to test, and to debug. Worse, developers often debug firmware at the same time they system behavior. “Firmware” in this context means just the code, that is, the conditional and sequential logic, data structures, and data transformations. “System behavior” is what makes firmware challenging, stuff like:

  1. real world, noisy, messy, sensor inputs
  2. events coming from peripherals on external/independent timing
  3. realtime requirements

But if we decouple firmware and system behavior, both tasks can be made easier. We can test and verify how our code behaves when a sensor dies, saturates, or locks, without having a sensor or creating the conditions that trigger the failure. We can verify correct data formatting, communications parsing and handling, accelerometer transforms — all without the encumbrance of hardware. We can debug system behavior with greater insight as well, knowing that at least the code has been shown to work, if not properly, at least as intended, with any peculiarities likely coming from the data and system itself, rather than bugs.

“Testable firmware”[3] has two defining characteristics:

  1. State independence: Data and context (state) are supplied to functions as parameters, with no direct access to system data or special registers, and little (preferably no) direct access to statically allocated data.
  2. Platform independence: All hardware-specific registers and includes are isolated to hardware-specific inline files. The codebase can be targeted to custom hardware or PC at buildtime without edits.

Remember, testable firmware won’t obviate in situ testing, which would require complete device emulation (significant effort for dubious benefit). The intent here is to leverage our powerful development environments to solve two smaller testing problems instead of one big one.

Let’s find and fix some specific causes of untestable code.

State Independence

We write untestable code when we access state not easily replicated in the test environment. Obviously, some statically defined data is unavoidable, as firmware favors static over dynamic allocation. The drawback is that this statically defined data must be available or replicated on all hardware targets, some of which have ACCEL_CON0 and some that don’t, so it is a good habit to eliminate direct access.

Take a look at your drivers (SPI or flash or accelerometer — all good places). Look for any functions directly accessing registers or manipulating statically defined data.

Here’s a simple example:

Don’t do this
Do this instead

Platform Independence

We write untestable code when it won’t compile without intrinsics, types, data, and libraries available only for the hardware target. To achieve platform independence we isolate all hardware-specific details to inlined files that are selected at compile time.

Take another look through your driver files. Look for any platform-specific includes or libraries. These are effectively hardware, unless SDK and library providers have x86/linux/whatever builds. Look for any machine intrinsics or platform-specific types. That’s all hardware too.

There’s probably quite a lot!

The solution is well-known: Write a Hardware Abstraction Layer (“HAL”).

Note bene [4]: Most chip makers provide a “HAL” but their definition of “hardware abstraction” is chip variants, not runtime environments. *Your* HAL is higher level, providing a complete separation between platform, which includes sdk, libraries, registers, data structures, intrinsics, and application.

Most developers already write “shim” files rather than accessing library code directly. Keep doing that! That’s the first part of writing a HAL.

Most developers also end up littering their code with preprocessor logic and conditionals to target different platforms and hardware versions. Absolutely stop doing that.

Let’s try an example.

Platform Independence Example

In firmware you occasionally need a busy-wait. To prevent multiple files from containing #include “nrf_delay.h” (a platform-specific file), one typically writes a wrapper delay function in time.c and tucks the dependency there, but this still leaves time.c hardware-dependent.

The best solution would be to copy-paste in the hardware-specific code only at compile time. This is exactly what the C/C++ preprocessor does! We use this functionality to inline the correct hardware-specific code in to the non-hardware-specific code at compile time.

1. Remove the implementation for TimeMicrosecondDelay from time.c.

2. At the bottom of time.c add #include “time.inl”.

3. Create a folder for each hardware target in the src directory. These folders contain .inl files that include all, and only, hardware-specific code.

The embedded target includes src/nrf/time.inl:

While the PC build redirects to src/x86/time.inl:

This redirection can work the other way as well, providing functionality easily available in a test environment while avoiding issues on the target. For example, if logging is difficult on the target hardware, you can provide a minimal or empty implementation there, but on a PC, dump logging strings to console.

Through a single compile time script directive we can now target different runtime environments.

Note bene: One serious risk is writing essentially different codebases in different .inl folders is letting the .c files atrophy to mere shims. Avoid this at all costs. The more code in the .inl files, the less “platform independence” really means, because it becomes one codebase on PC and quite another on the target. For example, if a function is ten lines of application code containing one hardware-specific line, that function stays in the .c file. Replace the hardware-specific line with a call to a function defined in the .inl file. The only code that goes in .inl files is hardware-specific code.

Some Concerns

Firmware engineers work on resource-constrained systems and are instinctively averse to anything that would potentially affect performance. Let’s address a few.

1. With no direct access, every register access requires a slow, expensive function call.

In my view, this is more theoretical concern than actual problem, and rarely driven by actual timing data. There are workarounds as well: Try keeping the accessor function limited to read/write and nothing more. A smart compiler and some coaxing (with direct examination of the assembly to confirm) can inline register access in the ISR with the function call overhead optimized away.

2. With no direct hardware/peripheral/system calls, every function call requires a wrapper, bloating our callstack.

This is also more theoretical than practical. It’s true that abstraction tends to grow a callstack, and it’s true the abstraction cost (the wrapper call) is paid each time the callstack moves from application to HAL. It’s important to realize these calls are almost always one direction, with application code accessing hardware. This directionality prevents the successive chaining of wrapper calls that would really bloat a callstack.

3. When all function operands become arguments, our fat function frames will increase stack use.

This approach may require substantial code changes which can have a large impact — but it’s not the one you expect. Writing firmware like this increases transient stack consumption — but persistent memory consumption plummets. In my experience, code written to directly access statically allocated data often heavily abuses statically allocated data, keeping data in scope long past need. As code adopts a more functional style, so too does memory usage. More arguments are retrieved and constructed on the fly, passed to a function, and freed when the function is popped.

4. This abstraction hinders development and makes the code harder to understand.

Engineers are generally risk-adverse. New ways of doing things introduce unknown risk. Thus it makes a certain sense to avoid new approaches. At the same time, we must avoid settling at local maxima [4] in our practice.

Any abstraction that adds complexity to the code hierarchy may well take longer to write, but a well-chosen abstraction reduces overall development time by simplifying comprehension and subsequent debugging. As developers spend most of their time reading and debugging code (even when writing it) a lightweight abstraction like this is a good trade.

Last, I find the old software adage holds in most areas of engineering: It is easier to make correct code fast than it is to make fast code correct.

3. Automate The Pipeline

An automated pipeline (“continuous integration”) can feel like overkill, especially for firmware. Why bother? After all, there is only one firmware engineer on the team, and the firmware doesn’t change very fast, and we only have a single seat of that weird compiler, and we only built one hardware prototype, and, and.

If we go to the effort of writing testable code, an automated pipeline is how we harvest from it a fundamental benefit: Development time. In hardware this means more than firmware developer time. It lets the team codify the requirements, edge cases, communication protocols, and observed field failures as tests and then run them every release. If a developer’s change unexpectedly and subtly breaks important behavior, that failure is immediately and automatically reported to the developer, rather than manifesting downstream, perhaps as a corrupted datafile during verification or QA testing or in production.

None of these practices will eliminate the need for in-hardware testing but it can reduce or eliminate manual software testing for each release. Firmware, while slow to change, still changes faster than hardware, and automated testing will make it safer to go faster.

Automation is not easy to implement. Embedded development toolchains are quirky, so broad recommendations fit almost no one. But the requirements are lower than you might think. A decent, networked, web-accessible, automated build system requires at minimum a build pc, a vpn (free), jenkins (also free), python scripts (free), and a bit of time (definitely not free).

And a proprietary compiler, of course. Those things are expensive.

Coda

My goal with this article is to share some of the useful lessons I’ve learned over the past decade working on small, resource-constrained startups. These lessons were generally learned the hard way (face first and at high speed). I hope this article encourages firmware engineers to engage with the abundance of ideas in the larger software domain, and challenges seasoned firmware engineers to find new ways to improve their practice [6].

Et cetera and digressions

  1. Version 1.0.0, last edited 3/11/20.
  2. Articles like this are only read by people trying to improve. Bad engineers tend to stay bad, and they do so by not improving. “Good engineering” is a practice, not a destination. More on that in another article.
  3. Testable firmware does not mean “Test Driven Development”, which is a useful (though laborious) technique, though I hope you see the obvious overlap in TDD and these recommendations. The book most germane to TDD and firmware is Test Driven Development for Embedded C.
  4. Note bene is Latin for “Hey I can translate latin on google”. Trust me, this kills it with the dominarum!
  5. “local maxima”: Stay hungry. There’s always room to improve.
  6. There are many books on software development craft but two recent goodies are Robert Martin’s Clean Code and Clean Architecture.

--

--