A Complex High-wire Act: Safety and Security in Embedded Systems - EETimes

2022-08-08 03:08:46 By : Ms. Crystal Ou

Embedded systems are now found in pretty-well every type of vehicle, machine, and device on which our economies, our day-to-day interactions, and our lives depend. They are fundamental to the operation of everything from high-speed trains and industrial robots to medical ventilators and vacuum cleaners, and they are necessarily – and sometimes, perhaps, unnecessarily – connected. Increasingly, as well as being connected systems, these systems are safety-critical systems. This opens untold opportunities for malicious interference that can have serious, even tragic consequences. In short, because it is connected, no safety-critical system today can be safe if it is not also secure.

Design and delivery of an embedded software system that is both safe and secure is a complex high-wire act performed by specialists with very different ways of thinking and skill sets who must implement competing requirements in a system with limited computing resources. The key to success has two aspects.

First, we need to understand how we can best use the tools and techniques available to us for building both safety and security. Second, and critically, we must appreciate that, while the cultures of functional safety and cybersecurity have some affinities, they are not identical. Both are founded on intellectual rigour, discipline, and a healthy paranoia about things going wrong, but beyond that the similarities fade.

The functional safety specialist focuses on building systems that run as designed, can recover from faults, and, in the worst case, can assume a design safe state (DSS) in a timely manner. By contrast the cybersecurity specialist is unconcerned about what exactly the system is doing, so long as no malicious external actor can interfere with the system carrying out its duties. Her focus is determining how a system can be tricked into not behaving as specified. She assumes that the system will be reverse engineered repeatedly until a vulnerability is exposed and exploited to compromise the safety specialist’s meticulously engineered creation.

Once a system’s operation impacts safety, it necessitates a strong cybersecurity defense – functionally safe products that are easily hacked don’t survive long.

Whenever we discuss system safety, we must discuss functional safety. We use the term functional because to be safe a system must do something: it must function, and it must function safely. A system that is so safe it does nothing is of no use to anyone.

To be functionally safe an embedded software system must function as specified. It must be dependable. That is, it must do what it is expected to do (reliability), and it must do this within expected time limits (availability). For example, a train door control system must open and close the train doors, inform the train control system of the state of these doors, and alert it of any anomalies (such as the doors opening while the train is in motion), all in a timely manner.

If our door control system fails to alert the train control system of open doors, it is not reliable. It did not do what it is expected to do. If it checks the status of the doors too infrequently or fails to inform the train control system of the state of the doors when required, then it fails the availability test. It did not do what it is expected to do within the expected timeframe.

Reliability and availability are of course fundamental, not just to a complete functionally safe software system, but to its components. For example, the process checking the status of hardware sensors that communicate the state of train doors must perform each check as specified (reliability) and when specified (availability). It must also pass this information on to whatever process or component requires it and so on.

A system that is so safe it does nothing is of no use to anyone.

A security breach in any software system can be costly. In addition to direct financial losses, the target of a breach may suffer harm, such as damage to its reputation and its relationships with customers.

With safety-critical systems the stakes are, by definition, much higher. Whatever its financial and reputational consequences, a security breach can endanger property, human lives, and the environment. The May 2021 ransomware attack against Colonial Pipeline halted almost half the fuel deliveries to the US East Coast. Imagine if instead of ransomware the attack had targeted the pipeline controls with the purpose of leaking fuel and igniting it.

The UNECE WP.29 automotive cybersecurity regulation adopted in June 2020 underscores the vulnerability of the myriad embedded systems in today’s vehicles and the importance of cybersecurity strategies to counter threats to them. These systems have already taken responsibility for life-critical decisions and actions in advanced driver assistance systems that keep the driver and occupants safe. As vehicles progress towards full autonomy these responsibilities will necessarily increase.

Focused on cybersecurity risk assessment and mitigation, and on ensuring secure software updates, this regulation underscores the following simple truths about embedded systems in vehicles and, indeed, virtually all today’s embedded systems:

The WP.29 regulation of June 2020 is just the beginning. We should expect more regulatory initiatives and increasing requirements across industries and by jurisdictions around the world. Cybersecurity regulations are simply a useful tool for achieving an end; accountability ultimately rests with the individuals and teams delivering the safety-critical embedded systems. Just as the functional safety experts must ensure that the systems meet their functional safety requirements, the cybersecurity experts must ensure that cyberattacks cannot compromise these systems.

The consequences of a failure in many of today’s safety-critical embedded systems reaches far beyond the systems themselves. Imagine, for instance, an embedded software system that manages communications between a high-speed train such as the ICE trains in Germany and the wayside infrastructure. If this system raises a false alarm, for instance, forcing a train to stop, the consequences percolate outwards across the rail system. The stopped train forces other trains on the line to reduce speed or stop, perhaps others to be delayed or re-routed. And every unscheduled change increases the possibility of an accident.

Cybersecurity regulations are simply a useful tool for achieving an end; accountability ultimately rests with the individuals and teams delivering the safety-critical embedded systems.

Connectivity is both essential to today’s embedded systems and a security risk. Since the advent of the internet, and especially the Internet of things (IoT), connectivity tends to mean a connection to the internet. In fact, when considering the implications of connectivity for functional safety or cybersecurity of embedded systems, we should understand connectivity as any interface to any external entity with which the embedded system or one of its components interacts, whatever the mode of interaction, and however briefly.

When building cybersecurity defense, we should not assume that attacks will come from outside our system. On the contrary, barring internal sabotage, while the attack will have its origins outside our system, the attack itself will likely come from inside the system, from some little bit of innocuous code no one notices, such as an old and venerable device driver or service.

The 2021 T-Bone drone-enabled attack on Tesla vehicles illustrates well how connectivity is always a security vulnerability: hackers used the system connection manager (ConnMan) to gain entry to the vehicle’s infotainment system. There they stopped, but had their intent been malicious rather than instructional, they likely could have continued the attack to compromise the vehicle’s safety-critical systems. Typically for hackers, once the hack methodology was determined, they needed less than a minute to get into a selected target.

In another notable attack, the point of entry was through USB ports. In an unacknowledged covert operation begun in 2006, the US allegedly used scientists’ USB memory sticks to get the Stuxnet worm into Iranian scientists’ computers and destroy that country’s uranium enrichment program at the Natanz nuclear facility. No internet connection was required nor was direct access to the computers needed. All that was required was for the worm to be on some computer – somewhere. Using USB sticks to migrate to numerous computers in various countries, the worm eventually reached the targeted Iranian computers.

The overwhelming benefits that drive connectivity introduce the opportunity for attackers and necessitate cybersecurity defenses – even in the safest embedded system.

We should note that, whatever the cybersecurity issues it brings along, connectivity is and will remain an essential requirement of embedded systems. These systems need connectivity to do their jobs and expecting them to remain unconnected to keep them protected is unrealistic.

Connectivity is now also essential for keeping systems up to date. Over-the-air (OTA) updates in particular are a necessary component of many software release strategies, routinely used to distribute new features and to update software that may have been rushed to release. Critically, OTA is the vehicle of choice for maintaining system functional safety and cybersecurity. It is used to deliver bug patches to correct issues uncovered post-release and to update cybersecurity defences to counter new threats – and there will always be new threats.

In “Analytical Review of Cybersecurity for Embedded Systems” Abdulmohsan Aloseel and his co-authors nicely state the particular challenge of ensuring the cybersecurity of an embedded system:

The conflict between cybersecurity requirements and the computing capabilities of embedded systems makes it critical to implement sophisticated security countermeasures against cyber-attacks in an embedded system with limited resources, without draining those resources.

The key phrase is “without draining those resources”. Embedded systems typically lack the processing power to run applications such as virus scanners to protect them from cyber-attacks. If we remember that a functionally safe system must always have the power, CPU cycles, memory, and so on it needs to run dependably, we can see that implementing the necessary cybersecurity protections becomes a balancing act between competing requirements.

System simplicity is essential. No software is without bugs; even the most trivial bits of code can include errors. And the more complex the code the greater the likelihood that it will contain errors and that these errors will be missed by validation techniques such as static analysis and slip through testing.

In contrast to functional safety design, the fundamental design strategy for cybersecurity is defense in depth. This means layering the system and, at each potential point of entry, implementing defenses appropriate to that layer. These defenses include techniques such as using privilege levels to restrict access to specific components or even processes, using structures allocated from the heap so that they receive new address at each allocation, and adding code to scramble, encrypt, decrypt, and run checks.

Not surprisingly, such measures add complexity. While each security measure may not greatly increase complexity, together the overall effect on system complexity is not negligible, pulling the design in a direction exactly contrary to the simplicity that can streamline the design and validation of functionally safe systems.

If we remember that a functionally safe system must always have the power, CPU cycles, memory, and so on, it needs to run dependably, we can see that implementing the necessary cybersecurity protections becomes a balancing act between competing requirements.

Fortunately, all is not contradiction. In fact, though functional safety and cybersecurity design requirements often pull in contrary directions and compete for limited resources, functional safety and cybersecurity both benefit from many of the same design features, tools, and techniques.

Functional safety and cybersecurity can only be achieved through close cooperation between software and hardware. Our focus here is on embedded system software, but we assume that the system will also take advantage of hardware features such as secure boot, Trusted Platform Module (TPM), ARM TrustZone and exception levels, and x86 Rings.

A key design requirement of functional safety systems is that safety-critical components be isolated from interference from other components. The OS architecture can be used to limit the amount of code implicated in the system’s safety-critical operations – for example, to a small OS kernel and a select number of resources required for carrying out the system’s functional safety tasks. All other tasks are excluded. This limited code set may also be used to define a small, trusted computing base (TCB) and thereby help reduce the system’s exploitable attack surface.

That is, both functional safety and cybersecurity may benefit from a smaller code base delimited by clear boundaries. The smaller the safety-critical code base, the smaller the amount of code that needs to be safety-validated and certified, the fewer the opportunities for this code to contain security vulnerabilities, and the less difficult it will be to identify and correct these vulnerabilities.

Containers bundle an application with all the supporting pieces it needs to operate: libraries, utilities, data, and configuration files. They provide an easy method for isolating applications with benefits for both functional safety and cybersecurity. Containers can be signed so that unsigned (untrusted) containers can be prevented from running, providing protection from malicious intrusions. Containers are also an ideal vehicle for code migration and updates, facilitating delivery of modular software updates and security upgrades.

Hypervisors allow multiple, diverse embedded systems to run concurrently on the same board. Each system runs as though it were running directly on hardware, isolated from the hosting hypervisor and the other systems on the board. They are ideal for mixed-criticality systems; that is, systems with different functional safety requirements. For instance, a system certified to IEC 62304 Class C can run in one hypervisor environment, isolated from interference from other non-safety systems running in other contained and isolated environments. As with containers, the isolation a hypervisor provides is useful both for functional safety and for cybersecurity.

Additionally, by abstracting stable running environments for safety-critical systems in a self-contained environment, hypervisors reduce the cost and the risk associated with migrating these systems to new hardware platforms. If the hypervisor meets the system’s functional safety and cybersecurity needs, relatively little work is needed to ensure the system continues to run safely and securely in a virtualized environment on the new hardware.

The smaller the safety-critical code base, the smaller the amount of code that needs to be safety-validated and certified, and the fewer the opportunities for this code to contain security vulnerabilities.

Fortunately, the programming tools available today provide safeguards that were unavailable to developers only a few years ago. For example, the new C++20 standard (September 2020) provides alternatives to the pointer and stack management dangers that kept many a C and C++ developer up at night. Other languages, such as Rust and Ada offer increasingly interesting possibilities for developing functionally safe systems. Rust achieves memory safety without real-time problematic techniques like garbage collection. Ada, the language developed by the US Department of Defense in the early 1980s, is still in use and is infamous for checking for errors at compile time rather than leaving them to be discovered by testing during runtime.

Similarly, many compilers offer sophisticated options for closing security vulnerabilities, including those for marking the stack as non-executable to force attackers to use complicated techniques such as return-oriented programming and position independent executables (PIE). This is so that the locations of executables in memory can be randomized, forcing attackers to find and leak the addresses of executables before being able to proceed with their attack. Compilers can also often be instructed to insert stack overflow checks like stack canaries, that can detect when a stack overflow has occurred.

Many system validation techniques valuable for demonstrating that a system meets functional safety requirements are equally valuable for improving cybersecurity robustness. These include:

Static code analysis, a mainstay of software safety engineering, is a valuable tool for ensuring cybersecurity as well as functional safety. Static code analysis tools often provide settings that focus analysis on issues that are exploitable by cyberattacks. Numerous source code security analysis tools are also available.

Automated testing is especially useful for code migration and updates, where regression testing is required to ensure that the software performs identically whenever the underlying platform changes. Automated testing is also essential if the team uses continuous integration techniques during development, although this is less common in development workflows for functional safety.

Fault injection, usually used during automated testing, is another key tool available to both functional safety and cybersecurity specialists. It involves deliberately adding faults to a program at compile time or at run time to force the program to execute its fault recovery code, which in the absence of errors would never be tested.

Fuzz testing, a subcategory of fault injection, inputs invalid, unexpected, and random data to test a system’s ability to deal with such input. It is particularly valuable for evaluating a system’s ability to withstand cyberattacks when it is used to inject data at trust boundaries; that is, when data is passed between components of different privilege levels.

Many system validation techniques valuable for demonstrating that a system meets functional safety requirements are equally valuable for improving cybersecurity robustness.

As with so much in engineering, success implementing functional safety and cybersecurity in embedded systems comes down to intangibles, the so-called “soft-skills”, the working culture in which these systems are designed, built, and implemented.

Complementary yet competing skills and attitudes support both functional safety and security.

First among the attributes required of a working culture that will deliver functionally safe systems hardened against cyberattacks is, along with a healthy paranoia that things just do go awry, a healthy humility, and with that a sense of the limits of our understanding. Specifically, as functional safety specialists we must recognize that we are not security specialists, and as security specialist accept that we are not functional safety specialists. Each domain is its own discipline, and it is, literally, dangerous to assume that because we are competent in the one, we are competent in the other. What we can do, though, is work with the other specialists to negotiate the best solutions possible for our conflicting requirements.

Second, if we have accepted our limitations, we must also recognize that neither functional safety nor cybersecurity can be achieved simply by adding technology. More abstractions and layers, more encryption and checks, canaries, gadgets, and patches will not in themselves produce the results we need. These must be considered and implemented in the context of a culture where safety and security are understood to be integral to design and, most importantly, where the planning, design, development, release, and maintenance of our software are driven first by our system’s safety and security requirements.

Finally – and in the world of embedded software development driven by aggressive release schedules, this may require some rejigging of our thinking – we must empower ourselves to adjust feature sets and release schedules to ensure that our functional safety and cybersecurity requirements are met. Nothing will be so costly as our failure to do so.

Kalle Dalheimer is a software pioneer and the co-founder of KDE. He’s also the founder and CEO of KDAB, the leading consulting, training, and development company for Qt, C++, and Open GL.

You must Register or Login to post a comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed.