Assuring Cyber-Physical Systems in an Age of Rising Autonomy

As developers continue to build greater autonomy into cyber-physical systems (CPSs), such as unmanned aerial vehicles (UAVs) and automobiles, these systems aggregate data from an increasing number of sensors. The systems use this data for control and for otherwise acting in their operational environments. However, more sensors not only create more data and more precise data, but they require a complex architecture to correctly transfer and process multiple data streams. This increase in complexity comes with additional challenges for functional verification and validation (V&V) a greater potential for faults (errors and failures), and a larger attack surface. What’s more, CPSs often cannot distinguish faults from attacks.

To address these challenges, researchers from the SEI and Georgia Tech collaborated on an effort to map the problem space and develop proposals for solving the challenges of increasing sensor data in CPSs. This SEI Blog post provides a summary our work, which comprised research threads addressing four subcomponents of the problem:

addressing error propagation induced by learning components
mapping fault and attack scenarios to the corresponding detection mechanisms
defining a security index of the ability to detect tampering based on the monitoring of specific physical parameters
determining the impact of clock offset on the precision of reinforcement learning (RL)

Later I will describe these research threads, which are part of a larger body of research we call Safety Analysis and Fault Detection Isolation and Recovery (SAFIR) Synthesis for Time-Sensitive Cyber-Physical Systems. First, let’s take a closer look at the problem space and the challenges we are working to overcome.

More Data, More Problems

CPS developers want more and better data so their systems can make better decisions and more precise evaluations of their operational environments. To achieve those goals, developers add more sensors to their systems and improve the ability of these sensors to gather more data. However, feeding the system more data has several implications: more data means the system must execute more, and more complex, computations. Consequently, these data-enhanced systems need more powerful central processing units (CPUs).

More powerful CPUs introduce various concerns, such as energy management and system reliability. Larger CPUs also raise questions about electrical demand and electromagnetic compatibility (i.e., the ability of the system to withstand electromagnetic disturbances, such as storms or adversarial interference).

The addition of new sensors means systems need to aggregate more data streams. This need drives greater architectural complexity. Moreover, the data streams must be synchronized. For instance, the information obtained from the left side of an autonomous automobile must arrive at the same time as information coming from the right side.

More sensors, more data, and a more complex architecture also raise challenges concerning the safety, security, and performance of these systems, whose interaction with the physical world raises the stakes. CPS developers face heightened pressure to ensure that the data on which their systems rely is accurate, that it arrives on schedule, and that an external actor has not tampered with it.

A Question of Trust

As developers strive to imbue CPSs with greater autonomy, one of the biggest hurdles is gaining the trust of users who depend on these systems to operate safely and securely. For example, consider something as simple as the air pressure sensor in your car’s tires. In the past, we had to check the tires physically, with an air pressure gauge, often miles after we’d been driving on tires dangerously underinflated. The sensors we have today let us know in real time when we need to add air. Over time, we have come to depend on these sensors. However, the moment we get a false alert telling us our front driver’s side tire is underinflated, we lose trust in the ability of the sensors to do their job.

Now, consider a similar system in which the sensors pass their information wirelessly, and a flat-tire warning triggers a safety operation that prevents the car from starting. A malicious actor learns how to generate a false alert from a spot across the parking lot or merely jams your system. Your tires are fine, your car is fine, but your car’s sensors, either detecting a simulated problem or entirely incapacitated, will not let you start the car. Extend this scenario to autonomous systems operating in airplanes, public transportation systems, or large manufacturing facilities, and trust in autonomous CPSs becomes even more critical.

As these examples demonstrate, CPSs are susceptible to both internal faults and external attacks from malicious adversaries. Examples of the latter include the Maroochy Shire incident involving sewage services in Australia in 2000, the Stuxnet attacks targeting power plants in 2010, and the Keylogger virus against a U.S. drone fleet in 2011.

Trust is critical, and it lies at the heart of the work we have been doing with Georgia Tech. It is a multidisciplinary problem. Ultimately, what developers seek to deliver is not just a piece of hardware or software, but a cyber-physical system comprising both hardware and software. Developers need an assurance case, a convincing argument that can be understood by an external party. The assurance case must demonstrate that the way the system was engineered and tested is consistent with the underlying theories used to gather evidence supporting the safety and security of the system. Making such an assurance case possible was a key part of the work described in the following sections.

Addressing Error Propagation Induced by Learning Components

As I noted above, autonomous CPSs are complex platforms that operate in both the physical and cyber domains. They employ a mix of different learning components that inform specific artificial intelligence (AI) functions. Learning components gather data about the environment and the system to help the system make corrections and improve its performance.

To achieve the level of autonomy needed by CPSs when operating in uncertain or adversarial environments, CPSs make use of learning algorithms. These algorithms use data collected by the system—before or during runtime—to enable decision making without a human in the loop. The learning process itself, however, is not without problems, and errors can be introduced by stochastic faults, malicious activity, or human error.

Many groups are working on the problem of verifying learning components. In most cases, they are interested in the correctness of the learning component itself. This line of research aims to produce an integration-ready component that has been verified with some stochastic properties, such as a probabilistic property. However, the work we conducted in this research thread examines the problem of integrating a learning-enabled component within a system.

For example, we ask, How can we define the architecture of the system so that we can fence off any learning-enabled component and assess that the data it is receiving is correct and arriving at the right time? Furthermore, Can we assess that the system outputs can be controlled for some notion of correctness? For instance, Is the acceleration of my car within the speed limit? This kind of fencing is necessary to determine whether we can trust that the system itself is correct (or, at least, not that wrong) compared to the verification of a running component, which today is not possible.

To address these questions, we described the various errors that can appear in CPS components and affect the learning process. We also provided theoretical tools that can be used to verify the presence of such errors. Our aim was to create a framework that operators of CPSs can use to assess their operation when using data-driven learning components. To do so, we adopted a divide-and-conquer approach that used the Architecture Analysis & Design Language (AADL) to create a representation of the system’s components, and their interconnections, to construct a modular environment that allowed for the inclusion of different detection and learning mechanisms. This approach supports a full model-based development, including system specification, analysis, system tuning, integration, and upgrade over the lifecycle.

We used a UAV system to illustrate how errors propagate throughout system components when adversaries attack the learning processes and obtain safety tolerance thresholds. We focused only on specific learning algorithms and detection mechanisms. We then investigated their properties of convergence, as well as the errors that can disrupt those properties.

The results of this investigation provide the starting point for a CPS designer’s guide to utilizing AADL for system-level analysis, tuning, and upgrade in a modular fashion. This guide could comprehensively describe the different errors in the learning processes during system operation. With these descriptions, the designer can automatically verify the proper operation of the CPS by quantifying marginal errors and integrating the system into AADL to evaluate critical properties in all lifecycle phases. To learn more about our approach, I encourage you to read the paper A Modular Approach to Verification of Learning Components in Cyber-Physical Systems.

Mapping Fault and Attack Scenarios to Corresponding Detection Mechanisms

UAVs have become more susceptible to both stochastic faults (stemming from faults occurring on the diﬀerent components comprising the system) and malicious attacks that compromise either the physical components (sensors, actuators, airframe, etc.) or the software coordinating their operation. Diﬀerent research communities, using an assortment of tools that are often incompatible with each other, have been investigating the causes and eﬀects of faults that occur in UAVs. In this research thread, we sought to identify the core properties and elements of these approaches to decompose them and thereby enable designers of UAV systems to consider all the diﬀerent results on faults and the associated detection techniques via an integrated algorithmic approach. In other words, if your system is under attack, how do you select the best mechanism for detecting that attack?

The issue of faults and attacks on UAVs has been widely studied, and a number of taxonomies have been proposed to help engineers design mitigation strategies for various attacks. In our view, however, these taxonomies were insuﬃcient. We proposed a decision process made of two elements: ﬁrst, a mapping from fault or attack scenarios to abstract error types, and second, a survey of detection mechanisms based on the abstract error types they help detect. Using this approach, designers could use both elements to select a detection mechanism to protect the system.

To classify the attacks on UAVs, we created a list of component compromises, focusing on those that reside at the intersection of the physical and the digital realms. The list is far from comprehensive, but it is adequate for representing the major qualities that describe the eﬀects of those attacks to the system. We contextualized the list in terms of attacks and faults on sensing, actuating, and communication components, and more complex attacks targeting multiple elements to cause system-wide errors:

More Data, More Problems

A Question of Trust

Addressing Error Propagation Induced by Learning Components

Mapping Fault and Attack Scenarios to Corresponding Detection Mechanisms

Recommend

入手充电运营平台（国内）建设：聊聊定价策略

微信养号怎么养

什么是h5页面制作

Netflix is expanding its game streaming test service to the US

微信怎么设置余额不可见

微信信息删除后可以恢复吗

Predict tomorrow's trends today & stay ahead of the curve

Mark This For Me

微信公众号如何对文字划线

Toyota's first electric truck prototype caught testing in Australia

About Joyk