HOME

TheInfoList



OR:

In our modern society, computerized or digital
control system A control system manages, commands, directs, or regulates the behavior of other devices or systems using control loops. It can range from a single home heating controller using a thermostat controlling a domestic boiler to large industrial c ...
s have been used to reliably automate many of the industrial operations that we take for granted, from the power plant to the automobiles we drive. However, the complexity of these systems and how the designers integrate them, the roles and responsibilities of the humans that interact with the systems, and the
cyber security Computer security, cybersecurity (cyber security), or information technology security (IT security) is the protection of computer systems and networks from attack by malicious actors that may result in unauthorized information disclosure, the ...
of these highly networked systems have led to a new paradigm in research philosophy for next-generation control systems. Resilient Control Systems consider all of these elements and those disciplines that contribute to a more effective design, such as
cognitive psychology Cognitive psychology is the scientific study of mental processes such as attention, language use, memory, perception, problem solving, creativity, and reasoning. Cognitive psychology originated in the 1960s in a break from behaviorism, which ...
,
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, and
control engineering Control engineering or control systems engineering is an engineering discipline that deals with control systems, applying control theory to design equipment and systems with desired behaviors in control environments. The discipline of controls o ...
to develop interdisciplinary solutions. These solutions consider things such as how to tailor the control system operating displays to best enable the user to make an accurate and reproducible response, how to design in cybersecurity protections such that the system defends itself from attack by changing its behaviors, and how to better integrate widely distributed computer control systems to prevent cascading failures that result in disruptions to critical industrial operations. In the context of cyber-physical systems, resilient control systems are an aspect that focuses on the unique interdependencies of a control system, as compared to
information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of data . and information. IT forms part of information and communications technology (ICT). An information technology system (I ...
computer systems and networks, due to its importance in operating our critical industrial operations.


Introduction

Originally intended to provide a more efficient mechanism for controlling industrial operations, the development of
digital control Digital control is a branch of control theory that uses digital computers to act as system controllers. Depending on the requirements, a digital control system can take the form of a microcontroller to an ASIC to a standard desktop computer. ...
systems allowed for flexibility in integrating distributed sensors and operating logic while maintaining a centralized interface for human monitoring and interaction. This ease of readily adding sensors and logic through software, which was once done with relays and isolated analog instruments, has led to wide acceptance and integration of these systems in all industries. However, these digital control systems have often been integrated in phases to cover different aspects of an industrial operation, connected over a network, and leading to a complex interconnected and interdependent system. While the
control theory Control theory is a field of mathematics that deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a ...
applied is often nothing more than a digital version of their analog counterparts, the dependence of digital control systems upon the communications networks, has precipitated the need for
cybersecurity Computer security, cybersecurity (cyber security), or information technology security (IT security) is the protection of computer systems and networks from attack by malicious actors that may result in unauthorized information disclosure, the ...
due to potential effects on confidentiality, integrity and availability of the information. To achieve resilience in the next generation of
control systems A control system manages, commands, directs, or regulates the behavior of other devices or systems using control loops. It can range from a single home heating controller using a thermostat controlling a domestic boiler to large industrial c ...
, therefore, addressing the complex control system interdependencies, including the human systems interaction and cybersecurity, will be a recognized challenge.


Defining resilience

Research in resilience engineering over the last decade has focused in two areas, organizational and
information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of data . and information. IT forms part of information and communications technology (ICT). An information technology system (I ...
. Organizational resilience considers the ability of an organization to adapt and survive in the face of threats, including the prevention or mitigation of unsafe, hazardous or compromising conditions that threaten its very existence.
Information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of data . and information. IT forms part of information and communications technology (ICT). An information technology system (I ...
resilience has been considered from a number of standpoints . Networking resilience has been considered as
quality of service Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitat ...
. Computing has considered such issues as dependability and performance in the face of unanticipated changes . However, based upon the application of control dynamics to industrial processes, functionality and determinism are primary considerations that are not captured by the traditional objectives of information technology. . Considering the paradigm of control systems, one definition has been suggested that "Resilient control systems are those that tolerate fluctuations via their structure, design parameters,
control structure In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''imp ...
and control parameters". However, this definition is taken from the perspective of control theory application to a control system. The consideration of the malicious actor and cyber security are not directly considered, which might suggest the definition, "an effective reconstitution of control under attack from intelligent adversaries," which was proposed. However, this definition focuses only on resilience in response to a malicious actor. To consider the cyber-physical aspects of control system, a definition for resilience considers both benign and malicious human interaction, in addition to the complex interdependencies of the control system application . The use of the term “recovery” has been used in the context of resilience, paralleling the response of a rubber ball to stay intact when a force is exerted on it and recover its original dimensions after the force is removed. Considering the rubber ball in terms of a system, resilience could then be defined as its ability to maintain a desired level of performance or normalcy without irrecoverable consequences. While resilience in this context is based upon the
yield strength In materials science and engineering, the yield point is the point on a stress-strain curve that indicates the limit of elastic behavior and the beginning of plastic behavior. Below the yield point, a material will deform elastically and wi ...
of the ball, control systems require an interaction with the environment, namely the sensors, valves, pumps that make up the industrial operation. To be reactive to this environment, control systems require an awareness of its state to make corrective changes to the industrial process to maintain normalcy. With this in mind, in consideration of the discussed cyber-physical aspects of human systems integration and cyber security, as well as other definitions for resilience at a broader critical infrastructure level, the following can be deduced as a definition of a resilient control system: :"A resilient control system is one that maintains state awareness and an accepted level of operational normalcy in response to disturbances, including threats of an unexpected and malicious nature" Considering the flow of a digital
control system A control system manages, commands, directs, or regulates the behavior of other devices or systems using control loops. It can range from a single home heating controller using a thermostat controlling a domestic boiler to large industrial c ...
as a basis, a resilient control system framework can be designed. Referring to the left side of Fig. 1, a resilient control system holistically considers the measures of performance or normalcy for the
state space A state space is the set of all possible configurations of a system. It is a useful abstraction for reasoning about the behavior of a given system and is widely used in the fields of artificial intelligence and game theory. For instance, the toy ...
. At the center, an understanding of performance and priority provide the basis for an appropriate response by a combination of human and automation, embedded within a
multi-agent A multi-agent system (MAS or "self-organized system") is a computerized system composed of multiple interacting intelligent agents.Hu, J.; Bhowmick, P.; Jang, I.; Arvin, F.; Lanzon, A.,A Decentralized Cluster Formation Containment Framework fo ...
, semi-autonomous framework. Finally, to the right, information must be tailored to the consumer to address the need and position a desirable response. Several examples or scenarios of how resilience differs and provides benefit to control system design are available in the literature.


Areas Of resilience

Some primary tenets of resilience, as contrasted to traditional reliability, have presented themselves in considering an integrated approach to resilient control systems. These cyber-physical tenants complement the fundamental concept of dependable or reliable computing by characterizing resilience in regard to control system concerns, including design considerations that provide a level of understanding and assurance in the safe and secure operation of an industrial facility. These tenants are discussed individually below to summarize some of the challenges to address in order to achieve resilience.


Human systems

The benign human has an ability to quickly understand novel solutions, and provide the ability to adapt to unexpected conditions. This behavior can provide additional resilience to a control system, but reproducibly predicting
human behavior Human behavior is the potential and expressed capacity ( mentally, physically, and socially) of human individuals or groups to respond to internal and external stimuli throughout their life. Kagan, Jerome, Marc H. Bornstein, and Richard M. L ...
is a continuing challenge. The ability to capture historic human preferences can be applied to
bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
and bayesian belief networks, but ideally a solution would consider direct understanding of human state using sensors such as an
EEG Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in the neocortex ...
. Considering control system design and interaction, the goal would be to tailor the amount of automation necessary to achieve some level of optimal resilience for this mixed initiative response. Presented to the human would be that actionable information that provides the basis for a targeted, reproducible response.


Cyber security

In contrast to the challenges of prediction and integration of the benign human with control systems, the abilities of the malicious actor (or hacker) to undermine desired control system behavior also create a significant challenge to control system resilience. Application of dynamic
probabilistic risk analysis Probabilistic risk assessment (PRA) is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity (such as an airliner or a nuclear power plant) or the effects of stressors on the envir ...
used in
human reliability Human reliability (also known as human performance or HU) is related to the field of human factors and ergonomics, and refers to the reliability of humans in fields including manufacturing, medicine and nuclear power. Human performance can b ...
can provide some basis for the benign actor. However, the decidedly malicious intentions of an adversarial individual, organization or nation make the modeling of the human variable in both objectives and motives. However, in defining a control system response to such intentions, the malicious actor looks forward to some level of recognized behavior to gain an advantage and provide a pathway to undermining the system. Whether performed separately in preparation for a
cyber attack A cyberattack is any offensive maneuver that targets computer information systems, computer networks, infrastructures, or personal computer devices. An attacker is a person or process that attempts to access data, functions, or other restricted ...
, or on the system itself, these behaviors can provide opportunity for a successful attack without detection. Therefore, in considering resilient control system architecture, atypical designs that imbed active and passively implemented randomization of attributes, would be suggested to reduce this advantage.


Complex networks and networked control systems

While much of the current critical infrastructure is controlled by a web of interconnected control systems, either architecture termed as distributed control systems ( DCS) or supervisory control and data acquisition (
SCADA Supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications and graphical user interfaces for high-level supervision of machines and processes. It also covers sensors and ...
), the application of control is moving toward a more decentralized state. In moving to a smart grid, the complex interconnected nature of individual homes, commercial facilities and diverse power generation and storage creates an opportunity and a challenge to ensuring that the resulting system is more resilient to threats. The ability to operate these systems to achieve a global optimum for multiple considerations, such as overall efficiency, stability and security, will require mechanisms to holistically design complex
networked control system A networked control system (NCS) is a control system wherein the control loops are closed through a communication network. The defining feature of an NCS is that control and feedback signals are exchanged among the system's components in the form o ...
s. Multi-agent methods suggest a mechanism to tie a global objective to distributed assets, allowing for management and coordination of assets for optimal benefit and semi-autonomous, but constrained controllers that can react rapidly to maintain resilience for rapidly changing conditions.


Base Metrics for Resilient Control Systems

Establishing a metric that can capture the resilience attributes can be complex, at least if considered based upon differences between the interactions or interdependencies. Evaluating the control, cyber and cognitive disturbances, especially if considered from a disciplinary standpoint, leads to measures that already had been established. However, if the metric were instead based upon a normalizing dynamic attribute, such a performance characteristic that can be impacted by degradation, an alternative is suggested. Specifically, applications of base metrics to resilience characteristics are given as follows for type of disturbance: *Physical Disturbances: **Time Latency Affecting Stability **Data Integrity Affecting Stability *Cyber Disturbances: **Time Latency **Data Confidentiality, Integrity and Availability *Cognitive Disturbances: **Time Latency in Response **Data Digression from Desired Response Such performance characteristics exist with both time and data integrity. Time, both in terms of delay of mission and communications latency, and data, in terms of corruption or modification, are normalizing factors. In general, the idea is to base the metric on “what is expected” and not necessarily the actual initiator to the degradation. Considering time as a metrics basis, resilient and un-resilient systems can be observed in Fig. 2. Dependent upon the abscissa metrics chosen, Fig. 2 reflects a generalization of the resiliency of a system. Several common terms are represented on this graphic, including robustness, agility, adaptive capacity, adaptive insufficiency, resiliency and brittleness. To overview the definitions of these terms, the following explanations of each is provided below: *Agility: The derivative of the disturbance curve. This average defines the ability of the system to resist degradation on the downward slope, but also to recover on the upward. Primarily considered a time based term that indicates impact to mission. Considers both short term system and longer term human responder actions. *Adaptive Capacity: The ability of the system to adapt or transform from impact and maintain minimum normalcy. Considered a value between 0 and 1, where 1 is fully operational and 0 is the resilience threshold. *Adaptive Insufficiency: The inability of the system to adapt or transform from impact, indicating an unacceptable performance loss due to the disturbance. Considered a value between 0 and -1, where 0 is the resilience threshold and -1 is total loss of operation. *Brittleness: The area under the disturbance curve as intersected by the resilience threshold. This indicates the impact from the loss of operational normalcy. *Phases of Resilient Control System Preparation and Disturbance Response: **Recon: Maintaining proactive state awareness of system conditions and degradation **Resist: System response to recognized conditions, both to mitigate and counter **Respond: System degradation has been stopped and returning system performance **Restore: Longer term performance restoration, which includes equipment replacement *Resiliency: The converse of brittleness, which for a resilience system is “zero” loss of minimum normalcy. *Robustness: A positive or negative number associated with the area between the disturbance curve and the resilience threshold, indicating either the capacity or insufficiency, respectively. On the abscissa of Fig. 2, it can be recognized that cyber and cognitive influences can affect both the data and the time, which underscores the relative importance of recognizing these forms of degradation in resilient control designs. For cybersecurity, a single cyberattack can degrade a control system in multiple ways. Additionally, control impacts can be characterized as indicated. While these terms are fundamental and seem of little value for those correlating impact in terms like cost, the development of use cases provide a means by which this relevance can be codified. For example, given the impact to system dynamics or data, the performance of the control loop can be directly ascertained and show approach to instability and operational impact.


Resilience Manifold for Design and Operation

The very nature of control systems implies a starting point for the development of resilience metrics. That is, the control of a physical process is based upon quantifiable performance and measures, including first principles and stochastic. The ability to provide this measurement, which is the basis for correlating operational performance and adaptation, then also becomes the starting point for correlation of the data and time variations that can come from the cognitive, cyber-physical sources. Effective understanding is based upon developing a manifold of adaptive capacity that correlates the design (and operational) buffer. For a power system, this manifold is based upon the real and reactive power assets, the controllable having the latitude to maneuver, and the impact of disturbances over time. For a modern distribution system (MDS), these assets can be aggregated from the individual contributions as shown in Fig. 3. For this figure, these assets include: a) a battery, b) an alternate tie line source, c) an asymmetric P/Q-conjectured source, d) a distribution static synchronous compensator (DSTATCOM), and e) low latency, four quadrant source with no energy limit.


Examples of Resilient Control System Developments

1) When considering the current digital control system designs, the cyber security of these systems is dependent upon what is considered border protections, i.e., firewalls, passwords, etc. If a malicious actor compromised the digital control system for an industrial operation by a
man-in-the-middle attack In cryptography and computer security, a man-in-the-middle, monster-in-the-middle, machine-in-the-middle, monkey-in-the-middle, meddler-in-the-middle, manipulator-in-the-middle (MITM), person-in-the-middle (PITM) or adversary-in-the-middle (AiTM) ...
, data can be corrupted with the control system. The industrial facility operator would have no way of knowing the data has been compromised, until someone such as a security engineer recognized the attack was occurring. As operators are trained to provide a prompt, appropriate response to stabilize the industrial facility, there is a likelihood that the corrupt data would lead to the operator reacting to the situation and lead to a plant upset. In a resilient control system, as per Fig. 1, cyber and physical data is fused to recognize anomalous situations and warn the operator. 2) As our society becomes more automated for a variety of drivers, including energy efficiency, the need to implement ever more effective control algorithms naturally follow. However, advanced control algorithms are dependent upon data from multiple sensors to predict the behaviors of the industrial operation and make corrective responses. This type of system can become very brittle, insofar as any unrecognized degradation in the sensor itself can lead to incorrect responses by the control algorithm and potentially a worsened condition relative to the desired operation for the industrial facility. Therefore, implementation of advanced control algorithms in a resilient control system also requires the implementation of diagnostic and prognostic architectures to recognize sensor degradation, as well as failures with industrial process equipment associated with the control algorithms.


Resilient Control System Solutions and the Need for Interdisciplinary Education

In our world of advancing automation, our dependence upon these advancing technologies will require educated skill sets from multiple disciplines. The challenges may appear simply rooted in better design of control systems for greater safety and efficiency. However, the evolution of the technologies in the current design of automation has created a complex environment in which a cyber-attack, human error (whether in design or operation), or a damaging storm can wreak havoc on the basic infrastructure. The next generation of systems will need to consider the broader picture to ensure a path forward where failures do not lead to ever greater catastrophic events. One critical resource are students who are expected to develop the skills necessary to advance these designs, and require both a perspective on the challenges and the contributions of others to fulfill the need. Addressing this need, courses have been developed to provide the perspectives and relevant examples to overview the issues and provide opportunity to create resilient solutions at such universities a
George Mason University
an
Northeastern
The tie to critical infrastructure operations is an important aspect of these courses. Through the development of technologies designed to set the stage for next generation automation, it has become evident that effective teams are comprised several disciplines.T.R. McJunkin, C.G. Rieger, B.K. Johnson, D.S. Naidu, J.F. Gardner, L.H. Beaty, I. Ray, K. L. Le Blanc, M. Guryan, "Interdisciplinary Education through “Edu-tainment”: Electric Grid Resilient Control Systems Course," 122nd ASEE Annual Conference and Exposition, June 2015. However, developing a level of effectiveness can be time consuming, and when done in a professional environment can expend a lot of energy and time that provides little obvious benefit to the desired outcome. It is clear that the earlier these
STEM Stem or STEM may refer to: Plant structures * Plant stem, a plant's aboveground axis, made of vascular tissue, off which leaves and flowers hang * Stipe (botany), a stalk to support some other structure * Stipe (mycology), the stem of a mushro ...
disciplines can be successfully integrated, the more effective they are at recognizing each other’s contributions and working together to achieve a common set of goals in the professional world
Team competition
at venues such a
Resilience Week
will be a natural outcome of developing such an environment, allowing interdisciplinary participation and providing an exciting challenge to motivate students to pursue a STEM education.


Standardizing Resilience and Resilient Control System Principles

Standards and policy that define resilience nomenclature and metrics are needed to establish a value proposition for investment, which includes government, academia and industry. The
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
Industrial Electronics Society has taken the lead in forming
technical committee
toward this end. The purpose of this committee will be to establish metrics and standards associated with codifying promising technologies that promote resilience in automation. This effort is distinct from more supply chain community focus on resilience and security, such as the efforts o
ISO
an
NIST


Notes


References

* * * * * * * * * * * * * ;Attribution * ** {{Authority control National security policies Computer security Control engineering