Lightweight Kernel Operating System
   HOME

TheInfoList



OR:

A lightweight kernel (LWK) operating system is one used in a large computer with many
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
cores, termed a
parallel computer Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
. A
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
(HPC) system is particularly sensitive to
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
overhead. Traditional multi-purpose operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often inter-dependent on each other. The computing overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. A very common
parallel programming model In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its ''generality ...
is referred to as the
bulk synchronous parallel The bulk synchronous parallel (BSP) abstract computer is a bridging model for designing parallel algorithms. It is similar to the parallel random access machine (PRAM) model, but unlike PRAM, BSP does not take communication and synchronization fo ...
model which often employs Message Passing Interface (MPI) for communication. The synchronization events are made at specific points in the
application code This glossary of computer software terms lists the general terms related to computer software, and related fields, as commonly used in Wikipedia articles. Glossary See also * Outline of computer programming * Outline of soft ...
. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Unpredictable operating system overhead is one significant reason a processor might take longer to reach the synchronization point than the others.


Examples

Custom lightweight kernel operating systems, used on some of the fastest computers in the world, help alleviate this problem. The IBM
Blue Gene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
line of
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
s runs various versions of
CNK operating system Compute Node Kernel (CNK) is the node level operating system for the IBM Blue Gene series of supercomputers.''Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference'' 2004, by Marco Danelutto, Marco Vanneschi and Domenico Lafore ...
. The
Cray XT4 The Cray XT4 (codenamed ''Hood'' during development) is an updated version of the Cray XT3 supercomputer. It was released on November 18, 2006. It includes an updated version of the SeaStar interconnect router called SeaStar2, processor sockets ...
and
Cray XT5 The Cray XT5 is an updated version of the Cray XT4 supercomputer, launched on November 6, 2007. It includes a faster version of the XT4's SeaStar2 interconnect router called SeaStar2+, and can be configured either with XT4 compute blades, which ...
supercomputers run Compute Node Linux while the earlier XT3 ran the lightweight kernel
Catamount The cougar (''Puma concolor'') is a large cat native to the Americas. Its range spans from the Canadian Yukon to the southern Andes in South America and is the most widespread of any large wild terrestrial mammal in the Western Hemisphere. ...
which was based on
SUNMOS SUNMOS (Sandia/UNM Operating System) is an operating system jointly developed by Sandia National Laboratories and the Computer Science Department at the University of New Mexico. The goal of the project, started in 1991, is to develop a highly porta ...
.
Sandia National Laboratories Sandia National Laboratories (SNL), also known as Sandia, is one of three research and development laboratories of the United States Department of Energy's National Nuclear Security Administration (NNSA). Headquartered in Kirtland Air Force Ba ...
has an almost two-decade commitment to lightweight kernels on its high-end HPC systems. Sandia and University of New Mexico researchers began work on
SUNMOS SUNMOS (Sandia/UNM Operating System) is an operating system jointly developed by Sandia National Laboratories and the Computer Science Department at the University of New Mexico. The goal of the project, started in 1991, is to develop a highly porta ...
for the Intel Paragon in the early 1990s. This operating system evolved into the Puma, Cougar - which achieved the first teraflop on
ASCI Red ASCI Red (also known as ASCI Option Red or TFLOPS) was the first computer built under the Accelerated Strategic Computing Initiative ( ASCI), the supercomputing initiative of the United States government created to help the maintenance of the ...
- and Catamount on Red Storm. Sandia continues its work in LWKs with a new R&D effort, called kitten.


Characteristics

Although it is surprisingly difficult to exactly define what a lightweight kernel is, there are some common design goals: * Targeted at massively parallel environments composed of thousands of processors with distributed memory and a tightly coupled network. * Provide necessary support for scalable, performance-oriented scientific applications. * Offer a suitable development environment for parallel applications and libraries. * Emphasize efficiency over functionality. * Maximize the amount of resources (e.g., CPU, memory, and network bandwidth) allocated to the application. * Seek to minimize time to completion for the application.


Implementation

LWK implementations vary, but all strive to provide applications with predictable and maximum access to the
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU) and other system resources. To achieve this, simplified algorithms for scheduling and memory management are usually included. System services (e.g., daemons), are limited to the absolute minimum. Available services, such as job launch are constructed in a hierarchical fashion to ensure scalability to thousands of nodes. Networking protocols for communication between nodes in the system are also carefully selected and implemented to ensure scalability. One such example is the
Portals network programming application programming interface Portals is a low-level network API for high-performance networking on high-performance computing systems developed by Sandia National Laboratories and the University of New Mexico. Portals is currently the lowest-level network programming interfac ...
(API). Lightweight kernel operating systems assume access to a small set of nodes that are running full-service operating systems to offload some of the necessary services: login access, compiling environments, batch job submission, and file I/O. By restricting services to only those that are absolutely necessary and by streamlining those that are provided, the overhead (sometimes called noise) of the lightweight operating system is minimized. This allows a significant ''and'' predictable amount of the processor cycles to be given to the parallel application. Since the application can make consistent progress on each processor, they will reach their synchronization points faster, ideally at the same time. Lost wait time is reduced.


Future

The last supercomputers running lightweight kernels are the remaining IBM
Bluegene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
systems running CNK. A new direction for lightweight kernels is to combine them with a full-featured OS, such as Linux, on a many-core node. These
multikernel A multikernel operating system treats a multi-core machine as a network of independent cores, as if it were a distributed system. It does not assume shared memory but rather implements inter-process communications as message-passing. Barrelfish was ...
operating systems run a lightweight kernel on some of the CPU cores of a node, while other cores provide services that are omitted in lightweight kernels. By combining the two, users get the Linux features they need but also the deterministic behavior and scalability of lightweight kernels.


References

{{Supercomputer operating systems Supercomputer operating systems Massively parallel computers