DOME Project
   HOME

TheInfoList



OR:

DOME is a Dutch government-funded project between IBM and
ASTRON Astron may refer to: * Mitsubishi Astron engine * ASTRON, the Dutch foundation for astronomy research, operating the Westerbork Synthesis Radio Telescope and LOFAR * Astron (comics), a fictional character, a member of the Marvel Comics group The Et ...
in form of a public-private-partnership focussing on the
Square Kilometre Array The Square Kilometre Array (SKA) is an intergovernmental international radio telescope project being built in Australia (low-frequency) and South Africa (mid-frequency). The combining infrastructure, the Square Kilometre Array Observatory (SKA ...
(SKA), the world's largest planned
radio telescope A radio telescope is a specialized antenna and radio receiver used to detect radio waves from astronomical radio sources in the sky. Radio telescopes are the main observing instrument used in radio astronomy, which studies the radio frequency ...
. SKA will be built in Australia and South Africa. The DOME project objective is technology roadmap development that applies both to SKA and IBM. The 5-year project was started in 2012 and is co-funded by the Dutch government and
IBM Research IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research org ...
in Zürich, Switzerland and ASTRON in the Netherlands. The project ended officially on 30 September 2017. The DOME project is focusing on three areas of computing,
green computing Green computing, green IT, or ICT sustainability, is the study and practice of environmentally sustainable computing or IT. The goals of green computing are similar to green chemistry: reduce the use of hazardous materials, maximize energy effici ...
, data and streaming and nano-photonics and partitioned into seven research projects. * P1 Algorithms & Machines – As traditional computing scaling have essentially hit a wall, a new set of methodologies and principles is needed for the design of future large-scale computers. This will be an umbrella project for the other six. * P2 Access Patterns – When faced with storing petabytes of data per day, new thinking of data storage tiering and storage medium must be developed. * P3 Nano Photonics – Fiber optic communication over long distances and between systems is nothing new, but there is a lot to do for optic communications within computer systems and within the telescopes themselves. * P4 Microservers – New demands on higher computing density, higher performance per Watt, and reduced complexity of systems suggests a new kind of custom designed server * P5 Accelerators – With the flattening of general computing performance, special architectures for addressing next level of performance will be investigated for specialized tasks like signal processing and analysis. * P6 Compressive Sampling – Fundamental research into tailored signal processing and machine learning algorithms for the capture, processing, and analysis of the radio astronomy data. Compressive sensing, algebraic systems, machine learning and pattern recognition are focus areas. * P7 Real-Time Communication – Reduce the latency caused by redundant network operations at very large scale systems and optimize the utility of the communications bandwidth so that the correct data gets to the correct processing unit in real time.


P1 Algorithms & Machines

The design of computers has changed dramatically in the last decades but the old paradigms still reign. Current designs stem from single computers working on small data sets in one location. SKA will face a completely different landscape, working on an extremely large data set, collected on myriad of geographically separated locations using tens of thousands of separate computers in real time. The fundamental principles for designing such a machine will have to be reexamined. Parameters concerning power envelope, accelerator technologies, workload distribution, memory size, CPU architecture, node intercommunications, must be investigated to draw new baseline to design from. The tools that result from this project are being open-sourced early 2018. This fundamental research will work as the umbrella for the other six focus areas, help making proper decisions regarding architectural directions. A first step will be a retrospective analysis of the design of the LOFAR and
MeerKAT MeerKAT, originally the Karoo Array Telescope, is a radio telescope consisting of 64 antennas in the Meerkat National Park, in the Northern Cape of South Africa. In 2003, South Africa submitted an expression of interest to host the Square Kilom ...
telescopes and development of a design tool to use when designing very large and distributed computers.


P2 Access Patterns

This project will focus on the very large amount of data the DOME must handle. SKA will generates petabytes of data daily and this must be handled differently according to urgency and geographical location whether its near the telescope arrays or in the datacenters. A complex tiered solution must be devised using a lot of technologies that currently is beyond the state of the art. Driving forces behind the designs will be lowest possible cost, accessibility and energy efficiency. This multi-tier approach will combine several different kinds of software technologies to analyze, sift, distribute, store and retrieve data on hardware ranging from traditional storage media like
magnetic tape Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magne ...
and
hard drives A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magneti ...
to newly developed technologies like
phase-change memory Phase-change memory (also known as PCM, PCME, PRAM, PCRAM, OUM (ovonic unified memory) and C-RAM or CRAM (chalcogenide RAM)) is a type of non-volatile random-access memory. PRAMs exploit the unique behaviour of chalcogenide glass. In PCM, heat pr ...
. The suitability of different storage media heavily depends on the usage patterns when writing and reading data, and these patterns will change over time, so there must also be room for changes to the designs.


P3 Nano Photonics

Transport of data is a major factor, influencing design on the largest scales to the smallest of DOME. The cost of communicating electrically on copper wires will drive the application of low-power
photonic Photonics is a branch of optics that involves the application of generation, detection, and manipulation of light in form of photons through emission, transmission, modulation, signal processing, switching, amplification, and sensing. Though ...
interconnects, from connections between collecting antennas and datacenters to connecting devices inside the computers. Both IBM and ASTRON have advanced research programs into nano photonics,
beamforming Beamforming or spatial filtering is a signal processing technique used in sensor arrays for directional signal transmission or reception. This is achieved by combining elements in an antenna array in such a way that signals at particular angles e ...
and optical links and they will combine their efforts for the new designs. This research project is divided into four R&D sections, investigating digital optical interconnects, analog optical interconnects and analog optical signal processing. # Digital optical interconnect technology for astronomy
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...
boards. # Analog optical interconnection technology for
focal-plane array A staring array, also known as staring-plane array or focal-plane array (FPA), is an image sensor consisting of an array (typically rectangular) of light-sensing pixels at the focal plane of a lens. FPAs are used most commonly for imaging purpo ...
front-ends. # Analog optical interconnection technology for photonic
phased array In antenna theory, a phased array usually means an electronically scanned array, a computer-controlled array of antennas which creates a beam of radio waves that can be electronically steered to point in different directions without moving th ...
receiver tiles. # Analog optical interconnection and signal processing technology for photonic focal plane arrays. In February 2013 at the
International Solid-State Circuits Conference International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state circuits and Systems-on-a-Chip. The conference is held every year in February at the San Francisco Marriott Marquis in downtown San Fra ...
(ISSCC), IBM and
École Polytechnique Fédérale de Lausanne École may refer to: * an elementary school in the French educational stages normally followed by secondary education establishments (collège and lycée) * École (river), a tributary of the Seine flowing in région Île-de-France * École, Savoi ...
(EPFL) in Switzerland showed a 100 
Gbit/s In telecommunications, data-transfer rate is the average number of bits (bitrate), characters or symbols (baudrate), or data blocks per unit time passing through a communication link in a data-transmission system. Common data rate units are multi ...
analog-to-digital converter In electronics, an analog-to-digital converter (ADC, A/D, or A-to-D) is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provide ...
(ADC). In February 2014 at ISSCC, IBM and ASTRON demoed a 400 Gbit/s ADC.


P4 Microservers

In 2012 a team at IBM led by Ronald P. Luijten started pursuing a computational dense, and energy efficient
64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...
compute server design based on commodity components, running
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
. A
system-on-chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...
(SoC) design where most necessary components would fit on a single chip would fit these goals best, and a definition of "microserver" emerged where essentially a complete motherboard (except RAM and boot flash) would fit on chip.
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between the ...
,
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
and
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
based solutions were investigated and a solution based on
Freescale Freescale Semiconductor, Inc. was an American semiconductor manufacturer. It was created by the divestiture of the Semiconductor Products Sector of Motorola in 2004. Freescale focused their integrated circuit products on the automotive, embed ...
's Power ISA-based
dual core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate Central processing unit, processing units, called cores, each of which reads and executes Instruction set, program instructions. The instructio ...
P5020 /
quad core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate Central processing unit, processing units, called cores, each of which reads and executes Instruction set, program instructions. The instructio ...
P5040 processor came out on top.


Design

The resulting microserver is fit inside the same form factor as standard
FB-DIMM Fully Buffered DIMM (or FB-DIMM) is a memory technology that can be used to increase reliability and density of memory systems. Unlike the parallel bus architecture of traditional DRAMs, an FB-DIMM has a serial interface between the memory contro ...
socket. The SoC chip, about 20 GB of
DRAM Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal-oxid ...
and a few control chips (such as the PSoC 3 from
Cypress Cypress is a common name for various coniferous trees or shrubs of northern temperate regions that belong to the family Cupressaceae. The word ''cypress'' is derived from Old French ''cipres'', which was imported from Latin ''cypressus'', the ...
used for monitoring,
debugging In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems. Debugging tactics can involve in ...
and
booting In computing, booting is the process of starting a computer as initiated via hardware such as a button or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so som ...
) comprise a complete compute node with the physical dimensions of 133×55 mm. The card's pins are used for a
SATA SATA (Serial AT Attachment) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard to ...
, five
Gbit The bit is the most basic Units of information, unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a truth value, logical state with one of two possible value (computer sc ...
and two 10 Gbit
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
ports, one
SD card Secure Digital, officially abbreviated as SD, is a proprietary non-volatile flash memory card format developed by the SD Association (SDA) for use in portable devices. The standard was introduced in August 1999 by joint efforts between SanDis ...
interface, one
USB 2 Universal Serial Bus (USB) is an industry standard that establishes specifications for cables, connectors and protocols for connection, communication and power supply ( interfacing) between computers, peripherals and other computers. A broa ...
interface, and power. The compute card operates within a 35 W power envelope with headroom up to 70 W. The idea is to fit about a hundred of these compute cards within a 19" rack 2U drawer together with network switchboards for external storage and communication. Cooling will be provided via the
Aquasar Aquasar is a supercomputer (a high-performance computer) prototype created by IBM Labs in collaboration with ETH Zurich in Zürich, Switzerland and ETH Lausanne in Lausanne, Switzerland. While most supercomputers use air as their coolant of choi ...
hot water cooling solution pioneered by the SuperMUC supercomputer in Germany.


Future

In late 2013 a new SoC was chosen. Freescale's newer 12 core T4240 is significantly more powerful and operates within the same power envelope as the T5020. A new prototype micro server card was built and validated for the larger scale deployment in the full 2U drawer in early 2014. Later an 8-core ARMv8 board was developed using the LS2088A part from NXP (Formerly Freescale). At the end of 2017, IBM is licensing the technology to a startup who plans to take this to market by mid 2018.


P5 Accelerators

Traditional high performance processors hit a performance wall during the late 2000s when clock-speeds couldn't be increased anymore due to increasing power requirements. One of the solutions is to include hardware to off load the most common and/or compute intensive tasks to specialized hardware called accelerators. This research area will try to identify these areas and design algorithms and hardware to overcome the bottlenecks. There will probably be accelerators doing
pattern detection Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
,
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
, data lookup and
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...
. The hardware will be of two classes; fixed accelerators for static tasks, or programmable accelerators for a family of tasks with similar characteristics. The project will also look att massively
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
using commodity
graphics processors A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
.


P6 Compressive Sampling

The compressive sampling project is fundamental research into
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...
in collabrotation with
Delft University of Technology Delft University of Technology ( nl, Technische Universiteit Delft), also known as TU Delft, is the oldest and largest Dutch public technical university, located in Delft, Netherlands. As of 2022 it is ranked by QS World University Rankings among ...
. In the context of
radio astronomy Radio astronomy is a subfield of astronomy that studies celestial objects at radio frequencies. The first detection of radio waves from an astronomical object was in 1933, when Karl Jansky at Bell Telephone Laboratories reported radiation coming f ...
capture, analysis and processing of signals is extremely compute intensive on enormous datasets. The goal is to do sampling and
compression Compression may refer to: Physical science *Compression (physics), size reduction due to forces *Compression member, a structural element such as a column *Compressibility, susceptibility to compression *Gas compression *Compression ratio, of a c ...
simultaneously and use
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
to detect what to keep and what to throw away, preferably as close to the data collectors as possible. This project's goal is to develop compressive sampling algorithms to use in capturing the signal and to calibrate the
patterns A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated l ...
to keep, in an ever-increasing number of pattern clusters. The research will also tackle the problem of degraded pattern quality,
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
detection, object classification and image formation.


P7 Real-Time Communication

Moving data from the collectors to the process facilities are traditionally bogged down due to high latency I/O, low
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
connections and data is often multiplied along the way due to lack of purposeful design of the communication network. This research project will try to reduce latency to a minimum and design the I/O systems so data will be written directly into the processing engines on an exascale computer design. The first phase will identify system bottlenecks, and investigate Remote direct memory access (RDMA). The second phase will investigate using standard RDMA technology onto interconnect networking. Phase three includes development of functional prototypes.ASTRON & IBM Center for Exascale Technology – RT Communication
/ref>


References

Data processing Public–private partnership Square Kilometre Array