Sunway SW26010

	Sunway SW26010 The SW26010 is a 260-core manycore processor designed by the Shanghai Integrated Circuit Technology and Industry Promotion Center (ICC for short)(Chinese: 上海集成电路技术与产业促进中心 (简称ICC)). It implements the Sunway architecture, a 64-bit reduced instruction set computing (RISC) architecture designed in China. The SW26010 has four clusters of 64 ''Compute-Processing Elements'' (CPEs) which are arranged in an eight-by-eight array. The CPEs support SIMD instructions and are capable of performing eight double-precision floating-point operations per cycle. Each cluster is accompanied by a more conventional general-purpose core called the ''Management Processing Element'' (MPE) that provides supervisory functions. Each cluster has its own dedicated DDR3 SDRAM controller and a memory bank with its own address space. The processor runs at a clock speed of 1.45 GHz. The CPE cores feature 64 KB of scratchpad memory for data and 16 KB for instructions, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Manycore Processor Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are used extensively in embedded computers and high-performance computing. Contrast with multicore architecture Manycore processors are distinct from multi-core processors in being optimized from the outset for a higher degree of explicit parallelism, and for higher throughput (or lower power consumption) at the expense of latency and lower single-thread performance. The broader category of multi-core processors, by contrast, are usually designed to efficiently run ''both'' parallel ''and'' serial code, and therefore place more emphasis on high single-thread performance (e.g. devoting more silicon to out of order execution, deeper pipelines, more superscalar execution units, and larger, more general caches), and shared memory. These techniqu ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Instruction (computing) In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ''implementation''. In general, an ISA defines the supported instructions, data types, registers, the hardware support for managing main memory, fundamental features (such as the memory consistency, addressing modes, virtual memory), and the input/output model of a family of implementations of the ISA. An ISA specifies the behavior of machine code running on implementations of that ISA in a fashion that does not depend on the characteristics of that implementation, providing binary compatibility between implementations. This enables multiple implementations of an ISA that differ in characteristics such as performance, physical size, and monetary cost (among other things), but that are capable of running the same machine code, so that a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Loongson Loongson () is the name of a family of general-purpose, MIPS architecture-compatible microprocessors, as well as the name of the Chinese fabless company (Loongson Technology) that develops them. The processors are alternately called Godson processors, which are described as its academic name. History The ''Godson'' processors, based on MIPS architecture, were initially developed at the ''Institute of Computing Technology'' (ICT), Chinese Academy of Sciences (CAS). The chief architect was Professor . The development of the first Loongson chip was started in 2001. The aim of the Godson project was to develop "high performance general-purpose microprocessors in China", and to become technologically self-sufficient as part of the Made in China 2025 plan. The development was supported by funding via the 10th and 11th Five-Year Plans. In 2010 the company was commercialised as a separate entity, and in April 2010 ''Loongson Technology Corporation Limited'' was formally established ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Massively Parallel Processor Array A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications. Architecture MPPA is a MIMD (Multiple Instruction streams, Multiple Data) architecture, with distributed memory accessed locally, not shared globally. Each processor is strictly encapsulated, accessing only its own code and memory. Point-to-point communication between processors is directly realized in the configurable interconnect. The MPPA's massive parallelism and its distributed memory MIMD architecture distinguishes it fro ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	LINPACK Benchmark The LINPACK Benchmarks are a measure of a system's Floating-point arithmetic, floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n'' system of linear equations ''Ax'' = ''b'', which is a common task in engineering. The latest version of these benchmark (computing), benchmarks is used to build the TOP500 list, ranking the world's most powerful supercomputers. The aim is to approximate how fast a computer will perform when solving real problems. It is a simplification, since no single computational task can reflect the overall performance of a computer system. Nevertheless, the LINPACK benchmark performance can provide a good correction over the peak performance provided by the manufacturer. The peak performance is the maximal theoretical performance a computer can achieve, calculated as the machine's frequency, in cycles per second, times the number of operations per cycle it can perform. The actual per ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	PFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measure than measuring instructions per second. Floating-point arithmetic Floating-point arithmetic is needed for very large or very small real numbers, or computations that require a large dynamic range. Floating-point representation is similar to scientific notation, except everything is carried out in base two, rather than base ten. The encoding scheme stores the sign, the exponent (in base two for Cray and VAX, base two or ten for IEEE floating point formats, and base 16 for IBM Floating Point Architecture) and the significand (number after the radix point). While several similar formats are in use, the most common is ANSI/IEEE Std. 754-1985. This standard defines the format for 32-bit numbers called ''single precision'', as well as 64-b ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL (benchmark), HPL, a portable implementation of the high-performance LINPACK Benchmark (computing), benchmark written in Fortran for distributed-memory computers. The 60th TOP500 was published in November 2022. Since June 2022, USA's Frontier (supercomputer), Frontier is the most powerful supercomputer on TOP500, reaching 1102 petaFlops (1.102 exaFlops) on the LINPACK benchmarks. The United States has by far the highest share of total computing ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions per second (MIPS). Since 2017, there have existed supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers. Supercomputers play an important role in the field of computational science, and are used for a wide range of computationally intensive tasks in var ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sunway TaihuLight The Sunway TaihuLight ( ''Shénwēi·tàihú zhī guāng'') is a Chinese supercomputer which, , is ranked fourth in the TOP500 list, with a LINPACK benchmark rating of 93 petaflops. The name is translated as ''divine power, the light of Taihu Lake''. This is nearly three times as fast as the previous Tianhe-2, which ran at 34 petaflops. , it is ranked as the 16th most energy-efficient supercomputer in the Green500, with an efficiency of 6.051 GFlops/watt. It was designed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC) and is located at the National Supercomputing Center in Wuxi in the city of Wuxi, in Jiangsu province, China. The Sunway TaihuLight was the world's fastest supercomputer for two years, from June 2016 to June 2018, according to the TOP500 lists. The record was surpassed in June 2018 by IBM's Summit. Architecture The Sunway TaihuLight uses a total of 40,960 Chinese-designed SW26010 manycore 64-bit RISC processors based on th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	L2 Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit (MMU) whi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Data Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit (MMU) whi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Instruction Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit (MMU) w ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]