Many-task Computing

	Many-task Computing Many-task computing (MTC)I. Raicu, I. Foster, Y. Zhao. "Many-Task Computing for Grids and Supercomputers", IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS08), 2008 in computational science is an approach to parallel computing that aims to bridge the gap between two computing paradigms: high-throughput computing (HTC) and high-performance computing (HPC). Definition MTC is reminiscent of HTC, but it "differs in the emphasis of using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations (e.g. jobs) per month. MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. Tasks may be small or large, uniprocessor or multiprocessor, compute-intensive or data-intensive. The set of tasks may be static or ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Computational Science Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science, and more specifically the Computer Sciences, which uses advanced computing capabilities to understand and solve complex physical problems. While this typically extends into computational specializations, this field of study includes: * Algorithms ( numerical and non-numerical): mathematical models, computational models, and computer simulations developed to solve sciences (e.g, physical, biological, and social), engineering, and humanities problems * Computer hardware that develops and optimizes the advanced system hardware, firmware, networking, and data management components needed to solve computationally demanding problems * The computing infrastructure that supports both the science and engineering problem solving and the developmental computer and information science In practical use, it is typically the application of compu ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Parallel Computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: Bit-level parallelism, bit-level, Instruction-level parallelism, instruction-level, Data parallelism, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling.S.V. Adve ''et al.'' (November 2008)"Parallel Computing Research at Illinois: The UPCRC Agenda" (PDF). Parallel@Illinois, University of Illinois at Urbana-Champaign. "The main techniques for these performance benefits—increased clock frequency and smarter but increasingly complex architectures—are now hitting the so-called power wall. The computer industry has accepted that future performance inc ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Computing Paradigm A programming paradigm is a relatively high-level way to conceptualize and structure the implementation of a computer program. A programming language can be classified as supporting one or more paradigms. Paradigms are separated along and described by different dimensions of programming. Some paradigms are about implications of the execution model, such as allowing side effects, or whether the sequence of operations is defined by the execution model. Other paradigms are about the way code is organized, such as grouping into units that include both state and behavior. Yet others are about syntax and grammar. Some common programming paradigms include (shown in hierarchical relationship): * Imperative code directly controls execution flow and state change, explicit statements that change a program state procedural organized as procedures that call each other object-oriented organized as objects that contain both data structure and associated behavior, uses data struct ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	High-throughput Computing In computer science, high-throughput computing (HTC) is the use of many computing resources over long periods of time to accomplish a computational task. Challenges The HTC community is also concerned with robustness and reliability of jobs over a long-time scale. That is, being able to create a reliable system from unreliable components. This research is similar to transaction processing, but at a much larger and distributed scale. Some HTC systems, such as HTCondor and PBS, can run tasks on opportunistic resources. It is a difficult problem, however, to operate in this environment. On one hand the system needs to provide a reliable operating environment for the user's jobs, but at the same time the system must not compromise the integrity of the execute node and allow the owner to always have full control of their resources. Vs. high-performance vs. many-task There are many differences between high-throughput computing, high-performance computing (HPC), and many-task comput ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	High-performance Computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multidisciplinary field that combines digital electronics, computer architecture, system software, programming languages, algorithms and computational techniques. HPC technologies are the tools and systems used to implement and create high performance computing systems. Recently, HPC systems have shifted from supercomputing to computing clusters and grids. Because of the need of networking in clusters and grids, High Performance Computing Technologies are being promoted by the use of a collapsed network backbone, because the collapsed backbone architecture is simple to troubleshoot and upgrades can be applied to a single router as opposed to multiple ones. HPC integrates with data analytics in AI engineering workflows to generate ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Data-intensive Computing Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive if they require large volumes of data and devote most of their processing time to input/output and manipulation of data. Introduction The rapid growth of the Internet and World Wide Web led to vast amounts of information available online. In addition, business and government organizations create large amounts of both structured and unstructured information, which need to be processed, analyzed, and linked. Vinton Cerf described this as an “information avalanche” and stated, “we must harness the Internet’s energy before the information it has unleashed buries us”. An International Data ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Embarrassingly Parallel In parallel computing, an embarrassingly parallel workload or problem (also called embarrassingly parallelizable, perfectly parallel, delightfully parallel or pleasingly parallel) is one where little or no effort is needed to split the problem into a number of parallel tasks. This is due to minimal or no dependency upon communication between the parallel tasks, or for results between them.Section 1.4.4 of: These differ from distributed computing problems, which need communication between tasks, especially communication of intermediate results. They are easier to perform on server farms which lack the special infrastructure used in a true supercomputer cluster. They are well-suited to large, Internet-based volunteer computing platforms such as BOINC, and suffer less from parallel slowdown. The opposite of embarrassingly parallel problems are inherently serial problems, which cannot be parallelized at all. A common example of an embarrassingly parallel problem is 3D video renderi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mapreduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering and sorting (such as sorting students by first name into queues, one queue for each name), and a '' reduce'' method, which performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance. The model is a specialization of the ''split-apply-combine'' strategy for data analysis. It is inspired by the map and reduce functions commonly used in functional programming,"Our abstracti ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Boinc The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it became the platform for many other applications in areas as diverse as medicine, molecular biology, mathematics, linguistics, climatology, environmental science, and astrophysics, among others. The purpose of BOINC is to enable researchers to utilize processing resources of personal computers and other devices around the world. BOINC development began with a group based at the Space Sciences Laboratory (SSL) at the University of California, Berkeley, and led by David P. Anderson, who also led SETI@home. As a high-performance volunteer computing platform, BOINC brings together 34,236 active participants employing 136,341 active computers (hosts) worldwide, processing daily on average 20.164 PetaFLOPS (it would be the 21st largest processing capa ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]