HOME

TheInfoList



OR:

The Byte Sieve is a computer-based implementation of the
Sieve of Eratosthenes In mathematics, the sieve of Eratosthenes is an ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e., not prime) the multiples of each prime, starting with the first prime n ...
published by ''
Byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
'' as a
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
performance benchmark. It first appeared in the September 1981 edition of the magazine and was revisited on occasion. Although intended to compare the performance of different languages on the same computers, it quickly became a widely used machine benchmark. The Sieve was one of the more popular benchmarks of the home computer era, another being the
Creative Computing Benchmark The Creative Computing Benchmark, also called Ahl's Simple Benchmark, is a computer benchmark that was used to compare the performance of the BASIC programming language on various machines. It was first introduced in the November 1983 issue of ''Cr ...
of 1983, and the Rugg/Feldman benchmarks, mostly seen in the UK in this era. ''Byte'' later published the more thorough
NBench NBench, short for Native mode Benchmark and later known as BYTEmark, is a synthetic computing benchmark program developed in the mid-1990s by the now defunct BYTE magazine intended to measure a computer's CPU, FPU, and Memory System speed. Hist ...
in 1995 to replace it.


History


Origins

Jim Gilbreath of the Naval Ocean System Center had been considering the concept of writing a small language benchmarking program for some time, desiring one that would be portable across languages, small enough that the program code would fit on a single printed page, and did not rely on specific features like hardware multiplication or division. The solution was inspired by a meeting with
Chuck Forsberg Charles Alton "Chuck" Forsberg (May 6, 1944 – September 24, 2015) developed two data transmission protocols popular in the 1990s, for uploading and downloading files from dial-up bulletin board systems. He received a Dvorak Award for Excell ...
at the January 1980 USENIX meeting in
Boulder, CO Boulder is a home rule city that is the county seat and most populous municipality of Boulder County, Colorado, United States. The city population was 108,250 at the 2020 United States census, making it the 12th most populous city in Colora ...
, where Forsberg mentioned an implementation of the sieve written by
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
. Gilbreath felt the sieve would be an ideal benchmark as it avoided indirect tests on arithmetic performance, which varied widely between systems. The algorithm mostly stresses array lookup performance and basic logic and branching capabilities. Nor does it require any advanced language features like
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
or advanced collection types. The only modification from Knuth’s original version was to remove a multiplication by two and replace it with an addition instead. Machines with hardware multipliers would otherwise run so much faster that the rest of the performance would be hidden. After six months of effort porting it to as many platforms as he had access to, the first results were introduced in the September 1981 edition of ''
Byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
'' in an article entitled "A High-Level Language Benchmark". Gilbreath was quick to point out that: The article provided reference implementations in ten languages, including more popular selections like
BASIC BASIC (Beginners' All-purpose Symbolic Instruction Code) is a family of general-purpose, high-level programming languages designed for ease of use. The original version was created by John G. Kemeny and Thomas E. Kurtz at Dartmouth College ...
, C,
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
,
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily us ...
, and FORTRAN, and some less well-known examples like
Forth Forth or FORTH may refer to: Arts and entertainment * ''forth'' magazine, an Internet magazine * ''Forth'' (album), by The Verve, 2008 * ''Forth'', a 2011 album by Proto-Kaw * Radio Forth, a group of independent local radio stations in Scotla ...
, ZSPL,
Ratfor Ratfor (short for ''Rational Fortran'') is a programming language implemented as a preprocessor for Fortran 66. It provides modern control structures, unavailable in Fortran 66, to replace GOTOs and statement numbers. Features Ratfor provides ...
,
PL/1 PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
and PLMX. Example runs were provided for a variety of machines, mostly
Zilog Z80 The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples wer ...
or
MOS 6502 The MOS Technology 6502 (typically pronounced "sixty-five-oh-two" or "six-five-oh-two") William Mensch and the moderator both pronounce the 6502 microprocessor as ''"sixty-five-oh-two"''. is an 8-bit microprocessor that was designed by a small te ...
-based. The best time was initially 16.5 seconds, turned in by Ratfor on a 4 MHz Z80 machine, but
Gary Kildall Gary Arlen Kildall (; May 19, 1942 – July 11, 1994) was an American computer scientist and microcomputer entrepreneur. During the 1970s, Kildall created the CP/M operating system among other operating systems and programming tools, a ...
personally provided a version in
Digital Research Digital Research, Inc. (DR or DRI) was a company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser DOS, DOS Plus, DR DOS and ...
's prototype version of PL/1 that ran in 14 seconds and set the mark for this first collection. The slowest was Microsoft COBOL on the same machine, which took a whopping 5115 seconds (almost one and a half hours), longer even than interpreted languages like BASIC. A notable feature of this first run was that C, Pascal and PL/1 all turned in a roughly similar performance that easily beat the various interpreters. A second set of tests was carried out on more powerful machines, with
Motorola 68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
turning in the fastest times at 1.12 seconds, slightly besting C on a
PDP-11/70 The PDP-11 is a series of 16-bit minicomputers sold by Digital Equipment Corporation (DEC) from 1970 into the 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of all models were sold, ...
and almost twice as fast as 8086 assembler. Most PDP-11 and
HP-3000 The HP 3000 series is a family of 16-bit and 32-bit minicomputers from Hewlett-Packard. It was designed to be the first minicomputer with full support for time-sharing in the hardware and the operating system, features that had mostly been limite ...
times were much slower, on the order of 10 to 50 seconds. Tests on these machines using only high-level languages was led by NBS Pascal on the PDP-11, at 2.6 seconds.
UCSD Pascal UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (UCS ...
provided another interesting set of results as the same program can be run on multiple machines. Running on the dedicated
Ithaca Intersystems Ithaca Intersystems was a microcomputer manufacturer in the 1970s and 1980s, located in Ithaca, New York. The early years drew on engineering talent from Cornell University when the founders, including Steven Edelman, worked in a small rented spa ...
Pascal-100 machine, a
Pascal MicroEngine Pascal MicroEngine is a series of microcomputer products manufactured by Western Digital from 1979 through the mid-1980s, designed specifically to run the UCSD p-System efficiently. Compared to other microcomputers, which use a machine language ...
based computer, it ran in 54 seconds, while on the Z80 it was 239, and 516 on the Apple II.


Spread

Gilbreath, this time along with his brother Gary, revisited the code in the January 1983 edition of ''Byte''. This version removed most of the less popular languages, leaving Pascal, C, FORTRAN IV, and COBOL, while adding
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, Tur ...
and
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
. Thanks to readers providing additional samples, the number of machines, operating systems and languages compared in the resulting tables was greatly expanded.
Motorola 68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
(68k) assembly remained the fastest, almost three times the speed of the
Intel 8086 The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allowi ...
running at the same 8 MHz clock. Using high-level languages the two were closer in performance, with the 8086 generally better than half the speed of the 68k and often much closer. A wider variety of
minicomputer A minicomputer, or colloquially mini, is a class of smaller general purpose computers that developed in the mid-1960s and sold at a much lower price than mainframe and mid-size computers from IBM and its direct competitors. In a 1970 survey, ...
s and
mainframe A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise ...
s was also included, with times that the 68k generally beat except for the very fastest machines like the
IBM 3033 The IBM 303XIBM used a capital X when referring to 303X, as did print media; see Computerworld ref below. is a discontinued line of mainframe computers, the first model of which, the IBM 3033 Processor, nicknamed "The Big One", was introduced Mar ...
and high-end models of the
VAX VAX (an acronym for Virtual Address eXtension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation (DEC) in the late 20th century. The VA ...
. Older machines like the
Data General Nova The Data General Nova is a series of 16-bit minicomputers released by the American company Data General. The Nova family was very popular in the 1970s and ultimately sold tens of thousands of units. The first model, known simply as "Nova", was ...
, PDP-11 and
HP-1000 The HP 2100 is a series of 16-bit minicomputers that were produced by Hewlett-Packard (HP) from the mid-1960s to early 1990s. Tens of thousands of machines in the series were sold over its twenty-five year lifetime, making HP the fourth largest mi ...
were nowhere near as fast as the 68k. Gilbreath's second article appeared as the benchmark was becoming quite common as a way to compare the performance of various machines, let alone languages. In spite of his original warning not to do so, it soon began appearing in magazine advertisements as a way to compare performance against the competition, and as a general benchmark. ''Byte'' once again revisited the sieve later in August 1983 as part of a whole-magazine series of articles on the C language. In this case the use was more in keeping with the original intent, using a single
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
and running it on a single machine to compare the performance of C compilers on the
CP/M-86 CP/M-86 was a version of the CP/M operating system that Digital Research (DR) made for the Intel 8086 and Intel 8088. The system commands are the same as in CP/M-80. Executable files used the relocatable .CMD file format. Digital Research als ...
operating system, on CP/M-80, and for the
IBM PC The IBM Personal Computer (model 5150, commonly known as the IBM PC) is the first microcomputer released in the IBM PC model line and the basis for the IBM PC compatible de facto standard. Released on August 12, 1981, it was created by a team ...
. In spite of Gilbreath's concern in the original article, by this time the code had become almost universal for testing, and one of the articles remarked that "The Sieve of Eratosthenes is a mandatory benchmark". It was included in the ''Byte'' UNIX Benchmark Suite introduced in August 1984.


Today

New versions of the code continue to appear for new languages, e
Rosetta Code
and
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
has many versions available. It is often used as an example of
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declar ...
in spite of the common version not actually using the sieve algorithm.


Implementation

The provided implementation calculated odd primes only, so the 8191 element array actually represented primes less than 16385. As shown in a sidebar table, the 0th element represented 3, 1st element 5, 2nd element 7, and so on. This is the original BASIC version of the code presented in 1981. The dialect is not specified, but a number of details mean it does not run under early versions of
Microsoft BASIC Microsoft BASIC is the foundation software product of the Microsoft company and evolved into a line of BASIC interpreters and compiler(s) adapted for many different microcomputers. It first appeared in 1975 as Altair BASIC, which was the first ve ...
(4.x and earlier), among these the use of long variable names like and . The lack of line numbers may suggest a minicomputer variety that reads source from a text file, but may have also been a printing error. REM Eratosthenes Sieve Prime Number Program in BASIC 1 SIZE = 8190 2 DIM FLAGS(8191) 3 PRINT "Only 1 iteration" 5 COUNT = 0 6 FOR I = 0 TO SIZE 7 FLAGS (I) = 1 8 NEXT I 9 FOR I = 0 TO SIZE 10 IF FLAGS (I) = 0 THEN 18 11 PRIME = I+I + 3 12 K = I + PRIME 13 IF K > SIZE THEN 17 14 FLAGS (K) = 0 15 K = K + PRIME 16 GOTO 13 17 COUNT = COUNT + 1 18 NEXT I 19 PRINT COUNT," PRIMES" And in C, with some whitespace adjustments from the original: #define true 1 #define false 0 #define size 8190 #define sizepl 8191 char flags izepl main()


Notes


References


Citations


Bibliography

* * * {{cite book , first=Donald , last=Knuth , title=The Art of Computer Programming Volume 2: Seminumerical Algorithms , date=1969 , publisher=Addison-Wesley , url=https://books.google.com/books?id=_NHDswEACAAJ , isbn=978-0-201-89684-8 Benchmarks (computing) History of computing Articles with example BASIC code Articles with example C code