HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the five-minute rule is a
rule of thumb In English, the phrase ''rule of thumb'' refers to an approximate method for doing something, based on practical experience rather than theory. This usage of the phrase can be traced back to the 17th century and has been associated with various t ...
for deciding whether a data item should be kept in
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
, or stored on disk and read back into memory when required. It was first formulated by Jim Gray and Gianfranco Putzolu in 1985, and then subsequently revised in 1997 and 2007 Free version in ''
ACM Queue ''ACM Queue'' is a bimonthly computer magazine founded and published by the Association for Computing Machinery The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 an ...
''
September 2008
to reflect changes in the relative cost and performance of memory and persistent storage. The rule is as follows:
The 5-minute random rule: cache randomly accessed disk pages that are re-used every 5 minutes or less.
Gray also issued a counterpart one-minute rule for
sequential access Sequential access is a term describing a group of elements (such as data in a memory array or a disk file or on magnetic tape data storage) being accessed in a predetermined, ordered sequence. It is the opposite of random access, the ability to ac ...
:
The 1-minute rule: cache sequentially accessed disk pages that are re-used every 1 minute or less.
Although the 5-minute rule was invented in the realm of databases, it has also been applied elsewhere, for example, in
Network File System Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like ...
cache capacity planning. The original 5-minute rule was derived from the following cost-benefit computation: :BreakEvenIntervalinSeconds = (PagesPerMBofRAM / AccessesPerSecondPerDisk) × (PricePerDiskDrive / PricePerMBofRAM) Applying it to 2007 data yields approximately a 90-minutes interval for magnetic-disk-to-DRAM caching, 15 minutes for
SSD A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is ...
-to-DRAM caching and 2 hours for disk-to-SSD caching. The disk-to-DRAM interval was thus a bit short of what Gray and Putzolu anticipated in 1987 as the "five-hour rule" was going to be in 2007 for RAM and disks. According to calculations by
NetApp NetApp, Inc. is an American hybrid cloud data services and data management company headquartered in San Jose, California. It has ranked in the Fortune 500 from 2012–2021. Founded in 1992 with an IPO in 1995, NetApp offers cloud data services ...
engineer David Dale as reported in
The Register ''The Register'' is a British technology news website co-founded in 1994 by Mike Magee, John Lettice and Ross Alderson. The online newspaper's masthead sublogo is "''Biting the hand that feeds IT''." Their primary focus is information tec ...
, the figures for disc-to-DRAM caching in 2008 were as follows: "The 50KB page break-even was five minutes, the 4KB one was one hour and the 1KB one was five hours. There needed to be a 50-fold increase in page size to cache for break-even at five minutes." Regarding disk-to-SSD caching in 2010, the same source reported that "A 250KB page break even with SLC was five minutes, but five hours with a 4KB page size. It was five minutes with a 625KB page size with MLC flash and 13 hours with a 4KB MLC page size." In 2000, Gray and Shenoy applied a similar calculation for web page caching and concluded that a browser should "cache web pages if there is any chance they will be re-referenced within their lifetime."Jim Gray, Prashant Shenoy,
Rules of Thumb in Data Engineering
, MS-TR-99-100


References

{{Reflist, 30em Memory management algorithms Rules of thumb Cache (computing)