Working set is a concept in

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

which defines the amount of memory that a

process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management * Business process, activities that produce a specific s ...

requires in a given time interval.

Definition

Peter Denning (1968) defines "the working set of information

W(t, \tau)

of a

at time

t

to be the collection of information referenced by the process during the process time interval

(t - \tau, t)

". Typically the units of information in question are considered to be memory pages. This is suggested to be an approximation of the set of pages that the process will access in the future (say during the next

\tau

time units), and more specifically is suggested to be an indication of what pages ought to be kept in main memory to allow most progress to be made in the execution of that process.

Rationale

The effect of the choice of what pages to be kept in main memory (as distinct from being ''paged out'' to auxiliary storage) is important: if too many pages of a process are kept in main memory, then fewer other processes can be ready at any one time. If too few pages of a process are kept in main memory, then its

page fault In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space ...

frequency is greatly increased and the number of active (non-suspended) processes currently executing in the system approaches zero. The working set model states that a process can be in

RAM Ram, ram, or RAM most commonly refers to: * A male sheep * Random-access memory, computer memory * Ram Trucks, US, since 2009 ** List of vehicles named Dodge Ram, trucks and vans ** Ram Pickup, produced by Ram Trucks Ram, ram, or RAM may also ref ...

if and only if all of the pages that it is currently using (often approximated by the most recently used pages) can be in RAM. The model is an all or nothing model, meaning if the pages it needs to use increases, and there is no room in RAM, the process is swapped out of memory to free the memory for other processes to use. Often a heavily loaded computer has so many processes queued up that, if all the processes were allowed to run for one

scheduling A schedule (, ) or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things ...

time slice In computing, preemption is the act performed by an external scheduler — without assistance or cooperation from the task — of temporarily interrupting an executing task, with the intention of resuming it at a later time. This preemptive s ...

, they would refer to more pages than there is RAM, causing the computer to " thrash". By swapping some processes from memory, the result is that processes—even processes that were temporarily removed from memory—finish much sooner than they would if the computer attempted to run them all at once. The processes also finish much sooner than they would if the computer only ran one process at a time to completion since it allows other processes to run and make progress during times that one process is waiting on the hard drive or some other global resource. In other words, the working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. Thus it optimizes CPU utilization and throughput.

Implementation

The main hurdle in implementing the working set model is keeping track of the working set. The working set window is a moving window. At each memory reference a new reference appears at one end and the oldest reference drops off the other end. A page is in the working set if it is referenced in the working set window. To avoid the overhead of keeping a list of the last ''k'' referenced pages, the working set is often implemented by keeping track of the time ''t'' of the last reference, and considering the working set to be all pages referenced within a certain period of time. The working set isn't a

page replacement algorithm In a computer operating system that uses paging for virtual memory management, page replacement algorithms decide which memory pages to page out, sometimes called swap out, or write to disk, when a page of memory needs to be allocated. Page repl ...

, but page-replacement algorithms can be designed to only remove pages that aren't in the working set for a particular process. One example is a modified version of the clock algorithm called WSClock.

Variants

Working set can be divided into ''code'' working set and ''data'' working set. This distinction is important when code and data are separate at the relevant level of the memory hierarchy, as if ''either'' working set does not fit in that level of the hierarchy, thrashing will occur. In addition to the code and data themselves, on systems with

virtual memory In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...

, the

memory map In computer science, a memory map is a structure of data (which usually resides in memory itself) that indicates how memory is laid out. The term "memory map" has different meanings in different contexts. #It is the fastest and most flexible cach ...

(of virtual memory to physical memory) entries of the pages of the working set must be cached in the

translation lookaside buffer A translation lookaside buffer (TLB) is a memory CPU cache, cache that stores the recent translations of virtual memory address to a physical memory Memory_address, location. It is used to reduce the time taken to access a user memory location. It ...

(TLB) for the process to progress efficiently. This distinction exists because code and data are cached in small blocks (

cache line A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

s), not entire pages, but address lookup is done at the page level. Thus even if the code and data working sets fit into cache, if the working sets are split across many pages, the virtual address working set may not fit into TLB, causing TLB thrashing. Analogs of working set exist for other limited resources, most significantly processes. If a set of processes requires frequent interaction between multiple processes, then it has a that must be coscheduled in order to progress: If the processes are not scheduled simultaneously – for example, if there are two processes but only one core on which to execute them – then the processes can only advance at the rate of one interaction per time slice. Other resources include

file handle In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket. File descriptors typically ha ...

s or

network socket A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming ...

s – for example, copying one file to another is most simply done with two file handles: one for input, one for output, and thus has a "file handle working set" size of two. If only one file handle is available, copying can still be done, but requires acquiring a file handle for the input, reading from it (say into a buffer), releasing it, then acquiring a file handle for the output, writing to it, releasing it, then acquiring the input file handle again and repeating. Similarly a server may require many sockets, and if it is limited would need to repeatedly release and re-acquire sockets. Rather than thrashing, these resources are typically ''required'' for the program, and if it cannot acquire enough resources, it simply fails.

Definition

Rationale

Implementation

Variants

See also

References

Further reading