Memcached (pronounced variously /mɛmkæʃˈdiː/ ''mem-cash-dee'' or /ˈmɛmkæʃt/ ''mem-cashed'') is a general-purpose distributed
memory-caching system. It is often used to speed up dynamic
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
-driven websites by caching data and
objects in
RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached is
free and open-source software
Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
, licensed under the
Revised BSD license.
Memcached runs on
Unix-like
A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
operating systems (
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
and
macOS
macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
) and on
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
. It depends on the
libevent
libevent is a software library that provides asynchronous event notification. The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. libevent ...
library.
Memcached's
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
s provide a very large
hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in
least recently used
In computing, cache replacement policies (also known as cache replacement algorithms or cache algorithms) are Program optimization, optimizing instructions or algorithms which a computer program or hardware-maintained structure can utilize to ma ...
(LRU) order. Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database.
Memcached has no internal mechanism to track misses which may happen. However, some third party utilities provide this functionality.
Memcached was first developed by
Brad Fitzpatrick
Bradley Joseph Fitzpatrick (born February 5, 1980) is an American programmer. He is best known as the creator of LiveJournal and is the author of a variety of free software projects such as memcached, PubSubHubbub, OpenID, and Perkeep.
Personal l ...
for his website
LiveJournal
LiveJournal (), stylised as LiVEJOURNAL, is a Russian-owned social networking service where users can keep a blog, journal, or diary. American programmer Brad Fitzpatrick started LiveJournal on April 15, 1999, as a way of keeping his high school ...
, on May 22, 2003. It was originally written in
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
, then later rewritten in
C by Anatoly Vorobey, then employed by LiveJournal. Memcached is now used by many other systems, including
YouTube
YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
,
Reddit
Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
,
Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
,
Pinterest
Pinterest is an American social media service for publishing and discovery of information in the form of digital Bulletin board, pinboards. This includes recipes, home, style, motivation, and inspiration on the Internet using image sharing. Pint ...
,
Twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
,
Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
, and
Method Studios
Method Studios is a visual effects company launched in 1999 in Los Angeles, California with facilities in New York, Atlanta, Vancouver, San Francisco, Melbourne, Montreal, and Pune. The company provides production and post-production services ...
.
Google App Engine,
Google Cloud Platform
Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, Computer data storage, data storage, Data analysis, data analytics, and machine learnin ...
,
Microsoft Azure
Microsoft Azure, or just Azure ( /ˈæʒər, ˈeɪʒər/ ''AZH-ər, AY-zhər'', UK also /ˈæzjʊər, ˈeɪzjʊər/ ''AZ-ure, AY-zure''), is the cloud computing platform developed by Microsoft. It has management, access and development of ...
,
IBM Bluemix and
Amazon Web Services
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
also offer a Memcached service through an API.
Software architecture
The system uses a
client–server architecture. The servers maintain a key–value
associative array
In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
; the clients populate this array and query it by key. Keys are up to 250 bytes long and values can be at most 1
megabyte
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes ...
in size.
Clients use client-side libraries to contact the servers which, by default, expose their service at
port
A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as Hamburg, Manch ...
11211. Both TCP and UDP are supported. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a
hash of the key to determine which server to use. This gives a simple form of
sharding and scalable
shared-nothing architecture across the servers. The server computes a second hash of the key to determine where to store or read the corresponding value. The servers keep the values in RAM (and, starting in 1.6.0, in auxiliary cache on disk using an external storage server option);
if a server runs out of available memory or disk, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. Other databases, such as
MemcacheDB,
Couchbase Server, provide persistent storage while maintaining Memcached protocol compatibility.
If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data.
A typical deployment has several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server. The size of its hash table is often very large. It is limited to available memory across all the servers in the cluster of servers in a data center. Where high-volume, wide-audience Web publishing requires it, this may stretch to many gigabytes. Memcached can be equally valuable for situations where either the number of requests for content is high, or the cost of generating a particular piece of content is high. Applications with particularly high-demand caching needs can use a built-in proxy to define and configure complex client-server routes.
Security
Most deployments of Memcached are within trusted networks where clients may freely connect to any server. However, sometimes Memcached is deployed in untrusted networks or where administrators want to exercise control over the clients that are connecting. For this purpose Memcached can be compiled with optional
SASL authentication support. The SASL support requires the binary protocol.
A presentation at
BlackHat USA 2010 revealed that a number of large public websites had left Memcached open to inspection, analysis, retrieval, and modification of data.
Even within a trusted organisation, the flat trust model of memcached may have security implications. For efficient simplicity, all Memcached operations are treated equally. Clients with a valid need for access to low-security entries within the cache gain access to ''all'' entries within the cache, even when these are higher-security and that client has no justifiable need for them. If the cache key can be either predicted, guessed or found by exhaustive searching, its cache entry may be retrieved.
Some attempt to isolate setting and reading data may be made in situations such as high volume web publishing. A farm of outward-facing content servers have ''read'' access to memcached containing published pages or page components, but no write access. Where new content is published (and is not yet in memcached), a request is instead sent to content generation servers that are not publicly accessible to create the content unit and add it to memcached. The content server then retries to retrieve it and serve it outwards.
Used as a DDoS attack vector
In February 2018,
CloudFlare
Cloudflare, Inc., is an American company that provides content delivery network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other se ...
reported that misconfigured memcached servers were used to launch
DDoS attacks in large scale. The memcached protocol over UDP has a huge
amplification factor, of more than 51000. Victims of the DDoS attacks include
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
, which was flooded with 1.35 Tbit/s peak incoming traffic.
This issue was mitigated in Memcached version 1.5.6, which disabled UDP protocol by default.
Example code
''Note that all functions described on this page are
pseudocode only. Memcached calls and programming languages may vary based on the API used.''
Converting database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows:
function get_foo(int userid)
data = db_select("SELECT * FROM users WHERE userid = ?", userid)
return data
After conversion to Memcached, the same call might look like the following
function get_foo(int userid)
/* first try the cache */
data = memcached_fetch("userrow:" + userid)
if not data
/* not found : request database */
data = db_select("SELECT * FROM users WHERE userid = ?", userid)
/* then store in cache until next get */
memcached_add("userrow:" + userid, data)
end
return data
The client would first check whether a Memcached value with the unique key "userrow:userid" exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call.
However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached "view" of the data would become out of date. Therefore, in addition to creating an "add" call, an update call would also be needed using the Memcached set function.
function update_foo(int userid, string dbUpdateString)
/* first update database */
result = db_execute(dbUpdateString)
if result
/* database update successful : fetch data to be stored in cache */
data = db_select("SELECT * FROM users WHERE userid = ?", userid)
/* the previous line could also look like data = createDataFromDBString(dbUpdateString) */
/* then store in cache until next get */
memcached_set("userrow:" + userid, data)
This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache.
An alternate cache-invalidation strategy is to store a random number in an agreed-upon cache entry and to incorporate this number into all keys that are used to store a particular kind of entry. To invalidate all such entries at once, change the random number. Existing entries (which were stored using the old number) will no longer be referenced and so will eventually expire or be recycled.
function store_xyz_entry(int key, string value)
/* Retrieve the random number - use zero if none exists yet.
* The key-name used here is arbitrary. */
seed = memcached_fetch(":xyz_seed:")
if not seed
seed = 0
/* Build the key used to store the entry and store it.
* The key-name used here is also arbitrary. Notice that the "seed" and the user's "key"
* are stored as separate parts of the constructed hashKey string: ":xyz_data:(seed):(key)."
* This is not mandatory, but is recommended. */
string hashKey = sprintf(":xyz_data:%d:%d", seed, key)
memcached_set(hashKey, value)
/* "fetch_entry," not shown, follows identical logic to the above. */
function invalidate_xyz_cache()
existing_seed = memcached_fetch(":xyz_seed:")
/* Coin a different random seed */
do
seed = rand()
until seed != existing_seed
/* Now store it in the agreed-upon place. All future requests will use this number.
* Therefore, all existing entries become un-referenced and will eventually expire. */
memcached_set(":xyz_seed:", seed)
Usage
*
MySQL
MySQL () is an Open-source software, open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A rel ...
- directly supports the Memcached API as of version 5.6.
*
Oracle Coherence - directly supports the Memcached API as of version 12.1.3.
*
Infinispan - directly supports Memcached.
See also
*
Amazon ElastiCache
*
Aerospike
*
Couchbase Server
*
Redis
Redis (; Remote Dictionary Server) is an in-memory key–value database, used as a distributed cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low- latency reads ...
*
Mnesia
*
MemcacheDB
*
Hazelcast
*
Cassandra
Cassandra or Kassandra (; , , sometimes referred to as Alexandra; ) in Greek mythology was a Trojan priestess dedicated to the god Apollo and fated by him to utter true prophecy, prophecies but never to be believed. In modern usage her name is e ...
*
ScyllaDB
*
Tarantool
*
Ehcache
*
Infinispan
References
External links
*{{Official website
2003 software
Cross-platform software
Database caching
Free memory management software
Key-value databases
Software using the BSD license
Structured storage