Memcached
   HOME

TheInfoList



OR:

Memcached (pronounced variously ''mem-cash-dee'' or ''mem-cashed'') is a general-purpose distributed memory-caching system. It is often used to speed up dynamic
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
-driven websites by caching data and
objects Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
in
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
to reduce the number of times an external data source (such as a database or API) must be read. Memcached is
free and open-source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
, licensed under the Revised BSD license. Memcached runs on
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems (
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
and
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
) and on
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
. It depends on the
libevent libevent is a software library that provides asynchronous event notification. The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. libeven ...
library. Memcached's
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
s provide a very large
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order. Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database. Memcached has no internal mechanism to track misses which may happen. However, some third party utilities provide this functionality. Memcached was first developed by
Brad Fitzpatrick Bradley Joseph Fitzpatrick (born February 5, 1980) is an American programmer. He is best known as the creator of LiveJournal and is the author of a variety of free software projects such as memcached, PubSubHubbub, OpenID, and Perkeep. Early life ...
for his website
LiveJournal LiveJournal (russian: Живой Журнал), stylised as LiVEJOURNAL, is a Russian-owned social networking service where users can keep a blog, journal, or diary. American programmer Brad Fitzpatrick started LiveJournal on April 15, 1999, as ...
, on May 22, 2003. It was originally written in
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
, then later rewritten in C by Anatoly Vorobey, then employed by LiveJournal. Memcached is now used by many other systems, including
YouTube YouTube is a global online video platform, online video sharing and social media, social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by ...
,
Reddit Reddit (; stylized in all lowercase as reddit) is an American social news aggregation, content rating, and discussion website. Registered users (commonly referred to as "Redditors") submit content to the site such as links, text posts, images ...
,
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
,
Pinterest Pinterest is an American image sharing and social media service designed to enable saving and discovery of information (specifically "ideas") on the internet using images, and on a smaller scale, animated GIFs and videos, in the form of pinboard ...
,
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
,
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
, and
Method Studios Method Studios is a visual effects company launched in 1999 in Los Angeles, California with facilities in New York, Atlanta, Vancouver, San Francisco, Melbourne, Montreal, and Pune. The company provides production and post-production services i ...
.
Google App Engine Google App Engine (often referred to as GAE or simply App Engine) is a cloud computing platform as a service for developing and hosting web applications in Google-managed data centers. Applications are sandboxed and run across multiple server ...
,
Google Cloud Platform Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Drive, and YouTube. Alongside ...
,
Microsoft Azure Microsoft Azure, often referred to as Azure ( , ), is a cloud computing platform operated by Microsoft for application management via around the world-distributed data centers. Microsoft Azure has multiple capabilities such as software as a ...
,
IBM Bluemix IBM Cloud, (formerly known as Bluemix) is a set of cloud computing services for business offered by the information technology company IBM. Services As of 2021, IBM Cloud contains more than 170 services including compute, storage, networkin ...
and
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
also offer a Memcached service through an API.


Software architecture

The system uses a client–server architecture. The servers maintain a key–value
associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
; the clients populate this array and query it by key. Keys are up to 250 bytes long and values can be at most 1
megabyte The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...
in size. Clients use client-side libraries to contact the servers which, by default, expose their service at
port A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as Ham ...
11211. Both TCP and UDP are supported. Each client knows all servers; the servers do not communicate with each other. If a client wishes to set or read the value corresponding to a certain key, the client's library first computes a hash of the key to determine which server to use. This gives a simple form of
shard Shard or sherd is a sharp piece of glass, pottery or stone. Shard may also refer to: Places * Shard End, a place in Birmingham, United Kingdom Architecture * Dresden Shard, a redesign of the Bundeswehr Military History Museum in Dresden, Germa ...
ing and scalable
shared-nothing architecture A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
across the servers. The server computes a second hash of the key to determine where to store or read the corresponding value. The servers keep the values in RAM; if a server runs out of RAM, it discards the oldest values. Therefore, clients must treat Memcached as a transitory cache; they cannot assume that data stored in Memcached is still there when they need it. Other databases, such as
MemcacheDB MemcacheDB (pronunciation: mem-cash-dee-bee) is a persistence enabled variant of memcached. MemcacheDB has not been actively maintained since 2009. It is a general-purpose distributed memory caching system often used to speed up dynamic database- ...
,
Couchbase Server Couchbase Server, originally known as Membase, is an open-source, distributed ( shared-nothing architecture) multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many ...
, provide persistent storage while maintaining Memcached protocol compatibility. If all client libraries use the same hashing algorithm to determine servers, then clients can read each other's cached data. A typical deployment has several servers and many clients. However, it is possible to use Memcached on a single computer, acting simultaneously as client and server. The size of its hash table is often very large. It is limited to available memory across all the servers in the cluster of servers in a data center. Where high-volume, wide-audience Web publishing requires it, this may stretch to many gigabytes. Memcached can be equally valuable for situations where either the number of requests for content is high, or the cost of generating a particular piece of content is high.


Security

Most deployments of Memcached are within trusted networks where clients may freely connect to any server. However, sometimes Memcached is deployed in untrusted networks or where administrators want to exercise control over the clients that are connecting. For this purpose Memcached can be compiled with optional SASL authentication support. The SASL support requires the binary protocol. A presentation at BlackHat USA 2010 revealed that a number of large public websites had left Memcached open to inspection, analysis, retrieval, and modification of data. Even within a trusted organisation, the flat trust model of memcached may have security implications. For efficient simplicity, all Memcached operations are treated equally. Clients with a valid need for access to low-security entries within the cache gain access to ''all'' entries within the cache, even when these are higher-security and that client has no justifiable need for them. If the cache key can be either predicted, guessed or found by exhaustive searching, its cache entry may be retrieved. Some attempt to isolate setting and reading data may be made in situations such as high volume web publishing. A farm of outward-facing content servers have ''read'' access to memcached containing published pages or page components, but no write access. Where new content is published (and is not yet in memcached), a request is instead sent to content generation servers that are not publicly accessible to create the content unit and add it to memcached. The content server then retries to retrieve it and serve it outwards.


Used as a DDoS attack vector

In February 2018,
CloudFlare Cloudflare, Inc. is an American content delivery network and DDoS mitigation company, founded in 2009. It primarily acts as a reverse proxy between a website's visitor and the Cloudflare customer's hosting provider. Its headquarters are in San ...
reported that misconfigured memcached servers were used to launch DDoS attacks in large scale. The memcached protocol over UDP has a huge
amplification factor In general an amplification factor is the numerical multiplicative factor by which some quantity is increased. * In structural engineering the amplification factor is the ratio of second order to first order deflections. * In electronics the ampl ...
, of more than 51000. Victims of the DDoS attacks include
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
, which was flooded with 1.35 Tbit/s peak incoming traffic. This issue was mitigated in Memcached version 1.5.6, which disabled UDP protocol by default.


Example code

''Note that all functions described on this page are
pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
only. Memcached calls and programming languages may vary based on the API used.'' Converting database or object creation queries to use Memcached is simple. Typically, when using straight database queries, example code would be as follows: function get_foo(int userid) data = db_select("SELECT * FROM users WHERE userid = ?", userid) return data After conversion to Memcached, the same call might look like the following function get_foo(int userid) /* first try the cache */ data = memcached_fetch("userrow:" + userid) if not data /* not found : request database */ data = db_select("SELECT * FROM users WHERE userid = ?", userid) /* then store in cache until next get */ memcached_add("userrow:" + userid, data) end return data The client would first check whether a Memcached value with the unique key "userrow:userid" exists, where userid is some number. If the result does not exist, it would select from the database as usual, and set the unique key using the Memcached API add function call. However, if only this API call were modified, the server would end up fetching incorrect data following any database update actions: the Memcached "view" of the data would become out of date. Therefore, in addition to creating an "add" call, an update call would also be needed using the Memcached set function. function update_foo(int userid, string dbUpdateString) /* first update database */ result = db_execute(dbUpdateString) if result /* database update successful : fetch data to be stored in cache */ data = db_select("SELECT * FROM users WHERE userid = ?", userid) /* the previous line could also look like data = createDataFromDBString(dbUpdateString) */ /* then store in cache until next get */ memcached_set("userrow:" + userid, data) This call would update the currently cached data to match the new data in the database, assuming the database query succeeds. An alternative approach would be to invalidate the cache with the Memcached delete function, so that subsequent fetches result in a cache miss. Similar action would need to be taken when database records were deleted, to maintain either a correct or incomplete cache. An alternate cache-invalidation strategy is to store a random number in an agreed-upon cache entry and to incorporate this number into all keys that are used to store a particular kind of entry. To invalidate all such entries at once, change the random number. Existing entries (which were stored using the old number) will no longer be referenced and so will eventually expire or be recycled. function store_xyz_entry(int key, string value) /* Retrieve the random number - use zero if none exists yet. * The key-name used here is arbitrary. */ seed = memcached_fetch(":xyz_seed:") if not seed seed = 0 /* Build the key used to store the entry and store it. * The key-name used here is also arbitrary. Notice that the "seed" and the user's "key" * are stored as separate parts of the constructed hashKey string: ":xyz_data:(seed):(key)." * This is not mandatory, but is recommended. */ string hashKey = sprintf(":xyz_data:%d:%d", seed, key) memcached_set(hashKey, value) /* "fetch_entry," not shown, follows identical logic to the above. */ function invalidate_xyz_cache() existing_seed = memcached_fetch(":xyz_seed:") /* Coin a different random seed */ do seed = rand() until seed != existing_seed /* Now store it in the agreed-upon place. All future requests will use this number. * Therefore, all existing entries become un-referenced and will eventually expire. */ memcached_set(":xyz_seed:", seed)


Usage

*
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
- directly supports the Memcached API as of version 5.6. *
Oracle Coherence In computing, Oracle Coherence (originally Tangosol Coherence) is a Java-based distributed cache and in-memory data grid. It is claimed to be "intended for systems that require high availability, high scalability and low latency, particularly in ...
- directly supports the Memcached API as of version 12.1.3. *
Infinispan Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it as remote service through TCP/IP. ...
- directly supports Memcached.


See also

* Amazon ElastiCache * Aerospike *
Couchbase Server Couchbase Server, originally known as Membase, is an open-source, distributed ( shared-nothing architecture) multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many ...
*
Redis Redis (; Remote Dictionary Server) is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, su ...
*
Mnesia Mnesia is a distributed, soft real-time database management system written in the Erlang programming language. It is distributed as part of the Open Telecom Platform. Description As with Erlang, Mnesia was developed by Ericsson for soft real ...
*
MemcacheDB MemcacheDB (pronunciation: mem-cash-dee-bee) is a persistence enabled variant of memcached. MemcacheDB has not been actively maintained since 2009. It is a general-purpose distributed memory caching system often used to speed up dynamic database- ...
*
Hazelcast In computing, Hazelcast IMDG is an open source in-memory data grid based on Java. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California. In a ...
*
Cassandra Cassandra or Kassandra (; Ancient Greek: Κασσάνδρα, , also , and sometimes referred to as Alexandra) in Greek mythology was a Trojan priestess dedicated to the god Apollo and fated by him to utter true prophecies but never to be believe ...
* Tarantool *
Ehcache Ehcache ( ) is an open source Java distributed cache for general-purpose caching, Java EE and . Ehcache is available under an Apache open source license. Ehcache was developed by Greg Luck starting in 2003. In 2009, the project was purchased b ...
*
Infinispan Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it as remote service through TCP/IP. ...


References


External links

*{{Official website Free memory management software Cross-platform software Structured storage 2003 software Database caching Key-value databases Software using the BSD license