Capacity Optimization
   HOME
*





Capacity Optimization
Capacity optimization is a general term for technologies used to improve storage use by shrinking stored data. Primary technologies used for capacity optimization are data deduplication and data compression. These are delivered as software or hardware, integrated with storage systems or delivered as standalone products. Deduplication algorithms look for redundancy in sequences of bytes across comparison windows. Typically using cryptographic hash functions as identifiers of unique sequences, sequences are compared to the history of other such sequences, and where possible, the first uniquely stored version of a sequence is referenced rather than stored again. Different methods for selecting data windows include 4KB blocks to full-file comparisons known as single-instance storage (SIS). Capacity optimization generally refers to the use of this kind of technology in a storage system. An example of this kind of system is the Venti file system in the Plan9 open source OS. There ar ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Data Deduplication
In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent. The deduplication process requires comparison of data 'chunks' (also known as 'byte patterns') which are unique, contiguous blocks of data. These chunks are identified and stored during a process of analysis, and compared to other chunks within existing data. Whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduc ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Data Compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding; encoding done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Cryptographic Hash Functions
A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output result ( hash value) for a random input string ("message") is 2^ (like for any good hash), so the hash value can be used as a representative of the message; * finding an input string that matches a given hash value (a ''pre-image'') is unfeasible, unless the value is selected from a known pre-calculated dictionary ("rainbow table"). The ''resistance'' to such search is quantified as security strength, a cryptographic hash with n bits of hash value is expected to have a ''preimage resistance'' strength of n bits. A ''second preimage'' resistance strength, with the same expectations, refers to a similar problem of finding a second message that matches the given hash value when one message is already known; * finding any pair of different mes ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Single-instance Storage
Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in file systems, e-mail server software, data backup, and other storage-related computer software. Single-instance storage is a simple variant of data deduplication. While data deduplication may work at a segment or sub-block level, single-instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages. Concept In the case of an e-mail server, single-instance storage would mean that a single copy of a message is held within its database while individual mailboxes access the content through a reference pointer. However, there is a common misconception that the primary benefit of single-instance storage in mail servers is a reduction in disk space requirements. The truth is that its primary benefit is to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




WAN Optimization
WAN optimization is a collection of techniques for improving data transfer across wide area networks (WANs). In 2008, the WAN optimization market was estimated to be $1 billion, and was to grow to $4.4 billion by 2014 according to Gartner, a technology research firm. In 2015 Gartner estimated the WAN optimization market to be a $1.1 billion market. The most common measures of TCP data-transfer efficiencies (i.e., optimization) are throughput, bandwidth requirements, latency, protocol optimization, and congestion, as manifested in dropped packets. In addition, the WAN itself can be classified with regards to the distance between endpoints and the amounts of data transferred. Two common business WAN topologies are Branch to Headquarters and Data Center to Data Center (DC2DC). In general, "Branch" WAN links are closer, use less bandwidth, support more simultaneous connections, support smaller connections and more short-lived connections, and handle a greater variety of protocols. The ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]